โ† Back to Portal

ReferralAgent Pipeline Architecture

Healthcare Referral Analytics & Optimization System

v3.1 | January 2026
0
Visit Data & Compliance
dagster job execute -f repository.py -j neo4j_phase_b_full
๐Ÿข
CRM Data Load
Load company records from LACRM into Neo4j graph database
Entities: 440 CRM companies
Source: LACRMProject v4
Index: Integer CRMIndex assigned
๐Ÿ“…
Visit History
Import visit records and assign scheduled_clinic to each visit
Clinics: Addicks, Cinco Ranch, Harvest Green
Categories: CULTIVATE, ACCELERATE, MAINTAIN
Tracking: Visit counts per practice
๐Ÿ“Š
Schedule Parsing
Parse field rep schedules from email attachments
Source: Microsoft Graph API
Format: Excel attachments
Output: Weekly visit schedules
โœ…
Compliance Analytics
Calculate distance compliance for each clinic
Thresholds: 5, 10, 15, 20 miles
Output: Compliance % by clinic
Trend: Weekly tracking
1
ML Matching Pipeline
dagster job execute -f repository.py -j neo4j_comparison_pipeline
๐Ÿ“ฅ
Referral Ingestion
Fetch referrals from Google Sheets and aggregate to unique sources
Raw Referrals: 625+
Unique Sources: ~237
Years: 2024-2025
๐Ÿ”
Data Enrichment
NPI lookup, geocoding, specialty extraction
NPI Rate: 86%+ match
Geocoding: Google Maps API
Free Sources: NPPES, CMS PECOS
๐Ÿง 
Embeddings & Top-K
Generate semantic embeddings and filter candidate pairs
Model: text-embedding-3-large
Dimensions: 1536D vectors
Reduction: 95%+ pair filtering
๐ŸŽฏ
CatBoost Matching
ML model predicts referralโ†’CRM matches
Features: 40+ engineered
Threshold: 0.25
Match Rate: 88%+
2
Future Doctors Intelligence
dagster job execute -f repository.py -j neo4j_future_doctors
๐Ÿ‘จโ€โš•๏ธ
Provider Discovery
Identify unvisited providers in service area
Sources: NPI Registry, CMS
Radius: 15 miles per clinic
Filter: PT-referring specialties
๐Ÿ“Š
Opportunity Scoring
Rank prospects by conversion potential
Factors: Volume, specialty, distance
Output: High/Medium/Low tiers
Update: Weekly refresh
๐Ÿ—บ๏ธ
Geographic Assignment
Assign each prospect to nearest FYZICAL clinic
Method: Haversine distance
Tag: nearest_clinic property
Output: FutureDoctor nodes in Neo4j
3
ML Forecasting (5 Models)
dagster job execute -f repository.py -j run_ec2_forecasting
๐Ÿง 
TFT (Temporal Fusion Transformer)
State-of-the-art attention-based deep learning
Architecture: Multi-head attention + LSTM
Covariates: Visits, category, clinic, time
Output: Conversion probability per practice
Key Use: Powers CP-SAT optimizer scoring
๐Ÿ”ฎ
LSTM Neural Network
Deep learning time series forecasting
Architecture: 2-layer, 128 hidden units
Features: Visits + Referrals (daily data)
Uncertainty: Monte-Carlo dropout (20 samples)
Training: ~365 days history
๐Ÿ“ˆ
Prophet
Meta's forecasting with seasonality detection
Seasonality: Weekly + Yearly patterns
Tuning: Grid search cross-validation
Confidence: 80% prediction intervals
๐ŸŽฏ
Darts Ensemble
Auto-select best statistical model via MAPE
Models: ExponentialSmoothing, Theta, ARIMA
Selection: 5-fold cross-validation
Fallback: Naive if others fail
๐Ÿฅ
Per-Clinic + Aggregate
All 5 models run for each clinic + combined
Clinics: Addicks, Cinco, Harvest, All
Horizon: 12 weeks
Total Models: 5 ร— 4 = 20 forecasts
4
CP-SAT Optimizer & Two-List System
python optimal_visit_optimizer.py --two-list
โšก
Two-Stage Optimization
Stage 1: Geographic clustering, Stage 2: CP-SAT selection
Cluster Radius: 3 miles (same-stop: 0.5 mi)
Solver: Google OR-Tools CP-SAT
Decision Var: x[c] โˆˆ {0,1} per cluster
Objective: Max expected referral value
๐Ÿš—
Driving Budget Constraints
Realistic physical visit limits based on driving
Weekly Budget: 125 miles total
Daily Limit: 25 miles/day ร— 5 days
Physical Visits: ~30/week (10 per clinic)
Per-Clinic: Addicks 75mi, Cinco 65mi, Harvest 60mi
๐Ÿ“‹
Two-List Output
Separates achievable visits from phone outreach
Physical: ACCEL/REACT/CULTIVATE (in budget)
Phone: MAINTAIN + overflow practices
Piggyback: MAINTAIN near route โ†’ physical
Files: physical_visit_schedule.csv, phone_outreach_list.csv
๐Ÿ“Š
Category Constraints
Balance priority categories in schedule
ACCELERATE: โ‰ฅ15% (high ROI, close to magic 7)
REACTIVATE: โ‰ฅ5% (prevent relationship decay)
CULTIVATE: โ‰ฅ10% (pipeline building)
MAINTAIN: โ‰ค35% (phone calls OK)
๐Ÿ”„
Coverage Rotation (12-Week)
Minimum cooldown between visits to same practice
ACCELERATE: 2-week gap โ†’ 6 visits in 12 wks
CULTIVATE: 2-week gap โ†’ 6 visits (2ร—/month)
MAINTAIN: 10-week gap โ†’ 1 visit (quarterly)
Goal: Reach magic 7 in 3-4 months
๐Ÿงฎ
Expected Value Formula
Per-practice scoring using TFT + volume proxies
EV = volume ร— conv_prob ร— specialty_rate
     ร— category_weight ร— proximity_bonus
Proximity: +50% if 1-3 visits from magic 7
Underutilized Org: +15-50% boost
5
Closed-Loop Forecasting
dagster job execute -f repository.py -j neo4j_forecast_lift
๐Ÿ”„
TFT Re-run with Schedule
Re-run TFT using CP-SAT schedule as covariates
Scenario: cpsat_optimized
Input: cpsat_schedule_export.csv
Output: What-if referral forecasts
๐Ÿ“ˆ
Forecast Lift Analysis
Compare baseline vs optimized scenarios
Metrics: Lift %, ROI, CI bands
Breakdown: By clinic & category
Toggle: 90% confidence intervals
6
Comprehensive Reports
dagster job execute -f repository.py -j neo4j_comprehensive_reports
๐Ÿ“‘
Executive Summary
Key metrics, trends, and recommendations
Sections: 7 executive views
Charts: Funnel, KPI cards
Audience: Leadership
๐Ÿ“Š
Visits Analysis
Priority timeline, weekly breakdowns
Charts: Stacked bar, timeline
Metrics: By category & clinic
Trend: 52-week history
๐Ÿ”ฎ
Forecasting & ML
Interactive model comparison charts
Traditional: LSTM, Prophet, Darts
Optimization: Current, Optimal, TFT Optimized
Toggles: All / Traditional / Optimization
๐Ÿ—บ๏ธ
Interactive Map
Geographic visualization layers
Layers: Future Docs, Priority, Sources
Radius: 5/10/15 mile circles
Export: Standalone HTML

Pipeline Performance

88%
Match Rate
1.0
F1 Score
625+
Referrals
3
Clinics
12wk
Forecast
85
Visits/Week