ReferralAgent Pipeline Architecture

0

Visit Data & Compliance

dagster job execute -f repository.py -j neo4j_phase_b_full

🏢

CRM Data Load

Load company records from LACRM into Neo4j graph database

Entities: 440 CRM companies
Source: LACRMProject v4
Index: Integer CRMIndex assigned

📅

Visit History

Import visit records and assign scheduled_clinic to each visit

Clinics: Addicks, Cinco Ranch, Harvest Green
Categories: CULTIVATE, ACCELERATE, MAINTAIN
Tracking: Visit counts per practice

📊

Schedule Parsing

Parse field rep schedules from email attachments

Source: Microsoft Graph API
Format: Excel attachments
Output: Weekly visit schedules

✅

Compliance Analytics

Calculate distance compliance for each clinic

Thresholds: 5, 10, 15, 20 miles
Output: Compliance % by clinic
Trend: Weekly tracking

1

ML Matching Pipeline

dagster job execute -f repository.py -j neo4j_comparison_pipeline

📥

Referral Ingestion

Fetch referrals from Google Sheets and aggregate to unique sources

Raw Referrals: 625+
Unique Sources: ~237
Years: 2024-2025

🔍

Data Enrichment

NPI lookup, geocoding, specialty extraction

NPI Rate: 86%+ match
Geocoding: Google Maps API
Free Sources: NPPES, CMS PECOS

🧠

Embeddings & Top-K

Generate semantic embeddings and filter candidate pairs

Model: text-embedding-3-large
Dimensions: 1536D vectors
Reduction: 95%+ pair filtering

🎯

CatBoost Matching

ML model predicts referral→CRM matches

Features: 40+ engineered
Threshold: 0.25
Match Rate: 88%+

2

Future Doctors Intelligence

dagster job execute -f repository.py -j neo4j_future_doctors

👨‍⚕️

Provider Discovery

Identify unvisited providers in service area

Sources: NPI Registry, CMS
Radius: 15 miles per clinic
Filter: PT-referring specialties

📊

Opportunity Scoring

Rank prospects by conversion potential

Factors: Volume, specialty, distance
Output: High/Medium/Low tiers
Update: Weekly refresh

🗺️

Geographic Assignment

Assign each prospect to nearest FYZICAL clinic

Method: Haversine distance
Tag: nearest_clinic property
Output: FutureDoctor nodes in Neo4j

3

ML Forecasting (5 Models)

dagster job execute -f repository.py -j run_ec2_forecasting

🧠

TFT (Temporal Fusion Transformer)

State-of-the-art attention-based deep learning

Architecture: Multi-head attention + LSTM
Covariates: Visits, category, clinic, time
Output: Conversion probability per practice
Key Use: Powers CP-SAT optimizer scoring

🔮

LSTM Neural Network

Deep learning time series forecasting

Architecture: 2-layer, 128 hidden units
Features: Visits + Referrals (daily data)
Uncertainty: Monte-Carlo dropout (20 samples)
Training: ~365 days history

📈

Prophet

Meta's forecasting with seasonality detection

Seasonality: Weekly + Yearly patterns
Tuning: Grid search cross-validation
Confidence: 80% prediction intervals

🎯

Darts Ensemble

Auto-select best statistical model via MAPE

Models: ExponentialSmoothing, Theta, ARIMA
Selection: 5-fold cross-validation
Fallback: Naive if others fail

🏥

Per-Clinic + Aggregate

All 5 models run for each clinic + combined

Clinics: Addicks, Cinco, Harvest, All
Horizon: 12 weeks
Total Models: 5 × 4 = 20 forecasts

4

CP-SAT Optimizer & Two-List System

python optimal_visit_optimizer.py --two-list

⚡

Two-Stage Optimization

Stage 1: Geographic clustering, Stage 2: CP-SAT selection

Cluster Radius: 3 miles (same-stop: 0.5 mi)
Solver: Google OR-Tools CP-SAT
Decision Var: x[c] ∈ {0,1} per cluster
Objective: Max expected referral value

🚗

Driving Budget Constraints

Realistic physical visit limits based on driving

Weekly Budget: 125 miles total
Daily Limit: 25 miles/day × 5 days
Physical Visits: ~30/week (10 per clinic)
Per-Clinic: Addicks 75mi, Cinco 65mi, Harvest 60mi

📋

Two-List Output

Separates achievable visits from phone outreach

Physical: ACCEL/REACT/CULTIVATE (in budget)
Phone: MAINTAIN + overflow practices
Piggyback: MAINTAIN near route → physical
Files: physical_visit_schedule.csv, phone_outreach_list.csv

📊

Category Constraints

Balance priority categories in schedule

ACCELERATE: ≥15% (high ROI, close to magic 7)
REACTIVATE: ≥5% (prevent relationship decay)
CULTIVATE: ≥10% (pipeline building)
MAINTAIN: ≤35% (phone calls OK)

🔄

Coverage Rotation (12-Week)

Minimum cooldown between visits to same practice

ACCELERATE: 2-week gap → 6 visits in 12 wks
CULTIVATE: 2-week gap → 6 visits (2×/month)
MAINTAIN: 10-week gap → 1 visit (quarterly)
Goal: Reach magic 7 in 3-4 months

🧮

Expected Value Formula

Per-practice scoring using TFT + volume proxies

EV = volume × conv_prob × specialty_rate
× category_weight × proximity_bonus
Proximity: +50% if 1-3 visits from magic 7
Underutilized Org: +15-50% boost

5

Closed-Loop Forecasting

dagster job execute -f repository.py -j neo4j_forecast_lift

🔄

TFT Re-run with Schedule

Re-run TFT using CP-SAT schedule as covariates

Scenario: cpsat_optimized
Input: cpsat_schedule_export.csv
Output: What-if referral forecasts

📈

Forecast Lift Analysis

Compare baseline vs optimized scenarios

Metrics: Lift %, ROI, CI bands
Breakdown: By clinic & category
Toggle: 90% confidence intervals

6

Comprehensive Reports

dagster job execute -f repository.py -j neo4j_comprehensive_reports

📑

Executive Summary

Key metrics, trends, and recommendations

Sections: 7 executive views
Charts: Funnel, KPI cards
Audience: Leadership

📊

Visits Analysis

Priority timeline, weekly breakdowns

Charts: Stacked bar, timeline
Metrics: By category & clinic
Trend: 52-week history

🔮

Forecasting & ML

Interactive model comparison charts

Traditional: LSTM, Prophet, Darts
Optimization: Current, Optimal, TFT Optimized
Toggles: All / Traditional / Optimization

🗺️

Interactive Map

Geographic visualization layers

Layers: Future Docs, Priority, Sources
Radius: 5/10/15 mile circles
Export: Standalone HTML

Pipeline Performance