Missing Data Strategy
Two-phase privacy-first missing data consultation: AI first generates a study-design-aware profiling script (including MAR diagnostics and Little's MCAR test) for you to run locally, then delivers a complete handling strategy based on the anonymized output — raw data never leaves your device.
- Fill in your study design, dataset description, and missing data patterns (which variables, estimated %).
- Click AI Run — the AI generates a profiling script tailored to your data (R or Python).
- Run the script locally on your dataset and paste the output back into the chat.
- Receive your complete missing data strategy: recommended method, implementation code, Methods section draft, and sensitivity analysis plan.
What this tool does
A two-phase missing data consultant for clinical researchers. Phase 1: the AI generates a study-design-aware profiling script covering variable types, missing rates, MAR correlation diagnostics, and design-specific checks (RCT arm balance, longitudinal dropout, outcome-stratified missingness). Phase 2: the AI diagnoses the missingness mechanism and delivers a complete strategy — not just a method recommendation, but a full decision with assumptions, implementation code, and sensitivity analysis.
Missing data mechanism reference
| Mechanism | What it means | Profiler signal | Typical approach |
|---|---|---|---|
| MCAR | Missingness independent of all variables | Little's test p > 0.05, no correlation pattern | Complete case or simple imputation |
| MAR | Missingness depends on observed variables | Correlation signal detected | Multiple imputation (mice/Amelia) or IPW |
| MNAR | Missingness depends on the missing value itself | No signal, but clinical logic suggests pattern | Pattern-mixture models; must disclose assumptions |
Key built-in safeguards
Binary/categorical/count variables are never mean- or median-imputed. MNAR data cannot be silently imputed — the AI must list assumptions for the researcher to decide. If any key variable exceeds 20% missing, a mandatory Impact Report is generated showing 2–3 approaches with explicit trade-offs. All clarification questions are batched into one message (no one-by-one interrogation).
What this tool will not do
It generates imputation code for you to run — it does not impute data directly. MNAR cannot be confirmed from observed data alone; the AI flags it but cannot prove it. It does not replace statistician judgment for complex regulatory submissions.
Data privacy
The profiling script runs entirely on your local machine. Only summary statistics are pasted into the conversation — raw patient data never leaves your computer.