Why the two-phase approach?

The profiling script runs entirely on your local machine. It outputs only aggregate statistics — variable types, missing rates, and correlation signals — never individual patient records. This gives the AI objective MAR diagnostic data (instead of relying on your description) while keeping raw data private. The dynamic script also tailors diagnostics to your study design: RCT scripts check treatment-arm imbalance; longitudinal scripts detect monotone dropout patterns.

I don't have R or Python installed. What do I do?

Tell the AI your software preference in Phase 1. If you have neither, ask the AI to generate a script for whichever is easier to install (R is typically simpler for clinical researchers). Alternatively, describe your data structure manually in Phase 1 and ask the AI to skip script generation and proceed directly to Phase 2 analysis — the strategy quality will be somewhat lower without the objective profiler signals.

How do I paste the profiler output?

Copy everything between (and including) the '=== MISSING DATA PROFILER OUTPUT ===' header and footer that the script prints to your console. Paste it into the same chat window. If you have a clinical hypothesis about why data is missing, add it below the pasted block. The AI automatically recognizes this as the Phase 2 trigger.

Should I provide a clinical hypothesis about why data is missing?

Yes, if you have one — it significantly improves mechanism diagnosis. Clinical context the profiler cannot detect: 'Patients with worse outcomes were less likely to return for follow-up'; 'Lab values were only ordered for high-risk patients'. If you are unsure, leave it blank. The AI will use the MAR correlation signals from the profiler to infer the most likely mechanism and flag uncertainty explicitly.

What is the difference between MCAR, MAR, and MNAR?

MCAR (Missing Completely At Random): The probability of missing data is unrelated to any variable — like randomly spilling coffee on a data sheet. Safe to use complete-case analysis, though with some power loss. MAR (Missing At Random): The probability of missingness depends on observed variables — like patients who live far from the hospital missing more follow-ups. Multiple imputation works well here. MNAR (Missing Not At Random): The missingness is directly related to the unobserved value itself — like patients stopping pain scores because their pain worsened. No standard imputation method is unbiased; pattern-mixture models or sensitivity analyses are needed.

When should I use complete case analysis?

Complete case analysis (listwise deletion) is valid when: (1) data are MCAR, confirmed by Little's MCAR test (p > 0.05) and the profiler showing no MAR correlations |r| > 0.2; AND (2) overall missingness is <5%. Outside these conditions, complete case analysis produces biased estimates and loses statistical power. The AI will explicitly warn you if these conditions are not met.

Why not always use multiple imputation?

Multiple imputation assumes data are MAR. If the mechanism is MNAR, multiple imputation will produce systematically biased results — often worse than complete case analysis, because it imputes plausible-looking but wrong values. Additionally, multiple imputation for binary or ordinal variables requires specialized methods (predictive mean matching, logistic imputation) — mean imputation for these types is always incorrect regardless of mechanism.

What does the MAR diagnostic in the profiler actually detect?

For each variable with >5% missing, the profiler creates a binary missingness indicator (0 = observed, 1 = missing) and computes its correlation with every other observed variable. Correlations |r| > 0.2 suggest that the missingness of that variable is associated with the values of another variable — a classic MAR signal. For example, if BMI_missing correlates strongly with Age (r = 0.35), heavier patients may systematically have missing BMI — which is MAR relative to age. The profiler reports these signals so the AI can cite specific evidence rather than guessing.

Missing Data Strategy -- medPrompt

Usage Guide

Fill in your study design, dataset description, and missing data patterns (which variables, estimated %).
Click AI Run — the AI generates a profiling script tailored to your data (R or Python).
Run the script locally on your dataset and paste the output back into the chat.
Receive your complete missing data strategy: recommended method, implementation code, Methods section draft, and sensitivity analysis plan.

Medical Research Assistant

Fill in variables and run directly with AI

Wiki

What this tool does

A two-phase missing data consultant for clinical researchers. Phase 1: the AI generates a study-design-aware profiling script covering variable types, missing rates, MAR correlation diagnostics, and design-specific checks (RCT arm balance, longitudinal dropout, outcome-stratified missingness). Phase 2: the AI diagnoses the missingness mechanism and delivers a complete strategy — not just a method recommendation, but a full decision with assumptions, implementation code, and sensitivity analysis.

Missing data mechanism reference

Mechanism	What it means	Profiler signal	Typical approach
MCAR	Missingness independent of all variables	Little's test p > 0.05, no correlation pattern	Complete case or simple imputation
MAR	Missingness depends on observed variables	Correlation signal detected	Multiple imputation (mice/Amelia) or IPW
MNAR	Missingness depends on the missing value itself	No signal, but clinical logic suggests pattern	Pattern-mixture models; must disclose assumptions

Key built-in safeguards

Binary/categorical/count variables are never mean- or median-imputed. MNAR data cannot be silently imputed — the AI must list assumptions for the researcher to decide. If any key variable exceeds 20% missing, a mandatory Impact Report is generated showing 2–3 approaches with explicit trade-offs. All clarification questions are batched into one message (no one-by-one interrogation).

What this tool will not do

It generates imputation code for you to run — it does not impute data directly. MNAR cannot be confirmed from observed data alone; the AI flags it but cannot prove it. It does not replace statistician judgment for complex regulatory submissions.

Data privacy

The profiling script runs entirely on your local machine. Only summary statistics are pasted into the conversation — raw patient data never leaves your computer.

FAQ