A Three-Arm In-Silico Comparative Effectiveness Study
Background & Rationale
Prior authorization imposes one of the largest administrative burdens in US primary care and is a leading contributor to clinician burnout. Published analyses indicate that more than 93% of Medicare Advantage prior authorization requests are ultimately approved, while 82% of appealed denials are overturned — a pattern suggesting that a substantial fraction of the administrative process does not alter clinical decision-making and is a candidate for automation.
The regulatory environment is converging on the specific FHIR-based standards that constitute Study 1's Arm C. The CMS Interoperability and Prior Authorization Final Rule (CMS-0057-F), published January 2024, requires impacted Medicare Advantage, Medicaid, CHIP, and Qualified Health Plan payers to implement FHIR-based Prior Authorization APIs. Operational provisions began January 2026; full API compliance is required by January 2027. CMS explicitly points to the HL7 Da Vinci implementation guides — Coverage Requirements Discovery, Documentation Templates and Rules, and Prior Authorization Support — as the preferred implementation path. Arm C of this study represents exactly the approach the regulated industry is being mandated to adopt.
Across a structured literature review covering NEJM AI, Lancet Primary Care, JAMA, JAMA Network Open, JAMIA, and adjacent venues through April 2026, no original research has been published evaluating agentic AI for provider-side Medicare prior authorization automation. This is the gap Study 1 is designed to address.
Without rigorous, blinded, board-certified physician adjudication, no comparison between an AI system and a standards-body reference implementation is scientifically valid. The reference standard is the science. — Protocol v2.0, §3.7 Reference Standard Adjudication
Study 1 is designed in tandem with a planned follow-on study (Study 1b) that will extend the comparison to a real human clinic-staff arm under a full human-subjects regulatory pathway. Every methodological decision in this protocol — case construction, adjudication rubric, reference-standard derivation, data storage, and infrastructure — is made so the case bank and findings can be carried forward to Study 1b without re-deriving ground truth or re-adjudicating cases. Adjudicators recruited now are the foundational cohort for both studies.
Methods Overview
Each synthetic case is independently randomized to one of three intervention arms, processed end-to-end by the assigned arm in a fully automated experimental harness, and the output is recorded under blinded human adjudication.
Multi-agent orchestration on the OpenClaw runtime: five specialized agent plugins coordinate criteria retrieval, documentation assembly, clinical reasoning, and submission packet construction. Includes RAG over codified payer exemplar corpora, a human-in-the-loop checkpoint, and chain-of-thought prompting.
claude-haiku-4-5-20251001 · OpenClaw · RAG · 5 agent plugins
Same underlying language model as Arm A, prompted in a single turn to mimic a clinical prior authorization coordinator. No tool use, no retrieval, no multi-agent scaffolding. Isolates the contribution of the agentic architecture itself.
claude-haiku-4-5-20251001 · single-call · frozen prompt
Coverage Requirements Discovery, Documentation Templates and Rules, and Prior Authorization Support reference implementations deployed per the HL7 Da Vinci specifications. Represents the non-AI rule-based approach mandated by CMS-0057-F for 2027 payer compliance.
HL7-DaVinci/CRD · prior-auth · CDS-Library · CQL rules
The case bank consists of approximately 2,340 synthetic Medicare primary-care prior authorization cases generated using the Synthea open-source patient simulation platform, structured as FHIR R4 bundles with realistic clinical documentation. Cases are stratified across four payer exemplars (CMS Fee-for-Service, UnitedHealthcare Medicare Advantage, BCBS Medicare Advantage, and a Regional MA plan) and five CMS service categories (advanced imaging, durable medical equipment, Part B drugs, specialty referral, and other).
Upon completion of reference-standard adjudication, the case bank is frozen and deposited to Zenodo with a permanent DOI before any arm begins processing. This deposit is referenced in the preregistration and manuscript and is a non-negotiable methodological commitment of the protocol.
The Adjudicator's Role
Adjudication is conducted entirely remotely via a secure web interface. Each case is presented with all materials needed to make a determination. Your independent determination establishes the reference-standard label.
Each case displays on a single screen: the synthetic patient record (FHIR bundle rendered as a clinic-style chart), the prior authorization order, the applicable payer coverage criteria (verbatim and codified), and the programmatic initial determination produced by the case-derivation module. No tab switching. No external lookups required.
Classify each case as Should Approve, Should Deny, or Ambiguous / Insufficient Information against the payer's stated coverage criteria. Optionally annotate the specific criterion driving your decision. All ratings are timestamped and locked. You cannot see another adjudicator's ratings.
Cases on which two adjudicators disagree are automatically routed to a third adjudicator for tiebreaking. You are not asked to revise your rating based on what another adjudicator decided — independence is the entire point. Inter-rater reliability (Cohen's κ) is computed per batch. If κ drops below 0.70, a brief recalibration discussion is scheduled before the next batch.
Cases are released in batches of 100. There is no per-session minimum. Log in when convenient, work through as many cases as you wish, and log out. The interface tracks your queue and resumes exactly where you left off. Most adjudicators complete a batch of 100 in 3–5 sittings spread over one to two weeks.
Before primary adjudication begins, all adjudicators independently rate the same 10 pilot cases. The PI computes pairwise κ. If κ ≥ 0.70 across all adjudicator pairs, the rubric is frozen and primary adjudication begins. If κ falls short, discrepant cases are reviewed jointly, the rubric is refined, and calibration repeats.
Adjudication is designed to be realistic for an active clinician. The estimates below are conservative; experienced reviewers of Medicare PA criteria typically accelerate after the first 50 cases.
Eligibility
Required criteria are non-negotiable. Preferred qualifications strengthen your application but are not disqualifying in their absence.
What you receive
We are transparent about what participation involves — and what it offers. No monetary compensation is provided. The benefits below are professional and recognitional.
All adjudicators are named in the supplementary acknowledgments of the primary NEJM AI manuscript. Adjudicators meeting ICMJE criteria (substantial contribution to data acquisition, drafting or revising, final approval, accountability) are eligible for co-authorship — discussed individually with the PI before submission.
The onboarding session and adjudication rubric provide expert-guided exposure to CMS LCD/NCD criteria, UnitedHealthcare MA policy, and BCBS MA coverage standards across imaging, DME, Part B drugs, specialty referrals, and PT/OT/home health.
You see first-hand how agentic AI, an unassisted LLM, and a FHIR-standard rule-based system handle real-world PA complexity — before findings are published anywhere. A substantive, early perspective on technology actively reshaping primary-care workflows.
Upon study completion, the PI provides a formal letter documenting adjudicator service for academic CVs, promotion files, and clinical research portfolios. Useful for institutional research credit.
PACE-AI is a four-study program. Study 1b — a four-arm extension adding human clinic-staff comparators under a full human-subjects regulatory pathway — uses the same adjudication infrastructure. Study 1 adjudicators are prioritized for continued involvement.
The study operates under an IRB protocol (pending approval) and follows ICH-GCP principles for data handling and adjudication. A data-handling agreement is provided before onboarding. Your participation is formally documented and auditable.
Application Process
We aim to complete screening within five business days of application and have all adjudicators trained before the case bank is finalized.
Complete the form below. The PI reviews every application personally — no algorithmic screening.
5–10 minutesIf you meet criteria, you receive a personal email from the PI with a scheduling link for the onboarding call. Otherwise we explain why and, where appropriate, suggest a future opportunity.
Within 5 business daysA brief data-handling agreement is sent for e-signature. The patient records you review are fully synthetic — Synthea-generated, not de-identified real records — but this agreement is a protective formality consistent with GCP standards.
~10 minutes · e-signA video call with the PI covering: the adjudication rubric, payer criteria reference materials, web-interface navigation, and the 10-case calibration run. Inter-rater reliability is computed after calibration.
1 hour · Zoom or Google MeetOnce κ ≥ 0.70 is confirmed, your queue is populated with the first batch of 100 cases. Most adjudicators complete all batches in 2–3 weeks of part-time work.
Months 3–4 of the studyFrequently Asked Questions
No. All cases use fully synthetic patient records generated with the Synthea open-source patient simulation platform. There is no real patient data anywhere in the case bank. The synthetic records are designed to be clinically plausible for Medicare-aged primary-care patients, not to represent any real individual.
No. Your role is to adjudicate the underlying PA cases using clinical judgment and payer coverage criteria — exactly as you would in clinical practice. You are not evaluating AI outputs directly. The three arms are applied separately and blinded to your adjudication.
The protocol specifies board-certified Family Medicine or Internal Medicine, aligning with the Medicare primary-care PA case population. Physicians with dual certification or a closely related primary board (e.g., geriatrics or general internal medicine subspecialties with FM/IM primary boards) are encouraged to apply and will be reviewed individually.
Disagreements are expected and built into the protocol. When two adjudicators reach different determinations, the case is automatically flagged for a third-adjudicator tiebreaker. You are never asked to change your rating based on what another adjudicator decided. Independence is the entire scientific point.
Authorship follows ICMJE criteria: substantial contribution to data acquisition or analysis, drafting or critical revision, final approval, and accountability. Adjudicators meeting these criteria are eligible for co-authorship and are discussed individually with the PI before submission. All adjudicators are named in the supplementary acknowledgments regardless of authorship status.
No. The study is funded by a sponsoring nonprofit, and Anthropic API costs for Arms A and B are covered separately. We are transparent about this: what we offer is professional recognition, structured training in Medicare PA criteria, early access to landmark findings, and the opportunity to contribute to research that addresses one of primary care's most pressing administrative burdens.
The calibration threshold exists to ensure adjudication quality, not to screen out good physicians. If κ is below 0.70 after the 10-case pilot, the PI schedules a brief discussion to review discrepant cases and refine the rubric — calibration is then repeated. The rubric is intentionally adjustable at this stage.
Yes. Participation is entirely voluntary. If you withdraw, notify the PI; your completed cases are retained with your consent or removed from the dataset if not. We ask only that you communicate promptly so a replacement adjudicator can be onboarded if needed.
Anthropic API services are used for Arms A and B (the agentic AI arm and the unassisted-LLM arm both use Anthropic's claude-haiku-4-5-20251001). Anthropic has no role in study design, conduct, analysis, or publication decisions. API costs are funded by the sponsoring nonprofit. This is disclosed in full in the manuscript at submission.
Apply
Complete the form below. The PI reviews every application personally and replies within five business days. Thorough answers in the specialty and motivation fields significantly speed up screening.