Research Preview

Monte Carlo Committee Simulation for HTA Prediction

A neurosymbolic framework that simulates multi-panelist deliberation to predict CDA-AMC drug reimbursement recommendations and their associated conditions.

Prospectively validated on 49 recommendations published after model training cutoff.
93.2% accuracy on confident predictions | AUROC 0.817 | Calibrated uncertainty

View Results Methodology

Validated Performance

Temporal external validation on CDA-AMC recommendations (n=49) published after GPT-5 knowledge cutoff, ensuring no data contamination.

93.2%

Accuracy

(on confident predictions)

0.817

AUROC

(vs 0.50 baseline)

0.091

ECE

(calibration error)

86.3%

Conditions

(Hamming accuracy)

Uncertainty-Aware Predictions

The Strength of Mandate metric stratifies predictions into confidence tiers, enabling selective prediction where users can trade coverage for accuracy.

High Mandate

96.8%

accuracy (63% of cases)

Contested

84.6%

accuracy (27% of cases)

Weak Mandate

40.0%

accuracy (10% of cases)

83.3% of errors occurred in Contested or Weak mandate predictions, confirming that uncertainty estimates reliably identify difficult cases.

Neurosymbolic Architecture

Neural components (LLM panelists) perform evidence interpretation while symbolic components (voting rules, convergence criteria) provide calibrated uncertainty.

14 Persona-Conditioned Panelists

7 panelist types representing HTA committee expertise: clinicians, health economists, patient representatives, policy experts, and more.

CLIN HE PAT POL ITC SRCLIN GEN

Monte Carlo Sampling

Multiple deliberation rounds with stochastic sampling (temperature=1.0) generate probability distributions over outcomes.

Median rounds to convergence 15

Maximum rounds 50

Stability threshold 3%

Weighted Voting

Framework panelists (full CDA-AMC deliberative prompts) weighted 2x vs simplified panelists, reflecting structured domain assessment.

Framework weight 2.0

Simplified weight 1.0

Effective committee 21 votes

Conditions Prediction

First prospective prediction of specific reimbursement conditions, not just categorical outcomes. Actionable for formulary negotiation preparation.

86.3%

Hamming Accuracy

Proportion of individual condition categories correctly predicted. On average, 4.3 of 5 categories are correct per submission.

48.8%

Subset Accuracy

Exact match of all 5 categories simultaneously. A strict metric on a 32-class problem (2⁵ combinations).

5-Category Condition Taxonomy

90.2%

Population Restrictions

75.6%

Prescriber/Setting

68.3%

Continuation

97.6%

Economic

100%

Evidence

Per-category accuracy. Continuation Conditions achieved AUROC 0.896, demonstrating strong discriminative ability with sufficient class balance.

Prediction Outcomes

Reimburse

Positive recommendation without conditions. Rare in practice (0% in evaluation period).

RWC

Reimburse with Conditions

Most common outcome (92% of cases). Conditions include population restrictions, price reductions, and prescriber requirements.

DNR

Do Not Reimburse

Negative recommendation. The committee does not support public funding for this drug in this indication.

Temporal External Validation

Unlike most LLM validation studies, our evaluation addresses data contamination concerns. All 49 test recommendations were published after the GPT-5 knowledge cutoff (September 30, 2024), ensuring the system reasons from evidence rather than retrieving cached outcomes.

Calibration set: Oct-Dec 2024 (n=18)

Test set: Jan-Dec 2025 (n=49)

Thresholds frozen before test evaluation

Ground truth from official CDA-AMC data

Research Preview Access

This tool is currently available by invitation only for research collaborators and pharmaceutical sponsors participating in validation studies.

Monte Carlo Committee Simulation is a research tool for forecasting HTA outcomes.
Predictions should inform strategic preparation, not replace formal regulatory processes.