Trial by Algorithm: Enabling Continually-Learning SaMD in Adaptive Clinical Trials ‣ 2026-02-20

The rise of continually-learning SaMD (software as a medical device) challenges traditional trial models and regulatory frameworks, demanding a clear blueprint for safe model updates, pre-specified retraining triggers, and hybrid evidence generation in adaptive clinical trials. This article offers a practical, regulator-ready roadmap—combining risk management, operational controls, and statistical design—to help sponsors and developers run “trial by algorithm” programs that keep patient safety central while enabling learning systems to evolve.

Why continually-learning SaMD needs a new playbook

Unlike locked algorithms, continually-learning SaMD adapts over time as it ingests new data. That ability can improve clinical performance, personalize care, and accelerate innovation—but it also introduces dynamic risk: performance may drift, bias may emerge, or unknown interactions can surface when models update in the real world. Adaptive clinical trials that incorporate model retraining must therefore embed pre-specified rules and oversight mechanisms so updates are transparent, auditable, and safe.

Core regulatory expectations

Transparency and traceability: Regulators expect change control (who changed what, when, and why), versioning, and clear audit trails for data, training pipelines, and model artifacts.
Pre-specified change protocols: Define an Algorithm Change Protocol (ACP) or Similarity Assessment describing allowed update types, triggers, and validation requirements.
Risk-based validation: Higher-risk functions require more extensive prospective evaluation and may need clinical data before deployment.
Post-market surveillance: Continuous monitoring plans and real-world evidence (RWE) strategies must be in place to detect safety signals quickly.

Designing an Algorithm Change Protocol (ACP)

An ACP formalizes how and when a SaMD model may change. It should be part of the trial master file and included in regulatory submissions. Key elements include:

Change taxonomy: Distinguish minor operational updates (e.g., retraining on more labeled examples within the same distribution) from major architecture or label definition changes.
Pre-specified triggers: Define objective triggers for retraining or deployment (see next section).
Validation gates: Quantitative thresholds for safety and efficacy that must be met before a new model is promoted.
Rollback plan: How to revert to a previous model on detection of harm.
Governance: Roles for data scientists, clinicians, quality assurance, and an independent safety board.

Pre-specified retraining triggers: objective and auditable

Retraining should not be ad-hoc. Triggers should be measurable, clinically meaningful, and tied to patient safety or performance goals. Common classes of triggers:

Data distribution shift: Statistical drift in input features or covariate distribution beyond pre-defined bounds (e.g., KL divergence, population shift metrics).
Performance degradation: Drop in sensitivity/specificity, AUC, calibration, or other clinical performance metrics on sentinel or holdout data below pre-specified limits.
New clinical subpopulation: Sufficient influx of previously underrepresented groups that could change model behavior and require retraining or revalidation.
Edge-case accumulation: Discovery of clinically important failure modes or corrected labels that accumulate past a threshold count.
Regulatory or safety events: Any adverse event or complaint that implicates algorithm performance.

How to set thresholds

Use historical simulation, holdout datasets, and worst-case scenario analysis to set thresholds that balance sensitivity to true problems versus false alarms. Incorporate clinical input to ensure metric changes are clinically meaningful, not just statistically detectable.

Operationalizing safe model updates

Operational controls translate policy into practice. The following components are essential:

Versioning and provenance: Immutable artifacts with timestamps, model hashes, data snapshots, and training pipeline logs.
Shadow deployment: Run candidate models in parallel (“shadow mode”) to collect prospective performance data without affecting patient care.
Automated validation pipelines: Continuous integration for models that runs unit tests, fairness checks, clinical metric evaluation, security scans, and simulated rollouts.
Independent review: Pre-deployment sign-off by a cross-functional committee including clinicians and QMS representatives.
Real-time monitoring: Telemetry for input distributions, outputs, and outcome linkage to detect early degradation.

Hybrid evidence generation: combining adaptive trials and RWE

Adaptive clinical trials offer efficient, ethical ways to evaluate both the device and its learning process. Pairing them with hybrid evidence generation—randomized adaptive arms plus RWE—creates continuous learning loops that regulators increasingly accept.

Design options

Bayesian adaptive trials: Use Bayesian updating to allow interim model evaluation and allocation adjustments as evidence accrues.
Platform trials: Evaluate multiple model versions or competing algorithms under a shared control arm, enabling efficient head-to-head comparisons.
Registry-linked surveillance: Link trial participants and general-use deployments to registries for long-term outcomes and safety monitoring.
Pragmatic post-market studies: Use real-world datasets (electronic health records, claims, registries) to assess generalizability and rare adverse events after updates.

Risk management and ethical considerations

Maintain patient safety and trust by embedding risk management, transparency, and informed consent into every phase:

Labeling and user guidance: Document the adaptive behavior of the SaMD, intended use, and known limitations.
Informed consent: Clearly explain to trial participants (and where appropriate, users) that the algorithm may adapt and what safeguards are in place.
Bias and fairness monitoring: Regular subgroup analyses and fairness metrics as part of validation gates.
Cybersecurity and data privacy: Secure pipelines for retraining data and ensure de-identification where needed.

Checklist for sponsors and developers

Draft and include a comprehensive Algorithm Change Protocol in submissions.
Define measurable retraining triggers and validation thresholds using historical simulation.
Implement immutable versioning, shadow deployments, and automated validation pipelines.
Create governance and independent review procedures; document rollbacks and emergency stop criteria.
Plan hybrid evidence generation: adaptive trials + registries + RWE strategies.
Publish labeling and informed consent materials that describe continual learning behavior.

Practical example

Imagine a chest X‑ray triage SaMD deployed in multiple hospitals. Pre-specified triggers might include a 5% absolute drop in sensitivity on sentinel hospitals, a demographic shift where >10% of cases come from a previously underrepresented age group, or accumulation of 50 manually corrected false negatives. When a trigger fires, the ACP requires shadow retraining, clinical validation on an external holdout set, independent safety committee review, and only then a staged rollout with close telemetry and an automatic rollback if safety metrics cross thresholds.

Conclusion

Continually-learning SaMD can deliver substantial clinical benefits when governed by a thoughtful regulatory and operational blueprint: pre-specified retraining triggers, robust validation gates, transparent change control, and hybrid evidence generation form the backbone of safe adaptive clinical trials. By combining rigorous design, cross-functional governance, and ongoing real-world surveillance, sponsors can enable learning systems that improve over time without compromising patient safety.

Ready to operationalize your Algorithm Change Protocol and adaptive trial design? Contact a regulatory expert to map these principles to your product and jurisdiction.