AI-Generated Control Arms in Clinical Trials

AI-Generated Control Arms in Clinical Trials: Evidence, Opportunities, and Strategic Considerations

Authored by

Nageatte Ibrahim

Date Released

March 26, 2026

Comments

No Comments

Clinical trials are undergoing structural change. As therapies become more targeted and patient populations more fragmented, the traditional randomized controlled trial (RCT) is increasingly difficult to execute. This is particularly true in oncology, rare diseases, and precision medicine, where eligible patients may number in the dozens rather than the thousands. ¹

In response, regulators, sponsors, and researchers are exploring new trial designs that rely on external or synthetic control arms, comparators built from historical or real-world data instead of newly enrolled patients. ²^,³

These approaches rely on advanced statistical methods, and emerging work is investigating the role of artificial intelligence in their construction.¹^,⁴

Synthetic control arms are no longer purely experimental. They have already supported multiple regulatory decisions, particularly in rare diseases and oncology, although their use remains context-dependent and methodologically sensitive. ²^,⁵

Academic analyses suggest that, under certain conditions, outcomes derived from external control groups can approximate those observed in randomized trials. ⁶

However, the method introduces important scientific, operational, and regulatory considerations. For sponsors, the key question is not simply whether synthetic control arms are viable, but when they are appropriate and how to implement them rigorously. ²

What Is a Synthetic or AI-Generated Control Arm?

In a traditional randomized trial, patients are prospectively assigned to either an experimental treatment or a control arm receiving placebo or standard of care. Randomization balances patient characteristics across groups, allowing investigators to isolate the treatment effect.²

In trials using a synthetic or external control arm, the comparator group is constructed from existing datasets rather than newly enrolled participants. These datasets may include:²

Historical clinical trials
Electronic health records
Disease registries
Claims databases
Other real-world evidence sources

Statistical methods, and in some cases machine-learning approaches—are used to identify patients in these datasets who closely match the characteristics of participants in the experimental arm. ⁴

The outcomes of these matched patients are then used as the comparator for the investigational therapy.⁶

The objective is to estimate treatment effect without enrolling a contemporaneous control group, in settings where randomization may be impractical, unethical, or infeasible, while maintaining a level of scientific rigor acceptable to regulators.²^,⁴

Why the Industry Is Moving Toward Data-Based Control Arms

Randomized controlled trials remain the gold standard for demonstrating efficacy and safety. Yet several structural forces are making traditional RCTs harder to conduct in many therapeutic areas.²

Precision medicine has divided common diseases into numerous molecular subtypes. In oncology, therapies are increasingly targeted to patients with specific genomic alterations. This leads to small, highly specific patient populations that may be geographically dispersed. ¹

At the same time, ethical considerations have intensified. Patients with life-threatening conditions may be unwilling to enroll in trials where they risk receiving placebo or outdated therapies. So, recruitment into control arms can be challenging.²

Rare diseases present an even greater challenge. With patient populations sometimes numbering only a few hundred worldwide, randomized trials may be impractical. In these settings, regulators have shown increasing openness to externally controlled designs. ²^,⁵

Academic analyses confirm this shift. A review of regulatory submissions found that a subset of therapies, particularly in rare diseases and oncology, relied on synthetic or external control evidence. In several cases, this evidence supported either initial approval or label expansion. ⁵

Evidence From Peer-Reviewed Research

The credibility of synthetic control arms depends on whether they can reproduce results similar to those seen in randomized trials. Recent peer-reviewed studies provide mixed but encouraging evidence: ²^,⁶

A 2025 systematic review in oncology found that 6 of 8 comparisons showed similar survival outcomes between real-world external controls and randomized trial control groups.
An analysis of 180 externally controlled trials reported that:
4% used real-world clinical data
2% used prior clinical trial data
Only 1% prespecified their external control methods
Only 8% formally assessed external data quality

These findings highlight both the promise and the methodological variability of externally controlled trials. Reviews in high-impact journals consistently emphasize that the reliability of synthetic controls depends on data quality, matching methods, confounding adjustment, and transparent, prespecified analysis plans. ²^,⁴

How AI and Advanced Analytics Enable Synthetic Controls

Early externally controlled trials relied on relatively simple statistical approaches, such as direct cohort matching or basic regression adjustments. While these methods provided a starting point, they were often limited in their ability to account for the complex, multidimensional differences between patients in clinical trials and those represented in historical or real-world datasets. ⁴

Advances in artificial intelligence and data science have significantly expanded the analytical capabilities available for constructing synthetic control arms. Modern approaches can incorporate large numbers of patient variables simultaneously, including demographic characteristics, disease severity markers, comorbidities, treatment histories, laboratory values, and genomic data. Rather than relying on one-to-one matching based on a small set of characteristics, machine-learning models can identify patterns across hundreds or thousands of variables, creating more nuanced and statistically balanced comparisons. ¹^,⁴

In many cases, these methods are used to estimate the probability that a given patient would have received the investigational therapy or the standard of care. This probability is then used to weight or match patients across datasets, creating a comparator group that more closely resembles the experimental cohort. Bayesian hierarchical models and other advanced statistical frameworks can also be used to combine information from multiple data sources, including clinical trials and real-world evidence, while accounting for uncertainty and heterogeneity. ^3,⁴

Machine learning is particularly valuable in modeling disease progression. Survival models and predictive algorithms can estimate how patients in the synthetic control arm would have fared under standard treatment, even when their clinical trajectories differ from those of trial participants. These models are especially useful in oncology and rare diseases, where patient characteristics and treatment responses can vary widely. ¹

Despite these technological advances, academic literature consistently emphasizes that artificial intelligence does not eliminate the fundamental risks associated with non-randomized comparisons. Model outputs remain highly dependent on the quality, completeness, and representativeness of the underlying data. If important confounding variables are missing or poorly recorded, even the most sophisticated algorithms cannot fully correct for bias. ²^,⁴

For this reason, regulators and methodological experts stress the importance of transparency, prespecified analytical strategies, and rigorous validation. AI-enabled synthetic controls are best viewed as an extension of established causal inference methods rather than a replacement for them. Their credibility depends not only on the algorithms used, but also on careful trial design, high-quality data sources, and clear regulatory alignment. ²^,⁴

Regulatory Experience With Synthetic Control Arms

Regulatory agencies have taken a cautious but increasingly pragmatic stance toward synthetic control evidence.⁵

Both the U.S. Food and Drug Administration and the European Medicines Agency have accepted externally controlled evidence in specific situations, particularly in rare diseases, life-threatening conditions, and settings where randomization is impractical or unethical.⁵

Several regulatory decisions have incorporated such approaches. In some rare diseases, therapies have been approved based on single-arm trials supported by external control data. In oncology, external datasets have also contributed to certain label expansions.⁵

Nevertheless, regulators consistently emphasize that randomized trials remain the preferred standard whenever feasible. Synthetic control arms are generally considered acceptable only when strong justification exists and when external data are robust and comparable to trial populations.²

Advantages of Synthetic Control Arms

The appeal of externally controlled designs is rooted in a combination of ethical, operational, and economic factors. As drug development shifts toward increasingly specialized therapies, particularly in oncology and rare diseases, traditional randomized control arms can become difficult to implement.²

One potential advantage is the ability to reduce the number of patients required for a trial. In rare diseases or highly stratified oncology populations, eligible patients may be extremely limited, and external controls may enable comparative analyses without enrolling large numbers of control patients.²

Externally controlled designs may also accelerate trial timelines by reducing or eliminating the need for a contemporaneous control cohort, although this effect varies by study.²

Ethical considerations can also play a role. In severe or progressive diseases, some patients may be reluctant to enroll in trials where they could receive placebo or outdated therapies, and external controls may help address these concerns.²

From an operational perspective, smaller trials typically translate into lower development costs. Fewer enrolled patients mean reduced site management, monitoring, and data collection expenses. In some cases, this can substantially decrease the overall cost of bringing a therapy to market. ²

These advantages have made synthetic control arms particularly relevant in rare diseases, precision oncology, and other settings where traditional randomized trials face significant constraints. However, these benefits must be weighed against the scientific and regulatory challenges inherent in non-randomized comparisons.⁴

Major Scientific and Operational Concerns

Despite their potential, synthetic control arms present significant challenges. The most important concern is bias. Without randomization, treated and control patients may differ in subtle but important ways, and these differences can influence outcomes. ²

Academic studies highlight several forms of bias that may occur, including selection bias, survivor bias, and measurement bias resulting from differences between clinical trial data and real-world datasets. These biases can exaggerate or obscure treatment effects if not carefully controlled. ²

Data quality is another critical issue. Real-world datasets often contain missing values, inconsistent variable definitions, and irregular follow-up intervals. These limitations complicate matching and outcome comparisons. ⁴

A recent academic analysis found that only a small fraction of externally controlled trials formally assessed the quality of their external data sources. This suggests that methodological rigor varies widely across studies and underscores the need for standardized approaches ²

Key Advantages and Risks at a Glance

Advantages ²
Reduced sample size requirements
Faster recruitment and trial completion
Improved patient willingness to enroll
Feasibility in rare or highly stratified diseases
Lower operational costs

Key risks ²^,⁴
Confounding and selection bias
Incomplete or inconsistent real-world data
Lack of methodological standardization
Regulatory uncertainty in some indications

Where Synthetic Control Arms Are Used Most

The use of external control arms is concentrated in specific therapeutic areas. ²

Oncology is the most active field, driven by the rise of precision medicine and molecularly defined patient subgroups. ²

Rare diseases represent another major area of adoption, where limited patient populations make randomization difficult. Pediatric and life-threatening conditions also frequently rely on externally controlled designs, particularly when placebo use raises ethical concerns. ²^,⁵

Hybrid Trial Designs: The Emerging Middle Ground

While some trials replace control arms entirely, many sponsors are adopting hybrid approaches. In a hybrid design, a smaller randomized control arm is included, and external real-world data are used to supplement the comparator group. Statistical methods then combine evidence from both sources. ³

This approach preserves the benefits of randomization while reducing sample size requirements. Hybrid designs are particularly useful in low-prevalence diseases, early-phase oncology trials, and situations where partial randomization is feasible. ³

When Synthetic Control Arms Are Most Appropriate

Based on academic literature and regulatory experience, synthetic control arms perform best under specific conditions. They are most appropriate when the disease has a well-characterized natural history, standard-of-care outcomes are predictable, high-quality external data are available, and randomization is impractical or unethical. ³^,⁴

They are less suitable in common diseases where large randomized trials are feasible, in situations with heterogeneous or poorly characterized real-world data, or when expected treatment effects are small. ⁴

Strategic Considerations for Sponsors

For sponsors, synthetic control arms offer a powerful but complex tool. Their success depends less on the technology itself and more on the strategic and methodological decisions surrounding it. ²

Sponsors must carefully evaluate whether an externally controlled design is appropriate, assess the quality and relevance of available data sources, and ensure that the proposed approach will be acceptable to regulators. ⁴

Early engagement with regulatory agencies is often critical. Authorities typically expect clear justification for the externally controlled design, transparent analytical methods, and evidence that external data are comparable to trial populations. ²

The Future of AI-Enabled Control Arms

Interest in AI-driven synthetic control arms is expected to grow as analytical methods evolve and regulatory guidance develops. Hybrid designs and the integration of multiple real-world data sources may become more common in certain development scenarios, although their use will likely remain context-dependent. ²^, ⁶

Conclusion: A Strategic Complement to Randomization

Synthetic control arms represent one of the most significant methodological innovations in modern clinical development. They offer clear advantages in rare diseases, precision oncology, and ethically complex settings. ²

Evidence from high-impact academic journals suggests that, when carefully designed, externally controlled trials can produce results comparable to randomized controls. ⁶

However, concerns about bias, data quality, and methodological rigor remain central. ²^,⁴

For sponsors, the challenge is not simply adopting synthetic control arms, but deploying them strategically and responsibly. Success depends on high-quality data, robust analytical methods, early regulatory alignment, and experienced clinical development strategy. ²

When these elements are in place, AI-enabled control arms can accelerate development while maintaining scientific credibility. ⁴

Key Takeaways

External or synthetic control arms use historical clinical trial data or real-world datasets (such as electronic health records, registries, or claims data) instead of newly enrolled control patients.
In a study of 180 externally controlled trials, 4% of external control arms were derived from real-world clinical data, and 37.2% were derived from prior clinical trial datasets.
The same analysis found that only 16.1% of trials prespecified external control methods, and only 7.8% formally assessed the quality of external data before analysis.
A systematic review in oncology reported that 6 of 8 studies showed similar survival outcomes between real-world external control arms and randomized trial control groups.
Randomized controlled trials remain the preferred standard for demonstrating treatment efficacy, and external control arms are generally used when randomization is difficult or impractical, such as in rare diseases or highly stratified oncology populations.
Academic literature consistently reports that non-randomized comparisons carry risks of bias, including selection bias, confounding, and data quality limitations, which must be addressed through rigorous study design and analysis.

About Arc Nouvel Clinical Development Consulting

Arc Nouvel Clinical Development Consulting LLC is a boutique consulting firm dedicated to delivering comprehensive clinical development solutions for pharmaceutical, biotechnology, and investment clients. We specialize in strategic consulting, operational support, and executive coaching tailored to clinical trial programs from early to late phases. With deep expertise in oncology and a commitment to innovation, quality, and client success, we partner closely with sponsors from concept through completion, bridging scientific rigor with operational excellence.

AI-Generated Control Arms in Clinical Trials: Evidence, Opportunities, and Strategic Considerations