Thoughts on Healthcare Markets and Technology

Thoughts on Healthcare Markets and Technology

Clinical Trials Are the New Bottleneck: AI Drug Discovery Has Created an Evidence Infrastructure Crisis

Apr 01, 2026
∙ Paid

Abstract

The core argument: AI has dramatically compressed preclinical drug discovery, but clinical development timelines remain stuck. The bottleneck has shifted from molecule identification to evidence generation. The next durable health-tech companies won’t discover drugs. They’ll prove they work.

Key claims:

- AI-driven structure-based and generative methods have increased preclinical throughput substantially, pushing more candidates into already-strained development pipelines

- Long-run clinical success rates only recently began recovering after decades of decline, per 2025 Nature Communications data – meaning the industry hasn’t solved translational efficiency at scale

- FDA’s 2025 draft guidance on externally controlled trials effectively outlines a technical stack (phenotype normalization, covariate harmonization, temporal alignment, endpoint ontology mapping) that nobody has fully built yet

- A 2025 Nature Medicine TrialTranslator study showed real-world oncology survival is often ~6 months worse than RCT results, and ~1 in 5 real-world patients don’t qualify for phase 3 trials

- 2025 FedECA paper in Nature Communications introduces federated external control arms for distributed settings – a direct blueprint for privacy-preserving comparator networks

- TrialGPT (Nature Communications, 2024) and successor systems suggest patient-trial matching is solvable, but recruitment alone is too narrow a moat without a broader trial-state architecture

- Five infrastructure layers where durable category-defining companies will likely form: comparator infrastructure, phenotype infrastructure, continuous measurement infrastructure, model assurance infrastructure, and adaptive protocol infrastructure

Table of Contents

The paradox nobody talks about out loud

Why this bottleneck exists now

The technical stack academia is quietly building

Regulators are opening the door but raising the bar

What this means for founders, CROs, and venture underwriting

The five infrastructure layers that matter

The paradox nobody talks about out loud

Here’s a thing that should be more disorienting than it is: the AI-in-biopharma narrative has largely convinced the industry, investors, and the trade press that the hard part of drug development is finding good molecules. Structure-based design, generative chemistry, AlphaFold derivatives, target ID from genomic embeddings – all of it has gotten genuinely impressive. Not vaporware impressive. Actually impressive. Some of these tools are running live in discovery programs at major sponsors right now.

The problem is that making the front end of discovery faster is a little like widening the on-ramp to a highway that’s already gridlocked. You don’t actually get more cars to their destination any faster. You just fill up the backup.

Clinical development is the backup. It takes, on average, somewhere between six and ten years from first-in-human to approval, and the bottleneck during most of that period is not computational. It’s not even primarily biological in the narrow sense. It’s evidentiary. The industry is slow at assembling the kind of regulatory-grade, causally defensible, generalizable evidence packages that the FDA and its international equivalents actually need to say yes with confidence. And as AI accelerates the front end, that evidentiary bottleneck becomes more acute, not less.

This is the paradox that people in clinical development talk about quietly but that rarely makes it into the funding narratives for health-tech companies. Everyone wants to fund the “AI for drug discovery” story because it maps to a familiar venture pattern: scientific insight, clever model, platform play, licensing revenue or acquisition. The evidence generation story is harder to pitch because the product is less romantic. It involves phenotype normalization, external control cohort assembly, federated data governance, digital twin validation frameworks, and adaptive master protocol software. None of that fits on a TED slide.

But that’s exactly where the alpha is.

Why this bottleneck exists now

To understand why this is happening now specifically, it helps to separate two different timelines that are running at very different speeds.

The first timeline is discovery throughput. Over the past five or six years, the combination of structure prediction, generative molecular design, and large-scale biological embedding has genuinely changed the rate at which credible drug candidates can be identified. The number of AI-discovered compounds entering clinical trials – while still small relative to the total industry pipeline – is growing. More importantly, the capital and talent flowing into “AI-native” biotech is enormous, which means the pipeline of candidates heading toward IND filings is expanding.

The second timeline is development infrastructure. This one has barely moved. The FDA’s average review timeline hasn’t compressed dramatically. Phase 2 and phase 3 success rates, while having shown some recent improvement after decades of decline per 2025 Nature Communications analysis, are still deeply sobering – roughly 10 to 15 percent of candidates that enter phase 1 ultimately reach approval depending on the therapeutic area. Enrollment velocity is still plagued by the same problems it was twenty years ago: sites are overextended, patients are hard to identify, eligibility criteria are often written for clean populations that don’t exist in the wild, and sponsors regularly discover late in development that their trial population doesn’t look much like the patients who will actually use the drug if it’s approved.

The structural cause of this divergence is that discovery innovation has been driven by computational methods that are relatively cheap to deploy and iterate, while development innovation requires touching the actual trial infrastructure: sites, IRBs, patient populations, regulatory submissions, comparator data sets, endpoint definitions. That stuff is slow, bureaucratic, and deeply institutional. It’s not the kind of thing you can iterate on quickly with gradient descent.

What makes this moment distinct is that multiple academic and regulatory threads are now converging on a shared diagnosis, and regulators themselves are beginning to acknowledge that the evidence generation stack needs to be rebuilt rather than patched. That combination – a growing pipeline of AI-discovered candidates, a strained development infrastructure, and a regulatory environment signaling conditional openness to new methodologies – is the setup for a very large business opportunity.

The technical stack academia is quietly building

User's avatar

Continue reading this post for free, courtesy of Special Interest Media.

Or purchase a paid subscription.
© 2026 Thoughts on Healthcare · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture