AI in Drug Discovery: Where the Real Progress Is Happening
Every major pharmaceutical company now has an AI drug discovery initiative. Most of them are quietly struggling. But a few are generating genuinely interesting results, and the pattern of what works (and what doesn’t) is becoming clearer.
I’ve been tracking this space for four years. The hype in 2021 was enormous - AI would revolutionize drug development, cutting timelines from a decade to months. The reality has been more nuanced, but that doesn’t mean the technology isn’t valuable. It means we’re past the hype cycle and into the harder work of actual deployment.
What’s Actually Working
Let me start with the wins, because they’re real.
Protein structure prediction is the clearest success story. AlphaFold2 and its successors have fundamentally changed structural biology. Problems that took months of lab work can now be solved computationally in hours. This doesn’t directly create drugs, but it accelerates early-stage research significantly.
Insilico Medicine got a molecule into Phase 2 trials that was designed with AI assistance. The company claims their approach reduced the time from target identification to candidate selection from years to months. That’s a single data point, and Phase 2 is still early - most drugs fail in Phase 3 or later. But it’s evidence that AI-accelerated discovery can reach human trials.
Virtual screening at scale is the other proven application. Drug discovery traditionally involves testing thousands or millions of compounds to find ones that might work. AI can predict which compounds are worth testing, dramatically reducing the screening needed. This isn’t revolutionary - computational chemistry has done this for decades - but modern AI models are significantly more accurate.
Recursion Pharmaceuticals has built what they call the world’s largest biological dataset, using automated lab systems to generate training data for their AI models. Their approach essentially treats drug discovery as a massive pattern-matching problem. They’ve got multiple programs in clinical trials.
The Sobering Reality
Here’s what doesn’t get talked about enough: drug discovery is a pipeline problem, and AI only helps with certain stages.
The typical drug development process looks like this:
- Target identification (what biological mechanism should we target?)
- Hit finding (what molecules might affect that target?)
- Lead optimization (how do we improve those molecules?)
- Preclinical testing (does it work in animal models?)
- Clinical trials (does it work in humans?)
- Regulatory approval
AI has demonstrated value in stages 2 and 3. It can help find and optimize candidate molecules faster than traditional methods. That’s genuinely useful.
But stages 4-6 still take years and cost billions. An AI-discovered molecule still has to go through the same clinical trials as a traditionally discovered one. The biology doesn’t care how the molecule was found.
A senior researcher at a major pharma company put it to me this way: “AI has maybe reduced our discovery timeline by 20-30%. But discovery was only 15% of our total development time. The clinical work still dominates.”
The Data Challenge
Here’s a problem that gets less attention than it should: pharma companies are terrible at data.
Most pharmaceutical research data sits in incompatible formats across different systems, with inconsistent labeling and incomplete metadata. Training AI models requires large, clean, well-annotated datasets. Most pharma companies don’t have that.
The companies making progress - Recursion, Tempus, Insitro - are the ones that built data infrastructure from scratch. They’ve invested heavily in automated lab systems that generate consistent, machine-readable data. They essentially treated the data problem as primary and the AI problem as secondary.
Traditional pharma companies trying to retrofit AI onto existing data infrastructure are struggling. “We’ve spent two years just trying to harmonize our historical datasets,” one R&D director told me. “The AI part was the easy part. The data part was brutal.”
What to Watch
If you’re tracking this space for emerging opportunities, here’s where I’d focus attention:
Generative models for chemistry are improving rapidly. These systems can propose novel molecular structures with desired properties, rather than just screening existing compounds. The latest models can generate molecules that are synthesizable (a major limitation of earlier approaches) and optimize for multiple properties simultaneously.
Multi-omics integration is the next frontier. The most interesting work combines genomics, proteomics, transcriptomics, and clinical data to understand disease at a systems level. This is where AI’s ability to find patterns in high-dimensional data could genuinely provide insights humans would miss.
Clinical trial optimization is underexplored. AI could potentially help design better trials, identify better patient populations, and predict which trials are likely to fail before spending hundreds of millions of dollars. Some early work here, but nothing proven at scale yet.
For Innovation Leaders
If you’re at a pharma company or biotech, the strategic question isn’t whether to use AI - that ship has sailed. The question is where to focus.
My honest advice: don’t try to build everything internally. The leading AI drug discovery platforms have spent hundreds of millions on data infrastructure and model development. Unless you’re willing to match that investment, partnering makes more sense than building.
The decision-makers I talk to who seem to be getting value are the ones treating AI as one tool among many, not a magic solution. They’re identifying specific bottlenecks in their discovery pipeline where AI can help, running rigorous pilots, and scaling what works.
The ones struggling are the ones who set up “AI labs” without clear integration into actual drug programs, or who expected AI to solve problems that require fundamentally new biology.
Biotech AI is real and valuable. It’s just not magic. The companies that understand the difference are the ones generating returns.
For organizations looking to build AI capabilities in this space, firms like Team400 work with biotech and healthcare companies on practical AI implementations - though the data infrastructure work remains the harder challenge regardless of who builds the models.