At the ai-Pulse conference in Paris, three leading biotech innovators revealed the unique challenges of applying artificial intelligence to healthcare and drug discovery, painting a picture of a field where massive potential meets unprecedented technical hurdles.
I had the pleasure of moderating a conversation that included Julia Gimbernat Mayol of InstaDeep, Jean-Philippe Vert of Bioptimus, and Robert Marino of Qubit Pharmaceuticals – each tackling different aspects of scaling AI in healthcare but all facing similar fundamental challenges.
"Biology is a broad term," explained Vert, whose company Bioptimus is building what he calls a "foundation model of biology" that spans from molecules to cells to entire organisms. "We want to train one model to summarize all biology because this is what we need to answer questions like, if you change a molecule, how does it affect the patient?"
This ambitious goal highlights the first major challenge in scaling healthcare AI: data acquisition. While some biological data is publicly available, such as DNA and protein sequences, the most valuable medical data often sits locked away in hospitals and pharmaceutical companies, protected by privacy regulations and commercial interests.
Vert described a delicate dance of partnership-building with healthcare providers, noting that success depends on finding "win-win situations" where both parties benefit. Interestingly, Bioptimus's approach to foundation models actually makes data access easier in some ways. "We don't want the metadata," Vert explained. "We typically negotiate partnerships where we want access to large amounts of images, but we don't need the clinical state of the patients."
InstaDeep's Mayol emphasized the importance of understanding not just the data itself, but how it's generated. Using genomic sequencing as an example, she described how knowing the technical details of data generation helps build better models: "If you want to build a good model, you have to understand the data really well, but understand it to the point where you even know how the technology generated the data."
This deep understanding becomes crucial when dealing with varying levels of data confidence – some genomic regions might have excellent coverage while others are sparsely sampled, affecting how much weight models should give to different data points.
Qubit Pharmaceuticals takes a unique approach to the data challenge. Working in drug discovery, where experimental data is scarce and closely guarded by pharmaceutical companies, they've developed a hybrid system combining quantum computing, AI, and high-performance computing to generate synthetic data.
"When a pharma works on projects, they will generate about 5,000 to 10,000 experimental data points," Marino explained. The challenge? The better their platform performs at predicting which compounds will work, the fewer experimental data points they generate – creating a potential data shortage for their AI models.
Their solution involves using quantum computing to generate highly accurate synthetic data that can train their neural networks. "Quantum computers today are still prototypes," Marino noted, "but every year they go much more than twice computing power increasing."
Beyond data challenges, the panel revealed how scaling AI in healthcare requires rethinking traditional AI architectures. Bioptimus, for instance, has had to develop entirely new model structures to handle different types of biological data. Unlike typical multimodal AI systems that combine different views of the same thing (like text and images describing the same scene), biological data presents a unique challenge.
"How do you train a model jointly on a billion protein sequences and a million tumor images?" Vert asked. "The link is subtle – it's not as if there were two views of the same thing... but there is a real link that exists, which is that the proteins are in the cells of the patients."
InstaDeep addresses scaling challenges through sophisticated engineering approaches, including model sharding (breaking models into parts that run on different machines) and hardware-optimized algorithms. "You can actually adjust the algorithm to the hardware you're going to use," Mayol explained, noting how matching matrix sizes to hardware specifications can significantly speed up model performance.
The final frontier? Getting these sophisticated tools into the hands of users who can benefit from them.
Because Bioptimus is still in its earliest stages, the company is only just starting to confront this challenge. Qubit Pharmaceuticals sidesteps the user adoption challenge entirely by selling drug candidates rather than their platform, letting pharmaceutical companies evaluate results using their existing processes.
However, this issue is paramount for InstaDeep. Mayol said while it's great to build general models, the company is always seeking to identify the high-value use cases.
To uncover that, the company spends a lot of time trying to talk to end users. One of the biggest methods for doing that is via its open source community. The company currently has its models on Hugging Face where they've been downloaded more than 700,000 times.
"We just go out there and ask them what are your challenges running these models," Mayol said. "And it's actually quite surprising. We see things like just basic inference for a computational biologist can sometimes be super challenging. Even though we make it accessible, it is sometimes quite hard for them."