Build the Data Infrastructure Precision Medicine Demands

Pharma Insights Hub

Sign up to our newsletter →

Category Template

Fancy tagline could go here....

Build the Data Infrastructure Precision Medicine Demands

AI drug discovery tools are now open and operational but the infrastructure to run them barely exists

2 speakers onstage at Pharma Barcelona 2026

Content produced using generative AI. Published July 1, 2026

SPEAKERS:
Gary McAllister, CTO Healthcare & Public Sector, Dell Technologies
Hasan Jouni, EMEA Senior Partner Manager, Healthcare & Life Sciences, NVIDIA

KEY TAKEAWAYS:
- NVIDIA BioNeMo enabling AI-driven biology and drug discovery workflows
- Lilly and NVIDIA investing up to $1 billion over five years in talent, infrastructure, compute over five years to support the new AI co-innovation lab
- Petabytes of pharma data remain architecturally unusable for AI training at scale
- The competitive moat in precision medicine is inverting from algorithm access to infrastructure ownership

"We've got petabytes of information available to us," Gary McAllister argued at Pharma 2026, "but we've not managed our data architectures to a point where a lot of this information is useful to us for the purposes of the development of AI."

That indictment is the organizing problem of this moment in pharma AI. Not capability. Not access. Architecture. While the industry has spent the better part of a decade acquiring data, the infrastructure required to make that data usable for AI-driven discovery has not kept pace, and the gap is wider than most strategic planning cycles have accounted for. Dell Technologies has been working directly with healthcare and life sciences organizations to address this — building the data infrastructure, compute environments, and sovereign hosting architectures that approved clinical and research workloads actually require.

The AI layer, enabled by NVIDIA's accelerated computing and software, is further along than many executives realize. Hasan Jouni was unambiguous: the agentic workflows that compress drug discovery from years to weeks are, in his words, "not kind of a vision or anything like that — this is now and this is happening." The paradox shaping the next phase of precision medicine is precisely this inversion: the tools exist, the data exists, and the infrastructure connecting them largely does not.

The Discovery Pipeline That Already Works
The industry's mental model of AI in drug discovery is still largely single-task: one model predicts binding affinity, another screens for ADMET liabilities. That model is already obsolete at the frontier.

Drug discovery has moved to multi-agent orchestration, and the architectural complexity of that shift is easy to understate. "What you're doing is you're training one model, for example, to generate a protein, but you also want to fold that protein as the next step. And then you want to identify a candidate and dock and then evaluate that," Jouni explained. "That is multiple models, there's multiple agents. That is a very complex workflow. That's a big blueprint." The implication is organizational. A single-model deployment can be managed as a point solution. A multi-agent pipeline requires coordinated data flows, API contracts between models, and compute infrastructure capable of sustaining concurrent workloads across the full discovery sequence. Delivering that kind of environment at scale is precisely where Dell's healthcare and life sciences infrastructure practice operates — provisioning and integrating the underlying systems that make these workflows viable in production.

What that pipeline looks like in operational terms is worth examining directly. A researcher prompts an AI agent with a target molecule. The agent conducts an autonomous literature search, identifies drug candidates, generates three-dimensional models, performs molecular docking simulations, and ranks results by binding affinity and predicted toxicity. It then "sends orders to a robotic wet lab where certain assays are performed… and then feeds back to the scientific agent with the results in natural language and presents it back to the user." This is not augmentation of the scientific process. It is automation of the iterative loop that historically consumed the most time in early-stage discovery, and the scientist is no longer executing that loop but supervising it.

NVIDIA's response to the complexity this creates is a creation of reference architectures: reference blueprints providing a structured starting point for specific discovery objectives, available as mostly open-source frameworks any organization can adopt and extend. These blueprints, deployed on purpose-built Dell infrastructure, give organizations a meaningful head-start toward defined discovery objectives. The remaining effort is almost entirely at the infrastructure layer: data harmonization pipelines, compute provisioning, security architecture, and sovereign hosting. The blueprint tells you what to build. Dell's healthcare infrastructure practice helps you build the foundation you build it on.

The scale of institutional commitment to this architecture is no longer speculative. Jouni disclosed that NVIDIA signed a deal with Lilly to build a co-innovation lab that will leverage Lilly's experience discovering, developing and manufacturing medicines with NVIDIA's leadership in AI, accelerated computing and AI infrastructure. The two companies will invest up to $1 billion in talent, infrastructure and compute over five years to support the new AI co-innovation lab. The collaboration will initially focus on creating a continuous learning system that tightly connects Lilly's agentic wet labs with computational dry labs, enabling 24/7 AI-assisted experimentation to support biologists and chemists. This scientist-in-the-loop framework aims to enable experiments, data generation and AI model development to continuously inform and improve one another.

The Infrastructure Gap Nobody Budgeted For
The pharma industry's relationship with data infrastructure has a specific historical failure mode. The 2010s investment thesis was volumetric: acquire more data, build larger data lakes, migrate to cloud at scale. Those investments solved a storage and retrieval problem. They did not solve the problem that AI-driven discovery actually presents.

McAllister characterized the actual data environment with precision: "We're now working in multimodal data sets, which consist of clinical documentation, raw data sets that come from EMRs and unstructured semi-structured data that comes from genomic labs, wet lab systems, histopathology systems, etc." Each of those data types carries different structure, different provenance, different regulatory handling requirements, and different formatting conventions. A data lake that stores all of them is not the same as a data architecture that harmonizes them into AI-training-ready formats. The former was achievable with 2015-era technology and strategy. The latter requires purpose-built infrastructure designed from the ground up for the specific ingestion and preprocessing demands of foundation model training — and this is where Dell's healthcare and life sciences infrastructure solutions are specifically designed to close the gap.

Even with NVIDIA providing AI reference blueprints that get an organization "at least 30% there" toward a defined discovery objective, the remaining work is almost entirely at the infrastructure layer: harmonizing heterogeneous data sources, provisioning compute at the scale multi-agent workloads require, and ensuring the resulting environment meets both security and sovereignty requirements. The 30% is the part NVIDIA can hand you. The 70% is the part that determines whether you can use it — and it is the part that purpose-built infrastructure, deployed and integrated by partners with deep healthcare domain knowledge, is built to address.

What a purpose-built solution looks like at meaningful scale is illustrated by an NHS initiative McAllister described: "We're creating what we call the NHS Supercomputer, which is a Cambridge-driven initiative from the University of Cambridge to collate health data across the whole of the NHS into a single supercomputer to support the creation of foundation models for cancer and new foundation models for drug discovery." The Cambridge project is notable not because it is large, though it is, but because of what it required conceptually: a decision to treat health data as sovereign infrastructure rather than as an enterprise IT asset. That distinction has organizational and political dimensions extending well beyond procurement. For most pharma companies, the equivalent question is whether their data architecture is designed to serve AI workloads or merely to store the inputs to them. Those are different systems built on different assumptions, and converting one to the other is not a configuration exercise.

Discover more on this topic at Pharma Commercial Data & Tech Europe 2026 (4-5 November, London) Europe’s collaborative home for data and tech pioneers. Visit the website here.

Three Risks That Compound, Not Coexist
"The problem of global uncertainty is that we kind of don't know who to trust anymore, which is a big problem for all of us in this day and age when it comes to where does our data live." McAllister's framing resets the data sovereignty conversation in a way that compliance-oriented discussions rarely do. GDPR and the US Cloud Act established the regulatory perimeter. What McAllister is describing is something different: a geopolitical trust environment in which the question of where data lives is inseparable from the question of whose interests that location serves. For pharma organizations holding genomic data, clinical trial results, and proprietary compound libraries, that question carries competitive and national security dimensions that most enterprise data governance frameworks were not designed to address.

The trust problem does not stop at the data layer. McAllister argued that infrastructure infiltration is an active, present-tense threat: "There is infrastructure infiltration happening across the globe, which means that when you purchase any asset, you need to make sure that you have a full audit and compliance process. That doesn't just start from the moment where infrastructure gets to your organization. It starts from the moment that the infrastructure is actually manufactured." Most pharma IT security architectures treat the delivery point as the security perimeter. Extending that perimeter to the factory floor requires supplier audit capabilities, chain-of-custody documentation, and procurement relationships that most organizations have not developed.

The Moat Has Already Moved
Jouni's characterisation of NVIDIA's AI model strategy is worth understanding in terms of what it means for customers. "All these models that we've developed or helped co-develop, they're all open source, they're all free to use," he acknowledged. "It's not a lock-in thing. It's more of a let's help you accelerate it. We can't solve that problem for you, but we can help you get there faster." The practical implication for pharma organisations is that access to AI models is no longer the constraint. What determines outcomes is the infrastructure built to run them: the compute environment, the data harmonisation pipelines, the security architecture, and the integration work that connects open-source AI capability to proprietary data at scale. Organisations that invest in that infrastructure layer — pairing NVIDIA's accelerated computing and software with purpose-built deployment environments — are the ones positioned to convert model availability into genuine discovery advantage.

The Lilly deal makes this concrete. A billion dollars is not buying Lilly access to AI models, which are available to any organisation with a GPU cluster and a download link. It is buying purpose-built infrastructure to run those models on proprietary data at a scale and speed no shared cloud environment can match. The competitive differentiator Lilly is purchasing is not the algorithm. It is the architecture.

That reframing carries a forward-looking implication worth considering. McAllister observed that "we used to work in the multimillion parameter AI model space, we're now getting into multibillion model parameter AI space. And I think when we're talking about the arms race to AGI, artificial generalized intelligence, we're going to be in the trillion model AI space. And I think a lot of what we need to get to is the trillion parameter models in order to get to a space where we can trust AI." Whether or not that precise trajectory holds, the directional point is clear: the infrastructure demands of precision medicine are not static. Organisations sizing their infrastructure investment for today's workloads risk building for requirements that their procurement cycles will outlive.

Three diagnostic questions follow. Can your current data architecture ingest and harmonize multimodal data — genomic, clinical, imaging, wet lab — into AI-training-ready formats, or does it stop at storage and retrieval? Do you know where your AI compute hardware was manufactured, assembled, and audited before it entered your environment? Is your infrastructure investment thesis sized for the model scale precision medicine will actually require, or for the scale it requires today?

The organisations that answer all three correctly are not necessarily the ones with the largest AI budgets. They are the ones that recognised, early enough to act, that this was always an infrastructure problem wearing an AI problem's clothes.

To get you highlights of Pharma 2026 faster, we are using generative AI technology to summarise the transcripts of the sessions. If you have any feedback about the summary, please contact lucy.fisher@thomsonreuters.com.

Discover more on this topic at Pharma Commercial Data & Tech Europe 2026 (4-5 November, London) Europe’s collaborative home for data and tech pioneers. Visit the website here.

The global collaborative network for leading pharma innovators

Join peers from commercial, marketing, medical affairs, patient engagement, market access, RWE, digital health, IT and more, as we create conversations to deliver unprecedented value for all healthcare stakeholders.

Visit Pharma 2025 here

Visit the website here

Latest whitepaper

Title

Body

download whitepaper here

The status quo and beyond: A new launch personalization paradigm banner

The global collaborative network for leading pharma innovators

Join peers from commercial, marketing, medical affairs, patient engagement, market access, RWE, digital health, IT and more, as we create conversations to deliver unprecedented value for all healthcare stakeholders.

Visit Pharma 2025 here

Pharma Customer Engagement Europe banner

Visit the website here

Join 1000s who read our Pharma Insights monthly newsletters

Sign up here

Contact Us

5 Canada Square, Canary Wharf, London, E14 5AQ
United Kingdom

+44(0) 207 375 7500

+44 (0) 207 375 7576

Legal

Terms of Use

Event Terms & Conditions

Privacy Policy