SPEAKERS:
Dr William McCully, Co-Founder & Director, AITHENA
Dr Hank Du, Co-Founder & Director, AITHENA
KEY TAKEAWAYS:
- Human MLR reviewers detect as few as 37% of errors on their worst-performing days
- Three in four recent UK pharma enforcement cases trace to failures Medical, Legal and Regulatory (MLR) review should have caught
- AI achieves 95% average sensitivity but over-flags; human contextual precision remains essential
- Content volume demands expected to triple against infrastructure already at its cognitive ceiling
- Human-AI collaboration simultaneously delivers peak review performance and neutralises adoption resistance
The Safety Net Is the Source
"In our experiment we saw 91% worst case sensitivity versus in human reviewers 37% on those days. That's when errors get slipped through the net and that is when problems are caught further down the stream and the materials get kicked back and the cycle begins again. And at worst a complaint is raised," says Dr Hank Du.
On a bad day, a human MLR reviewer misses nearly two in three errors that pass in front of them. The materials clear the process. They reach HCPs. Some of them reach patients.
The downstream consequence is not hypothetical. As Dr William McCully argued in the same session, "nearly 75% of those cases are due to things that should have been picked up in the MLR process. Mandatory information that either was wrong or was missing, claims that were not substantiated or misleading. It's hard to believe that in 2026 we are still seeing cases of material that goes out the door, that's prescribing information that's out of date."
These are not two separate problems. The 37% figure is the mechanism producing the 75% figure. Pharma has been managing MLR as a speed and cost constraint. The evidence reframes it as a compounding compliance liability, one that worsens automatically as content volume grows.
The Liability Hidden Inside the Process
The 15-day average approval time is the figure most commonly cited when MLR performance comes up in executive discussions. McCully argued that it systematically obscures the operational reality: "The average time to approve a piece of content is 15 days. But that doesn't tell the whole story. Many of these assets take much, much longer. Websites, detail aids, videos. It can take up to two, maybe even three months to approve. And a lot of that time is actually dead time."
The long tail matters strategically, not just operationally. The assets that take two to three months to clear; websites, detail aids and promotional videos, are precisely the assets with the highest HCP and patient visibility. Extended approval cycles mean either that outdated materials stay in market while updated versions queue, or that launches stall while reviews run. Both outcomes carry regulatory exposure; the first is harder to detect until enforcement arrives.
McCully estimated that a mid-sized affiliate spends £1 to £2 million annually on MLR process costs alone, before any outsourcing to third-party review organisations. The industry is allocating significant budget to a system that, on the evidence presented, misses the majority of errors under peak volume pressure. That is not a poor return on investment. It is an inverted one.
"Six months ago on a similar stage in London, I was doing a similar talk and in the morning during breakfast, I went to the top 10 pharmaceutical websites on the product pages. I found that six out of 10 of those pages were down for maintenance. Fast forward six months, it's still six out of 10."
The unchanged finding over six months shifts this from anecdote to diagnosis. These companies were presumably aware of the content governance failure at the first audit. They had not resolved it by the second. That persistence suggests the problem is not episodic but endemic, and raises a structural question about whether MLR reform carries sufficient executive sponsorship or whether it remains trapped as a mid-management operational concern, adequately funded but insufficiently prioritised.
The cognitive science behind the 37% figure is relevant here. Performance degradation in sustained attention tasks under volume pressure is well-documented in adjacent fields — radiology screening and financial audit have both confronted versions of this problem. The finding is not evidence of inadequate reviewers; it is a predictable feature of human cognitive limits under load. Hiring additional reviewers delays the ceiling but does not eliminate it. Each new reviewer will eventually trace the same degradation curve when workload scales. The risk is architectural, and it compounds automatically as content demands increase.
What the Controlled Study Reveals and What It Doesn't
AITHENA's controlled study measured two distinct dimensions of review performance: sensitivity (the proportion of real errors detected) and precision (the proportion of flagged items that are genuine errors). The results on each dimension point in different directions.
"The sensitivity of AI system averaged around 95% versus 70% in human reviewers. AI system caught more things. But if you look at precision, humans understand the nuance about an error and understand the context around an error so they are more precise, more accurate." Du's framing establishes a complementary performance profile rather than a simple hierarchy. AI over-flags; humans contextualise. An organisation that deploys AI review without human oversight gains sensitivity at the cost of precision, generating volumes of false positives that consume the reviewer time it was designed to free.
Consistency is the operationally decisive variable, and here the gap is harder to dismiss. McCully's observation that the industry currently expects to "produce three times as much content as we're doing now" describes a system already at its ceiling being asked to absorb a tripling of load. Under those conditions, the gap between AI's 91% worst-case sensitivity and human reviewers' 37% worst case is not merely a quality metric. It is a predictive model for where regulatory enforcement exposure will concentrate as volume scales.
For organisations operating across multiple jurisdictions, or in highly specialised therapeutic areas where treatment guidelines shift rapidly, the precision advantage that human reviewers hold may be wider than this study captures. Contextual judgment of cross-referencing evolving clinical evidence, interpreting ambiguous regulatory language across markets and weighing promotional claims against emerging safety signals, becomes more valuable as content complexity increases, not less. Executives in those environments should read the sensitivity data as confirming the necessity of human review, not as a case for reducing it.
"When human reviewers using AI systems, that's when you get the best quality of reviews." Du's prescriptive conclusion follows directly from the evidence: neither performance profile alone represents the ceiling. The question is not AI versus reviewer but what the combination of both, deployed appropriately, can achieve together.
Discover more on this topic at Pharma Commercial Data & Tech Europe 2026 (4-5 November, London) Europe’s collaborative home for data and tech pioneers. Visit the website here.
Diagnosing the Right Bottleneck
Most organisations approach MLR performance through a single variable: capacity. Add reviewers. Extend review hours. Outsource to a third-party specialist. The investment is real; the improvement is consistently smaller than projected. Du's diagnostic framework explains why.
"There are three key elements in the speed of MLR approval. First, capacity. That is how many reviews you can run at any single one time. Quality is how good is your review, and complexity is how difficult is your content to approve or your campaign. All three elements work together and affect your speed of approval. And if you only take one thing away from this equation today, it will be if in your organization you are only optimizing for one of these three factors, you could be leaving a lot of potential on the table."
The framework also explains a recurring executive frustration with MLR technology investments. Workflow automation tools including parallel routing, digital submission portals and automated briefing templates, address capacity. If quality and complexity remain unchanged, the structural improvement is limited to one-third of the equation. The speed gains are real but bounded. The appropriate diagnostic question is not "How fast is our MLR process?" but "Which of the three variables is currently the binding constraint, and are our investments concentrated against it?"
McCully's headline efficiency finding illustrates what multi-variable optimisation delivers when all three are addressed simultaneously: "Six months ago, we were able to show that using AI in this process can save up to 90% of our approval time. That's 75 hours or 9.4 days per asset." AI improves capacity through parallel review, quality through consistent sensitivity that does not degrade under volume, and complexity handling through rule-based assessment against codified regulatory standards. The 90% figure is not achievable by optimising any single variable alone.
The efficiency data is compelling. Both speakers were candid that the primary barrier to capturing these gains is not technical.
The Dual-Function Strategy
"When I was working in COVID-19, that was the only time I've seen the MLR process work. It was six months where people just really had a vision, worked together really closely as a small team and everything just flew through. Guidelines changed every day, but we were on top of it. But that's not the reality, that's the dream."
McCully's COVID-19 account is useful not as inspiration but as diagnosis. MLR operated at speed when a small team shared a clear mission and an external forcing function compressed the decision cycle. The question is what structural change can replicate that alignment without requiring a crisis to generate it. The evidence points toward removing the volume pressure that produces cognitive degradation, which is precisely what consistent AI sensitivity does.
The adoption barrier is the harder problem. McCully's behavioural observation is precise: "There's a whole bunch of change management that needs to happen. People don't like using new systems, new tools. It's almost a bit like your pension. Everyone should save more for a rainy day and get the benefit later on. But not many people want to do it." The technology is validated. The obstacle is organisational.
The human-AI partnership framing that McCully and Du developed throughout this session serves a dual strategic function they did not explicitly name. It is simultaneously the empirically highest-performing review architecture, the controlled study demonstrates that human-AI collaboration outperforms either alone on the metrics that matter most, and the most effective change management positioning available. Framing AI as a complement to human reviewers rather than a replacement directly neutralises the job displacement anxiety McCully identified as the primary adoption barrier. Executives who recognise this dual function can use the performance data to justify the investment at board level while using the collaboration narrative to secure the internal buy-in that makes implementation viable. These are not separate conversations requiring separate arguments. They are the same argument deployed in two directions.
The compliance risk gap between AI-augmented and human-only review widens with every additional asset pushed through the system. Organisations that move first compound their advantage; organisations that wait compound their exposure.
"The gap between what your team knows and what the HCPs hear is currently measured in weeks if not months. AI can reduce that to days and ultimately what your customer hears is how the patient is treated."
That gap exists because of a process limitation the data suggests is now solvable. The weeks between knowledge and reach are not an inherent feature of pharma marketing. They are a feature of infrastructure that was never designed to carry the load now being placed on it, and the controlled evidence for what replaces it is no longer theoretical. The decision about whether to act on it is not a technical question. It is a governance one, and it belongs on the agenda above the level where MLR has traditionally been managed.
To get you highlights of Pharma 2026 faster, we are using generative AI technology to summarise the transcripts of the sessions. If you have any feedback about the summary, please contact lucy.fisher@thomsonreuters.com.
Discover more on this topic at Pharma Commercial Data & Tech Europe 2026 (4-5 November, London) Europe’s collaborative home for data and tech pioneers. Visit the website here.