Qualifying Large Language Models for Oversight in Patient Safety Governance
Cersi AI Funded
Delivered by network partner: University of York
This University of York project evaluates how large language models can support NHS patient safety governance. It compares LLM reviews with expert oversight, identifies risks, and develops qualification frameworks to inform safe, transparent and scalable regulatory use in non-diagnostic settings.

Mission & Vision
To explore how LLMs can be qualified and safely deployed to support oversight and learning in patient safety governance. The vision is trustworthy, transparent, and scalable AI-assisted oversight, ensuring systems thinking and compassionate engagement while maintaining regulatory integrity.
What is the project about?
This project investigates whether LLMs can automate review of NHS patient safety learning responses using the HSSIB Learning Response Review and Improvement Tool (LRRIT). It compares LLM outputs with expert reviews, identifies risks, and develops a framework for qualification and risk classification in non-diagnostic governance roles.


Why is this project commissioned?
Current oversight processes under the Patient Safety Incident Response Framework (PSIRF) are time-consuming and limited in scale, making it hard to ensure consistent systems-based learning. Regulators lack tools to evaluate AI in reflective governance tasks. This project addresses these gaps by testing LLMs for oversight, assessing their reasoning quality, and defining safe deployment boundaries to inform smarter regulation.
Who are the intended users?
- Primary users: NHS governance teams, patient safety leads, and regulators (e.g., HSSIB, NHS England).
- Secondary stakeholders: Policy makers, AI developers, and academic researchers in regulatory science.
How are we making this project a reality?
The project prepares anonymised PSIRF reports, configures LLMs for review, and develops an evaluation framework comparing AI outputs to expert scoring. Analysis identifies error patterns and governance risks, with outputs translated into regulatory recommendations through stakeholder consultations and policy briefs. Data access, report variability, and model risks are mitigated through secure NHS sharing, standalone LLM setups, and experienced research teams.

Aim &
Objective
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam.
Patient Trials
124
Project Active
Since
2022
Key
Achievemnts
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Utility Scale
Efficency
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Millions
8M
Research &
Results
1320
International
Activities
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Need to know
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Other Projects
December 15, 2025
Optimising SDEs for AI Health Technologies
December 15, 2025
Qualification and Classification of Large Language Models
December 15, 2025
The AI Readiness Checklist
December 15, 2025
Hardian Regulatory Intelligence Platform (HaRi)
Model stress testing using causal generative AI
Cersi AI Funded
Delivered by network partner: Imperial College London
This Imperial College London project uses causal generative AI to stress test medical imaging models with realistic synthetic data. It identifies performance weaknesses across populations and conditions, supporting safer clinical deployment and informing MHRA guidance on AI evaluation.

Mission & Vision
To develop and validate in-silico stress testing methods using causal generative AI for assessing safety and reliability of medical imaging AI systems. The vision is robust, equitable, and transparent AI performance evaluation across diverse populations and imaging conditions, supporting safer clinical deployment and regulatory frameworks.
What is the project about?
This project uses advanced generative AI to create highly realistic synthetic medical images for stress testing AI models, focusing on chest radiograph disease detection and mammographic density prediction. It aims to inform MHRA guidance on AI safety and provide tools for independent performance evaluation.


Why is this project commissioned?
AI adoption is limited by data distribution shifts—variations in demographics or imaging conditions that can cause models to fail. Regulators and providers lack scalable methods for comprehensive testing. This project addresses that gap by creating an in-silico framework to identify weaknesses before deployment and improve regulatory assurance.
Who are the intended users?
- Primary users: Regulators (e.g., MHRA), clinical end-users (radiologists, healthcare providers).
- Secondary stakeholders: AI developers, policymakers, patient representatives, and research communities in regulatory science.
How are we making this project a reality?
Causal generative AI models are retrained on diverse datasets, used to stress-test image classification systems, and analysed statistically to identify vulnerabilities. Outputs and insights are shared in workshops with regulators, clinicians, and patient representatives to shape future regulatory frameworks. Mitigations include multiple generative AI approaches, curated datasets, and experienced research staff.

Aim &
Objective
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam.
Patient Trials
124
Project Active
Since
2022
Key
Achievemnts
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Utility Scale
Efficency
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Millions
8M
Research &
Results
1320
International
Activities
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Need to know
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Other Projects
December 15, 2025
Optimising SDEs for AI Health Technologies
December 15, 2025
Qualification and Classification of Large Language Models
December 15, 2025
The AI Readiness Checklist
December 15, 2025
Hardian Regulatory Intelligence Platform (HaRi)
ARIES GenAI tool producing radiology-style reports in 3 imaging modalities
Cersi AI Funded
Delivered by network partner: University of Glasgow
ARIES-MED evaluates a generative AI tool producing radiology-style reports across CT trauma, CT head and chest X-ray imaging. Led by the University of Glasgow, the project assesses clinical accuracy, safety and acceptability to support safe NHS adoption and regulatory guidance.

Mission & Vision
To generate evidence for the safe, effective, and equitable use of generative AI in emergency medical imaging, ensuring timely and accurate interpretation of scans to improve patient care. The vision is to integrate trustworthy AI into clinical workflows, reducing diagnostic delays and supporting regulatory frameworks for AI as a medical device (SaMD).
What is the project about?
ARIES-MED evaluates a generative AI tool that produces radiology-style reports across trauma CT, chest X-ray, and CT head in emergency departments. AI-generated reports are compared with radiologist and teleradiology outputs to assess accuracy, sensitivity, specificity, and clinical acceptability.


Why is this project commissioned?
Emergency departments face delays in imaging interpretation due to radiologist shortages, particularly out-of-hours. Generative AI could reduce these delays and standardise reporting. However, evidence on its real-world performance across multiple modalities is lacking. This project fills that gap and informs regulatory guidance, risk classification, and post-market surveillance for generative AI tools.
Who are the intended users?
- Primary users: Emergency medicine clinicians and radiologists who rely on imaging reports for decision-making.
- Secondary stakeholders: NHS decision-makers, regulators (MHRA), AI developers, and policy bodies (e.g., DHSC, NICE).
How are we making this project a reality?
The project extracts and de-identifies imaging studies, conducts blinded expert review and evaluation by ED clinicians, performs statistical analysis to validate accuracy and reliability, and translates findings into regulatory insights through white papers and stakeholder briefings. Risks such as reviewer bias or data quality issues are mitigated via blinded randomisation, rater training, Safe Haven approvals, and structured scoring protocols.

Aim &
Objective
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam.
Patient Trials
124
Project Active
Since
2022
Key
Achievemnts
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Utility Scale
Efficency
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Millions
8M
Research &
Results
1320
International
Activities
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Need to know
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Other Projects
December 15, 2025
Optimising SDEs for AI Health Technologies
December 15, 2025
Qualification and Classification of Large Language Models
December 15, 2025
The AI Readiness Checklist
December 15, 2025
Hardian Regulatory Intelligence Platform (HaRi)
Autonomous Reporting of Chest X-Rays- side by side comparison of two CXR tools in Glasgow
Cersi AI Funded
Delivered by network partner: University of Glasgow
This University of Glasgow project compares two autonomous AI chest X-ray tools using 110,000 NHS scans. It evaluates whether AI can safely identify normal images, reduce reporting delays, and inform future NHS regulation and clinical adoption.

Mission & Vision
The project’s mission is to generate high-quality evidence to support the safe, effective, and scalable adoption of autonomous AI for chest X-ray reporting in the NHS. Its vision is to enable risk-based regulatory frameworks that allow AI to autonomously handle low-risk cases, improving efficiency and patient care while maintaining safety and public trust.
What is the project about?
This project evaluates two AI tools—qXR and Annalise Enterprise CXR—using 110,000 chest X-rays from NHS Greater Glasgow and Clyde. It assesses whether these tools can accurately identify normal scans without missing clinically important findings, potentially reducing reporting delays caused by radiologist shortages.


Why is this project commissioned?
Chest X-rays are the most common imaging test in the NHS, yet reporting delays due to workforce shortages pose patient safety risks. AI tools could help triage normal scans, but real-world evidence is limited. NICE has called for more data to inform regulatory guidance. This project addresses that gap and explores whether human review requirements can be safely updated.
Who are the intended users?
- Primary: NHS radiology teams and clinicians benefiting from faster reporting workflows.
- Secondary: Regulators (MHRA, IRMER), NICE, NHS England, AI developers, and patients.
How are we making this project a reality?
The project engages patients and the public in study design, securely prepares and manages the large CXR dataset, and ensures robust governance through ethics approvals and collaboration agreements. AI performance is evaluated against radiologist reviews, including bias analyses, and findings are disseminated through reports, publications, regulatory white papers, and NHS guidance.

Aim &
Objective
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam.
Patient Trials
124
Project Active
Since
2022
Key
Achievemnts
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Utility Scale
Efficency
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Millions
8M
Research &
Results
1320
International
Activities
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Need to know
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Other Projects
December 15, 2025
Optimising SDEs for AI Health Technologies
December 15, 2025
Qualification and Classification of Large Language Models
December 15, 2025
The AI Readiness Checklist
December 15, 2025
Hardian Regulatory Intelligence Platform (HaRi)
CLIO - extension to current Newton's Tree UK project monitoring the risk of ambient scribes
Cersi AI Funded
Delivered by network partner: Newton’s Tree
Aims: To implement and evaluate a federated monitoring platform for AI scribe tools across NHS outpatient settings, enabling monitoring of input quality, AI output drift, and clinician over-reliance

Mission & Vision
The project aligns with CERSI-AI’s vision to ensure AI-enabled healthcare is safe, effective, and equitable. It aims to develop scalable, systematic, lifecycle-oriented oversight frameworks rooted in real-world evidence, supporting the UK’s ambition to be a global leader in regulating and deploying trustworthy AI in healthcare.
What is the project about?
The CLIO (Clinical LLM Intelligent Oversight) extension project expands an InnovateUK study that monitors risks of AI Ambient Scribes—tools that generate clinical notes from doctor–patient conversations. The extension will include three additional NHS Trusts to evaluate audio quality, AI output drift, and clinician reliance, aiming to detect early risks like poor performance or over-reliance and ensure safe adoption of AI scribing tools.


Why is this project commissioned?
AI scribing tools reduce administrative burden but introduce safety risks such as documentation errors, automation bias, and performance drift. Current monitoring mechanisms are inadequate. CLIO addresses this gap by implementing a federated monitoring platform integrated with clinical risk management processes, ensuring these tools remain safe and effective in real-world NHS settings.
Who are the intended users?
- Primary users: NHS clinicians and governance teams using AI scribing tools.
- Secondary stakeholders: Regulators (MHRA), NHS England, AI vendors, and patient groups who benefit from safer AI adoption
How are we making this project a reality?
- The programme uses a coordinated, multi-site approach to enable safe, effective and scalable evaluation of AI technologies in real clinical settings. Sites are first supported to establish the necessary contracting, ethics and governance foundations, ensuring each location is ready to participate securely.
- Once onboarded, sites connect to a shared monitoring platform that integrates local clinical data and captures insights from over 500 patient encounters per site. Independent clinical evaluators apply a standardised review process to generate robust evidence on performance, usability and clinical impact.
- Insights from all sites are brought together through shared dashboards and facilitated workshops, enabling cross-site learning, comparison and the identification of best practices. This ensures that local findings contribute to national understanding and support the wider adoption of safe, effective AI tools.
- Strong governance underpins delivery, supported by flexible contracting, protected evaluator time, strict data anonymisation and oversight from a dedicated steering group.

Aim &
Objective
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam.
Patient Trials
124
Project Active
Since
2022
Key
Achievemnts
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Utility Scale
Efficency
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Millions
8M
Research &
Results
1320
International
Activities
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Need to know
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pulvinar vitae lectus vitae aliquam. Nullam neque dui, semper eu dolor in, ornare faucibus ipsum.
Other Projects
December 15, 2025
Optimising SDEs for AI Health Technologies
December 15, 2025
Qualification and Classification of Large Language Models
December 15, 2025
The AI Readiness Checklist
December 15, 2025





