27th November 2025
Recently, we spoke with Alicja Rudnicka, lead author of a groundbreaking study on automated retinal image analysis systems (ARIAS), which sets a new standard for transparency and fairness in medical AI evaluation.
Q: What problem were you aiming to address with this research?
Most current evaluations of artificial intelligence (AI) / automated retinal image analysis systems (ARIAS) are carried out by the vendors themselves, often using datasets and testing environments of their own choosing. These single-algorithm, vendor-delivered studies tend to overestimate performance and rarely reflect how systems might behave in real-world clinical settings. Differences in patient populations, image capture systems, and evaluation protocols make it nearly impossible to compare algorithms fairly particularly when key details like pre-selection or preprocessing of images are unclear. This approach limits transparency, comparability, and ultimately, trust in AI-driven healthcare tools.
Q: How did your study take a different approach?
Recognising these shortcomings, this study took a fundamentally different path. It created a vendor-independent evaluation platform using the largest, most ethnically diverse NHS Diabetic Eye Screening dataset in North-East London. Multiple state-of-the-art ARIAS each certified as a medical device (with/pending CE Class IIa) were assessed on the same dataset, under identical computational conditions. This ensured fair, direct, and reproducible comparisons while also testing for algorithmic fairness across diverse population subgroups (e.g. ethnicity and age). The result is a transparent, sustainable model for evaluating medical AI systems that mirrors real-world use.
Q: What are the implications for policy and future practice?
This work sets a new benchmark for independent, population-based evaluation of AI in healthcare. Demonstrating how multi-vendor comparisons can be done impartially and at scale, in the intended healthcare setting. It provides a blueprint for regulators, policymakers, and health systems to demand higher standards of evidence before deployment. It provides a more predictable adoption pathway for manufacturers. Beyond eye screening, the same principles could inform policy frameworks for AI evaluation across other disease areas, helping ensure that future health AI technologies are not only effective and safe for patients but also equitable and trustworthy.
Read the full research paper here: Automated retinal image analysis systems to triage for grading of diabetic retinopathy: a large-scale, open-label, national screening programme in England – The Lancet Digital Health
This study exemplifies the type of rigorous, independent research that CERSI-AI champions. By setting new standards for transparency and fairness in AI evaluation, it helps pave the way for safer, more equitable adoption of digital health technologies.
Want to learn more? Explore our latest projects, policy insights, and collaborative opportunities at CERSI-AI and join us in shaping the future of trustworthy AI in healthcare.
Professor Alicja Rudnicka
Professor Alicja Rudnicka is a Professor in Statistical Epidemiology in the Population Health Research Institute. Professor Rudnicka has been involved in a wide spectrum of epidemiological enquiry including large-scale population-based studies and the application of artificial intelligence (AI) technology for analysing retinal images for risk prediction and disease detection
Professor Adnan Tufail
Professor Adnan Tufail is a consultant ophthalmologist in the Medical Retina Service at Moorfields Eye Hospital, London, with special expertise in medical and inflammatory diseases of the retina and choroid. He is a Professor of Ophthalmology at University College London with extensive clinical and research experience. Professor Tufail is also a highly experienced cataract surgeon with expertise in the complex management of patients with cataract and the above retinal conditions.




