Events Calendar

Mon
Tue
Wed
Thu
Fri
Sat
Sun
M
T
W
T
F
S
S
28
29
30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
28
29
30
31
25th International Conference on Dermatology & Skin Care
2020-04-27 - 2020-04-28    
All Day
About Conference Derma 2020 Derma 2020 welcomes all the attendees, lecturers, patrons and other research expertise from all over the world to 25th International Conference on Dermatology & [...]
Insurance AI and Innovative Tech Virtual
2020-05-27 - 2020-05-28    
All Day
In light of the rapidly evolving impact of COVID-19 globally, we have made the decision to turn Insurance AI and Innovative Tech 2020 into a [...]
Insurance AI and Innovative Tech USA Virtual
2020 has seen the insurance industry change in an unprecedented fashion. What was once viewed as long-term development strategies have now been fast-tracked into today’s [...]
27 May
2020-05-27 - 2020-05-28    
All Day
2020 has seen the insurance industry change in an unprecedented fashion. What was once viewed as long-term development strategies have now been fast-tracked into today’s [...]
Events on 2020-04-27
Articles Latest News

National Standard Unveiled for Scalable, Safe Healthcare AI

EMR Industry

Researchers at Duke University School of Medicine have developed two innovative frameworks to assess the performance, safety, and reliability of large language models in healthcare.

Published in npj Digital Medicine and the Journal of the American Medical Informatics Association (JAMIA), two new studies present a novel approach to ensuring that AI systems used in clinical environments adhere to the highest standards of quality, safety, and accountability.

As large language models become more integrated into healthcare—supporting tasks such as clinical note generation, conversation summarization, and patient communication—health systems face increasing challenges in evaluating these technologies in a rigorous yet scalable way. The Duke University-led research, headed by Chuan Hong, Ph.D., assistant professor in Biostatistics and Bioinformatics, aims to address this critical need.

The study published in npj Digital Medicine introduces SCRIBE, a structured evaluation framework for Ambient Digital Scribing tools. These AI-driven systems are designed to generate clinical documentation by capturing real-time conversations between patients and providers. SCRIBE combines expert clinical review, automated performance scoring, and simulated edge-case testing to assess tools across key metrics such as accuracy, fairness, coherence, and resilience.

“Ambient AI has significant potential to ease documentation burdens for clinicians,” Hong noted. “But careful evaluation is crucial. Without it, there’s a risk of deploying systems that introduce bias, omit vital details, or compromise care quality. SCRIBE is built to safeguard against those risks.”

A second, related study published in JAMIA introduces a complementary framework for evaluating large language models integrated into the Epic electronic medical record system, specifically those used to generate draft responses to patient messages. The study assesses these AI-generated replies by comparing clinician feedback with automated evaluation metrics, focusing on attributes such as clarity, completeness, and safety.

While the models demonstrated strong performance in tone and readability, the study identified notable gaps in response completeness—highlighting the critical need for ongoing evaluation in real-world settings.

“This research helps bridge the gap between cutting-edge algorithms and meaningful clinical application,” said Michael Pencina, Ph.D., Chief Data Scientist at Duke Health and co-author of both studies. “It underscores that responsible AI implementation requires rigorous, ongoing evaluation as part of the technology’s entire life cycle—not just as a final step.”

Together, these two frameworks provide a robust foundation for the responsible integration of AI in healthcare. They equip clinical leaders, developers, and regulators with the tools necessary to evaluate AI models prior to deployment and to continuously monitor their performance—ensuring that these technologies enhance care delivery without compromising patient safety or trust.