Hossein Estiri, Director of the Clinical Augmented Intelligence (CLAI) research group and associate professor of medicine at Massachusetts General Hospital | The General Hospital Corporation
+ Pharmaceuticals
Patient Daily | Jan 27, 2026

Mass General Brigham develops autonomous AI tool for detecting cognitive impairment

A research team from Mass General Brigham has developed a fully autonomous artificial intelligence system designed to screen for cognitive impairment using routine clinical notes. The system, which operates without human intervention after deployment, demonstrated 98% specificity in real-world validation tests. The findings were published in npj Digital Medicine.

In addition to the study, the team introduced Pythia, an open-source tool that allows healthcare systems and research institutions to implement autonomous prompt optimization for their own AI screening purposes.

"We didn't build a single AI model - we built a digital clinical team," said Hossein Estiri, PhD, director of the Clinical Augmented Intelligence (CLAI) research group and associate professor of medicine at Massachusetts General Hospital. "This AI system includes five specialized agents that critique each other and refine their reasoning, just like clinicians would in a case conference."

Cognitive impairment is often underdiagnosed due to resource-intensive screening tools and limited patient access. Early detection is becoming more important as new Alzheimer's disease therapies are most effective when administered early.

"By the time many patients receive a formal diagnosis, the optimal treatment window may have closed," said Lidia Moura, MD, PhD, MPH, director of Population Health and the Center for Healthcare Intelligence in the Department of Neurology at Mass General Brigham MGB Neurology Department.

The new AI system uses an open-weight large language model that can be implemented locally within hospital IT infrastructure. It consists of five agents with distinct functions that work together to make and refine clinical determinations by addressing errors and improving accuracy. The agents operate autonomously in an iterative process until performance goals are achieved or convergence is determined. Patient data remains within local servers and is not transmitted externally.

Researchers analyzed over 3,300 clinical notes from 200 anonymized patients at Mass General Brigham. By examining documentation from regular healthcare visits, the system can identify potential cognitive issues that might otherwise go unnoticed.

"Clinical notes contain whispers of cognitive decline that busy clinicians can't systematically surface," said Moura. "This system listens at scale."

When disagreements arose between the AI system and human reviewers, independent experts re-evaluated cases. In 58% of these instances, experts validated the AI's reasoning over initial human assessments.

Errors made by the AI were linked to documentation gaps or missing domain knowledge—particularly when cognitive concerns appeared only on problem lists without supporting narratives or when lacking comprehensive context. While sensitivity reached 91% during balanced testing scenarios, it dropped to 62% under real-world conditions where about one-third of cases were positive; however, specificity remained high at 98%. Researchers highlighted these calibration challenges for transparency and future improvement efforts.

"We're publishing exactly the areas in which AI struggles," said Estiri. "The field needs to stop hiding these calibration challenges if we want clinical AI to be trusted."

Organizations in this story