A new artificial intelligence system called MedVersa has shown potential in analyzing a wide range of medical scans, according to a study published in NEJM AI. Developed as a generalist model, MedVersa was trained on tens of millions of medical images from 91 public datasets, covering various imaging modalities such as X-rays, CT and MRI scans, and including both images and clinical text.
Unlike traditional AI models that are typically designed for specific tasks or data types, MedVersa is structured to interpret different kinds of scans and generate reports within one unified framework. The system uses a large language model (LLM) as an orchestrator to evaluate user requests—such as identifying the location of a tumor—and then selects the appropriate internal vision modules for analysis.
In performance tests, MedVersa was compared with approved specialist AI models and board-certified radiologists across nine imaging tasks. Experts reviewed chest X-ray reports generated by humans, ChatGPT-4o, and MedVersa without knowing their source. The evaluation focused on clinical accuracy and efficiency in report generation.
Results showed that MedVersa’s performance was competitive with existing specialist models in object detection and segmentation tasks. In report generation metrics such as BLEU-4 (measuring text similarity) and RadCliQ (measuring deviation from human reporting), MedVersa scored higher than several other AI models tested. Specifically, its BLEU-4 score was 17.8 compared to MAIRA’s 14.2; its RadCliQ score was 2.71 versus MAIRA’s 3.10.
When compared directly with human experts, MedVersa's reports were considered clinically comparable to those written by radiologists in 64% of cases overall—a figure that rose to 91% for scans showing normal findings. However, for abnormal or complex cases involving intricate pathologies, human-written reports remained preferred by reviewing radiologists.
The study also found that using MedVersa helped doctors complete draft reports more quickly while reducing urgent discrepancies compared to GPT-4o-generated drafts—a reduction of about 20% within the five-to-ten-minute reporting interval.
Researchers highlighted that while recent advances have led to several specialized AI tools being approved for clinical use, these systems often cannot handle multiple data types or adapt easily to complex workflows where patients require holistic evaluation across different imaging methods.
The authors concluded: "The present study reveals that MedVersa represents an important step toward developing a unified clinical assistant rather than relying on traditionally fragmented AI tools." They noted that the model’s architecture allowed it to match or surpass specialist AIs on many tasks while helping streamline radiologist workflows.
However, they cautioned that expert supervision remains necessary: "While MedVersa excelled at routine cases, board-certified radiologists remain preferred for complex, abnormal cases involving intricate pathologies." They also acknowledged ongoing challenges related to generalizing performance across all imaging modalities since some datasets used were limited mainly to segmentation rather than full diagnostic interpretation.
Looking ahead, researchers suggested future generalist medical AIs should be trained on even broader datasets—including genetic information and electronic health records—to better support comprehensive patient care alongside human experts.