While the artist Claude Monet’s paintings can be blurry and indistinguishable, a new foundational model of the same name may help bring clarity to other medical artificial intelligence systems.
In a recent paper published in the journal Nature Medicine, a team of researchers at the University of Washington and Stanford University co-led by Allen School professor Su-In Lee introduced a medical concept retriever, or MONET, that can connect images of skin diseases to semantically meaningful medical concept terms. Beyond annotating dermatology images, MONET has the potential to improve transparency and trustworthiness throughout the entire AI development pipeline, from data curation to model development.
“We took a very different approach from current medical AI research, which often focuses on training large medical foundation models with the goal of achieving high performance in diagnostic tasks,” said Allen School Ph.D. student and lead author of the paper Chanwoo Kim, who works with Lee in the AI for bioMedical Sciences (AIMS) Lab. “We leverage these large foundation models’ capabilities to enhance the transparency of existing medical AI models with a focus on explainability.”
Prior to MONET, annotating medical images was a manual process and difficult to do on a large scale. Instead, MONET automates this process by employing an AI technique called contrastive learning, which enables it to generate plain language descriptions of images. The researchers trained MONET on over 100,000 dermatology image-text pairs from PubMed articles and medical textbooks and then had the model score each image based on how well it represents the concept. These medical concepts are “terms that a physician can understand and would use to make a diagnosis such as dome-shaped, asymmetrical or ulcer,” Kim said.
The team found that MONET could accurately annotate concepts across dermatology images as verified by board-certified dermatologists, and it was comparable to other supervised models built on previously concept-annotated dermatology datasets of small size.
These annotations can help researchers detect potential biases in datasets and undesirable behavior within AI systems. The researchers used MONET to audit the International Skin Imaging Collaboration (ISIC) dataset, the largest collection of over 70,000 dermoscopic images commonly used in training dermatology AI models, and found differences in how concepts correlate with being benign or malignant. For example, MONET showed that images of skin lesions where dermatologists placed orange stickers on them were mostly benign, which was not always the case. One explanation is that the orange stickers were often used in pediatric patients who tended to have benign cases, Kim noted.
This insight is crucial for understanding which factors affect the transferability of medical AI models across different sites. Usually, such data auditing at scale is not feasible due to the lack of concept labels.
“In the AI pipeline, MONET works at the entry level, providing a ‘lens’ through which each image can be ‘featurized’ based on available information to map it with relevant language-based features,” Lee said. “This allows MONET to be combined with an existing medical AI development pipeline, including data curation and model development, in a plug-in-play manner.
“You don’t have to worry about it going through a model as it goes right to the data — that’s one way we can make dataset and model auditing more transparent and trustworthy,” Lee continued.
The framework of MONET can also help medical AI model developers create inherently interpretable models. Physicians, in particular, are interested in such models, like concept bottleneck models (CBMs), because it is easy to decipher and understand what factors are influencing the AI’s decisions. However, CBMs are limited because they require concept annotation in the training data which may not always be available; MONET’s automatic annotation has the potential to help build CBMs that were previously impossible.
“While we only focused on a foundation model based on OpenAI’s CLIP model, we expect that this whole idea can be applied to other more advanced large foundation models,” Kim noted. “Nowadays, AI is developing very rapidly but our framework of using large foundation models’ amazing capabilities to improve transparency of medical AI systems will still be applicable.”
This is part of a broader research effort in the AIMS Lab to ensure AI in dermatology and medical imaging is safe, transparent and explainable. Other projects include a new framework for auditing AI-powered medical-image classifiers that can help dermatologists understand how the model determines whether an image depicts melanoma or a benign skin condition. Another paper sheds light on the reasoning process medical image classifiers use to identify patients’ sex from images of skin lesions. Additionally, counterfactual AI prompts have the potential to show how patient data may change based on genetic mutations, treatments or other factors. These research initiatives have potential applications beyond dermatology to other medical specialties, Lee said.
Lee and Kim’s co-authors on the paper include Allen School Ph.D. students Soham Gadgil and Alex DeGrave, Stanford University postdocs Zhuo Ran Cai, M.D. and Jesutofunmi Omiye, M.D., and Roxana Daneshjou, M.D., Ph.D., a faculty member in the Department of Biomedical Data Sciences and in Dermatology at Stanford University.
Read the full paper in Nature Medicine.