Single-cell genomics is revolutionizing biomedical research by enabling high-volume analysis of gene expression at the cellular level to understand the origins of disease and identify targets for potential treatment. To accelerate this progress, researchers are increasingly turning their attention to artificial intelligence (AI) tools to analyze these connections at scale. But the size and complexity of the resulting datasets, combined with noise and systematic biases in experimentation, make it difficult to build meaningful AI models from which to derive new biological insights.
Professors Su-In Lee and Sara Mostafavi of the Allen School’s Computational Biology group are working on new solutions to the problem, supported by two competitive grants from the Chan Zuckerberg Initiative’s (CZI) Data Insights program. The program supports the advancement of tools and resources that make it possible to gain greater insights into health and disease from single-cell biology datasets.
Lee directs the University of Washington’s AIMS Lab, shorthand for AI for bioMedical Sciences, where she and her collaborators develop explainable AI techniques for lifting the so-called black box on models to make them more transparent and interpretable in biomedical sciences and clinical settings. Newer deep neural network architectures used in single-cell genomics, such as transformers and graph neural networks (GNNs), are ripe for such tools. While they have been used to good effect by researchers investigating the mechanisms of gene regulation and cell identity in complex tissues across multiple single-cell datasets, how they arrive at their results remains shrouded in mystery.
The CZI Data Insights grant will support a project led by Lee, working in collaboration with professor Jian Ma at Carnegie Mellon University, to fill that void by extending principled XAI methods, such as a new framework for computing Shapley values using a learned explainer model, to transformers and GNNs. The results will enable researchers to understand which features contributed to the models’ predictions — and to what extent.
“There is an urgent need for new, explainable AI techniques that can be applied to complex neural network architectures,” said Lee. “This approach will enable researchers to rigorously interpret these models to enable data-driven biological discoveries in single-cell regulatory genomics for which a “wave” of new datasets is expected and enhance our fundamental understanding of how a cell works.”
A second CZI-funded project led by Mostafavi, working in collaboration with Lee, will support her efforts to develop methods for predicting how cells respond differently to various environmental factors. This direction extends Mostafavi’s previous research into the use of deep neural networks to predict when and how genetic variation between people leads to differences in disease susceptibility.
“Combining recent advances in AI with emerging single-cell datasets is a promising approach for understanding the role of genetic determinants of heritable diseases such as Alzheimer’s and cancer in rare or previously unknown cell populations,” explained Mostafavi, who is principal investigator on the project. “But we need to address issues of accuracy, scalability, and interpretability in the models in order to gain meaningful biological insights.”
Mostafavi and Lee’s awards are among three earned by University of Washington researchers in this latest cycle of CZI Data Insights grants. Allen School adjunct professor William Noble, professor of genome sciences at the UW, is part of a project to develop new computational methods that will significantly improve the quantitative accuracy of single-cell proteomics data.
Learn more about the CZI Data Insights grantees here.