Skip to main content

One medical model to rule them all: AI takes center stage at Allen School’s 2024 Research Showcase

Sheng Wang onstage speaking to a crowd of people seated at round tables with a slide behind him titled Four Paradigms in AI for Medicine

While the Allen School’s annual Research Showcase and Open House highlights both the breadth and depth of computing innovation at the state’s flagship university, the 2024 event at the University of Washington last week had a decidedly AI flavor. From a presentation on advances in AI for medicine, to technical sessions devoted to topics such as safety and sustainability, to the over 100 student research projects featured at the evening poster session, the school’s work to advance the foundations of AI and its ever-expanding range of applications took center stage.

“Medicine is inherently multimodal”

In his luncheon keynote on generative AI for multimodal biomedicine, Allen School professor Sheng Wang shared his recent work towards building foundation models that bring together medical imaging data from multiple sources — such as pathology, X-ray and ultrasound — to assist doctors with diagnosing and treating disease. 

“Medicine is inherently multimodal,” noted Wang. “There are lots of complicated diseases, like diabetes, hypertension, cancer, Alzheimer’s or even Covid…and we will see signals all over the body.”

The ability to capture these signals using multiple imaging modalities requires overcoming a number of challenges. For example, pathology images are too large for existing AI models to analyze in sufficiently detailed resolution — 100,000 by 100,000 pixels, large enough to cover a tennis court. Typically, images encountered by AI models are closer to 256 by 256 pixels which, in keeping with Wang’s analogy, is akin to a single tennis ball.

Sheng Wang gestures as he speaks at a podium displaying the Allen School logo
“In the future the AI model could be like a clinical lab test every doctor can order.” Allen School professor Sheng Wang shares his vision for using generative AI in medical imaging.

To make pathology images more manageable, Wang and his collaborators looked to generative AI. Despite the stark difference in domains, “the challenge or the solution here is very similar to the underlying problem behind ChatGPT,” Wang explained. ChatGPT can understand and summarize long documents; by converting large pathology slide images to a “long sentence” of smaller images, Wang and his colleagues determined AI could then summarize these image-sentences to obtain an overview of a patient’s status. Based on that idea, Wang and his team developed GigaPath, the first foundation model for whole-slide pathology. GigaPath, which achieved state-of-the-art performance on 25 out of 26 tasks, is “one model fits all,” meaning it can be applied to different types of cancer. Since its release, the tool is averaging 200,000 downloads per month.

One task for which AI models typically do not perform well is predicting which treatment to recommend for a particular patient. So Wang and his colleagues borrowed another concept drawn from generative AI, chain-of-thought, which calls for decomposing a complicated task into multiple, small subtasks. The model is then asked to solve those smaller tasks individually on the way to addressing the bigger, more challenging task.

“The question is, how can we apply chain-of-thought to medicine?” Wang asked. “This has never been done before.” The answer is to use clinical guidelines as the chain to instruct a large language model (LLM). By breaking the chain into subtasks such as predicting cancer subtype and patient biomarkers, the LLM then arrives at a prediction of the appropriate treatment.

Yet another challenge is how to apply AI to 3D medical imaging. Here again, Wang and his colleagues achieved a milestone by developing the first 3D OCT foundation model. OCT is short for optical coherence tomography, a type of imaging used to diagnose retinal diseases.

“Our model can comprehensively understand the entire 3D structure to make a diagnosis,” said Wang, who aims to extend this approach to other types of medical 3D imaging, like MRI and CT scans — and eventually, to create one model that can handle everything. This is challenging for even general domain machine learning; the state of the art, CLIP, is limited to two modalities, Wang noted; he wants to build a medical model that can integrate as many as nine.

To overcome the problem, Wang and his fellow researchers drew inspiration from Esperanto, a constructed language that provides a common means of communication among a group of people who speak different languages. They devised an approach, BiomedParse, in which they built one foundation model for each modality, and then projected everything into the medical imaging equivalent of Esperanto — in this case, human language in the form of text from the associated clinical reports — as the common space into which they can project the millions of images, both 2D and 3D, from the different modalities.

But Wang wants to go beyond multi-modal to multi-agent. Using the example of a molecular tumor board, in which multiple experts convene to discuss challenging cases to determine a course of treatment, he suggested that AI models developed for different imaging modalities could help doctors efficiently and accurately determine a treatment plan — analogous to a Microsoft 365 for cancer research. And while some doctors may worry about AI replacing them, Wang’s approach is focused on advancing human-AI collaboration: Medical experts still develop the high-level guidelines for the model, with the AI handling the individual steps.

“In the future the AI model could be like a clinical lab test every doctor can order,” Wang suggested. “The doctor can order an AI test to do a specific task, and then the doctor will make a decision based on the AI output.”

“It’s just really exciting to see all this great work”

The event culminated with the announcement of the recipients of the Madrona Prize, which is selected by local venture capital firm and longtime Allen School supporter Madrona Venture Group to recognize innovative research at the Allen School with commercial potential. Rounding out the evening was the presentation of the People’s Choice Award, which is given to the team with the favorite poster or demo as voted on by attendees during the event — or in this case, their top two.

Managing Director Tim Porter presented the Madrona Prize, which went to one winner and two runners up. Noting that previous honorees have gone on to raise hundreds of millions of dollars and get acquired by the likes of Google and Nvidia, he said, “It’s just really exciting to see all this great work turning into things that have long-term impact on the world through commercial businesses and beyond.”

A group of eight people standing onstage smiling
Award winners and presenters, left to right: Magdalena Balazinska, professor and director of the Allen School; Jon Turow, partner at Madrona Venture Group; Madrona Prize runner-up Vidya Srinivas; Chris Picardo, partner at Madrona Venture Group; Madrona Prize winner Ruotong Wang; Tim Porter, managing director at Madrona Venture Group; People’s Choice winner Chu Li; and professor Shwetak Patel

Madrona Prize winner / Designing AI systems to support team communication in remote work

Allen School Ph.D. student Ruotong Wang accepted Madrona’s top prize for a pair of projects that aim to transform workplace communication  — Meeting Bridges and PaperPing

The Covid-19 pandemic has led to a rise in remote meetings, as well as complaints of “Zoom fatigue” and “collaboration overload.” To help alleviate this negative impact on worker productivity, Wang proposed meeting bridges, or information artifacts that support post-meeting collaboration and help shift work to periods before and after meetings. Based on surveys and interviews with study participants, the team devised a set of design principles for creating effective meeting bridges, such as the incorporation of multiple data types and media formats and the ability to put information into a broader context.

Meanwhile, PaperPing supports researcher productivity in the context of group chats by suggesting papers relevant to their discussion based on social signals from past exchanges, including previous paper citations, comments and emojis. The system is an implementation of Social-RAG, an AI agent workflow based on the concept of retrieval-augmented generation that feeds the context of prior interactions among the group’s members and with the agent itself into a large language model (LLM) to explain its current recommendations.

Additional authors on Meeting Bridges include Allen School alum Lin Qui (B.S. ‘23) and  professor Amy Zhang, as well as Maestro AI co-founder Justin Cranshaw. In addition to Zhang and Qui, Allen School postdoc Xinyi Zhou and Allen Institute for AI’s Joseph Chee Chang and Jonathan Bragg (Ph.D., ‘18) contributed to PaperPing.

Madrona Prize runner up / Interpreting nanopore signals to enable single-molecule protein sequencing

For one of two runners up, Madrona singled out a team of researchers in the Allen School’s Molecular Information Systems Laboratory (MISL) for developing a method for long-range, single-molecule protein sequencing using commercially available nanopore sensing devices from Oxford Nanopore Technologies. Determining protein sequences, or the order that amino acids are arranged within a protein molecule, is key to understanding their role in different biological processes. This technology could help researchers develop medications targeting specific proteins for the treatment of cancer and neurological diseases such as Alzheimer’s.

The research team includes Allen School Ph.D. students Daphne Kontogiorgos-Heintz and Melissa Queen, current Master’s student Sangbeom Yang (B.S., ‘24), former postdoc Keisuke Motone, now a faculty member at Osaka University, and research professor Jeff Nivala; MISL undergraduate researchers Jasmine Wee, Yishu Fang and Kyoko Kurihara, lab manager Gwendolin Roote and research scientist Oren E. Fox; UW Molecular Engineering and Science Institute Ph.D. student Mattias Tolhurst and alum Nicolas Cardozo; and Miten Jain, now a professor of bioengineering and physics at Northeastern University.

Madrona Prize runner up / Knowledge boosting during low-latency inference

Another team of researchers earned accolades for their work on knowledge boosting, a technique for bridging potential communication delays between small AI models running locally on edge devices and larger, remote models to support low-latency applications. This approach can be used to improve the performance of a small model operating on headphones, for example, with the help of a larger model running on a smartphone or in the cloud. Potential uses for the technology include noise cancellation features, augmented reality and virtual reality headsets, and other mobile devices that run AI software locally.

Lead author Vidya Srinivas accepted the award on behalf of the team, which includes fellow Allen School Ph.D. student Tuochao Chen and professor Shyam Gollakota; Malek Itani, a Ph.D. student in the UW Department of Electrical & Computer Engineering; Microsoft Principal Researcher Emre Sefik Eskimez and Director of Research at AssemblyAI Takuya Yoshioka.

People’s Choice Award (tie) / AHA: A vision-language-model for detecting and reasoning over failures in robotic manipulation

Jiafei Duan gestures as he explains the contents of an adjacent research poster to another person
An “AHA” moment: Ph.D. student Jiafei Duan (right) explains his vision-language-model for robotics

Attendees could not decide on a single favorite presentation of the night, leading to a tie for the People’s Choice Award.

While advances in LLMs and vision-language models may have expanded robots’ problem solving, object recognition and spatial reasoning capabilities, they’re lacking when it comes to recognizing and reasoning about failures — which hinders their deployment in dynamic, real-world settings. The research team behind People’s Choice honoree AHA: A Vision-Language-Model for Detecting and Reasoning over Failures in Robotic Manipulation designed an open-source VLM that identifies failures and provides detailed natural-language explanations for those failures. 

“Our work focuses on the reasoning aspect of robotics, often overlooked but essentially especially with the rise of multimodal large language models for robotics,” explained lead author and Allen School Ph.D. student Jiafei Duan. “We explore how robotics could benefit from these models, particularly by providing these models with the capabilities to reason about failures in the robotics execution and hence helped with improving the downstream robotic systems.”

Using a scalable simulation framework for demonstrating failures, the team developed AHA to effectively generalize to a variety of robotic systems, tasks and environments. Duan’s co-authors include Allen School Ph.D. student Yi Ru Wang, alum Wentao Yuan (Ph.D. ‘24) and professors Ranjay Krishna and Dieter Fox; Wilbert Pumacay, a Master’s student at the Universidad Católica San Pablo; Nishanth Kumar, Ph.D. student at the Massachusetts Institute of Technology; Shulin Tian, an undergraduate researcher at Nanyang Technological University; and research scientists Ajay Mandlekar and Yijie Guo of Nvidia.

People’s Choice Award (tie) / AltGeoViz: Facilitating accessible geovisualization

The other People’s Choice Award winner was AltGeoViz, a system that enables screen-reader users to explore geovisualizations by automatically generating alt-text descriptions based on the user’s current map view. While conventional alt-text is static, AltGeoViz dynamically communicates visual information such as viewport boundaries, zoom levels, spatial patterns and other statistics to the user in real time as they navigate the map — inviting them to interact with and learn from the data in ways they previously could not. 

“Coming from an urban planning background, my motivation for pursuing a Ph.D. in human-computer interaction originates from my passion for helping people design better cities,” lead author and Allen School Ph.D. student Chu Li said. “AltGeoViz represents a step towards this goal — by making spatial data visualization accessible to blind and low-vision users, we can enable broader participation in the urban planning process and shape more inclusive environments.”

Li’s co-authors include Allen School Ph.D. students Rock Yuren Pang, Ather Sharif and Arnavi Chheda-Kothary and professors Jeffrey Heer and Jon Froehlich

For more about the Allen School’s 2024 Research Showcase and Open House, read GeekWire’s coverage of the daytime sessions here and the award winners here, and Madrona Venture Group’s announcement here.

Kristine White contributed to this story.

Read more →

Seeing the world through your pet’s eyes: Allen School professor Ben Shapiro receives JLS 2023 Outstanding Paper of the Year Award

A black dog lies in front of a row of toys and picks out a frog toy to play with.
Unsplash/Mathew Coulton

Pets can do more than just provide us with companionship and cuddles. Our love for our pets can improve science education and lead to innovative ways to use augmented reality (AR) to see the world through a canine or feline friend’s eyes.

Portrait of Allen School Professor Ben Shapiro
Ben Shapiro

In a paper titled “Reconfiguring science education through caring human inquiry and design with pets,” a team of researchers led by Allen School professor Ben Shapiro introduced AR tools to help teenage study participants in a virtual summer camp design investigations to understand their pets’ sensory experiences of the world around them and find ways to improve their quality of life. While science and science education typically emphasize a separation between scientists and the phenomena they study, the teens’ experience organizes learning around the framework of naturecultures, which emphasizes peoples’ relationships with non-human subjects in a shared world and encourages practices of perspective-taking and care. The team’s research  shows how these relational practices can instead enhance science and engineering education.

The paper won the 2023 Outstanding Paper of the Year Award from the Journal of the Learning Sciences – the top journal in Shapiro’s field.

“The jumping off point for the project was wondering if those feelings of love and care for your pets could anchor and motivate people to learn more about science. We wondered if learning science in that way could help people to reimagine what science is, or what it should be,” said Shapiro, the co-director of the University of Washington’s Center for Learning, Computing and Imagination. “Then, we wanted to build wearables that let people put on those animal senses and use that as a way into developing greater empathy with their pets and better understanding of how animals experience the shared environment.”

Science begins at home

When the Covid-19 pandemic in 2020 pushed everything online, it was a “surprising positive” for the team’s plan to host a pet-themed science summer camp, Shapiro said. Now, teens could study how their pets’ experiences were shaped by their home environment and how well those environments satisfied pets’ preferences, and researchers could support their learning together with their pets in their homes. Shapiro and the team developed “DoggyVision” and “KittyVision” filters that used red-green colorblindness, diminished visual acuity and reduced brightness to approximate how dogs and cats see. The study participants then designed structured experiments to answer questions such as “what is my pet’s favorite color?” that were guided by the use of the AR filter tools. 

“We wanted to organize student inquiry around the idea of their pets as whole beings with personalities and preferences and whose experiences are shaped by the places they are at. Those places are designed environments, and we wanted youth to think about how those designs serve both humans and non-humans,” Shapiro said. “We drew on prior work in animal-computer interaction to help students develop personality profiles of their pets called ‘pet-sonas.’”

For example, study participant Violet enjoyed buying colorful toys for her dog Billie, however, she found out using the AR filter that Billie could not distinguish between many colors. To see if Billie had a color preference, Violet designed a simple investigation where she placed treats on top of different colored sheets of papers and observed which one Billie chose. Violet learned from using the “DoggyVision” filter that shades of blue appeared bright in contrast to the treats — Billie chose treats off of blue sheets of paper in all three tests. She used the results of her experiments to further her investigations into what kinds of toys Billie would like.

In figure 1.1a, Billie chooses a treat off of a blue piece of paper. Figure 1.2 shows how the treats on colored pieces of paper look through Billie’s eyes using the DoggyVision filter.

“The students were doing legitimate scientific inquiry — but they did so through closeness and care, rather than in a distant and dispassionate way about something they may not care about. They’re doing it together with creatures that are part of their lives, that they have a lot of curiosity about and that they have love for,” Shapiro said. “You don’t do worse science because you root it in passion, love, care and closeness, even if today’s prevailing scientific norms emphasize distance and objectivity.”

Next, Shapiro is looking to explore other ways that pet owners can better understand their dogs. This includes working with a team of undergraduates in the Allen School and UW Department of Human Centered Design & Engineering to design wearables for dogs that give pet owners information about their pet’s anxiety and emotions so they can plan better outings with them.

Priyanka Parekh, a researcher in the Northern Arizona University STEM education and Learning Sciences program, is lead author of the paper. It was also co-authored by University of Colorado Learning Sciences and Human Development professor Joseph Polman and Google researcher Shaun Kane.

Read the full paper in the Journal of the Learning Sciences.

Read more →

A way with words and trillions of tokens: Allen School researchers recognized at ACL 2024 for expanding the capabilities of language models

An artist’s illustration of artificial intelligence (AI). This illustration depicts language models which generate text.
Pexels/Wes Cockx

Would you call your favorite fizzy drink a soda or a pop? Just because you speak the same language, does not mean you speak the same dialect based on variations in vocabulary, pronunciation and grammar. And whatever the language, most models used in artificial intelligence research are far from an open book, making them difficult to study.

At the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) in August, Allen School researchers took home multiple awards for their work to address these challenges. Their research ranged from introducing more dialects into language technology benchmarks to evaluating the reliability and fairness of language models and increasing the transparency and replicability of large language model training as well as evaluations across languages.

Best Social Impact Paper: DialectBench

The benchmarks used in natural language processing (NLP) research and evaluation are often limited to standard language varieties, making them less useful in real-world cases. To address this gap, Allen School researchers introduced DialectBench, the first large-scale NLP benchmark for language varieties that covers 40 different language clusters with 281 varieties across 10 NLP tasks. 

While DialectBench can give researchers a comprehensive overview of the current state of NLPs, it also has the potential to bring more languages under the NLP model in the future.

Portrait of Yulia Tsvetkov with leafy trees in the background
Yulia Tsvetkov

“Language variation like African American or Indian English dialects in NLP is often treated as noise, however in the real world, language variation often reflects regional, social and cultural differences,” said senior author and Allen School professor Yulia Tsvetkov. “We developed a robust framework to evaluate the quality of multilingual models on a wide range of language varieties. We found huge performance disparities between standard languages and their respective varieties, highlighting directions for future NLP research.”

Benchmarking helps researchers track the progress the NLP field has made across various tasks by comparing it to other standard points of reference. However, it is difficult to test the robustness of multilingual models without an established NLP evaluation framework that covers many language clusters, or groups of standard languages alongside its closely related varieties. For DialectBench, the researchers constructed several clusters such as the Hindustani cluster which encapsulated Fiji Hindi and Hindi. Then, they selected tasks that test the model’s linguistic and demographic utilities.

The researchers used DialectBench to report the disparities across standard and non-standard language varieties. For example, they found that the highest-performing varieties were mostly standard high-resource languages, such as English, and a few high resource dialects including Norwegian dialects. On the other hand, the majority of the lowest-performing language variants were also low-resourced language varieties. 

Additional authors of the DialectBench paper include Allen School Ph.D. students Orevaoghene Ahia, co-first author, and Kabir Ahuja; George Mason University Ph.D. student Fahim Faisal, lead author and professor Antonios Anastasopoulos; and University of Notre Dame Ph.D. student Aarohi Srivastava and professor David Chiang.

This was not the only ACL award-winning paper to come out of Tsvetkov’s research group, the TsvetShop. Another paper focusing on improving the reliability of large language models and preventing hallucinations from knowledge gaps won an Outstanding Paper Award and an Area Chair Award in the QA track.

Best Theme and Best Resource Papers: OLMo and Dolma

Two papers from Allen School professors Hanna Hajishirzi and Noah Smith, co-directors of the Open Language Model effort at the Allen Institute for Artificial Intelligence (AI2), along with their collaborators, earned accolades at ACL 2024 for advancing the state of open language models. 

As language models have become more common in commercial products, at the same time, important details about these models’ training data, architectures and development have become hidden behind proprietary interfaces. Without these features, it may be difficult to scientifically study these models’ strengths, weaknesses and their potential biases and risks.

Portrait of Allen School professor Noah Smith
Noah Smith

The researchers built a competitive, truly open language model, OLMo, to help fill this knowledge gap and inspire other scientists’ innovations. Alongside OLMo, the team also released its entire framework from the open training data to evaluation tools. The researchers earned Best Theme Paper at ACL for their work titled “OLMo: Accelerating the Science of Language Models.”

“Language models are a decades-old idea that have recently become the backbone of modern AI. Today the most famous models are built as commercial products by huge tech firms, and many details of their design are closely guarded secrets,” said Smith, the Amazon Professor of Machine Learning in the Allen School. “We launched the OLMo effort as a collaboration between the Allen Institute for AI and the Allen School to create a fully open alternative that scientists could study, because it’s important that we fully understand these artifacts.”

While this paper presents the team’s first release of OLMo, they intend to continue to support and extend the model and its framework, bringing in different model sizes, modalities, datasets and more. Already since OLMo’s original release, the researchers have improved the data and training; for example, the Massive Multitask Language Understanding scores, which measure knowledge acquired during pretraining, went up by 24 points to 52%.

Hajishirzi and Smith’s co-authors on the OLMo paper include Allen School professor Luke Zettlemoyer, postdocs Abhilasha Ravichander and Yanai Elazar, Ph.D. students Ananya Harsh Jha, Hamish Ivison, Ian Magnusson and Yizhong Wang, and alumnus Jacob Morrison (B.S. Computer Science, ‘17/M.S., Computational Linguistics, ‘22), now a researcher at AI2, and Mitchell Wortsman (Ph.D., ‘24), now a member of the technical staff at Anthropic; AI2 researchers Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yuling Gu, Jack Hessel, Tushar Khot, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Jesse Dodge, Kyle Lo and Luca Soldaini; and New York University Ph.D. student William Merrill.

OLMo’s efforts to progress research into language models would not be complete without its counterpart Dolma, an English corpus containing three trillion tokens from web content to scientific papers to public-domain books. 

While there has been progress toward making model parameters more accessible, pretraining datasets, which are fundamental to developing capable language models, are not as open and available. The researchers built and released OLMo’s pretraining dataset “Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research” to help facilitate open research into language models — and earned Best Resource Paper at ACL in the process.

“Even among open models, there are differences in what researchers can work with. With OLMo, we wanted a competitive, strong model whose data was also fully available for inspection,” said Smith. “Dolma is the dataset used to pretrain OLMo. It is extensively documented, and the paper includes analyses and discussion of lessons learned through data curation. We also released open-source data curation tools to enable reproduction and improvement of our work.”

Like with OLMo, this is just the beginning for Dolma. The researchers continue to make advancements as part of follow-on releases that, for example, yield significant performance improvements on downstream tasks.

Additional authors on the Dolma paper include Zettlemoyer, Ravichander, Jha, Elazar, Magnusson, Morrison, Soldaini, Kinney, Bhagia, Schwenk, Atkinson, Authur, Chandu, Dumas, Lambert, Muennighoff, Naik, Nam, Peters, Richardson, Strubell, Subramani, Tafjord, Walsh, Beltagy, Groeneveld and Dodge along with Russell Authur, Ben Bogin, Valentin Hofmann and Xinxi Lyu of AI2; University of California, Berkeley Ph.D. student Li Lucy; Carnegie Mellon University Ph.D. student Aakanksha Naik; and MIT Ph.D. student Zejiang Shen.

Read more →

Allen School researchers showcase speech and audio as the new frontier for human-AI interaction at Interspeech 2024

An overhead view of people sitting a table using laptops, phones, and headphones working on a project.
Unsplash/Marvin Meyer

Trying to work or record interviews in busy and loud cafes may soon be easier thanks to new artificial intelligence models.

A team of University of Washington, Microsoft and AssemblyAI researchers led by Allen School professor Shyam Gollakota, who heads the Mobile Intelligence Lab, built two AI-powered models that can help reduce the noise. By analyzing turn-taking dynamics while people are talking, the team developed the target conversation extraction approach that can single out the main speakers from background audio in a recording. Similar kinds of technology may be difficult to run in real time on smaller devices like headphones, but the researchers also introduced knowledge boosting, a technique whereby a larger model remotely helps with inference for a smaller on-device model.

Portrait of Shyam Gollakota
Shyam Gollakota

The team presented its papers describing both innovations at the Interspeech 2024 Conference in Kos Island, Greece, earlier this month.

“Speech and audio are now the new frontier for human-AI interaction,” said Gollakota, the Washington Research Foundation/Thomas J. Cable Professor in the Allen School.

What did you say?: Target conversation extraction

One of the problems Gollakota and his colleagues sought to solve was how will the AI model know who are the main speakers in an audio recording with lots of background chatter. The researchers trained the neural network using conversation datasets in both English and Mandarin to recognize “the unique characteristics of people talking over each other in conversation,” Gollakota said. Across both language datasets, the researchers found the turn-taking dynamic held up with up to four speakers in conversation.

“If there are other people in the recording who are having a parallel conversation amongst themselves, they don’t follow this temporal pattern,” said lead author and Allen School Ph.D. student Tuochao Chen. “What that means is that there is way more overlap between them and my voice, and I can use that information to create an AI which can extract out who is involved in the conversation with me and remove everyone else.”

While the AI model leverages the turn-taking dynamic, it still preserves any backchannels happening within the conversation. These backchannels are small overlaps that happen when people are talking and showing each other that they are listening, such as laughter or saying “yeah.” Without these backchannels, the recording would not be an authentic representation of the conversation and would lose some of the vocal cues between speakers, Gollakota explained. 

“These cues are extremely important in conversations to understand how the other person is actually reacting,” Gollakota said. “Let’s say I’m having a phone call with you. These backchannel cues where we overlap each other with ‘mhm’ create the cadence of our conversation that we want to preserve.”

The AI model can work on any device that has a microphone and record audio, including laptops and smartphones, without needing any additional hardware, Gollakota noted.

Additional co-authors on the target conversation extraction paper include Malek Itani, a Ph.D. student in the UW Department of Electrical & Computer Engineering, Allen School undergraduate researchers Qirui Wang and Bohan Wu (B.S., ‘24), Microsoft Principal Researcher Sefik Emre Eskimez and Director of Research at AssemblyAI Takuya Yoshioka.

Turning up the power: Knowledge boosting

Target conversation extraction or other AI-enabled software that work in real-time would be difficult to run on smaller devices like headphones due to size and power restraints. Instead, Gollakota and his team introduced knowledge boosting, which can increase the performance of the small model operating on headphones, for example, with the help of a remote model running on a smartphone or in the cloud. Knowledge boosting can potentially be applied to noise cancellation features, augmented reality and virtual reality headsets, or other mobile devices that run AI software locally.

However, because the small model has to feed information to the larger remote model, there is a slight delay in the noise cancellation.

”Imagine that while I’m talking, there is a teacher remotely telling me how to improve my performance through delayed feedback or hints,” said lead author and Allen School Ph.D. student Vidya Srinivas. ”This is how knowledge boosting can improve small models’ performance despite large models not having the latest information.”

To work around the delay, the larger model attempts to predict what is going to happen milliseconds into the future so it can react to it. The larger model is “always looking at things which are 40–50 milliseconds in the past,” Gollakota said.

The larger model’s prediction capabilities open up the door for further research into AI systems that can anticipate and autocomplete what and how someone is speaking, Gollakota noted.

In addition to Gollakota and Srinivas, co-authors on the knowledge boosting paper include Itani, Chen, Eskimez and Yoshioka. 

This is the latest work from Gollakota and his colleagues to advance new AI-enabled audio capabilities, including headphones that allow the wearer to focus on a specific voice in a crowd just by looking at them and a system for selecting which sounds to hear and which ones to cancel out.

Read the full papers on target conversation extraction and knowledge boosting here.

Read more →

Singular sensation: Allen School researchers develop new method for sequencing proteins using nanopores

A zoomed in image of a protein reading
Second Bay Studios

Determining protein sequences, or the order that amino acids are arranged within a protein molecule, is key to understanding their role in different biological processes and diseases. However, current methods for protein sequencing including mass spectrometry are limited and may not be sensitive enough to capture all the varying combinations of molecules in their entirety. 

Jeff Nivala
Jeff Nivala

In a recent paper published in the journal Nature, a team of University of Washington researchers introduced a new approach to long-range, single-molecule protein sequencing using commercially available devices from Oxford Nanopore Technologies (ONT). The team, led by senior author and Allen School research professor Jeff Nivala, demonstrated how to read each protein molecule by pulling it through a nanopore sensor. Nanopore technology uses ionic currents that flow through small nanometer-sized pores within a membrane, enabling the detection of molecules that pass through the pore. This can be done multiple times for the same molecule, increasing the sequencing accuracy. 

The approach has the potential to help researchers gain a clearer picture of what exists at the protein level within living organisms.

“This research is a foundational advance towards the holy grail of being able to determine the sequence of individual full-length proteins,” said Nivala, co-director of the Molecular Information Systems Lab (MISL)

The technique uses a two-step approach. First, an electrophoretic force pushes the target proteins through a CsgG protein nanopore. Then, a molecular motor called a ClpX unfoldase pulls and controls the translocation of the protein back through the nanopore sensor. Giving each protein multiple passes through the sensor helps eliminate the “noise associated with a single read,” Nivala explained. The team is then able to take the average of all the passes to get a more accurate sequencing readout as well as a detailed detection of any amino acid substitutions and post-translational modifications across the long protein strand.

This method differs from mass spectrometry, which does not look at each individual molecule, but takes the average of an ensemble of different proteins to characterize the sample —  potentially losing out on information as each protein can have multiple variations within a cell, Nivala noted.

“One major advantage of nanopore technology is its ability to read individual molecules. However, analyzing these signals at the single-molecule level is challenging because of the variability in the signals, which persist to some extent even after applying normalization and alignment algorithms,” said co-lead author Daphne Kontogiorgos-Heintz, an Allen School Ph.D. student who works with Nivala in the MISL. “This is why I am so excited that we found a method to reread the same molecule multiple times.”

With a more detailed understanding of the protein sequences, this technology can help researchers develop medications that can target specific proteins, tackling cancer and neurological diseases like Alzheimer’s, Nivala explained.

“This will shed light into new diagnostics by having the ability to determine new biomarkers that might be associated with disease that currently we’re not able to to to read,” Nivala said. “It will also develop more opportunities to find new therapeutic targets, because we can find out which proteins could be manifesting the disease and be able to now target those specific variants.”

While the technology can help analyze natural biological proteins, it can also help read synthetic protein molecules. For example, synthetic protein molecules could be designed as data storage devices to record the molecular history of the cell, which would not be possible without the detailed readings from nanopore sensing, Nivala explained. The next step for this research would be working toward increasing the accuracy and resolution to achieve de novo sequencing of single molecule proteins using nanopores, which does not require a reference database.

Nivala and the team were able to conduct this research by modifying ONT technology toward nanopore protein sequencing. 

“This study highlights the remarkable versatility of the Oxford Nanopore sensing platform,” said Lakmal Jayasinghe, the company’s SVP of R&D Biologics. “Beyond its established use in sequencing DNA and RNA, the platform can now be adapted for novel applications such as protein sequencing. With its distinctive features including portability, affordability and real-time data analysis, researchers can delve into proteomics at an unprecedented level by performing sequencing of entire proteins using the nanopore platform. Exciting developments lie ahead for the field of proteomics with this groundbreaking advancement.”

Additional authors include former postdoc Keisuke Motone, current Ph.D. student Melissa Queen and current Master’s student Sangbeom Yang (B.S. ‘24) of the Allen School; MISL undergraduate researchers Jasmine Wee, Yishu Fang and Kyoko Kurihara, lab manager Gwendolin Roote and research scientist Oren E. Fox; UW Molecular Engineering and Science Institute Ph.D. student Mattias Tolhurst and Ph.D. alum Nicolas Cardozo; and Miten Jain, a professor of bioengineering and physics at Northeastern University.

Read the full paper in the journal Nature.

Read more →

MONET helps paint a clearer picture of medical AI systems

A glowing brain emerges from a stack of books flinging pages with dermatology images around.
Ella Maru Studio

While the artist Claude Monet’s paintings can be blurry and indistinguishable, a new foundational model of the same name may help bring clarity to other medical artificial intelligence systems.

In a recent paper published in the journal Nature Medicine, a team of researchers at the University of Washington and Stanford University co-led by Allen School professor Su-In Lee introduced a medical concept retriever, or MONET, that can connect images of skin diseases to semantically meaningful medical concept terms. Beyond annotating dermatology images, MONET has the potential to improve transparency and trustworthiness throughout the entire AI development pipeline, from data curation to model development.

Headshot of Chanwoo Kim
Chanwoo Kim

“We took a very different approach from current medical AI research, which often focuses on training large medical foundation models with the goal of achieving high performance in diagnostic tasks,” said Allen School Ph.D. student and lead author of the paper Chanwoo Kim, who works with Lee in the AI for bioMedical Sciences (AIMS) Lab. “We leverage these large foundation models’ capabilities to enhance the transparency of existing medical AI models with a focus on explainability.”

Prior to MONET, annotating medical images was a manual process and difficult to do on a large scale. Instead, MONET automates this process by employing an AI technique called contrastive learning, which enables it to generate plain language descriptions of images. The researchers trained MONET on over 100,000 dermatology image-text pairs from PubMed articles and medical textbooks and then had the model score each image based on how well it represents the concept. These medical concepts are “terms that a physician can understand and would use to make a diagnosis such as dome-shaped, asymmetrical or ulcer,” Kim said. 

The team found that MONET could accurately annotate concepts across dermatology images as verified by board-certified dermatologists, and it was comparable to other supervised models built on previously concept-annotated dermatology datasets of small size.

These annotations can help researchers detect potential biases in datasets and undesirable behavior within AI systems. The researchers used MONET to audit the International Skin Imaging Collaboration (ISIC) dataset, the largest collection of over 70,000 dermoscopic images commonly used in training dermatology AI models, and found differences in how concepts correlate with being benign or malignant. For example, MONET showed that images of skin lesions where dermatologists placed orange stickers on them were mostly benign, which was not always the case. One explanation is that the orange stickers were often used in pediatric patients who tended to have benign cases, Kim noted. 

This insight is crucial for understanding which factors affect the transferability of medical AI models across different sites. Usually, such data auditing at scale is not feasible due to the lack of concept labels. 

Su-In Lee wearing a black suit seated at a table in front of a whiteboard, holding pen in one hand with a coffee mug and laptop on the table in front of her
Su-In Lee

“In the AI pipeline, MONET works at the entry level, providing a ‘lens’ through which each image can be ‘featurized’ based on available information to map it with relevant language-based features,” Lee said. “This allows MONET to be combined with an existing medical AI development pipeline, including data curation and model development, in a plug-in-play manner.

“You don’t have to worry about it going through a model as it goes right to the data — that’s one way we can make dataset and model auditing more transparent and trustworthy,” Lee continued.

The framework of MONET can also help medical AI model developers create inherently interpretable models. Physicians, in particular, are interested in such models, like concept bottleneck models (CBMs), because it is easy to decipher and understand what factors are influencing the AI’s decisions. However, CBMs are limited because they require concept annotation in the training data which may not always be available; MONET’s automatic annotation has the potential to help build CBMs that were previously impossible. 

“While we only focused on a foundation model based on OpenAI’s CLIP model, we expect that this whole idea can be applied to other more advanced large foundation models,” Kim noted. “Nowadays, AI is developing very rapidly but our framework of using large foundation models’ amazing capabilities to improve transparency of medical AI systems will still be applicable.” 

This is part of a broader research effort in the AIMS Lab to ensure AI in dermatology and medical imaging is safe, transparent and explainable. Other projects include a new framework for auditing AI-powered medical-image classifiers that can help dermatologists understand how the model determines whether an image depicts melanoma or a benign skin condition. Another paper sheds light on the reasoning process medical image classifiers use to identify patients’ sex from images of skin lesions. Additionally, counterfactual AI prompts have the potential to show how patient data may change based on genetic mutations, treatments or other factors. These research initiatives have potential applications beyond dermatology to other medical specialties, Lee said.

Lee and Kim’s co-authors on the paper include Allen School Ph.D. students Soham Gadgil and Alex DeGrave, Stanford University postdocs Zhuo Ran Cai, M.D. and Jesutofunmi Omiye, M.D., and Roxana Daneshjou, M.D., Ph.D., a faculty member in the Department of Biomedical Data Sciences and in Dermatology at Stanford University.

Read the full paper in Nature Medicine.

Read more →

Pushing beyond the silos: Allen School’s Jon Froehlich aims to build a unified approach to urban science as part of new NSF-funded project

Urban street scene depicting hot pink streetcar entering an intersection, with a car on the opposite side of the road and painted pedestrian crossing visible against a backdrop of street trees and low-rise buildings, with high-rise buildings in distant background
Mark Stone/University of Washington

According to the United Nations, more than half of the world’s population — 55% — lives in urban areas, with that figure projected to rise to 70% by the year 2050. While urban communities are vibrant centers of economic, cultural and civic activity, this vibrancy tends to be accompanied by concerns about housing affordability, aging or inadequate infrastructure, environmental impacts and more.

Such vibrancy also yields a lot of data that could provide a window onto how our urban environments function, with a view to creating more sustainable and equitable communities as well as planning for future growth. But it’s difficult to extract usable insights, let alone turn those insights into practical actions, when the data is stored in different formats, spread across different systems and maintained by different agencies.

A team of researchers that includes professor Jon Froehlich, director of the Allen School’s Makeability Lab, is pursuing a more unified approach that will democratize data analysis and exploration at scale and empower urban communities. Froehlich is co-principal investigator on a new, five-year project dubbed OSCUR — short for Open-Source Cyberinfrastructure for Urban Computing — that recently earned a $5 million grant from the National Science Foundation. The project is spearheaded by the NYU Tandon School of Engineering’s Visualization Imaging and Data Analytics Research Center (VIDA) in partnership with the University of Washington and the University of Illinois Chicago.

Portrait of Jon Froehlich wearing a t-shirt branded with an M logo for the Makeability Lab
Jon Froehlich

“The true beauty and promise of OSCUR is in how it attempts to unify long-standing and deeply interconnected problems in urban science that often have disparate approaches spread across disciplines,” Froehlich said in the project announcement. “We are trying to develop standardized tools, datasets, and data standards to address problems related to climate change (e.g., urban heat island effects), walkability and bikeability, urban accessibility for people with disabilities, and more.”

To that end, Froehlich and his colleagues intend to cultivate a cohesive urban computing community spanning computer science, data science, urban planning, civil engineering, environmental sciences and other expertise. They will harness this combined wisdom to develop a set of scalable, open-source tools to enable interactive exploration and analysis of complex data, with an emphasis on findability, usability, interoperability, transparency and reproducibility. This would enable a variety of stakeholders, from researchers, to practitioners, to residents, to collaboratively address common challenges — without having to build something from scratch that may quickly become obsolete. 

Communities stand to benefit from this more integrated approach in multiple ways. For example, the team envisions a set of tools that would enable agencies to make more robust use of citywide sensor data to monitor and mitigate noise pollution and improve quality of life. In addition, they could combine data from different sources to gain a more comprehensive understanding of how their infrastructure might withstand disaster — and where they may need to shore up their resilience. Such an approach would also enable communities to glean new insights into how the built environment affects pedestrian mobility, with a view to making their communities more accessible for all residents.

Froehlich is no stranger to urban accessibility issues — or collaborating with residents and decision makers to extend the real-world impact of his research. He previously co-founded Project Sidewalk, an effort that combines advances in artificial intelligence with the power of citizen science to identify and map pedestrian accessibility issues such as missing curb ramps and uneven surfaces. The initiative has spread to 21 cities in the U.S. and around the world, including Seattle and Chicago in the U.S., Mexico City, Mexico, and Amsterdam in the Netherlands. To date, contributors to Project Sidewalk have mapped more than 11,400 miles of infrastructure and contributed over one million labels — data that has been leveraged to improve city planning, build new interactive tools and train AI to automatically log accessibility issues.

“I have worked in urban computing for more than a decade,” said Froehlich. “OSCUR is one of those rare opportunities to push beyond the silos of academia and develop tools for and with communities that will take them far into the 21st century.”

Read the project announcement here.

Read more →

Looking into places others are not: Allen School’s Miranda Wei receives 2024 Karat Award for contributions to usable security and privacy

Headshot of Miranda Wei wearing a black sweater against a brick background
Miranda Wei

When Miranda Wei attended her first Symposium on Usable Privacy and Security (SOUPS) conference in 2017, she had little experience in the field; she had only recently graduated with a degree in political science from the University of Chicago. But the community of researchers at the conference welcomed her in. That experience paved the way for her to continue doing research on privacy and security and, eventually, to pursue a Ph.D. at the Allen School.

Seven years after her first foray into the SOUPS community, Wei received the 2024 John Karat Usable Privacy and Security Student Research Award at the conference for her interdisciplinary contributions to the field and strong leadership. The award, named for the late researcher John Karat, recognizes graduate students for their research in usable privacy and security, efforts to mentor others and community service.

“As a researcher, we publish in many different venues, but SOUPS is the first conference that I went to and is the closest to my heart,” Wei said. “It’s a huge honor to be recognized for the work that I’ve done, especially as someone who came to this field from a non-traditional background.”

For Allen School professor Franziska Roesner, one of Wei’s Ph.D. co-advisors and co-director of the Security and Privacy Research Lab alongside colleague Tadayoshi Kohno, Wei is already a “superstar in usable security and privacy.” Wei’s research focuses on how societal factors can impact individuals’ security and privacy. For example, her paper presented at the 2022 SOUPS conference analyzed how TikTok creators shared information on how to leverage technology to monitor or control others, especially within families or with romantic partners. The research was one of the first to consider the platform as a data source in the field. 

“When I think of thought leadership within a field, I think of those who look into places that others are not,” Kohno said. “Miranda’s work with TikTok as a data source is a great example of such leadership.”

Wei’s other work has also made strides in the privacy and security field. Her 2023 paper at the IEEE Symposium on Security and Privacy was one of the first to explore gender stereotypes within computer security and privacy practices, Roesner noted. There is still more research to do: in work presented at the 2024 USENIX Security Symposium, Wei analyzed the field’s apparent lack of knowledge on how sociodemographic factors affect computer security behaviors. For her research advancing usable security and privacy, Wei was also one of 75 graduate students from around the world selected for the 2023 Google Ph.D. Fellowship program and one in four working in the privacy and security field.

“Miranda’s work is often deep and nuanced, drawing on methodology and theory from multiple fields (such as computer security, human-computer interaction and social science) to ask fundamental questions situated in complex social and societal dynamics,” said Roesner, the Brett Helsel Career Development Professor in the Allen School. “This includes exploring constructs of power and gender, and challenging the field’s norms around what we know and how we develop knowledge.”

Outside of her research contributions, Wei is heavily involved in mentorship and community building. As a senior Ph.D. student in the Security and Privacy Research Lab, Wei works as a sounding board for other students and has served in an advisory role on multiple research projects. She also has co-founded and volunteers with the Allen School’s Pre-Application Mentorship Service (PAMS) advising prospective graduate students. At the 2024 SOUPS conference, Wei co-organized the inaugural Gender, Online Safety and Sexuality (GOSS) workshop to help integrate feminist, LGBTQ+ and critical theories into research on online safety. 

“Her research vision and agenda around advancing computer security, privacy and safety for all inherently embody a global ambition for social good,” Roesner said. “She cares deeply about expanding access to opportunities for and improving the experience of people in and around computer science.”

For Wei, she did not achieve this award on her own.

“All of my research papers and projects I’ve worked on have benefited from my friends in the Security and Privacy Research Lab and my mentors across the world,” Wei said. “I really think it takes a village.”


Learn more about the Karat Award here. Read more →

Marvelous mutants: Allen School’s René Just and Michael Ernst receive FSE Most Influential Paper Award for showing the validity of mutants in software testing

A large bolt of lightning illuminates clouds in a bluish-purple sky
Photo by NOAA on Unsplash

In the Marvel Universe, mutants known as the X-Men wield superhuman abilities ranging from shape-shifting to storm-summoning. 

In the software universe, mutants may not bring the thunder, but they are no less marvelous. In 2014, Allen School professors René Just and Michael Ernst, along with their collaborators, demonstrated that mutants function as an effective substitute for real defects in software testing. Their work, which spawned a robust line of follow-on research over the ensuing decade, earned them the Most Influential Paper Award at the ACM International Conference on the Foundations of Software Engineering (FSE 2024) last month in Porto de Galinhas, Brazil.

Mutants are artificial defects (bugs) intentionally embedded throughout a program. If a test suite is good at detecting these artificial defects, it may be good at detecting real defects. Testing is an important element of the software development cycle; buggy code can be annoying, like when a video game glitches, or it can grind industries to a halt, like the world witnessed during the recent CrowdStrike incident. According to the Consortium for Information & Software Quality (CISQ), the costs associated with defective software surpassed $2 trillion in 2022 in the United States alone. 

Portrait of René Just
René Just

Among the solutions CISQ emphasized in its report were “tools for understanding, finding and fixing deficiencies.” Mutants play an integral role in the development and evaluation of such tools. While the software community historically had assumed such artificial defects were valid stand-ins for real ones, no one had empirically established that this was, indeed, the case.

“We can’t know what real errors might be in a program’s code, so researchers and practitioners relied on mutants as a proxy. But there was very little evidence to support that approach,” Ernst said. “So we decided to test the conventional wisdom and determine whether the practice held up under scrutiny.”

Ernst, Just and their colleagues applied this scrutiny through a series of experiments using 230,000 mutants and over 350 real defects contained in five open-source Java programs comprising 321,000 lines of code. To reassemble the real defects, which had already been identified and fixed by developers, the researchers examined the version history for bug-fixing commits. They then ran both developer-written and automatically generated test suites to ascertain how their ability to find known mutants in a program correlated with their ability to identify the real defects. During their testing, the researchers controlled for code coverage, or the proportion of each program’s code that was executed during the test, which otherwise could confound the results.

Those results revealed a statistically significant relationship between a test suite’s effectiveness at detecting mutants and its effectiveness at detecting real defects. But while the team’s findings confirmed the conventional wisdom in one respect, it upended it in another.

Portrait of Michael Ernst
Michael Ernst

“Our findings validated the use of mutants in software test development,” said Just, who was first author of the paper and a postdoctoral researcher in the Allen School at the time of publication. “It also yielded a number of other new and practical insights — one being that a test suite’s ability to detect mutants is a better predictor of its performance on real defects than code coverage.”

Another of the paper’s insights was confirmation that a coupling effect exists between mutants and real defects. This effect is observed between a complex defect and a set of simple defects when a test that detects the latter also succeeds in detecting the former. While prior work had shown that the same effect exists between simple and complex mutants, it was unclear whether a similar coupling effect applied between real defects and simple mutants. The researchers found that this was, indeed, the case, identifying 73% of real defects that were coupled to mutants. Based on an analysis of the 27% that did not exhibit this coupling effect, the team recommended a set of concrete approaches for improving mutation analysis — and by extension, the effectiveness of test suites. 

In addition to Just and Ernst, co-authors of the paper include Allen School alum Darioush Jalali (M.S., ‘14), now a software engineer at Ava Labs; then-Ph.D. student Laura Inozemtseva and professor Reid Holmes of the University of Waterloo, now a senior software engineer at Karius and a faculty member at the University of British Columbia, respectively; and University of Sheffield professor Gordon Fraser, now a faculty member at the University of Passau.

Read the full paper here. Read more →

Mind over model: Allen School’s Rajesh Rao proposes brain-inspired AI architecture to make complex problems simpler to solve

A glowing hologram of a brain emerges from a circuit board.
Vecteezy/abdulbayzid

When you reach out to pet a dog, you expect it to feel soft. If it doesn’t feel like how you expect, your brain uses that feedback to inform your next action — maybe you pull your hand away. Previous models of how the brain works have typically separated perception and action. For Allen School professor Rajesh Rao, those two processes are closely intertwined, and their relationship can be mapped using a computational algorithm. 

“This flips the traditional paradigm of perception occurring before action,” said Rao, the Cherng Jia and Elizabeth Yun Hwang Professor in the Allen School and University of Washington Department of Electrical & Computer Engineering and co-director of the Center for Neurotechnology

In a recent paper titled “A sensory-motor theory of the neocortex” published in the journal Nature Neuroscience, Rao posited that the brain uses active predictive coding (APC) to understand the world and break down complicated problems into simpler tasks using a hierarchy. This architecture, which is inspired by previous work in artificial intelligence (AI), can in turn be potentially used to train AI algorithms on increasingly complex problems with less data and better predict different outcomes. 

“Data from neuroscience suggests the brain uses a hierarchical generative model to constantly predict the consequences of actions,” Rao said. “The brain is creating its hypotheses saying, ‘Here’s what I predict will happen in the world. Now, let’s check this hypothesis with what’s really coming in through the sensors in my body.’ Errors in prediction can then be used to correct the hypothesis.” 

For Rao, the anatomy of the neocortex indicates a “tight coupling” between sensory and motor processes, or perception and action, similar to the mathematical model used in reinforcement learning in AI. Reinforcement learning utilizes a generative model to capture the relationship between an agent’s motor output and the sensory input it receives from the environment. You reach out to pet a dog, for example, and you feel the texture of its fur. 

Portrait of a smiling Rajesh Rao wearing wire-rimmed eyeglasses and a dark grey suit jacket over a pale grey button-up shirt, with concrete and brick features and catwalk lighting in the Paul G. Allen Center atrium visible in the background.
Allen School professor Rajesh Rao

This generative model is also called a world model, represented mathematically as a state transition function specifying how the agent’s actions change their world. Alongside the world model is a policy function that selects a sequence of actions. These functions work together to help you learn, perform new tasks and predict the consequences of actions, such as what you expect this dog’s fur to feel like compared to other dogs you have touched. 

Anatomically, each section of the neocortex is “like a six-layered layer cake,” Rao explained, “with the middle and top layers processing sensory information and the bottom layers sending information to action centers of the brain.” The model suggests that areas higher up in the hierarchy can modulate the dynamics of the lower level neural networks and change the function being computed there, similar to hypernetworks in AI.

“The ability to change the functions being computed at different levels of a computational hierarchy endows the APC network with remarkable versatility,” Rao said.   

Break it down: How AI can learn from the brain

The APC model’s hierarchical approach can break down a complicated, abstract problem into smaller parts in a process known as compositionality. If the higher level goal, for example, is to go to the grocery store, that task can be decomposed into a sequence of simpler steps, Rao explained. First, you might walk to the garage where your car is parked, open the door and then unlock the car door. Eventually, the tasks can be broken down to specific muscles in the hand and to the lowest level in the spinal cord controlling your hand. 

Compositionality may help address one of the problems holding back traditional AI models. For every new problem that the AI model faces, it needs to be trained, potentially using reinforcement learning, on lots of new data, whereas the human brain is very good at quick generalization with little data, Rao noted.

Instead of using trial-and-error learning or planning each discrete step, if the agent has already learned to solve simpler tasks such as navigating smaller rooms or moving from corner to corner, it can use that knowledge to break down the task into a sequence of simpler tasks it already knows how to solve. 

“The compositionality inherent in the APC model allows it to compose solutions to new problems in the world really quickly,” Rao said. “Suppose I already learned how to get into the car. I can keep using that policy function for all kinds of other tasks like driving to school or going to meet a friend.”

The same APC model architecture can also be used for visual perception and learning. An APC model learns to read and write the number eight, for example, by breaking it down into different strokes. It can then use those parts to compose other new characters.

“The APC model builds on past ideas of hierarchical reinforcement learning but goes beyond the usual hierarchy of policy functions,” Rao said. “The architecture of the neocortex suggests that there is a great benefit to modeling the world itself as a hierarchy. Coupling such a hierarchical world model with a hierarchy of policy functions may be how our brain tackles the complexity of the world we live in.”

The next step for Rao and his students in the Allen School’s Neural Systems Laboratory is to look at how to apply this architecture to large-scale AI problems such as language and robotics and test the model’s predictions in collaboration with neuroscientists.

Read the paper in Nature Neuroscience here.

Read more →

Older Posts »