In honor of the late Allen School professor Gaetano Borriello, whose work focused on applying mobile technologies to tackle issues of global equity and social and environmental justice, the ubiquitous computing community each year recognizes Ph.D. students who are following in his footsteps.
This year, those footsteps led back to the Allen School when Ph.D. student Anandghan Waghmare won the 2024 Gaetano Borriello Outstanding Student Award at the ACM International Joint Conference on Pervasive and Ubiquitous Computing/International Symposium on Wearable Computing (UbiComp/ISWC) in Melbourne, Australia, in October. Waghmare’s contributions to the ubiquitous and pervasive computing field, the societal impact of his work as well as his community service embody Borriello’s legacy and research passions.
“This award means a lot to me. I always looked up to and respected each year’s winners for their good work and research,” Waghmare said. “This award is a significant honor in the UbiComp community.”
The focus of Waghmare’s thesis is investigating ways to add small and inexpensive pieces of hardware to existing devices to make them do more than they were designed to do. For example, Waghmare designed a smartphone-based glucose and prediabetes screening tool called GlucoScreen. More than one out of three adults in the country has prediabetes, or elevated blood sugar levels that can develop into type 2 diabetes, according to the U.S. Centers for Disease Control (CDC). However, more than 80% do not know they have the condition which can lead to complications such as heart disease, vision loss and kidney failure, the CDC found.
Current blood testing methods require visits to a health care facility, which can be expensive and inaccessible especially to those in developing countries, Waghmare explained. With GlucoScreen’s low-cost and disposable rapid tests, patients can easily screen for prediabetes at home without the need for additional equipment, and potentially reverse the condition under the care of a physician with diet and exercise. The research was published in the Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT) and presented at last year’s UbiComp/ISWC conference.
“A lot of technology is already built into these devices like smartphones — the computer power, the networking, the display,” Waghmare said. “I try to leverage everything in existing devices and add what’s missing and find clever engineering ways to do it.”
His other projects follow a similar theme. He developed WatchLink, which enables users to add additional sensors to smart watches to measure UV light, body temperature and breath alcohol levels. This research paves the way for building more cost-effective ways for personalized user-centric wearables. Waghmare also works on building wearable devices for intuitive input, such as the Z-Ring wearable that uses radio frequency sensing to facilitate context-aware hand interactions.
“Anand’s work would make Gaetano so proud given the focus on solving socially meaningful problems in a practical way using clever hardware/software solutions,” said Shwetak Patel, Waghmare’s advisor in the University of Washington’s UbiComp Lab and the Washington Research Foundation Entrepreneurship Endowed Professor in the Allen School and the UW Department of Electrical & Computer Engineering.
Outside of his research contributions, Waghmare is heavily involved in community service and mentorship. Since 2019, he has been a peer reviewer for multiple conferences including IMWUT, Human Factors in Computing Systems (CHI) and the International Conference on Interactive Surfaces and Spaces. This year, he also volunteered as video previews co-chair at the ACM Symposium on User Interface Software and Technology (UIST), web co-chair at the International Symposium on Mixed and Augmented Reality and special topic session chair at the ACM Symposium on Spatial User Interaction. In upcoming 2025 conferences, Waghmare is serving as publications co-chair at CHI, video previews co-chair at UIST and registration co-chair at UbiComp.
At the same time, Waghmare has been working on encouraging more students to pursue STEM fields. This includes exposing students to research early in their careers and sharing his own work and experiences, as well as hosting high school students over the years. For Patel, Waghmare is a “great citizen of the research community.”
“I am excited and honored to join the ranks of the amazing researchers who have won this award before me, and I hope to inspire other Ph.D. students to apply because you never know,” Waghmare said.
“Having traveled to and experienced different countries, I bring a more practical and global outlook to computer science. This definitely gives me a different understanding and appreciation for the subject compared to some of my peers.”
That sentiment, offered by Fei Huang, veteran and current undergraduate, is one of the themes we found among Allen School students with military backgrounds. To mark Veterans Appreciation Week at the University of Washington, we spoke with three students on how being a veteran gives them a unique perspective on their studies and future career path.
Fei Huang
Before joining the Allen School, Huang served six years in the Navy. The same thing that drove Huang to the military helped fuel his journey to the Allen School: intellectual curiosity. Huang described himself as a “curious person” and joining the Navy gave him the opportunity to see the world and experience different aspects of life. At the same time, his interest in computer science grew as he learned how the technology had transformed the world over the decades.
“I wanted to contribute to humanity and future generations, as I believe computer science is the foundation of our future world. I chose the Allen School because it’s one of the best computer science programs globally, attracting top talent and being situated in a city at the forefront of technology,” Huang said.
In the Navy, Huang learned how to quickly adapt to new environments, making for a smooth transition from military to university life. One of the biggest adjustments he faced was time management. In the Navy, Huang said he had his whole day scheduled for him; university life required him to manage his own time, but it also gave him “the freedom to pursue his passions.”
Huang’s advice for any other veterans looking to study at the Allen School is to see their military experience as a strength.
“At first, you might feel like an outsider because your way of thinking and your experiences are different from those of your classmates. Embrace this difference, and use it to lead your peers by sharing real-world perspectives,” Huang said. “The military places a strong emphasis on leadership, and for me, the most important aspect of leadership is ownership. I encourage my peers to take ownership of everything they do.”
Makalapua Goodness
Allen School undergraduate student and veteran Makalapua Goodness grew up in a small town and enlisted in the Air Force as a way to get out and see the world. He served seven years in the military before following his interest in technology and computer science to the Allen School. For Goodness, the skills he learned during his military service helped make the transition to being a university student easier.
“Veterans and civilians have different mindsets around things. Veterans can handle adversity better and are used to handling stressful situations,” Goodness said. “That’s what I tell myself when I’m lost in a class or a project — that I’ve been through tougher times than this.”
His military background also comes out in how he approaches his assignments and coursework. In the later part of his Air Force career, he was often in a supervisor role and focused on team dynamics. Now, when working with others at the Allen School, he thinks about “how to involve and put everyone in the best position to succeed, both as a group and individually.”
He credits the other skills he gained through the military such as timeliness and being goal-oriented for helping him find success at the Allen School.
Larrianna Warner
Allen School undergraduate student and veteran Larrianna Warner said she was unsure exactly what she wanted to pursue after high school, but she knew she loved learning languages. That passion led her to enlist in the Air Force and serve four years as a Russian Cryptologic Language Analyst focusing on translating messages. In the Air Force, Warner became interested in large language models and natural language processing and how they can be used for both translating languages as well as intelligence purposes. Studying at the Allen School became a perfect fit for her.
“The perspective I bring to computer science is that I can see the way it can be used in military applications. I’m really dedicated to the idea of human-centered technology, specifically artificial intelligence,” Warner said. “I don’t think many people fully grasp the idea of where AI is headed in regards to the military and government sector so I think it’s important to have at least one person who really understands the impact of it all and has seen it with their own eyes.”
At the University of Washington’s annual Veterans Day Ceremony on Monday, Warner was recognized for her military service and her work as a veteran peer mentor in the Office of Student Veteran Life. As part of her role, Warner supports veterans in their transition into university life.
“It’s a huge life decision to separate from the military and come back to school and it’s very easy to question whether or not you made the right choice,” Warner said. “But as soon as I voice these thoughts to other veterans, they come running to remind me that the Allen School wouldn’t have accepted me if they didn’t see something in me, and it is my job to tell them the same thing when they voice those imposter syndrome-induced thoughts.”
For any veterans on the fence about joining the Allen School, Warner emphasized that Student Veteran Life is there to lend a helping hand and build community with other veterans on campus.
For more information on the many ways the UW community celebrates the contributions of our veterans, visit the Veterans Appreciation Week website.
While the Allen School’s annual Research Showcase and Open House highlights both the breadth and depth of computing innovation at the state’s flagship university, the 2024 event at the University of Washington last week had a decidedly AI flavor. From a presentation on advances in AI for medicine, to technical sessions devoted to topics such as safety and sustainability, to the over 100 student research projects featured at the evening poster session, the school’s work to advance the foundations of AI and its ever-expanding range of applications took center stage.
“Medicine is inherently multimodal”
In his luncheon keynote on generative AI for multimodal biomedicine, Allen School professor Sheng Wang shared his recent work towards building foundation models that bring together medical imaging data from multiple sources — such as pathology, X-ray and ultrasound — to assist doctors with diagnosing and treating disease.
“Medicine is inherently multimodal,” noted Wang. “There are lots of complicated diseases, like diabetes, hypertension, cancer, Alzheimer’s or even Covid…and we will see signals all over the body.”
The ability to capture these signals using multiple imaging modalities requires overcoming a number of challenges. For example, pathology images are too large for existing AI models to analyze in sufficiently detailed resolution — 100,000 by 100,000 pixels, large enough to cover a tennis court. Typically, images encountered by AI models are closer to 256 by 256 pixels which, in keeping with Wang’s analogy, is akin to a single tennis ball.
To make pathology images more manageable, Wang and his collaborators looked to generative AI. Despite the stark difference in domains, “the challenge or the solution here is very similar to the underlying problem behind ChatGPT,” Wang explained. ChatGPT can understand and summarize long documents; by converting large pathology slide images to a “long sentence” of smaller images, Wang and his colleagues determined AI could then summarize these image-sentences to obtain an overview of a patient’s status. Based on that idea, Wang and his team developed GigaPath, the first foundation model for whole-slide pathology. GigaPath, which achieved state-of-the-art performance on 25 out of 26 tasks, is “one model fits all,” meaning it can be applied to different types of cancer. Since its release, the tool is averaging 200,000 downloads per month.
One task for which AI models typically do not perform well is predicting which treatment to recommend for a particular patient. So Wang and his colleagues borrowed another concept drawn from generative AI, chain-of-thought, which calls for decomposing a complicated task into multiple, small subtasks. The model is then asked to solve those smaller tasks individually on the way to addressing the bigger, more challenging task.
“The question is, how can we apply chain-of-thought to medicine?” Wang asked. “This has never been done before.” The answer is to use clinical guidelines as the chain to instruct a large language model (LLM). By breaking the chain into subtasks such as predicting cancer subtype and patient biomarkers, the LLM then arrives at a prediction of the appropriate treatment.
Yet another challenge is how to apply AI to 3D medical imaging. Here again, Wang and his colleagues achieved a milestone by developing the first 3D OCT foundation model. OCT is short for optical coherence tomography, a type of imaging used to diagnose retinal diseases.
“Our model can comprehensively understand the entire 3D structure to make a diagnosis,” said Wang, who aims to extend this approach to other types of medical 3D imaging, like MRI and CT scans — and eventually, to create one model that can handle everything. This is challenging for even general domain machine learning; the state of the art, CLIP, is limited to two modalities, Wang noted; he wants to build a medical model that can integrate as many as nine.
To overcome the problem, Wang and his fellow researchers drew inspiration from Esperanto, a constructed language that provides a common means of communication among a group of people who speak different languages. They devised an approach, BiomedParse, in which they built one foundation model for each modality, and then projected everything into the medical imaging equivalent of Esperanto — in this case, human language in the form of text from the associated clinical reports — as the common space into which they can project the millions of images, both 2D and 3D, from the different modalities.
But Wang wants to go beyond multi-modal to multi-agent. Using the example of a molecular tumor board, in which multiple experts convene to discuss challenging cases to determine a course of treatment, he suggested that AI models developed for different imaging modalities could help doctors efficiently and accurately determine a treatment plan — analogous to a Microsoft 365 for cancer research. And while some doctors may worry about AI replacing them, Wang’s approach is focused on advancing human-AI collaboration: Medical experts still develop the high-level guidelines for the model, with the AI handling the individual steps.
“In the future the AI model could be like a clinical lab test every doctor can order,” Wang suggested. “The doctor can order an AI test to do a specific task, and then the doctor will make a decision based on the AI output.”
“It’s just really exciting to see all this great work”
The event culminated with the announcement of the recipients of the Madrona Prize, which is selected by local venture capital firm and longtime Allen School supporter Madrona Venture Group to recognize innovative research at the Allen School with commercial potential. Rounding out the evening was the presentation of the People’s Choice Award, which is given to the team with the favorite poster or demo as voted on by attendees during the event — or in this case, their top two.
Managing Director Tim Porter presented the Madrona Prize, which went to one winner and two runners up. Noting that previous honorees have gone on to raise hundreds of millions of dollars and get acquired by the likes of Google and Nvidia, he said, “It’s just really exciting to see all this great work turning into things that have long-term impact on the world through commercial businesses and beyond.”
Madrona Prize winner / Designing AI systems to support team communication in remote work
Allen School Ph.D. student Ruotong Wang accepted Madrona’s top prize for a pair of projects that aim to transform workplace communication — Meeting Bridges and PaperPing.
The Covid-19 pandemic has led to a rise in remote meetings, as well as complaints of “Zoom fatigue” and “collaboration overload.” To help alleviate this negative impact on worker productivity, Wang proposed meeting bridges, or information artifacts that support post-meeting collaboration and help shift work to periods before and after meetings. Based on surveys and interviews with study participants, the team devised a set of design principles for creating effective meeting bridges, such as the incorporation of multiple data types and media formats and the ability to put information into a broader context.
Meanwhile, PaperPing supports researcher productivity in the context of group chats by suggesting papers relevant to their discussion based on social signals from past exchanges, including previous paper citations, comments and emojis. The system is an implementation of Social-RAG, an AI agent workflow based on the concept of retrieval-augmented generation that feeds the context of prior interactions among the group’s members and with the agent itself into a large language model (LLM) to explain its current recommendations.
Additional authors on Meeting Bridges include Allen School alum Lin Qui (B.S. ‘23) and professor Amy Zhang, as well as Maestro AI co-founder Justin Cranshaw. In addition to Zhang and Qui, Allen School postdoc Xinyi Zhou and Allen Institute for AI’s Joseph Chee Chang and Jonathan Bragg (Ph.D., ‘18) contributed to PaperPing.
Madrona Prize runner up / Interpreting nanopore signals to enable single-molecule protein sequencing
For one of two runners up, Madrona singled out a team of researchers in the Allen School’s Molecular Information Systems Laboratory (MISL) for developing a method for long-range, single-molecule protein sequencing using commercially available nanopore sensing devices from Oxford Nanopore Technologies. Determining protein sequences, or the order that amino acids are arranged within a protein molecule, is key to understanding their role in different biological processes. This technology could help researchers develop medications targeting specific proteins for the treatment of cancer and neurological diseases such as Alzheimer’s.
Madrona Prize runner up / Knowledge boosting during low-latency inference
Another team of researchers earned accolades for their work on knowledge boosting, a technique for bridging potential communication delays between small AI models running locally on edge devices and larger, remote models to support low-latency applications. This approach can be used to improve the performance of a small model operating on headphones, for example, with the help of a larger model running on a smartphone or in the cloud. Potential uses for the technology include noise cancellation features, augmented reality and virtual reality headsets, and other mobile devices that run AI software locally.
Lead author Vidya Srinivas accepted the award on behalf of the team, which includes fellow Allen School Ph.D. student Tuochao Chen and professor Shyam Gollakota; Malek Itani, a Ph.D. student in the UW Department of Electrical & Computer Engineering; Microsoft Principal Researcher Emre Sefik Eskimez and Director of Research at AssemblyAI Takuya Yoshioka.
People’s Choice Award (tie) / AHA: A vision-language-model for detecting and reasoning over failures in robotic manipulation
Attendees could not decide on a single favorite presentation of the night, leading to a tie for the People’s Choice Award.
While advances in LLMs and vision-language models may have expanded robots’ problem solving, object recognition and spatial reasoning capabilities, they’re lacking when it comes to recognizing and reasoning about failures — which hinders their deployment in dynamic, real-world settings. The research team behind People’s Choice honoree AHA: A Vision-Language-Model for Detecting and Reasoning over Failures in Robotic Manipulation designed an open-source VLM that identifies failures and provides detailed natural-language explanations for those failures.
“Our work focuses on the reasoning aspect of robotics, often overlooked but essentially especially with the rise of multimodal large language models for robotics,” explained lead author and Allen School Ph.D. student Jiafei Duan. “We explore how robotics could benefit from these models, particularly by providing these models with the capabilities to reason about failures in the robotics execution and hence helped with improving the downstream robotic systems.”
Using a scalable simulation framework for demonstrating failures, the team developed AHA to effectively generalize to a variety of robotic systems, tasks and environments. Duan’s co-authors include Allen School Ph.D. student Yi Ru Wang, alum Wentao Yuan (Ph.D. ‘24) and professors Ranjay Krishna and Dieter Fox; Wilbert Pumacay, a Master’s student at the Universidad Católica San Pablo; Nishanth Kumar, Ph.D. student at the Massachusetts Institute of Technology; Shulin Tian, an undergraduate researcher at Nanyang Technological University; and research scientists Ajay Mandlekar and Yijie Guo of Nvidia.
People’s Choice Award (tie) / AltGeoViz: Facilitating accessible geovisualization
The other People’s Choice Award winner was AltGeoViz, a system that enables screen-reader users to explore geovisualizations by automatically generating alt-text descriptions based on the user’s current map view. While conventional alt-text is static, AltGeoViz dynamically communicates visual information such as viewport boundaries, zoom levels, spatial patterns and other statistics to the user in real time as they navigate the map — inviting them to interact with and learn from the data in ways they previously could not.
“Coming from an urban planning background, my motivation for pursuing a Ph.D. in human-computer interaction originates from my passion for helping people design better cities,” lead author and Allen School Ph.D. student Chu Li said. “AltGeoViz represents a step towards this goal — by making spatial data visualization accessible to blind and low-vision users, we can enable broader participation in the urban planning process and shape more inclusive environments.”
Pets can do more than just provide us with companionship and cuddles. Our love for our pets can improve science education and lead to innovative ways to use augmented reality (AR) to see the world through a canine or feline friend’s eyes.
In a paper titled “Reconfiguring science education through caring human inquiry and design with pets,” a team of researchers led by Allen School professor Ben Shapiro introduced AR tools to help teenage study participants in a virtual summer camp design investigations to understand their pets’ sensory experiences of the world around them and find ways to improve their quality of life. While science and science education typically emphasize a separation between scientists and the phenomena they study, the teens’ experience organizes learning around the framework of naturecultures, which emphasizes peoples’ relationships with non-human subjects in a shared world and encourages practices of perspective-taking and care. The team’s research shows how these relational practices can instead enhance science and engineering education.
The paper won the 2023 Outstanding Paper of the Year Award from the Journal of the Learning Sciences – the top journal in Shapiro’s field.
“The jumping off point for the project was wondering if those feelings of love and care for your pets could anchor and motivate people to learn more about science. We wondered if learning science in that way could help people to reimagine what science is, or what it should be,” said Shapiro, the co-director of the University of Washington’s Center for Learning, Computing and Imagination. “Then, we wanted to build wearables that let people put on those animal senses and use that as a way into developing greater empathy with their pets and better understanding of how animals experience the shared environment.”
Science begins at home
When the Covid-19 pandemic in 2020 pushed everything online, it was a “surprising positive” for the team’s plan to host a pet-themed science summer camp, Shapiro said. Now, teens could study how their pets’ experiences were shaped by their home environment and how well those environments satisfied pets’ preferences, and researchers could support their learning together with their pets in their homes. Shapiro and the team developed “DoggyVision” and “KittyVision” filters that used red-green colorblindness, diminished visual acuity and reduced brightness to approximate how dogs and cats see. The study participants then designed structured experiments to answer questions such as “what is my pet’s favorite color?” that were guided by the use of the AR filter tools.
“We wanted to organize student inquiry around the idea of their pets as whole beings with personalities and preferences and whose experiences are shaped by the places they are at. Those places are designed environments, and we wanted youth to think about how those designs serve both humans and non-humans,” Shapiro said. “We drew on prior work in animal-computer interaction to help students develop personality profiles of their pets called ‘pet-sonas.’”
For example, study participant Violet enjoyed buying colorful toys for her dog Billie, however, she found out using the AR filter that Billie could not distinguish between many colors. To see if Billie had a color preference, Violet designed a simple investigation where she placed treats on top of different colored sheets of papers and observed which one Billie chose. Violet learned from using the “DoggyVision” filter that shades of blue appeared bright in contrast to the treats — Billie chose treats off of blue sheets of paper in all three tests. She used the results of her experiments to further her investigations into what kinds of toys Billie would like.
“The students were doing legitimate scientific inquiry — but they did so through closeness and care, rather than in a distant and dispassionate way about something they may not care about. They’re doing it together with creatures that are part of their lives, that they have a lot of curiosity about and that they have love for,” Shapiro said. “You don’t do worse science because you root it in passion, love, care and closeness, even if today’s prevailing scientific norms emphasize distance and objectivity.”
Next, Shapiro is looking to explore other ways that pet owners can better understand their dogs. This includes working with a team of undergraduates in the Allen School and UW Department of Human Centered Design & Engineering to design wearables for dogs that give pet owners information about their pet’s anxiety and emotions so they can plan better outings with them.
Priyanka Parekh, a researcher in the Northern Arizona University STEM education and Learning Sciences program, is lead author of the paper. It was also co-authored by University of Colorado Learning Sciences and Human Development professor Joseph Polman and Google researcher Shaun Kane.
Read the full paper in the Journal of the Learning Sciences.
Would you call your favorite fizzy drink a soda or a pop? Just because you speak the same language, does not mean you speak the same dialect based on variations in vocabulary, pronunciation and grammar. And whatever the language, most models used in artificial intelligence research are far from an open book, making them difficult to study.
At the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) in August, Allen School researchers took home multiple awards for their work to address these challenges. Their research ranged from introducing more dialects into language technology benchmarks to evaluating the reliability and fairness of language models and increasing the transparency and replicability of large language model training as well as evaluations across languages.
Best Social Impact Paper: DialectBench
The benchmarks used in natural language processing (NLP) research and evaluation are often limited to standard language varieties, making them less useful in real-world cases. To address this gap, Allen School researchers introduced DialectBench, the first large-scale NLP benchmark for language varieties that covers 40 different language clusters with 281 varieties across 10 NLP tasks.
While DialectBench can give researchers a comprehensive overview of the current state of NLPs, it also has the potential to bring more languages under the NLP model in the future.
“Language variation like African American or Indian English dialects in NLP is often treated as noise, however in the real world, language variation often reflects regional, social and cultural differences,” said senior author and Allen School professor Yulia Tsvetkov. “We developed a robust framework to evaluate the quality of multilingual models on a wide range of language varieties. We found huge performance disparities between standard languages and their respective varieties, highlighting directions for future NLP research.”
Benchmarking helps researchers track the progress the NLP field has made across various tasks by comparing it to other standard points of reference. However, it is difficult to test the robustness of multilingual models without an established NLP evaluation framework that covers many language clusters, or groups of standard languages alongside its closely related varieties. For DialectBench, the researchers constructed several clusters such as the Hindustani cluster which encapsulated Fiji Hindi and Hindi. Then, they selected tasks that test the model’s linguistic and demographic utilities.
The researchers used DialectBench to report the disparities across standard and non-standard language varieties. For example, they found that the highest-performing varieties were mostly standard high-resource languages, such as English, and a few high resource dialects including Norwegian dialects. On the other hand, the majority of the lowest-performing language variants were also low-resourced language varieties.
As language models have become more common in commercial products, at the same time, important details about these models’ training data, architectures and development have become hidden behind proprietary interfaces. Without these features, it may be difficult to scientifically study these models’ strengths, weaknesses and their potential biases and risks.
The researchers built a competitive, truly open language model, OLMo, to help fill this knowledge gap and inspire other scientists’ innovations. Alongside OLMo, the team also released its entire framework from the open training data to evaluation tools. The researchers earned Best Theme Paper at ACL for their work titled “OLMo: Accelerating the Science of Language Models.”
“Language models are a decades-old idea that have recently become the backbone of modern AI. Today the most famous models are built as commercial products by huge tech firms, and many details of their design are closely guarded secrets,” said Smith, the Amazon Professor of Machine Learning in the Allen School. “We launched the OLMo effort as a collaboration between the Allen Institute for AI and the Allen School to create a fully open alternative that scientists could study, because it’s important that we fully understand these artifacts.”
While this paper presents the team’s first release of OLMo, they intend to continue to support and extend the model and its framework, bringing in different model sizes, modalities, datasets and more. Already since OLMo’s original release, the researchers have improved the data and training; for example, the Massive Multitask Language Understanding scores, which measure knowledge acquired during pretraining, went up by 24 points to 52%.
OLMo’s efforts to progress research into language models would not be complete without its counterpart Dolma, an English corpus containing three trillion tokens from web content to scientific papers to public-domain books.
While there has been progress toward making model parameters more accessible, pretraining datasets, which are fundamental to developing capable language models, are not as open and available. The researchers built and released OLMo’s pretraining dataset “Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research” to help facilitate open research into language models — and earned Best Resource Paper at ACL in the process.
“Even among open models, there are differences in what researchers can work with. With OLMo, we wanted a competitive, strong model whose data was also fully available for inspection,” said Smith. “Dolma is the dataset used to pretrain OLMo. It is extensively documented, and the paper includes analyses and discussion of lessons learned through data curation. We also released open-source data curation tools to enable reproduction and improvement of our work.”
Like with OLMo, this is just the beginning for Dolma. The researchers continue to make advancements as part of follow-on releases that, for example, yield significant performance improvements on downstream tasks.
Additional authors on the Dolma paper include Zettlemoyer, Ravichander, Jha, Elazar, Magnusson, Morrison, Soldaini, Kinney, Bhagia, Schwenk, Atkinson, Authur, Chandu, Dumas, Lambert, Muennighoff, Naik, Nam, Peters, Richardson, Strubell, Subramani, Tafjord, Walsh, Beltagy, Groeneveld and Dodge along with Russell Authur, Ben Bogin, Valentin Hofmann and Xinxi Lyu of AI2; University of California, Berkeley Ph.D. student Li Lucy; Carnegie Mellon University Ph.D. student Aakanksha Naik; and MIT Ph.D. student Zejiang Shen.
Trying to work or record interviews in busy and loud cafes may soon be easier thanks to new artificial intelligence models.
A team of University of Washington, Microsoft and AssemblyAI researchers led by Allen School professor Shyam Gollakota, who heads the Mobile Intelligence Lab, built two AI-powered models that can help reduce the noise. By analyzing turn-taking dynamics while people are talking, the team developed the target conversation extraction approach that can single out the main speakers from background audio in a recording. Similar kinds of technology may be difficult to run in real time on smaller devices like headphones, but the researchers also introduced knowledge boosting, a technique whereby a larger model remotely helps with inference for a smaller on-device model.
The team presented its papers describing both innovations at the Interspeech 2024 Conference in Kos Island, Greece, earlier this month.
One of the problems Gollakota and his colleagues sought to solve was how will the AI model know who are the main speakers in an audio recording with lots of background chatter. The researchers trained the neural network using conversation datasets in both English and Mandarin to recognize “the unique characteristics of people talking over each other in conversation,” Gollakota said. Across both language datasets, the researchers found the turn-taking dynamic held up with up to four speakers in conversation.
“If there are other people in the recording who are having a parallel conversation amongst themselves, they don’t follow this temporal pattern,” said lead author and Allen School Ph.D. student Tuochao Chen. “What that means is that there is way more overlap between them and my voice, and I can use that information to create an AI which can extract out who is involved in the conversation with me and remove everyone else.”
While the AI model leverages the turn-taking dynamic, it still preserves any backchannels happening within the conversation. These backchannels are small overlaps that happen when people are talking and showing each other that they are listening, such as laughter or saying “yeah.” Without these backchannels, the recording would not be an authentic representation of the conversation and would lose some of the vocal cues between speakers, Gollakota explained.
“These cues are extremely important in conversations to understand how the other person is actually reacting,” Gollakota said. “Let’s say I’m having a phone call with you. These backchannel cues where we overlap each other with ‘mhm’ create the cadence of our conversation that we want to preserve.”
The AI model can work on any device that has a microphone and record audio, including laptops and smartphones, without needing any additional hardware, Gollakota noted.
Additional co-authors on the target conversation extraction paper include Malek Itani, a Ph.D. student in the UW Department of Electrical & Computer Engineering, Allen School undergraduate researchers Qirui Wang and Bohan Wu (B.S., ‘24), Microsoft Principal Researcher Sefik Emre Eskimez and Director of Research at AssemblyAI Takuya Yoshioka.
Turning up the power: Knowledge boosting
Target conversation extraction or other AI-enabled software that work in real-time would be difficult to run on smaller devices like headphones due to size and power restraints. Instead, Gollakota and his team introduced knowledge boosting, which can increase the performance of the small model operating on headphones, for example, with the help of a remote model running on a smartphone or in the cloud. Knowledge boosting can potentially be applied to noise cancellation features, augmented reality and virtual reality headsets, or other mobile devices that run AI software locally.
However, because the small model has to feed information to the larger remote model, there is a slight delay in the noise cancellation.
”Imagine that while I’m talking, there is a teacher remotely telling me how to improve my performance through delayed feedback or hints,” said lead author and Allen School Ph.D. student Vidya Srinivas. ”This is how knowledge boosting can improve small models’ performance despite large models not having the latest information.”
To work around the delay, the larger model attempts to predict what is going to happen milliseconds into the future so it can react to it. The larger model is “always looking at things which are 40–50 milliseconds in the past,” Gollakota said.
The larger model’s prediction capabilities open up the door for further research into AI systems that can anticipate and autocomplete what and how someone is speaking, Gollakota noted.
In addition to Gollakota and Srinivas, co-authors on the knowledge boosting paper include Itani, Chen, Eskimez and Yoshioka.
This is the latest work from Gollakota and his colleagues to advance new AI-enabled audio capabilities, including headphones that allow the wearer to focus on a specific voice in a crowd just by looking at them and a system for selecting which sounds to hear and which ones to cancel out.
Determining protein sequences, or the order that amino acids are arranged within a protein molecule, is key to understanding their role in different biological processes and diseases. However, current methods for protein sequencing including mass spectrometry are limited and may not be sensitive enough to capture all the varying combinations of molecules in their entirety.
In a recent paper published in the journal Nature, a team of University of Washington researchers introduced a new approach to long-range, single-molecule protein sequencing using commercially available devices from Oxford Nanopore Technologies (ONT). The team, led by senior author and Allen School research professor Jeff Nivala, demonstrated how to read each protein molecule by pulling it through a nanopore sensor. Nanopore technology uses ionic currents that flow through small nanometer-sized pores within a membrane, enabling the detection of molecules that pass through the pore. This can be done multiple times for the same molecule, increasing the sequencing accuracy.
The approach has the potential to help researchers gain a clearer picture of what exists at the protein level within living organisms.
“This research is a foundational advance towards the holy grail of being able to determine the sequence of individual full-length proteins,” said Nivala, co-director of the Molecular Information Systems Lab (MISL).
The technique uses a two-step approach. First, an electrophoretic force pushes the target proteins through a CsgG protein nanopore. Then, a molecular motor called a ClpX unfoldase pulls and controls the translocation of the protein back through the nanopore sensor. Giving each protein multiple passes through the sensor helps eliminate the “noise associated with a single read,” Nivala explained. The team is then able to take the average of all the passes to get a more accurate sequencing readout as well as a detailed detection of any amino acid substitutions and post-translational modifications across the long protein strand.
This method differs from mass spectrometry, which does not look at each individual molecule, but takes the average of an ensemble of different proteins to characterize the sample — potentially losing out on information as each protein can have multiple variations within a cell, Nivala noted.
“One major advantage of nanopore technology is its ability to read individual molecules. However, analyzing these signals at the single-molecule level is challenging because of the variability in the signals, which persist to some extent even after applying normalization and alignment algorithms,” said co-lead author Daphne Kontogiorgos-Heintz, an Allen School Ph.D. student who works with Nivala in the MISL. “This is why I am so excited that we found a method to reread the same molecule multiple times.”
With a more detailed understanding of the protein sequences, this technology can help researchers develop medications that can target specific proteins, tackling cancer and neurological diseases like Alzheimer’s, Nivala explained.
“This will shed light into new diagnostics by having the ability to determine new biomarkers that might be associated with disease that currently we’re not able to to to read,” Nivala said. “It will also develop more opportunities to find new therapeutic targets, because we can find out which proteins could be manifesting the disease and be able to now target those specific variants.”
While the technology can help analyze natural biological proteins, it can also help read synthetic protein molecules. For example, synthetic protein molecules could be designed as data storage devices to record the molecular history of the cell, which would not be possible without the detailed readings from nanopore sensing, Nivala explained. The next step for this research would be working toward increasing the accuracy and resolution to achieve de novo sequencing of single molecule proteins using nanopores, which does not require a reference database.
Nivala and the team were able to conduct this research by modifying ONT technology toward nanopore protein sequencing.
“This study highlights the remarkable versatility of the Oxford Nanopore sensing platform,” said Lakmal Jayasinghe, the company’s SVP of R&D Biologics. “Beyond its established use in sequencing DNA and RNA, the platform can now be adapted for novel applications such as protein sequencing. With its distinctive features including portability, affordability and real-time data analysis, researchers can delve into proteomics at an unprecedented level by performing sequencing of entire proteins using the nanopore platform. Exciting developments lie ahead for the field of proteomics with this groundbreaking advancement.”
While the artist Claude Monet’s paintings can be blurry and indistinguishable, a new foundational model of the same name may help bring clarity to other medical artificial intelligence systems.
In a recent paper published in the journal Nature Medicine, a team of researchers at the University of Washington and Stanford University co-led by Allen School professor Su-In Lee introduced a medical concept retriever, or MONET, that can connect images of skin diseases to semantically meaningful medical concept terms. Beyond annotating dermatology images, MONET has the potential to improve transparency and trustworthiness throughout the entire AI development pipeline, from data curation to model development.
“We took a very different approach from current medical AI research, which often focuses on training large medical foundation models with the goal of achieving high performance in diagnostic tasks,” said Allen School Ph.D. student and lead author of the paper Chanwoo Kim, who works with Lee in the AI for bioMedical Sciences (AIMS) Lab. “We leverage these large foundation models’ capabilities to enhance the transparency of existing medical AI models with a focus on explainability.”
Prior to MONET, annotating medical images was a manual process and difficult to do on a large scale. Instead, MONET automates this process by employing an AI technique called contrastive learning, which enables it to generate plain language descriptions of images. The researchers trained MONET on over 100,000 dermatology image-text pairs from PubMed articles and medical textbooks and then had the model score each image based on how well it represents the concept. These medical concepts are “terms that a physician can understand and would use to make a diagnosis such as dome-shaped, asymmetrical or ulcer,” Kim said.
The team found that MONET could accurately annotate concepts across dermatology images as verified by board-certified dermatologists, and it was comparable to other supervised models built on previously concept-annotated dermatology datasets of small size.
These annotations can help researchers detect potential biases in datasets and undesirable behavior within AI systems. The researchers used MONET to audit the International Skin Imaging Collaboration (ISIC) dataset, the largest collection of over 70,000 dermoscopic images commonly used in training dermatology AI models, and found differences in how concepts correlate with being benign or malignant. For example, MONET showed that images of skin lesions where dermatologists placed orange stickers on them were mostly benign, which was not always the case. One explanation is that the orange stickers were often used in pediatric patients who tended to have benign cases, Kim noted.
This insight is crucial for understanding which factors affect the transferability of medical AI models across different sites. Usually, such data auditing at scale is not feasible due to the lack of concept labels.
“In the AI pipeline, MONET works at the entry level, providing a ‘lens’ through which each image can be ‘featurized’ based on available information to map it with relevant language-based features,” Lee said. “This allows MONET to be combined with an existing medical AI development pipeline, including data curation and model development, in a plug-in-play manner.
“You don’t have to worry about it going through a model as it goes right to the data — that’s one way we can make dataset and model auditing more transparent and trustworthy,” Lee continued.
The framework of MONET can also help medical AI model developers create inherently interpretable models. Physicians, in particular, are interested in such models, like concept bottleneck models (CBMs), because it is easy to decipher and understand what factors are influencing the AI’s decisions. However, CBMs are limited because they require concept annotation in the training data which may not always be available; MONET’s automatic annotation has the potential to help build CBMs that were previously impossible.
“While we only focused on a foundation model based on OpenAI’s CLIP model, we expect that this whole idea can be applied to other more advanced large foundation models,” Kim noted. “Nowadays, AI is developing very rapidly but our framework of using large foundation models’ amazing capabilities to improve transparency of medical AI systems will still be applicable.”
This is part of a broader research effort in the AIMS Lab to ensure AI in dermatology and medical imaging is safe, transparent and explainable. Other projects include a new framework for auditing AI-powered medical-image classifiers that can help dermatologists understand how the model determines whether an image depicts melanoma or a benign skin condition. Another paper sheds light on the reasoning process medical image classifiers use to identify patients’ sex from images of skin lesions. Additionally, counterfactual AI prompts have the potential to show how patient data may change based on genetic mutations, treatments or other factors. These research initiatives have potential applications beyond dermatology to other medical specialties, Lee said.
According to the United Nations, more than half of the world’s population — 55% — lives in urban areas, with that figure projected to rise to 70% by the year 2050. While urban communities are vibrant centers of economic, cultural and civic activity, this vibrancy tends to be accompanied by concerns about housing affordability, aging or inadequate infrastructure, environmental impacts and more.
Such vibrancy also yields a lot of data that could provide a window onto how our urban environments function, with a view to creating more sustainable and equitable communities as well as planning for future growth. But it’s difficult to extract usable insights, let alone turn those insights into practical actions, when the data is stored in different formats, spread across different systems and maintained by different agencies.
A team of researchers that includes professor Jon Froehlich, director of the Allen School’s Makeability Lab, is pursuing a more unified approach that will democratize data analysis and exploration at scale and empower urban communities. Froehlich is co-principal investigator on a new, five-year project dubbed OSCUR — short for Open-Source Cyberinfrastructure for Urban Computing — that recently earned a $5 million grant from the National Science Foundation. The project is spearheaded by the NYU Tandon School of Engineering’s Visualization Imaging and Data Analytics Research Center (VIDA) in partnership with the University of Washington and the University of Illinois Chicago.
“The true beauty and promise of OSCUR is in how it attempts to unify long-standing and deeply interconnected problems in urban science that often have disparate approaches spread across disciplines,” Froehlich said in the project announcement. “We are trying to develop standardized tools, datasets, and data standards to address problems related to climate change (e.g., urban heat island effects), walkability and bikeability, urban accessibility for people with disabilities, and more.”
To that end, Froehlich and his colleagues intend to cultivate a cohesive urban computing community spanning computer science, data science, urban planning, civil engineering, environmental sciences and other expertise. They will harness this combined wisdom to develop a set of scalable, open-source tools to enable interactive exploration and analysis of complex data, with an emphasis on findability, usability, interoperability, transparency and reproducibility. This would enable a variety of stakeholders, from researchers, to practitioners, to residents, to collaboratively address common challenges — without having to build something from scratch that may quickly become obsolete.
Communities stand to benefit from this more integrated approach in multiple ways. For example, the team envisions a set of tools that would enable agencies to make more robust use of citywide sensor data to monitor and mitigate noise pollution and improve quality of life. In addition, they could combine data from different sources to gain a more comprehensive understanding of how their infrastructure might withstand disaster — and where they may need to shore up their resilience. Such an approach would also enable communities to glean new insights into how the built environment affects pedestrian mobility, with a view to making their communities more accessible for all residents.
Froehlich is no stranger to urban accessibility issues — or collaborating with residents and decision makers to extend the real-world impact of his research. He previously co-founded Project Sidewalk, an effort that combines advances in artificial intelligence with the power of citizen science to identify and map pedestrian accessibility issues such as missing curb ramps and uneven surfaces. The initiative has spread to 21 cities in the U.S. and around the world, including Seattle and Chicago in the U.S., Mexico City, Mexico, and Amsterdam in the Netherlands. To date, contributors to Project Sidewalk have mapped more than 11,400 miles of infrastructure and contributed over one million labels — data that has been leveraged to improve city planning, build new interactive tools and train AI to automatically log accessibility issues.
“I have worked in urban computing for more than a decade,” said Froehlich. “OSCUR is one of those rare opportunities to push beyond the silos of academia and develop tools for and with communities that will take them far into the 21st century.”
When Miranda Wei attended her first Symposium on Usable Privacy and Security (SOUPS) conference in 2017, she had little experience in the field; she had only recently graduated with a degree in political science from the University of Chicago. But the community of researchers at the conference welcomed her in. That experience paved the way for her to continue doing research on privacy and security and, eventually, to pursue a Ph.D. at the Allen School.
Seven years after her first foray into the SOUPS community, Wei received the 2024 John Karat Usable Privacy and Security Student Research Award at the conference for her interdisciplinary contributions to the field and strong leadership. The award, named for the late researcher John Karat, recognizes graduate students for their research in usable privacy and security, efforts to mentor others and community service.
“As a researcher, we publish in many different venues, but SOUPS is the first conference that I went to and is the closest to my heart,” Wei said. “It’s a huge honor to be recognized for the work that I’ve done, especially as someone who came to this field from a non-traditional background.”
For Allen School professor Franziska Roesner, one of Wei’s Ph.D. co-advisors and co-director of the Security and Privacy Research Lab alongside colleague Tadayoshi Kohno, Wei is already a “superstar in usable security and privacy.” Wei’s research focuses on how societal factors can impact individuals’ security and privacy. For example, her paper presented at the 2022 SOUPS conference analyzed how TikTok creators shared information on how to leverage technology to monitor or control others, especially within families or with romantic partners. The research was one of the first to consider the platform as a data source in the field.
“When I think of thought leadership within a field, I think of those who look into places that others are not,” Kohno said. “Miranda’s work with TikTok as a data source is a great example of such leadership.”
Wei’s other work has also made strides in the privacy and security field. Her 2023 paper at the IEEE Symposium on Security and Privacy was one of the first to explore gender stereotypes within computer security and privacy practices, Roesner noted. There is still more research to do: in work presented at the 2024 USENIX Security Symposium, Wei analyzed the field’s apparent lack of knowledge on how sociodemographic factors affect computer security behaviors. For her research advancing usable security and privacy, Wei was also one of 75 graduate students from around the world selected for the 2023 Google Ph.D. Fellowship program and one in four working in the privacy and security field.
“Miranda’s work is often deep and nuanced, drawing on methodology and theory from multiple fields (such as computer security, human-computer interaction and social science) to ask fundamental questions situated in complex social and societal dynamics,” said Roesner, the Brett Helsel Career Development Professor in the Allen School. “This includes exploring constructs of power and gender, and challenging the field’s norms around what we know and how we develop knowledge.”
Outside of her research contributions, Wei is heavily involved in mentorship and community building. As a senior Ph.D. student in the Security and Privacy Research Lab, Wei works as a sounding board for other students and has served in an advisory role on multiple research projects. She also has co-founded and volunteers with the Allen School’s Pre-Application Mentorship Service (PAMS) advising prospective graduate students. At the 2024 SOUPS conference, Wei co-organized the inaugural Gender, Online Safety and Sexuality (GOSS) workshop to help integrate feminist, LGBTQ+ and critical theories into research on online safety.
“Her research vision and agenda around advancing computer security, privacy and safety for all inherently embody a global ambition for social good,” Roesner said. “She cares deeply about expanding access to opportunities for and improving the experience of people in and around computer science.”
For Wei, she did not achieve this award on her own.
“All of my research papers and projects I’ve worked on have benefited from my friends in the Security and Privacy Research Lab and my mentors across the world,” Wei said. “I really think it takes a village.”