GEN1 members tabled and gave out self-care packages for National First-Gen Day.
About a quarter of all Allen School students are first-generation or first-gen, meaning that they are one of the first in their family to pursue a Bachelor’s degree in the U.S., according to Allen School demographics data.
In 2020, a group of first-gen Allen School students including Aerin Malana (B.S. ‘22) approached Chloe Dolese Mandeville, the Allen School’s Assistant Director for Diversity & Access, and Leslie Ikeda, who manages the Allen School Scholars Program, wanting to build a club to help support and recognize their fellow first-gen students. Today, the club called GEN1 has helped first-gen students find community, connect with resources at the Allen School and University of Washington and more — all with the goal of highlighting the first-gen experience.
“A really important goal of GEN1 was to celebrate that being a trailblazer and coming into this space has a lot of weight, and also there’s a lot of joy and things to celebrate in that journey,” said former GEN1 adviser Ikeda.
Fostering community
For Ikeda, who was also a first-gen student, the experience can feel like an “invisible identity” that comes with unique challenges.
“The first-generation experience can feel very isolating, especially coming into a larger institution and a competitive program such as the Allen School,” Ikeda said. “If you don’t have the social capital or the networks, you can feel really lost.”
GEN1 hosted a coding game night in collaboration with the student organization Computing Community (COM2).
GEN1’s goal is to make the first-gen community more visible and provide members with a space to share their stories with other first-gen students. Many students come to the club looking for others who understand their experience.
“I joined GEN1 because I was looking for a computer science and engineering community who have similar backgrounds as me,” said Christy Nguyen, Allen School undergraduate student and GEN1 vice chair. “I felt really behind compared to my peers when starting my computer science journey because I couldn’t ask my parents about coding-related difficulties. I’m really happy that I’m in GEN1 because we share these experiences and help each other in times when our parents can’t.”
Over the years, the club has led multiple initiatives and programs to help students thrive in their academics, future careers and their overall wellbeing. These include bi-weekly study sessions, an alumni mentorship program, an internship program to onboard new club officers and self-care packages during midterms and finals. GEN1 also hosts social events such as pumpkin painting to help students destress and unwind. At the same time, GEN1 also collaborates with other groups at the Allen School such as Women in Computing, recognizing how “intersectional” the first-gen identity can be, Ikeda said.
Czarina Dela Cruz, Allen School undergraduate student and GEN1 chair, has been involved with GEN1 since her freshman year and ran for the position to provide other students with the same sense of community that welcomed her in.
GEN1 members at the First-Gen Graduation celebration for the class of 2024.
“As someone who came into the Allen School without a strong technical background, joining GEN1 has helped me find a community who I can rely on for advice, laughs and connections,” Dela Cruz said.
Dela Cruz said her goal for this year as the GEN1 chair is to “increase GEN1’s engagement and reach all first-gen students in the Allen School, to encourage and support as they go along their journey in computing and beyond.” For example, to celebrate National First-Generation Day on Nov. 8, GEN1 hosted a career night featuring companies such as Google and Microsoft along with technical interview workshops. The holiday commemorates the signing of the Higher Education Act, ushering in programming to support first-gen college students.
But National First-Generation Day is not the only time during the year when the Allen School highlights first-gen students. Shortly after the club started, GEN1 began hosting a first-generation Allen School graduation ceremony. The event has grown beyond just a celebration for the graduates, but a chance for the Allen School community to come together and show their support.
“This celebration aspect of GEN1 has been really impactful,” Dolese Mandeville said. “Being here at the Allen School is a huge accomplishment, but it’s also important to highlight everything they do beyond. Having GEN1 as a space for first-gen students is amazing, but I hope they know that everyone else is also rooting for them to succeed.”
Each year, the InfoSys Science Foundation (ISF) recognizes the achievements and contributions of researchers and scientists of Indian origin who are making waves in their field and beyond.
Allen School professor Shyam Gollakota, who leads the Mobile Intelligence Lab, received this year’s Infosys Prize in Engineering and Computer Science for his research that uses artificial intelligence to change the way we think about speech and audio. He is among one of six award winners who will be honored at a ceremony in Bangalore, India, next month and receive a prize of $100,000.
“This prize supports our work on creating a symbiosis between humans, hardware and AI to create superhuman capabilities like superhearing, with the potential to transform billions of headphones, AirPods, and improves the lives of millions of people who have hearing loss,” said Gollakota, the Washington Research Foundation/Thomas J. Cable Professor in the Allen School.
The award is one of the largest in India recognizing science and research excellence. This year, the ISF decided that the award will honor researchers younger than 40, “emphasizing the need for early recognition of exceptional talent,” the organization said in a statement.
For the past few years, Gollakota has been building innovative ways to boost the power of headphones using AI. Most recently, he developed a prototype for AI-powered headphones that create a “sound bubble” around the wearer. The headphones use an AI algorithm that allows the wearer to hear others speaking inside the bubble, while sounds outside of it are quieted. Gollakota said that the award money from the InfoSys Prize will go toward commercializing the technology.
These AI-enabled software that work in real-time can be difficult to run on smaller devices like headphones due to size and power restraints, however, Gollakota helped create knowledge boosting. The system can increase the performance of the small model operating on headphones using the help of a remote model running on a smartphone or in the cloud.
“His work on mobile and wireless communications is game-changing,” said Jayathi Murthy, Engineering and Computer Science Infosys Prize jury chair. “Particularly impressive is his work on active sonar systems for physiological sensing, battery-free communications and the use of AI to selectively tailor acoustic landscapes. These innovations will continue to benefit humanity for years to come.”
Allen School Ph.D. student Joe Breda holds a smartphone against a patient’s head to show how the FeverPhone app works. Dennis Wise/University of Washington
When you need to test if you are running a fever, you may not have an accurate at-home thermometer handy. However, many people may have the next best thing right in their pocket — a smartphone.
A team of researchers in the Allen School’s UbiComp Lab and UW School of Medicine developed the app FeverPhone that turns smartphones into thermometers without the need for additional hardware. The research, which was originally published in the Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, received an IMWUT Distinguished Paper Award at the ACM International Joint Conference on Pervasive and Ubiquitous Computing/International Symposium on Wearable Computing (UbiComp/ISWC) in Melbourne, Australia, in October.
Joe Breda
“It feels great to have this work spotlighted by the community like this because it brings light to the still unexpected utility inside these — already ubiquitous — devices,” lead author Joe Breda, a Ph.D. student in the Allen School, said. “I like to do this type of research as it demonstrates what can be done with the sensor hardware already in people’s pockets so the next generation of devices might include these kinds of techniques by default. This is particularly important for applications like health sensing where ensuring access to diagnostics at the population scale can have serious impacts.”
The app is the first to use smartphones’ existing sensors to gauge whether or not someone has a fever. Inside most off-the-shelf smartphones are small sensors called thermistors that monitor the temperature of the battery. These sensors happen to be the same ones clinical-grade thermometers use to estimate body temperature. The researchers found that the smartphone’s touchscreen could sense skin-to-phone contact, while the thermistors could estimate the air temperature and the rise in heat when the phone was pressed against someone’s forehead.
The team tested out FeverPhone’s temperature-sensing capabilities against a traditional oral thermometer on patients at the UW School of Medicine’s Emergency Department. FeverPhone’s readings were within the clinically acceptable range. Although the app still needs more testing before it can be widely used, FeverPhone’s potential to help during sudden times of demand when thermometers may be less available is still exciting for doctors.
“People come to the ER all the time saying, ‘I think I was running a fever.’ And that’s very different than saying ‘I was running a fever,’” said study co-author Mastafa Springston, M.D., in a UW News release. “In a wave of influenza, for instance, people running to the ER can take five days, or even a week sometimes. So if people were to share fever results with public health agencies through the app, similar to how we signed up for COVID exposure warnings, this earlier sign could help us intervene much sooner.”
Since the team published the paper last year, some smartphone makers have introduced their own body temperature sensors.
“For example, it’s not unlikely that this paper directly inspired Google Pixel to introduce this new functionality, which was exactly why I pursued this work in the first place,” Breda said. “I wanted to advocate for these big tech companies to consider minimal hardware or software changes to these ubiquitous devices to make health care more accessible. I actually met with someone on the team shortly after this work was submitted to share my findings.”
Breda has turned his attention toward other devices with potential health care capabilities. For example, he is currently researching how smartwatches can be used for early detection of influenza-like illnesses. Breda has been collaborating with the National Institutes of Health (NIH) to build models that can passively detect if the wearer is getting sick, starting on the first day of virus exposure, using signals from their heart rate, temperature and skin conductance. Last October, he traveled to Washington, D.C. to test the technology on patients at the NIH.
“In the next phase of my career, I am looking at how ubiquitous computing and artificial intelligence-powered sensing on these devices can improve public health through analyzing biomarkers for disease prevention, and even more broadly through improving urbanism and digital civics,” Breda said. “These devices offer a unique opportunity to detect, or in some cases even predict, personal and public health issues before they even happen.”
In honor of the late Allen School professor Gaetano Borriello, whose work focused on applying mobile technologies to tackle issues of global equity and social and environmental justice, the ubiquitous computing community each year recognizes Ph.D. students who are following in his footsteps.
This year, those footsteps led back to the Allen School when Ph.D. student Anandghan Waghmare won the 2024 Gaetano Borriello Outstanding Student Award at the ACM International Joint Conference on Pervasive and Ubiquitous Computing/International Symposium on Wearable Computing (UbiComp/ISWC) in Melbourne, Australia, in October. Waghmare’s contributions to the ubiquitous and pervasive computing field, the societal impact of his work as well as his community service embody Borriello’s legacy and research passions.
“This award means a lot to me. I always looked up to and respected each year’s winners for their good work and research,” Waghmare said. “This award is a significant honor in the UbiComp community.”
The focus of Waghmare’s thesis is investigating ways to add small and inexpensive pieces of hardware to existing devices to make them do more than they were designed to do. For example, Waghmare designed a smartphone-based glucose and prediabetes screening tool called GlucoScreen. More than one out of three adults in the country has prediabetes, or elevated blood sugar levels that can develop into type 2 diabetes, according to the U.S. Centers for Disease Control (CDC). However, more than 80% do not know they have the condition which can lead to complications such as heart disease, vision loss and kidney failure, the CDC found.
Current blood testing methods require visits to a health care facility, which can be expensive and inaccessible especially to those in developing countries, Waghmare explained. With GlucoScreen’s low-cost and disposable rapid tests, patients can easily screen for prediabetes at home without the need for additional equipment, and potentially reverse the condition under the care of a physician with diet and exercise. The research was published in the Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT) and presented at last year’s UbiComp/ISWC conference.
“A lot of technology is already built into these devices like smartphones — the computer power, the networking, the display,” Waghmare said. “I try to leverage everything in existing devices and add what’s missing and find clever engineering ways to do it.”
His other projects follow a similar theme. He developed WatchLink, which enables users to add additional sensors to smart watches to measure UV light, body temperature and breath alcohol levels. This research paves the way for building more cost-effective ways for personalized user-centric wearables. Waghmare also works on building wearable devices for intuitive input, such as the Z-Ring wearable that uses radio frequency sensing to facilitate context-aware hand interactions.
“Anand’s work would make Gaetano so proud given the focus on solving socially meaningful problems in a practical way using clever hardware/software solutions,” said Shwetak Patel, Waghmare’s advisor in the University of Washington’s UbiComp Lab and the Washington Research Foundation Entrepreneurship Endowed Professor in the Allen School and the UW Department of Electrical & Computer Engineering.
“Solving socially meaningful problems in a practical way”: Waghmare with his modified glucose test strip to enable at-home screening for prediabetes
Outside of his research contributions, Waghmare is heavily involved in community service and mentorship. Since 2019, he has been a peer reviewer for multiple conferences including IMWUT, Human Factors in Computing Systems (CHI) and the International Conference on Interactive Surfaces and Spaces. This year, he also volunteered as video previews co-chair at the ACM Symposium on User Interface Software and Technology (UIST), web co-chair at the International Symposium on Mixed and Augmented Reality and special topic session chair at the ACM Symposium on Spatial User Interaction. In upcoming 2025 conferences, Waghmare is serving as publications co-chair at CHI, video previews co-chair at UIST and registration co-chair at UbiComp.
At the same time, Waghmare has been working on encouraging more students to pursue STEM fields. This includes exposing students to research early in their careers and sharing his own work and experiences, as well as hosting high school students over the years. For Patel, Waghmare is a “great citizen of the research community.”
“I am excited and honored to join the ranks of the amazing researchers who have won this award before me, and I hope to inspire other Ph.D. students to apply because you never know,” Waghmare said.
Reserve Officers’ Training Corps (ROTC) members carry flags during the University of Washington’s annual Veterans Day Ceremony. Photo by University of Washington
“Having traveled to and experienced different countries, I bring a more practical and global outlook to computer science. This definitely gives me a different understanding and appreciation for the subject compared to some of my peers.”
That sentiment, offered by Fei Huang, veteran and current undergraduate, is one of the themes we found among Allen School students with military backgrounds. To mark Veterans Appreciation Week at the University of Washington, we spoke with three students on how being a veteran gives them a unique perspective on their studies and future career path.
Fei Huang
Fei Huang
Before joining the Allen School, Huang served six years in the Navy. The same thing that drove Huang to the military helped fuel his journey to the Allen School: intellectual curiosity. Huang described himself as a “curious person” and joining the Navy gave him the opportunity to see the world and experience different aspects of life. At the same time, his interest in computer science grew as he learned how the technology had transformed the world over the decades.
“I wanted to contribute to humanity and future generations, as I believe computer science is the foundation of our future world. I chose the Allen School because it’s one of the best computer science programs globally, attracting top talent and being situated in a city at the forefront of technology,” Huang said.
In the Navy, Huang learned how to quickly adapt to new environments, making for a smooth transition from military to university life. One of the biggest adjustments he faced was time management. In the Navy, Huang said he had his whole day scheduled for him; university life required him to manage his own time, but it also gave him “the freedom to pursue his passions.”
Huang’s advice for any other veterans looking to study at the Allen School is to see their military experience as a strength.
“At first, you might feel like an outsider because your way of thinking and your experiences are different from those of your classmates. Embrace this difference, and use it to lead your peers by sharing real-world perspectives,” Huang said. “The military places a strong emphasis on leadership, and for me, the most important aspect of leadership is ownership. I encourage my peers to take ownership of everything they do.”
Makalapua Goodness
Makalapua Goodness (center)
Allen School undergraduate student and veteran Makalapua Goodness grew up in a small town and enlisted in the Air Force as a way to get out and see the world. He served seven years in the military before following his interest in technology and computer science to the Allen School. For Goodness, the skills he learned during his military service helped make the transition to being a university student easier.
“Veterans and civilians have different mindsets around things. Veterans can handle adversity better and are used to handling stressful situations,” Goodness said. “That’s what I tell myself when I’m lost in a class or a project — that I’ve been through tougher times than this.”
His military background also comes out in how he approaches his assignments and coursework. In the later part of his Air Force career, he was often in a supervisor role and focused on team dynamics. Now, when working with others at the Allen School, he thinks about “how to involve and put everyone in the best position to succeed, both as a group and individually.”
He credits the other skills he gained through the military such as timeliness and being goal-oriented for helping him find success at the Allen School.
Larrianna Warner
Larrianna Warner giving a speech at UW’s annual Veterans Day Ceremony. Photo by University of Washington
Allen School undergraduate student and veteran Larrianna Warner said she was unsure exactly what she wanted to pursue after high school, but she knew she loved learning languages. That passion led her to enlist in the Air Force and serve four years as a Russian Cryptologic Language Analyst focusing on translating messages. In the Air Force, Warner became interested in large language models and natural language processing and how they can be used for both translating languages as well as intelligence purposes. Studying at the Allen School became a perfect fit for her.
“The perspective I bring to computer science is that I can see the way it can be used in military applications. I’m really dedicated to the idea of human-centered technology, specifically artificial intelligence,” Warner said. “I don’t think many people fully grasp the idea of where AI is headed in regards to the military and government sector so I think it’s important to have at least one person who really understands the impact of it all and has seen it with their own eyes.”
At the University of Washington’s annual Veterans Day Ceremony on Monday, Warner was recognized for her military service and her work as a veteran peer mentor in the Office of Student Veteran Life. As part of her role, Warner supports veterans in their transition into university life.
“It’s a huge life decision to separate from the military and come back to school and it’s very easy to question whether or not you made the right choice,” Warner said. “But as soon as I voice these thoughts to other veterans, they come running to remind me that the Allen School wouldn’t have accepted me if they didn’t see something in me, and it is my job to tell them the same thing when they voice those imposter syndrome-induced thoughts.”
For any veterans on the fence about joining the Allen School, Warner emphasized that Student Veteran Life is there to lend a helping hand and build community with other veterans on campus.
For more information on the many ways the UW community celebrates the contributions of our veterans, visit the Veterans Appreciation Week website.
While the Allen School’s annual Research Showcase and Open House highlights both the breadth and depth of computing innovation at the state’s flagship university, the 2024 event at the University of Washington last week had a decidedly AI flavor. From a presentation on advances in AI for medicine, to technical sessions devoted to topics such as safety and sustainability, to the over 100 student research projects featured at the evening poster session, the school’s work to advance the foundations of AI and its ever-expanding range of applications took center stage.
“Medicine is inherently multimodal”
In his luncheon keynote on generative AI for multimodal biomedicine, Allen School professor Sheng Wang shared his recent work towards building foundation models that bring together medical imaging data from multiple sources — such as pathology, X-ray and ultrasound — to assist doctors with diagnosing and treating disease.
“Medicine is inherently multimodal,” noted Wang. “There are lots of complicated diseases, like diabetes, hypertension, cancer, Alzheimer’s or even Covid…and we will see signals all over the body.”
The ability to capture these signals using multiple imaging modalities requires overcoming a number of challenges. For example, pathology images are too large for existing AI models to analyze in sufficiently detailed resolution — 100,000 by 100,000 pixels, large enough to cover a tennis court. Typically, images encountered by AI models are closer to 256 by 256 pixels which, in keeping with Wang’s analogy, is akin to a single tennis ball.
“In the future the AI model could be like a clinical lab test every doctor can order.” Allen School professor Sheng Wang shares his vision for using generative AI in medical imaging.
To make pathology images more manageable, Wang and his collaborators looked to generative AI. Despite the stark difference in domains, “the challenge or the solution here is very similar to the underlying problem behind ChatGPT,” Wang explained. ChatGPT can understand and summarize long documents; by converting large pathology slide images to a “long sentence” of smaller images, Wang and his colleagues determined AI could then summarize these image-sentences to obtain an overview of a patient’s status. Based on that idea, Wang and his team developed GigaPath, the first foundation model for whole-slide pathology. GigaPath, which achieved state-of-the-art performance on 25 out of 26 tasks, is “one model fits all,” meaning it can be applied to different types of cancer. Since its release, the tool is averaging 200,000 downloads per month.
One task for which AI models typically do not perform well is predicting which treatment to recommend for a particular patient. So Wang and his colleagues borrowed another concept drawn from generative AI, chain-of-thought, which calls for decomposing a complicated task into multiple, small subtasks. The model is then asked to solve those smaller tasks individually on the way to addressing the bigger, more challenging task.
“The question is, how can we apply chain-of-thought to medicine?” Wang asked. “This has never been done before.” The answer is to use clinical guidelines as the chain to instruct a large language model (LLM). By breaking the chain into subtasks such as predicting cancer subtype and patient biomarkers, the LLM then arrives at a prediction of the appropriate treatment.
Yet another challenge is how to apply AI to 3D medical imaging. Here again, Wang and his colleagues achieved a milestone by developing the first 3D OCT foundation model. OCT is short for optical coherence tomography, a type of imaging used to diagnose retinal diseases.
“Our model can comprehensively understand the entire 3D structure to make a diagnosis,” said Wang, who aims to extend this approach to other types of medical 3D imaging, like MRI and CT scans — and eventually, to create one model that can handle everything. This is challenging for even general domain machine learning; the state of the art, CLIP, is limited to two modalities, Wang noted; he wants to build a medical model that can integrate as many as nine.
To overcome the problem, Wang and his fellow researchers drew inspiration from Esperanto, a constructed language that provides a common means of communication among a group of people who speak different languages. They devised an approach, BiomedParse, in which they built one foundation model for each modality, and then projected everything into the medical imaging equivalent of Esperanto — in this case, human language in the form of text from the associated clinical reports — as the common space into which they can project the millions of images, both 2D and 3D, from the different modalities.
But Wang wants to go beyond multi-modal to multi-agent. Using the example of a molecular tumor board, in which multiple experts convene to discuss challenging cases to determine a course of treatment, he suggested that AI models developed for different imaging modalities could help doctors efficiently and accurately determine a treatment plan — analogous to a Microsoft 365 for cancer research. And while some doctors may worry about AI replacing them, Wang’s approach is focused on advancing human-AI collaboration: Medical experts still develop the high-level guidelines for the model, with the AI handling the individual steps.
“In the future the AI model could be like a clinical lab test every doctor can order,” Wang suggested. “The doctor can order an AI test to do a specific task, and then the doctor will make a decision based on the AI output.”
“It’s just really exciting to see all this great work”
The event culminated with the announcement of the recipients of the Madrona Prize, which is selected by local venture capital firm and longtime Allen School supporter Madrona Venture Group to recognize innovative research at the Allen School with commercial potential. Rounding out the evening was the presentation of the People’s Choice Award, which is given to the team with the favorite poster or demo as voted on by attendees during the event — or in this case, their top two.
Managing Director Tim Porter presented the Madrona Prize, which went to one winner and two runners up. Noting that previous honorees have gone on to raise hundreds of millions of dollars and get acquired by the likes of Google and Nvidia, he said, “It’s just really exciting to see all this great work turning into things that have long-term impact on the world through commercial businesses and beyond.”
Award winners and presenters, left to right: Magdalena Balazinska, professor and director of the Allen School; Jon Turow, partner at Madrona Venture Group; Madrona Prize runner-up Vidya Srinivas; Chris Picardo, partner at Madrona Venture Group; Madrona Prize winner Ruotong Wang; Tim Porter, managing director at Madrona Venture Group; People’s Choice winner Chu Li; and professor Shwetak Patel
Madrona Prize winner / Designing AI systems to support team communication in remote work
Allen School Ph.D. student Ruotong Wang accepted Madrona’s top prize for a pair of projects that aim to transform workplace communication — Meeting Bridges and PaperPing.
The Covid-19 pandemic has led to a rise in remote meetings, as well as complaints of “Zoom fatigue” and “collaboration overload.” To help alleviate this negative impact on worker productivity, Wang proposed meeting bridges, or information artifacts that support post-meeting collaboration and help shift work to periods before and after meetings. Based on surveys and interviews with study participants, the team devised a set of design principles for creating effective meeting bridges, such as the incorporation of multiple data types and media formats and the ability to put information into a broader context.
Meanwhile, PaperPing supports researcher productivity in the context of group chats by suggesting papers relevant to their discussion based on social signals from past exchanges, including previous paper citations, comments and emojis. The system is an implementation of Social-RAG, an AI agent workflow based on the concept of retrieval-augmented generation that feeds the context of prior interactions among the group’s members and with the agent itself into a large language model (LLM) to explain its current recommendations.
Additional authors on Meeting Bridges include Allen School alum Lin Qui (B.S. ‘23) and professor Amy Zhang, as well as Maestro AI co-founder Justin Cranshaw. In addition to Zhang and Qui, Allen School postdoc Xinyi Zhou and Allen Institute for AI’s Joseph Chee Chang and Jonathan Bragg (Ph.D., ‘18) contributed to PaperPing.
Madrona Prize runner up / Interpreting nanopore signals to enable single-molecule protein sequencing
For one of two runners up, Madrona singled out a team of researchers in the Allen School’s Molecular Information Systems Laboratory (MISL) for developing a method for long-range, single-molecule protein sequencing using commercially available nanopore sensing devices from Oxford Nanopore Technologies. Determining protein sequences, or the order that amino acids are arranged within a protein molecule, is key to understanding their role in different biological processes. This technology could help researchers develop medications targeting specific proteins for the treatment of cancer and neurological diseases such as Alzheimer’s.
Madrona Prize runner up / Knowledge boosting during low-latency inference
Another team of researchers earned accolades for their work on knowledge boosting, a technique for bridging potential communication delays between small AI models running locally on edge devices and larger, remote models to support low-latency applications. This approach can be used to improve the performance of a small model operating on headphones, for example, with the help of a larger model running on a smartphone or in the cloud. Potential uses for the technology include noise cancellation features, augmented reality and virtual reality headsets, and other mobile devices that run AI software locally.
Lead author Vidya Srinivas accepted the award on behalf of the team, which includes fellow Allen School Ph.D. student Tuochao Chen and professor Shyam Gollakota; Malek Itani, a Ph.D. student in the UW Department of Electrical & Computer Engineering; Microsoft Principal Researcher Emre Sefik Eskimez and Director of Research at AssemblyAI Takuya Yoshioka.
People’s Choice Award (tie) / AHA: A vision-language-model for detecting and reasoning over failures in robotic manipulation
An “AHA” moment: Ph.D. student Jiafei Duan (right) explains his vision-language-model for robotics
Attendees could not decide on a single favorite presentation of the night, leading to a tie for the People’s Choice Award.
While advances in LLMs and vision-language models may have expanded robots’ problem solving, object recognition and spatial reasoning capabilities, they’re lacking when it comes to recognizing and reasoning about failures — which hinders their deployment in dynamic, real-world settings. The research team behind People’s Choice honoree AHA: A Vision-Language-Model for Detecting and Reasoning over Failures in Robotic Manipulation designed an open-source VLM that identifies failures and provides detailed natural-language explanations for those failures.
“Our work focuses on the reasoning aspect of robotics, often overlooked but essentially especially with the rise of multimodal large language models for robotics,” explained lead author and Allen School Ph.D. student Jiafei Duan. “We explore how robotics could benefit from these models, particularly by providing these models with the capabilities to reason about failures in the robotics execution and hence helped with improving the downstream robotic systems.”
Using a scalable simulation framework for demonstrating failures, the team developed AHA to effectively generalize to a variety of robotic systems, tasks and environments. Duan’s co-authors include Allen School Ph.D. student Yi Ru Wang, alum Wentao Yuan (Ph.D. ‘24) and professors Ranjay Krishna and Dieter Fox; Wilbert Pumacay, a Master’s student at the Universidad Católica San Pablo; Nishanth Kumar, Ph.D. student at the Massachusetts Institute of Technology; Shulin Tian, an undergraduate researcher at Nanyang Technological University; and research scientists Ajay Mandlekar and Yijie Guo of Nvidia.
People’s Choice Award (tie) / AltGeoViz: Facilitating accessible geovisualization
The other People’s Choice Award winner was AltGeoViz, a system that enables screen-reader users to explore geovisualizations by automatically generating alt-text descriptions based on the user’s current map view. While conventional alt-text is static, AltGeoViz dynamically communicates visual information such as viewport boundaries, zoom levels, spatial patterns and other statistics to the user in real time as they navigate the map — inviting them to interact with and learn from the data in ways they previously could not.
“Coming from an urban planning background, my motivation for pursuing a Ph.D. in human-computer interaction originates from my passion for helping people design better cities,” lead author and Allen School Ph.D. student Chu Li said. “AltGeoViz represents a step towards this goal — by making spatial data visualization accessible to blind and low-vision users, we can enable broader participation in the urban planning process and shape more inclusive environments.”
Pets can do more than just provide us with companionship and cuddles. Our love for our pets can improve science education and lead to innovative ways to use augmented reality (AR) to see the world through a canine or feline friend’s eyes.
Ben Shapiro
In a paper titled “Reconfiguring science education through caring human inquiry and design with pets,” a team of researchers led by Allen School professor Ben Shapiro introduced AR tools to help teenage study participants in a virtual summer camp design investigations to understand their pets’ sensory experiences of the world around them and find ways to improve their quality of life. While science and science education typically emphasize a separation between scientists and the phenomena they study, the teens’ experience organizes learning around the framework of naturecultures, which emphasizes peoples’ relationships with non-human subjects in a shared world and encourages practices of perspective-taking and care. The team’s research shows how these relational practices can instead enhance science and engineering education.
The paper won the 2023 Outstanding Paper of the Year Award from the Journal of the Learning Sciences – the top journal in Shapiro’s field.
“The jumping off point for the project was wondering if those feelings of love and care for your pets could anchor and motivate people to learn more about science. We wondered if learning science in that way could help people to reimagine what science is, or what it should be,” said Shapiro, the co-director of the University of Washington’s Center for Learning, Computing and Imagination. “Then, we wanted to build wearables that let people put on those animal senses and use that as a way into developing greater empathy with their pets and better understanding of how animals experience the shared environment.”
Science begins at home
When the Covid-19 pandemic in 2020 pushed everything online, it was a “surprising positive” for the team’s plan to host a pet-themed science summer camp, Shapiro said. Now, teens could study how their pets’ experiences were shaped by their home environment and how well those environments satisfied pets’ preferences, and researchers could support their learning together with their pets in their homes. Shapiro and the team developed “DoggyVision” and “KittyVision” filters that used red-green colorblindness, diminished visual acuity and reduced brightness to approximate how dogs and cats see. The study participants then designed structured experiments to answer questions such as “what is my pet’s favorite color?” that were guided by the use of the AR filter tools.
“We wanted to organize student inquiry around the idea of their pets as whole beings with personalities and preferences and whose experiences are shaped by the places they are at. Those places are designed environments, and we wanted youth to think about how those designs serve both humans and non-humans,” Shapiro said. “We drew on prior work in animal-computer interaction to help students develop personality profiles of their pets called ‘pet-sonas.’”
For example, study participant Violet enjoyed buying colorful toys for her dog Billie, however, she found out using the AR filter that Billie could not distinguish between many colors. To see if Billie had a color preference, Violet designed a simple investigation where she placed treats on top of different colored sheets of papers and observed which one Billie chose. Violet learned from using the “DoggyVision” filter that shades of blue appeared bright in contrast to the treats — Billie chose treats off of blue sheets of paper in all three tests. She used the results of her experiments to further her investigations into what kinds of toys Billie would like.
In figure 1.1a, Billie chooses a treat off of a blue piece of paper. Figure 1.2 shows how the treats on colored pieces of paper look through Billie’s eyes using the DoggyVision filter.
“The students were doing legitimate scientific inquiry — but they did so through closeness and care, rather than in a distant and dispassionate way about something they may not care about. They’re doing it together with creatures that are part of their lives, that they have a lot of curiosity about and that they have love for,” Shapiro said. “You don’t do worse science because you root it in passion, love, care and closeness, even if today’s prevailing scientific norms emphasize distance and objectivity.”
Next, Shapiro is looking to explore other ways that pet owners can better understand their dogs. This includes working with a team of undergraduates in the Allen School and UW Department of Human Centered Design & Engineering to design wearables for dogs that give pet owners information about their pet’s anxiety and emotions so they can plan better outings with them.
Priyanka Parekh, a researcher in the Northern Arizona University STEM education and Learning Sciences program, is lead author of the paper. It was also co-authored by University of Colorado Learning Sciences and Human Development professor Joseph Polman and Google researcher Shaun Kane.
Read the full paper in the Journal of the Learning Sciences.
Would you call your favorite fizzy drink a soda or a pop? Just because you speak the same language, does not mean you speak the same dialect based on variations in vocabulary, pronunciation and grammar. And whatever the language, most models used in artificial intelligence research are far from an open book, making them difficult to study.
At the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) in August, Allen School researchers took home multiple awards for their work to address these challenges. Their research ranged from introducing more dialects into language technology benchmarks to evaluating the reliability and fairness of language models and increasing the transparency and replicability of large language model training as well as evaluations across languages.
Best Social Impact Paper: DialectBench
The benchmarks used in natural language processing (NLP) research and evaluation are often limited to standard language varieties, making them less useful in real-world cases. To address this gap, Allen School researchers introduced DialectBench, the first large-scale NLP benchmark for language varieties that covers 40 different language clusters with 281 varieties across 10 NLP tasks.
While DialectBench can give researchers a comprehensive overview of the current state of NLPs, it also has the potential to bring more languages under the NLP model in the future.
Yulia Tsvetkov
“Language variation like African American or Indian English dialects in NLP is often treated as noise, however in the real world, language variation often reflects regional, social and cultural differences,” said senior author and Allen School professor Yulia Tsvetkov. “We developed a robust framework to evaluate the quality of multilingual models on a wide range of language varieties. We found huge performance disparities between standard languages and their respective varieties, highlighting directions for future NLP research.”
Benchmarking helps researchers track the progress the NLP field has made across various tasks by comparing it to other standard points of reference. However, it is difficult to test the robustness of multilingual models without an established NLP evaluation framework that covers many language clusters, or groups of standard languages alongside its closely related varieties. For DialectBench, the researchers constructed several clusters such as the Hindustani cluster which encapsulated Fiji Hindi and Hindi. Then, they selected tasks that test the model’s linguistic and demographic utilities.
The researchers used DialectBench to report the disparities across standard and non-standard language varieties. For example, they found that the highest-performing varieties were mostly standard high-resource languages, such as English, and a few high resource dialects including Norwegian dialects. On the other hand, the majority of the lowest-performing language variants were also low-resourced language varieties.
As language models have become more common in commercial products, at the same time, important details about these models’ training data, architectures and development have become hidden behind proprietary interfaces. Without these features, it may be difficult to scientifically study these models’ strengths, weaknesses and their potential biases and risks.
Noah Smith
The researchers built a competitive, truly open language model, OLMo, to help fill this knowledge gap and inspire other scientists’ innovations. Alongside OLMo, the team also released its entire framework from the open training data to evaluation tools. The researchers earned Best Theme Paper at ACL for their work titled “OLMo: Accelerating the Science of Language Models.”
“Language models are a decades-old idea that have recently become the backbone of modern AI. Today the most famous models are built as commercial products by huge tech firms, and many details of their design are closely guarded secrets,” said Smith, the Amazon Professor of Machine Learning in the Allen School. “We launched the OLMo effort as a collaboration between the Allen Institute for AI and the Allen School to create a fully open alternative that scientists could study, because it’s important that we fully understand these artifacts.”
While this paper presents the team’s first release of OLMo, they intend to continue to support and extend the model and its framework, bringing in different model sizes, modalities, datasets and more. Already since OLMo’s original release, the researchers have improved the data and training; for example, the Massive Multitask Language Understanding scores, which measure knowledge acquired during pretraining, went up by 24 points to 52%.
OLMo’s efforts to progress research into language models would not be complete without its counterpart Dolma, an English corpus containing three trillion tokens from web content to scientific papers to public-domain books.
While there has been progress toward making model parameters more accessible, pretraining datasets, which are fundamental to developing capable language models, are not as open and available. The researchers built and released OLMo’s pretraining dataset “Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research” to help facilitate open research into language models — and earned Best Resource Paper at ACL in the process.
“Even among open models, there are differences in what researchers can work with. With OLMo, we wanted a competitive, strong model whose data was also fully available for inspection,” said Smith. “Dolma is the dataset used to pretrain OLMo. It is extensively documented, and the paper includes analyses and discussion of lessons learned through data curation. We also released open-source data curation tools to enable reproduction and improvement of our work.”
Like with OLMo, this is just the beginning for Dolma. The researchers continue to make advancements as part of follow-on releases that, for example, yield significant performance improvements on downstream tasks.
Additional authors on the Dolma paper include Zettlemoyer, Ravichander, Jha, Elazar, Magnusson, Morrison, Soldaini, Kinney, Bhagia, Schwenk, Atkinson, Authur, Chandu, Dumas, Lambert, Muennighoff, Naik, Nam, Peters, Richardson, Strubell, Subramani, Tafjord, Walsh, Beltagy, Groeneveld and Dodge along with Russell Authur, Ben Bogin, Valentin Hofmann and Xinxi Lyu of AI2; University of California, Berkeley Ph.D. student Li Lucy; Carnegie Mellon University Ph.D. student Aakanksha Naik; and MIT Ph.D. student Zejiang Shen.
Trying to work or record interviews in busy and loud cafes may soon be easier thanks to new artificial intelligence models.
A team of University of Washington, Microsoft and AssemblyAI researchers led by Allen School professor Shyam Gollakota, who heads the Mobile Intelligence Lab, built two AI-powered models that can help reduce the noise. By analyzing turn-taking dynamics while people are talking, the team developed the target conversation extraction approach that can single out the main speakers from background audio in a recording. Similar kinds of technology may be difficult to run in real time on smaller devices like headphones, but the researchers also introduced knowledge boosting, a technique whereby a larger model remotely helps with inference for a smaller on-device model.
Shyam Gollakota
The team presented its papers describing both innovations at the Interspeech 2024 Conference in Kos Island, Greece, earlier this month.
One of the problems Gollakota and his colleagues sought to solve was how will the AI model know who are the main speakers in an audio recording with lots of background chatter. The researchers trained the neural network using conversation datasets in both English and Mandarin to recognize “the unique characteristics of people talking over each other in conversation,” Gollakota said. Across both language datasets, the researchers found the turn-taking dynamic held up with up to four speakers in conversation.
“If there are other people in the recording who are having a parallel conversation amongst themselves, they don’t follow this temporal pattern,” said lead author and Allen School Ph.D. student Tuochao Chen. “What that means is that there is way more overlap between them and my voice, and I can use that information to create an AI which can extract out who is involved in the conversation with me and remove everyone else.”
While the AI model leverages the turn-taking dynamic, it still preserves any backchannels happening within the conversation. These backchannels are small overlaps that happen when people are talking and showing each other that they are listening, such as laughter or saying “yeah.” Without these backchannels, the recording would not be an authentic representation of the conversation and would lose some of the vocal cues between speakers, Gollakota explained.
“These cues are extremely important in conversations to understand how the other person is actually reacting,” Gollakota said. “Let’s say I’m having a phone call with you. These backchannel cues where we overlap each other with ‘mhm’ create the cadence of our conversation that we want to preserve.”
The AI model can work on any device that has a microphone and record audio, including laptops and smartphones, without needing any additional hardware, Gollakota noted.
Additional co-authors on the target conversation extraction paper include Malek Itani, a Ph.D. student in the UW Department of Electrical & Computer Engineering, Allen School undergraduate researchers Qirui Wang and Bohan Wu (B.S., ‘24), Microsoft Principal Researcher Sefik Emre Eskimez and Director of Research at AssemblyAI Takuya Yoshioka.
Turning up the power: Knowledge boosting
Target conversation extraction or other AI-enabled software that work in real-time would be difficult to run on smaller devices like headphones due to size and power restraints. Instead, Gollakota and his team introduced knowledge boosting, which can increase the performance of the small model operating on headphones, for example, with the help of a remote model running on a smartphone or in the cloud. Knowledge boosting can potentially be applied to noise cancellation features, augmented reality and virtual reality headsets, or other mobile devices that run AI software locally.
However, because the small model has to feed information to the larger remote model, there is a slight delay in the noise cancellation.
”Imagine that while I’m talking, there is a teacher remotely telling me how to improve my performance through delayed feedback or hints,” said lead author and Allen School Ph.D. student Vidya Srinivas. ”This is how knowledge boosting can improve small models’ performance despite large models not having the latest information.”
To work around the delay, the larger model attempts to predict what is going to happen milliseconds into the future so it can react to it. The larger model is “always looking at things which are 40–50 milliseconds in the past,” Gollakota said.
The larger model’s prediction capabilities open up the door for further research into AI systems that can anticipate and autocomplete what and how someone is speaking, Gollakota noted.
In addition to Gollakota and Srinivas, co-authors on the knowledge boosting paper include Itani, Chen, Eskimez and Yoshioka.
This is the latest work from Gollakota and his colleagues to advance new AI-enabled audio capabilities, including headphones that allow the wearer to focus on a specific voice in a crowd just by looking at them and a system for selecting which sounds to hear and which ones to cancel out.
Determining protein sequences, or the order that amino acids are arranged within a protein molecule, is key to understanding their role in different biological processes and diseases. However, current methods for protein sequencing including mass spectrometry are limited and may not be sensitive enough to capture all the varying combinations of molecules in their entirety.
Jeff Nivala
In a recent paper published in the journal Nature, a team of University of Washington researchers introduced a new approach to long-range, single-molecule protein sequencing using commercially available devices from Oxford Nanopore Technologies (ONT). The team, led by senior author and Allen School research professor Jeff Nivala, demonstrated how to read each protein molecule by pulling it through a nanopore sensor. Nanopore technology uses ionic currents that flow through small nanometer-sized pores within a membrane, enabling the detection of molecules that pass through the pore. This can be done multiple times for the same molecule, increasing the sequencing accuracy.
The approach has the potential to help researchers gain a clearer picture of what exists at the protein level within living organisms.
“This research is a foundational advance towards the holy grail of being able to determine the sequence of individual full-length proteins,” said Nivala, co-director of the Molecular Information Systems Lab (MISL).
The technique uses a two-step approach. First, an electrophoretic force pushes the target proteins through a CsgG protein nanopore. Then, a molecular motor called a ClpX unfoldase pulls and controls the translocation of the protein back through the nanopore sensor. Giving each protein multiple passes through the sensor helps eliminate the “noise associated with a single read,” Nivala explained. The team is then able to take the average of all the passes to get a more accurate sequencing readout as well as a detailed detection of any amino acid substitutions and post-translational modifications across the long protein strand.
This method differs from mass spectrometry, which does not look at each individual molecule, but takes the average of an ensemble of different proteins to characterize the sample — potentially losing out on information as each protein can have multiple variations within a cell, Nivala noted.
“One major advantage of nanopore technology is its ability to read individual molecules. However, analyzing these signals at the single-molecule level is challenging because of the variability in the signals, which persist to some extent even after applying normalization and alignment algorithms,” said co-lead author Daphne Kontogiorgos-Heintz, an Allen School Ph.D. student who works with Nivala in the MISL. “This is why I am so excited that we found a method to reread the same molecule multiple times.”
With a more detailed understanding of the protein sequences, this technology can help researchers develop medications that can target specific proteins, tackling cancer and neurological diseases like Alzheimer’s, Nivala explained.
“This will shed light into new diagnostics by having the ability to determine new biomarkers that might be associated with disease that currently we’re not able to to to read,” Nivala said. “It will also develop more opportunities to find new therapeutic targets, because we can find out which proteins could be manifesting the disease and be able to now target those specific variants.”
While the technology can help analyze natural biological proteins, it can also help read synthetic protein molecules. For example, synthetic protein molecules could be designed as data storage devices to record the molecular history of the cell, which would not be possible without the detailed readings from nanopore sensing, Nivala explained. The next step for this research would be working toward increasing the accuracy and resolution to achieve de novo sequencing of single molecule proteins using nanopores, which does not require a reference database.
Nivala and the team were able to conduct this research by modifying ONT technology toward nanopore protein sequencing.
“This study highlights the remarkable versatility of the Oxford Nanopore sensing platform,” said Lakmal Jayasinghe, the company’s SVP of R&D Biologics. “Beyond its established use in sequencing DNA and RNA, the platform can now be adapted for novel applications such as protein sequencing. With its distinctive features including portability, affordability and real-time data analysis, researchers can delve into proteomics at an unprecedented level by performing sequencing of entire proteins using the nanopore platform. Exciting developments lie ahead for the field of proteomics with this groundbreaking advancement.”