Each year, Qualcomm recognizes exceptional Ph.D. students whose research proposals promote the company’s core values of innovation, execution and teamwork with the Qualcomm Innovation Fellowship. With the goal of enabling “students to pursue their futuristic innovative ideas,” the winning teams receive a one-year fellowship as well as mentorship from Qualcomm engineers to help their projects succeed.
Two of this year’s winning teams from North America feature Allen School students. Andrew Alex and Megan Frisella received support for their proposed project “Productive Programming of Multi-Threaded Hardware Accelerators.” Fellow Allen School Ph.D. student Zixian Ma and Yushi Hu, a Ph.D. student in the University of Washington’s Department of Electrical & Computer Engineering (ECE), also earned a fellowship for their project “Learning Multi-modal Agents to Reason and Act for Complex Multi-modal Tasks.”
Productive Programming of Multi-Threaded Hardware Accelerators
The hardware landscape of modern systems-on-chips (SoCs), which are made of a variety of digital signal processors (DSP), graphic processing units (GPU) and differing sizes of general purpose cores, is constantly evolving. Because each new generation of hardware requires an optimized kernel library to ensure that their machine learning and signal processing workloads run smoothly and efficiently, it can be difficult for performance engineers to keep up.
Performance engineers are turning to user-scheduled languages (USL), an emerging class of programming languages designed for heterogeneous hardware. They work by dividing the algorithm that specifies the program’s functional behavior from the schedule, which defines how that computation is carried out. Alex and Frisella aim to build a language system that is an extension of Exo, a popular USL that can optimize high-performance computing kernels on to new hardware accelerators, but lacks support for asynchronous parallelism or concurrency. Their proposed language system is capable of scheduling programs to exploit asynchronous and concurrent SoC targets, while also ensuring that the program’s behavior is preserved
“The tools for producing the highly-performant code that is important for fields like machine learning and signal processing have not kept pace with the ever-expanding capabilities of the hardware that runs the code,” said Alex, who is advised by Allen School professor Gilbert Bernstein. “Our project aims to remedy this gap by enabling a programmer to express this highly-performant, concurrent code as a sequence of equivalence-preserving transformations of a sequential program.”
Using such a system, performance engineers will not be limited to choices made by the optimizing compiler to transform their code using a cost model that may not reflect all the newly available features in the hardware it is targeting. Instead, engineers can apply their own domain and hardware-specific knowledge to the problem without existing tools such as compilers getting in the way — helping them write code faster and with less effort.
“Extending the power of user-scheduling to asynchronous and concurrent SoCs will unlock productivity in programming emerging hardware,” said Frisella, who is co-advised by Bernstein and faculty colleague Stephanie Wang.
Learning Multi-modal Agents to Reason and Act for Complex Multi-modal Tasks
Real-world multi-modal foundation models can help with various tasks ranging from answering simple visual questions about objects in daily life to solving more difficult problems about travel planning. Although these state-of-the-art models can answer generic and straightforward questions well, they struggle with complex questions and with generalizing about new tasks. For example, a user may take a picture of a panel showing different gas prices and ask the model how many gallons they can buy within a certain budget, but the model will have trouble answering.
To address these challenges, Ma and Hu propose to develop multi-modal agents that can explicitly reason about and act on these complex tasks using chains-of-thought-and-action. They aim to curate a new dataset with images from across various domains — such as daily life images, web screenshots and medical images — and pair it with a novel learning method.
“Our work aims to enhance open-source foundation multi-modal models’ capabilities to not only perform complex tasks through reasoning and actions but also do so in a more interpretable manner,” said Ma, who is co-advised by Allen School professor Ranjay Krishna and professor emeritus Daniel Weld.
With the large-scale dataset, Ma and Hu plan to train generalizable multi-modal agents using heterogeneous pretraining and domain-specific supervised finetuning and reinforcement learning. The researchers will build a similar architecture to that of heterogeneous pretrained transformers, which are able to combine a huge amount of data from multiple sources into one system to teach a robot an array of tasks using stems, trunks and heads.
In their proposed system, each stem features a domain-specific vision encoder that maps the visual data from various domains to visual features, or the numerical representations of an image’s visual content. The shared trunk is a transformer encoder block which connects these domain-specific visual features to shared representations in the same dimension of the text embeddings. Then the shared head, which is a decoder-only language model, takes both the visual tokens from the shared encoder as well as the text tokens of the input query, and generates the next set of text tokens following the inputs.
“This research focuses on developing artificial intelligence that can seamlessly understand, reason and generate across vision, language, audio — mirroring the way people interact with the world. By unifying these diverse streams, we aim to move beyond passive chatbots toward truly helpful agents that collaborate with humans and complete real-world tasks,” said Hu, who is co-advised by Allen School professor Noah A. Smith and ECE professor Mari Ostendorf.
The Allen School-affiliated teams are among three UW winners of the Qualcomm Innovation Fellowship this year. They are joined by ECE Ph.D. students Marziyeh Rezaei and Pengyu Zeng, who earned a fellowship to pursue their research proposal titled “Ultra-low Power Coherent Front-haul Optical Links to enable multi-Tb/s Capacity for 6G Massive MIMOs and Edge AI Datacenters.”
Read more about this year’s Qualcomm Innovation Fellowship North America recipients.



