Allen School News » Four Allen School undergraduates honored by the Computing Research Association for advancing work in AI, robotics, LLMs and more

Four Allen School undergraduates honored by the Computing Research Association for advancing work in AI, robotics, LLMs and more

Last month, the Computing Research Association (CRA) recognized a select group of undergraduate students from across North America who have made notable contributions to the field through research. This year’s cohort in the CRA Outstanding Undergraduate Researcher Awards included four Allen School undergraduates — awardee Haoquan Fang, finalist Hao Xu and honorable mention recipients Kaiyuan Liu and Lindsey Wei.

“At the Allen School, we have developed a sequence of seminars that introduce undergraduate students to research and give them the opportunity to work on a hands-on research project with a faculty or graduate student mentor,” said Allen School professor Leilani Battle, who co-chairs the undergraduate research committee alongside colleague Maya Cakmak.

“Through these research opportunities, students see a side of computer science that they may not encounter in their classes or internships such as learning how to identify new research problems or learning to draw connections between different areas of computer science,” Battle continued.

From developing policies to help robots learn to introducing methods to better understand the training data behind large language models (LLMs), these nationally recognized students have demonstrated how to take what they learned in the classroom and make a real-world difference.

Haoquan Fang: Enhancing the reasoning capabilities of robots

Haoquan Fang

Award winner Haoquan Fang aims to tackle one of the significant challenges in today’s embodied AI models — how to equip robots with robust, generalizable and interpretable reasoning abilities. In his work, Fang bridges these domains and paves the way for robots that can understand the world and act with purpose.

“I am broadly interested in robot learning. In particular, I focus on developing generalist robotic manipulation policies that leverage strong priors, by optimizing both the data curation and model architectures,” said Fang.

Fang proposed a new model that integrates memory into robotic manipulation. Alongside Ranjay Krishna, Fang spearheaded the development of SAM2Act, a multi-view robotic transformer-based policy that leverages the visual foundation model Segment Anything Model 2 (SAM2) to achieve state-of-the-art performance on existing benchmarks for robotic manipulation. He then built off that architecture to introduce SAM2Act+, which extends the SAM2Act system with memory-based components. This policy enables the agent to better predict actions based on past spatial information, thus enhancing performance in sequential decision-making tasks. Fang and his collaborators published this work at the 42nd International Conference on Machine Learning (ICML 2025), and received the Best Paper Award at the RemembeRL Workshop at the 9th Annual Conference on Robot Learning (CoRL 2025). Last year, his senior thesis on SAM2Act was awarded Best Senior Thesis Honorable Mention from the Allen School.

Fang also co-led the introduction of the first fully open action reasoning model for robotics with MolmoAct. The model, which was designed by a team of University of Washington and Allen Institute for Artificial Intelligence (Ai2) researchers, enables robots to interpret and understand instructions, sense their environment, generate spatial plans, and then execute them as goal-directed trajectories. Across various benchmarks, MolmoAct outperformed multiple competitive baselines, including NVIDIA’s GR00T N1.5. MolmoAct received the People’s Choice Award at the Allen School’s 2025 Research Showcase, as well as the Best Paper Award runner-up at the Rational Robots Workshop at CoRL 2025. The State of AI Report 2025 also highlighted MolmoAct for setting the standard of embodied reasoning, which was later adopted by Google Gemini Robotics 1.5.

Hao Xu: Making internet-scale corpora searchable

Hao Xu

Large language model behavior is shaped by their training data and tokenization. For Hao Xu, understanding the composition of these models’ training data is increasingly more important as the “data scales beyond what is practical to inspect.” Today’s LLM are trained on massive, Internet-scale text datasets, however, it is difficult to analyze and understand the quality and content of these corpora.

“My research interests lie in natural language processing with a focus on large language models. My future work aims to develop more efficient model-data interactions that move beyond today’s brute-force training paradigm,” said Xu. “As a violinist, I also view music as a distinct form of language and am interested in studying how it can be modeled and learned using language modeling techniques.”

Xu’s primary research focuses on bridging this gap. Alongside Allen School professors Hannaneh Hajishirzi and Noah A. Smith and Ph.D. student Jiacheng Liu, Xu developed infini-gram mini, an efficient exact-match search engine that is designed to work on internet-scale corpora with minimal storage needs. The system makes several open source corpora, such as Common Crawl, searchable, and it currently hosts the largest body of searchable text in the open-source community.

Using infini-gram mini, Xu and her collaborators revealed significant widespread contamination across standard LLM evaluation benchmarks. This is where the LLM training data inadvertently contains the test data. Their results raise concerns on how researchers measure artificial intelligence progress, and have sparked new conversations about evaluation integrity and LLM dataset transparency. As lead author, Xu presented the research at last year’s Conference on Empirical Methods in Natural Language Processing (EMNLP 2025) where she and the team received the Best Paper Award.

Xu is also interested in understanding the fundamentals of LLMs, such as tokenization that “shapes how the models interact with text.” She undertook the first systematic examination of how tokenization mechanisms prevalent in English fail in other languages with different morphology or writing systems. The paper presenting these findings, for which she was first author, is currently under review.

Kaiyuan Liu: Strengthening the reasoning capabilities of LLMs

Kaiyuan Liu posing with the Statue of Liberty in the background. — Kaiyuan Liu

Kaiyuan Liu aims to build reasoning-capable AI models. His background in competitive programming informs his research as he develops tests and benchmarks for LLMs’ proficiency in reasoning and self-correction.

“My research goal is to understand and improve the reasoning capabilities of large language models,” said Liu. “This goal emerges from two converging paths: years of competitive programming, which trained me to value algorithmic precision and creativity, and a broader curiosity about how intelligent systems — biological and artificial — learn, reason and cooperate.”

Writing competitive programming problems is a time-consuming task, requiring programmers to set multiple variables and constraints including input distributions, edge cases and specific algorithm targets. This makes it an ideal test for general LLM capabilities. Liu and his collaborators, including Allen School professor Natasha Jaques, developed AutoCode, a closed-loop multi-role framework that automates the entire process of competitive programming problem creation and evaluation. AutoCode can detect with 91% accuracy whether or not a program is a valid solution to a given algorithmic problem. The framework has potential industry usefulness especially as more and more large companies attempt to leverage LLMs to write and submit code independently. The team will present their findings at the 14th International Conference on Learning Representations (ICLR 2026) in April.

Liu also helped develop a set of benchmarks to better evaluate LLMs’ reasoning capabilities in competitive programming. LiveCodeBench Pro is composed of high-quality, live-sourced programming problems from sources such as Codeforces and the International Collegiate Programming Contest (ICPC) that vary in difficulty and are continuously updated to reduce the chance of data contamination. The researchers paired large-scale LLM studies with expert annotations and found that frontier models are proficient in solving implementation-oriented problems, but struggle with complex algorithmic reasoning, nuanced problem-solving and handling edge cases — failing on some of the benchmark’s most difficult problems. LiveCodeBench Pro has already had industry impact as the benchmark was selected for the Gemini 3 launch evaluation. Liu and his collaborators presented LiveCodeBench Pro at the 39th Annual Conference on Neural Information Processing Systems (NeurIPS 2025) last December.

Outside of his research, Liu and his team were ICPC 2024 World Finalists for competitive programming. More recently, he coached the UW programming team Jiakrico that won first place at the ICPC PNW Regional competition last November.

Lindsey Wei: Building efficient LLM-powered data systems

Lindsey Wei

Modern data systems power nearly every aspect of our digital world, yet growing data complexity and heterogeneity make it increasingly difficult for systems to consistently interpret and process data at scale. Lindsey Wei focuses on developing “intelligent and reliable data systems that reason about data semantics to make data-driven decision-making more accessible.”

“Large language models open up new opportunities for how systems understand and interact with data,” said Wei. “But integrating these capabilities into data systems in a systematic way remains challenging.”

One setting where these challenges arise is table understanding, which focuses on recovering missing semantic metadata from web tables and is crucial for data integration. Existing LLM-based methods have limitations including hallucinations and lack of domain-specific knowledge. To address this, Wei, alongside Allen School professor and director Magdalena Balazinska, developed RACOON, a framework that augments LLMs with facts retrieved from a knowledge graph through retrieval-augmented generation (RAG) to significantly improve zero-shot performance. Next, Wei aims to extend the system via RACOON+, further improving its accuracy and robustness by strengthening how models link to and reason over external knowledge.

Inspired by how inference-time techniques such as RAG can unlock LLMs’ reasoning over structured data, Wei began exploring how to extend these reasoning capabilities to unstructured data processing — a longstanding challenge in data management. With a team of University of California, Berkeley and Google researchers, Wei developed MOAR (Multi-Objective Agentic Rewrites), a new optimizer for DocETL, an open-source system for LLM-powered unstructured data processing at scale. MOAR introduces a global search algorithm that explores a vast space of possible pipeline rewrites to identify those with the best accuracy–cost tradeoffs under a limited evaluation budget. In experiments across six real-world workloads, MOAR consistently discovered pipelines that were both more accurate and significantly cheaper than prior approaches. The team recently released a preprint of this work, highlighting the need to rethink how optimization is designed for LLM-powered data systems.

In addition to designing LLM-powered data systems, Wei has also helped develop a graphical user interface for MaskSearch, a system that accelerates queries over databases of machine learning-generated image masks, leading to improved model debugging and analysis workflows.

Published by Kristine White on February 19, 2026