At the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Allen School researchers brought home multiple awards for their work that is moving natural language processing research forward. Their projects ranged from laying the foundation for how artificial intelligence systems understand and follow human instructions to exploring how large language models (LLMs) pull responses from their training data — and more.
TACL Test of Time Award: Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions
From robotics to voice assistants, today’s AI systems rely on their ability to understand human language and interact naturally with it.
In their 2013 paper “Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions,” Allen School professor Luke Zettlemoyer and then student Yoav Artzi (Ph.D., ‘15), now a professor at Cornell Tech, set the groundwork for this capability with an approach for teaching models to follow human instructions without the need for detailed manual explanations. For their lasting contributions to the field, the researchers received the Test of Time Award as part of the inaugural Transactions of the Association for Computational Linguistics (TACL) Paper Awards presented at ACL 2025.
“This was the first paper in a line of work on learning semantic parsers from easily gathered interactions with an external world, instead of requiring supervised training data,” Zettlemoyer said. “Although the form of the models has changed significantly over time, this style of learning is still relevant today as an early precursor to techniques such as Reinforcement Learning with Verifiable Rewards (RLVR).”
Zettlemoyer and Artzi developed the first comprehensive model that tackles common issues that arise with learning and understanding unrestricted natural language. For example, if you told a robot to follow the navigation instructions “move forward twice to the chair and at the corner, turn left to face the blue hall,” it would need to solve multiple subproblems to interpret the instructions correctly. It would have to resolve references to specific objects in the environment such as “the chair,” clarify words based on context, and also understand implicit requests like “at the corner” that provide goals without specific steps.
To address these challenges without the need for intensive engineering effort, the duo developed a grounded learning approach that can jointly reason about meaning and context, and then continue to learn from their interplay. It uses Combinatory Categorial Grammar (CCG), which is a framework that assigns words to different syntactic categories such as noun or prepositional phrase, efficiently parsing complex instructional language for meaning by mapping them into logical expressions. Then, the weighted CCG ranks possible meanings for each instruction.
This joint model of meaning and context allows for the system to continue to learn from situated cues, such as the visible objects in the environment. For example, in the earlier set of navigation instructions, “the chair” can refer to multiple different objects such as chairs, barstools or even recliners. While the CCG framework would include a lexical item for each meaning of “the chair,” the execution of the task might fail depending on what objects are in the world. It allows for the system to learn by watching examples play out, and then following if the actions lead to successful outcomes such as completing the task or reaching a destination.
The researchers tested their method using a benchmark navigational instructions dataset and found that their joint approach successfully completed 60% more instruction sets compared to the previous state-of-the-art methods.
Read the full paper, as well as a related Cornell Tech story.
Outstanding Paper Award: Byte Latent Transformer: Patches Scale Better Than Tokens
Allen School Ph.D. students Artidoro Pagnoni and Margaret Li earned an ACL Outstanding Paper Award for research done at Meta with their advisor Zettlemoyer, who is also the senior research director at Meta FAIR.
Alongside their collaborators, they introduced the Byte Latent Transformer (BLT), a new byte-level LLM architecture that is the first to be able to match the more standard tokenization-based LLM performance at scale. At the same time, BLT is also able to improve efficiency and increase robustness to noisy data.
Many existing LLMs are trained using tokenization, where raw text is broken down into more manageable tokens which then serve as the model’s vocabulary. This process was essential because training LLMs directly with bytes was cost prohibitive at scale. However, these tokens can influence how string is compressed and lead to issues such as domain sensitivity.
Instead, BLT groups bytes into dynamically-sized patches, serving as the primary units of computation. These patches are then segmented based on the entropy of the next byte, allowing the system to allocate more model capacity where needed. For example, higher entropy indicates a more complex sequence which can then prompt a new, shorter patch. In the first Floating-Point Operations (FLOP) controlled scaling study of byte-level models, the team found that BLT’s performance was on par or superior to models such as Llama 3. With its efficiency and adaptability, the researchers position BLT as a promising alternative to the traditional token-based models available.
Additional authors include Ram Pasunuru, Pedro Rodriguez, John Nguyen, Benjamin Muller, Chunting Zhou, Lili Yu, Jason Weston, Gargi Ghosh, former Allen School postdoc Mike Lewis and Srinivasan Iyer (Ph.D., ‘19) at Meta, and Ari Holtzman (Ph.D., ‘23), now faculty at University of Chicago.
Read the full paper on BLT here.
Best Demo Paper: OLMoTrace
As LLMs become increasingly popular in higher-stake scenarios, it is important to understand why they generate certain responses and where they get their answers from. Fully open language models such as OLMo have been trained on trillions of tokens that everyone can access, but current behavior tracing methods are not scaled to work within this multi-trillion-token setting.
To address this, a team of researchers at the Allen School and the Allen Institute for AI (Ai2) introduced OLMoTrace, the first system that allows users in real time to explore how LLM outputs connect back to their training data. For the research’s innovation and practical application, the team received the ACL 2025 Best Demo Paper Award.
“Today’s large language models are so complex that we barely understand anything about how they generate the responses we see,” said lead author and Allen School Ph.D. student Jiacheng Liu. “OLMoTrace is powered by a technology I previously developed at UW, ‘infini-gram,’ with numerous system optimizations enabling us to deliver instant insights to how LLMs likely have learned certain phrases and sentences.”
The OLMoTrace inference pipeline works by scanning the LLM’s output and identifying long, unique and relevant text spans that appear verbatim in the model’s training data. For each span, the system retrieves up to 10 snippets from the training data that contain the span, prioritizing the most relevant documents. Finally, the system does some post-processing on the spans and document snippets, and presents them to the user through the chat interface. OLMoTrace is publicly available in the Ai2 model playground with the OLMo 2 family of models.
The researchers proposed multiple practical applications for OLMoTrace. For example, if the model generates a fact, users can look back to the training data to fact check the statement. It can also reveal the potential source of seemingly creative and novel LLM-generated expressions. In addition, OLMoTrace can help debug erratic LLM behaviors, such as hallucinations or incorrect self-knowledge, which are crucial to address as LLMs become increasingly more commonplace, Liu explained.
Additional authors include Allen School professors and Ai2 researchers Ali Farhadi, Hannaneh Hajishirzi, Pang Wei Koh and Noah A. Smith, along with Allen School Ph.D. students Arnavi Chheda-Kothary and Rock Yuren Pang. The team also includes Taylor Blanton, Yanai Elazar, Sewon Min (Ph.D., ‘24), Yen-Sung Chen, Huy Tran, Byron Bischoff, Eric Marsh, Michael Schmitz (B.S., ‘08), Cassidy Trier, Aaron Sarnat, Jenna James, Jon Borchardt (B.S., ‘01), Bailey Kuehl, Evie Cheng, Karen Farley, Sruthi Sreeram, Taira Anderson, David Albright, Carissa Schoenick, Luca Soldaini, Dirk Groeneveld, Sophie Lebrecht and Jesse Dodge of Ai2, along with former Allen School professor Yejin Choi, now a faculty member at Stanford University.
Read the full paper on OLMoTrace.
ACL Dissertation Award: Rethinking Data Use in Large Language Models
For her Allen School Ph.D. dissertation titled “Rethinking Data Use in Large Language Models,” Sewon Min, now faculty at University of California, Berkeley, received the inaugural ACL Dissertation Award. In her work, Min tackled fundamental issues that current language models face, such as factuality and privacy, by introducing a new class of language models and alternative approaches for training such models.
This new class of models, called nonparametric language models, is able to identify and reason with relevant text from its datastore during inference. Compared to conventional models that have to remember every applicable detail from their training set, models with a datastore available at inference time have the potential to be more efficient and flexible.
Nonparametric language models can also help address the legal constraints that traditional models often face. Language models are commonly trained using all available online data, which can lead to concerns with copyright infringement and crediting data creators. Min developed a new approach where language models are trained solely on public domain data. Copyrighted or other high-risk data is then kept in a datastore that the model can only access during inference and which can be modified at any time.
In addition to receiving the ACL Dissertation Award, Min has also earned honorable mentions for the ACM Doctoral Dissertation Award from the Association for Computing Machinery and the Association for the Advancement of Artificial Intelligence Doctoral Dissertation Award.


