Allen School News » Lost in translation no more: IBM Fellowship winner Akari Asai asks — and answers — big questions in NLP to expand information access to all

Lost in translation no more: IBM Fellowship winner Akari Asai asks — and answers — big questions in NLP to expand information access to all

Portrait of Akari Asai wearing grey floral lace top with black trim and dangling earrings against a grey background

Growing up in Japan, Akari Asai never imagined that she would one day pursue a Ph.D. at the Allen School focused on developing the next generation of natural language processing tools. Asai hadn’t taken a single computing class before her arrival at the University of Tokyo, where she enrolled in economics and business courses; her first foray into computer science would come thousands of miles from home, while studying abroad at the University of California, Berkeley. The experience would alter the trajectory of her academic career and put her on a path to solving problems on a global scale.

“I changed my major in the middle of my undergraduate studies, and I wished I had discovered computer science and opportunities for pursuing my career abroad earlier,” said Asai. “My own situation made me realize the importance of information access for everyone.”

That realization led Asai to pursue her Ph.D. at the University of Washington, where she is now in the business of developing next-generation AI algorithms that offer rich natural language comprehension using multi-lingual, multi-hop and interpretable reasoning working with Allen School professor Hannaneh Hajishirzi in the H2Lab.

“Akari is very insightful and cares deeply about the impact of her work,” observed Hajishirzi, who is also senior research manager in the Allen Institute for AI’s AllenNLP group. “She is bridging the gap between research and real-world applications by making NLP models more efficient, more effective, and more inclusive by extending their benefits to languages other than English that have been largely ignored.”

More than 7,100 languages are spoken in the world today. While English is the most prevalent, spoken by nearly 1.5 billion people, the global population is nearing 8 billion — meaning a significant proportion is excluded from the benefits of today’s powerful NLP models. Asai is trying to close this gap by enabling universal question answering systems that can read and retrieve information across multiple languages. For example, she and her collaborators introduced XOR-TyDi QA, the first large-scale annotated dataset capable of open-ended information retrieval across seven different languages other than English. The approach — XOR QA stands for Cross-lingual Open Retrieval Question Answering — enables questions written in one language to be answered using content expressed in another.

Asai also contributed to CORA, the first unified multilingual retriever-generator framework that can answer questions across many languages — even in the absence of language-specific annotated data or knowledge sources. CORA, short for Cross-lingual Open-Retrieval Answer Generation, employs a dense passage retrieval algorithm to pull information from Wikipedia entries, irrespective of language boundaries; the system relies on a multilingual autoregressive generation model to answer questions in the target language without the need for translations. The team incorporated an iterative training method that automatically extends the annotated data previously only available in high-resource languages to low-resource ones.

“We demonstrated that CORA is capable of answering questions across 28 typologically different languages, achieving state-of-the-art results on 26 of them,” Asai explained. “Those results include languages that are more distant from English and for which there is limited training data, such as Hebrew and Malay.”

Language is not the only barrier Asai is working to overcome. The massive computational resources required to operate the latest, greatest language models, which few groups can afford, also puts them out of reach for many. Asai is making strides on this problem, too, recently unveiling a new multi-task learning paradigm for tuning large-scale language models that is modular, interpretable and parameter-efficient. In a preprint, Asai and her collaborators explained how ATTEMPT, or Attentional Mixture of Prompt Tuning, meets or exceeds the performance of full fine-tuning approaches while updating less than one percent of the parameters required by those other methods.

Asai is also keenly interested in the development of neuro-symbolic algorithms that are imbued with the ability to deal with complex questions. One example is PathRetriever, a graph-based recurrent retrieval method that learns to retrieve reasoning paths over the Wikipedia graph to answer multi-hop open-domain questions at web scale. By leveraging a reading comprehension model alongside the retriever model, Asai and her colleagues enabled PathRetriever to explore more accurate reasoning paths in answer to complex questions compared to other methods. Some of her co-authors subsequently adapted the system to enable complex queries of scientific publications related to COVID-19.

Ultimately, Asai intends to integrate the various facets of her research into a general-purpose, lightweight retriever and neuro-symbolic generator that will be capable of performing complex reasoning over diverse inputs while overcoming data scarcity. Having earned a 2022 IBM Ph.D. Fellowship earlier this year to advance this work, Asai’s ambition is to eliminate the disparity between the information “haves” and “have nots” by providing tools that will empower anyone to quickly and easily find what they need online — in multiple languages as well as multiple domains.

“Despite rapid progress in NLP, there are still several major limitations that prevent too many people from enjoying the benefits of that progress,” she explained. “My long-term research goal is to develop AI agents that can interact with broad swaths of internet users to answer their questions, giving everyone equal access to information that might otherwise be limited to certain default audiences.”

Her commitment to promoting equal access extends beyond information retrieval to include the field of NLP itself; to that end, Asai is an enthusiastic mentor to students from underrepresented backgrounds.

“I’m excited to continue making progress on my own research interests,” said Asai, “but I hope to also inspire the next generation of researchers in AI.”

Way to go, Akari!

Published by kristin on October 20, 2022