As people engage artificial intelligence to solve problems at a human level, reliance on such technologies has unearthed difficulties in the way that language models learn from data. Often, the models will memorize the peculiarities of a dataset rather than solving the underlying task for which they were developed. The problem has more to do with data quality than size, meaning the problem cannot be corrected by simply making the dataset larger.
Enter Alisa Liu, a Ph.D. student who works with Yejin Choi and Noah Smith in the Allen School’s Natural Language Processing group. Liu seeks to overcome shortcomings in how datasets are constructed by developing new methods of human-machine collaboration to improve the reliability of resulting models. In developing this new framework, Liu also aims to root out social biases that are present within the datasets and therefore reproduced by these models.
“I hypothesize that there is great potential in leveraging language models in a controlled way to aid humans in the dataset creation process,” Liu said.
Liu’s interest in the importance of data was sparked during her time as an undergraduate at Northwestern University. There, Liu felt drawn to the possibilities that machine learning offered to harness the potential of data and develop productive tools. She soon discovered that applying AI to language, music and audio research agendas often does not get the expected results because the external and social knowledge needed to solve certain tasks cannot easily be encoded into a dataset. And even high-performing models were not always useful for end user applications. This experience led Liu to ask questions about how researchers know whether their systems have learned that which they were asked to learn, what types of prior knowledge must be encoded in datasets by researchers, and how researchers can create meaningful tools for real people.
“I saw the importance and potential of AI that can reason about, be informed by, and serve the society in which it exists,” Liu explained.
In 2020, Liu began her graduate studies at the Allen School, where she is challenging previous modes of thinking in her field and incorporating human-centered design approaches to explore how AI can serve society. She earned a 2022 NSF Graduate Research Fellowship from the National Science Foundation to advance this work.
“Alisa’s recent work has really changed my thinking and that of many others in our group about the most impactful ways to use today’s language models,” said Smith, Amazon Professor of Machine Learning at the Allen School and senior director of NLP research at the Allen Institute for AI. “She brings so much creativity and independent thinking to our collaboration. It’s inspiring!”
In collaboration with AI2, Liu developed one of her projects, WANLI, which stands for “Worker and AI Collaboration for Natural Language Inference.” Liu was lead author of the paper published in last year’s Conference on Empirical Methods in Natural Language Processing (EMNLP 2022) Findings that introduced a novel approach to how datasets are formed using a combination of machine generation and human editing. To demonstrate, the researchers developed methods to automatically identify challenging reasoning patterns in existing data, and have GPT-3 generate new related examples that were then edited by human crowdworkers. The results indicate a potential for rethinking natural language generation techniques in addition to reenvisioning the role of humans in the process of dataset creation.
“Humans are very good at coming up with examples that are correct, but it is challenging to achieve sufficient diversity across examples by hand at scale,” said Liu. “WANLI offers the best of both worlds. It couples the generative strength of AI models with the evaluative strength of humans to build a large and diverse set of high-quality examples, and do it efficiently. The next step will be to apply our approach to problems bottlenecked by a lack of annotated datasets, especially for non-English languages.”
“Alisa’s research has been extremely well received by the research community, drawn a lot of interest and inspired thought-provoking discussions,” reflected Choi, Brett Helsel Career Development Professor at the Allen School and senior research director of Mosaic at AI2. “Her innovative work is already making an impact on the field.”
In addition to her ambitious research agenda, Liu places mentorship and service at the center of her endeavors at the Allen School. Notably, Liu mentors UW undergraduates who are interested in doing research in NLP. And having begun her Ph.D. remotely as the COVID pandemic surged, Liu found other ways to support her fellow students as co-chair of the Allen School’s CARE committee, which offers a peer support network to graduate students. She also helped coordinate the Allen School’s visit days program for prospective graduate students and helped organize the Allen School’s orientation for new graduate students once they arrive on campus.
“I chose to pursue a Ph.D. not just because I enjoy thinking about research problems,” said Liu, “but because I knew I would be in a good position to direct my work toward positive applications and to bring more diverse voices into the community.”