Allen School News » Allen School’s Leilani Battle earns TCDE Rising Star Award for advancing new tools and techniques for exploring massive datasets

Allen School’s Leilani Battle earns TCDE Rising Star Award for advancing new tools and techniques for exploring massive datasets

Portrait of Leilani Battle with her hair up in front of green leaves

Allen School professor and alumna Leilani Battle (B.S., ‘11) is building a career out of building better ways to deal with data. Her research, which looks beyond conventional data management techniques to incorporate human behavior and preferences, enables analysts to spend more time engaging with the data they need, and less time searching and waiting for it to load. Recognizing the transformational impact and future potential of Battle’s work, the IEEE Computer Society’s Technical Committee on Data Engineering recently recognized her with its TCDE Rising Star Award for “contributions to interactive data-intensive systems for exploratory data analysis.”

Battle joined the Allen School faculty in the summer of 2020 after spending three years as a professor at the University of Maryland, College Park. The move was akin to a second homecoming; having earned her bachelor’s degree at the University of Washington, she later went on to complete a postdoc working with professor Jeffrey Heer in the Interactive Data Lab — of which she is now co-director — in between earning a Ph.D. from MIT and launching her faculty career. Shortly before her return to Seattle, Battle was named one of MIT Technology Review’s 35 Innovators under 35; according to Heer, she is pushing the state of the art across multiple areas of computer science.

“Leilani is transforming how people explore and analyze data on a massive scale through her pursuit of a deeper understanding of users’ goals, strategies and behaviors, which she then leverages to develop novel systems and optimization methods,” said Heer. “She has shown a remarkable ability to bridge several subfields, from databases to visualization to human-computer interaction. By combining them in new and interesting ways to provide practical tools for scientists and analysts, she is advancing all three.”

Since the early days of her research career — she got her start as an undergraduate research assistant in the UW Database Group — Battle has been interested in how people interact with data and ways to reduce friction in those interactions. One of her initial and highly influential contributions was ForeCache, a user-friendly tool that reduces lag time, or latency, in interactive data visualization systems following an approach that Battle describes as “behavior-driven optimization.” This optimization is driven by ForeCache’s built-in prediction engine, which enables more efficient data exploration by anticipating and prefetching the results of queries it deems the user is most likely to want next based on past interaction patterns. The system also adapts its predictions based on actual usage patterns over time to improve future performance. In their paper presented at the Association for Computing Machinery’s 2016 Conference of the Special Interest Group on Management of Data (SIGMOD), Battle and her co-authors highlighted ForeCache’s dramatic improvements in latency compared to systems that do not rely on prefetching — reducing lag time by up to 430% — and significant improvements in both latency (88%) and accuracy (25%) over existing state-of-the-art prefetching techniques.

“Modeling user behaviors during interactive data exploration enables us to predict what interactions analysts will want to perform in our visualization interface,” Battle explained. “We can then use these models to preemptively execute database queries ahead of users as they explore. This helps analysts to focus more on their data and less on latency issues.”

Battle was keen to understand just how influential those latency issues are — and what other elements may be at play. By carefully studying how users interacted with visualization interfaces in the performance of different tasks, she and her collaborators have debunked some of the prevailing wisdom around what and how various factors shape users’ experiences with such systems.

For example, in a paper presented at EuroVis 2019 organized by the Eurographics Working Group on Data Visualization and IEEE Visualization and Graphics Technical Committee, Battle and Heer examined the impact of latency on exploratory visual analysis performed in Tableau. They compared their findings to the existing literature that reinforced the assumption that slower interfaces lead analysts to perform fewer interactions and gain fewer insights from their data. Battle and Heer discovered that, in practice, the issue is more nuanced; while latency does have an impact, it has been overstated relative to other factors such as the difficulty of the task users are trying to perform and the pacing of interactions based on “think time.” In another paper, this one appearing at the IEEE Information Visualization conference (InfoVis 2019), Battle and a multi-institutional team of researchers analyzed the impact of latency on visual search through a series of studies on Amazon Mechanical Turk. They found that while latency is a statistically significant predictor of user behavior under some conditions, in other cases, factors such as task type, task complexity and total interactions performed render latency virtually meaningless.

“What this research showed is that latency can have a more subtle and gradual effect than previously believed,” said Battle. “It’s still a factor, but it’s not the only one. This is a useful insight for designing better evaluations that reflect how people interact with these systems in the real world.”

Another previous belief that Battle subsequently turned on its head was the presumption that optimization schemes for well-researched use cases such as online analytical processing (OLAP) are also applicable to interactive scenarios. In a paper that appeared at SIGMOD 2020, she and her colleagues presented a novel benchmark for evaluating the suitability of database systems for supporting ad-hoc, real-time data exploration. Unlike other benchmarks for measuring database system performance, the new framework captures the cadence and flow of real users’ data explorations to more accurately reflect the dynamic nature of the associated queries.

“The patterns of queries we see issued to database systems are fundamentally different between the interactive and OLAP scenarios. The performance expectations are also different,” Battle noted. “Our paper provided the first empirical evidence of this mismatch, which had already been informally observed in the industry.”

Some of Battle’s latest work has focused on another potential mismatch, this time between users and designers of automated visualization recommendation systems. One of the selling points of these systems is that they alleviate some of the decision-making burden for analysts by suggesting which variables to focus on; by pointing users to the most salient visualizations and insights, the system speeds up the exploration process. In a paper published at the ACM Conference on Computer-Human Interaction (CHI 2021), Battle and her colleagues assessed how people actually use auto-generated visualizations to understand how user attitudes and expectations shape their results. While their findings partially backed up assumptions about users’ trust in algorithmically generated visualizations, the team pointed to a number of ways to make these systems more useful to different user profiles, including designing for different “foraging” patterns and taking into account the potential for biases about source and quality to influence the behavior of a subset of users.

Despite the proliferation of visualization recommendation systems, there has been no way to rigorously evaluate their suitability for various tasks. That is, until last year’s VIS 2021 conference, where Battle and her co-authors earned a Best Paper Honorable Mention for presenting the first framework for fairly and rigorously comparing visualization recommendation systems. And in work that appeared at CHI 2022 earlier this month, Battle and her collaborators explored user preferences in a specific domain, public health, to understand what that category of analysts values most from visualization recommendation systems. She recently received a CAREER Award from the National Science Foundation to build on this line of work by developing new tools for evaluating how well different systems can help analysts meet their particular data exploration goals.

“By making it easier and faster for analysts to explore their data, recommendation systems can improve the accuracy and rigor of the insights they take away from these sessions,” said Battle. “But if we can’t formally compare them, then we have no idea whether new ones we build actually provide any benefits over the old ones. We also lack an empirical understanding of which systems are best suited to specific users’ goals. Both are essential for making data visualizations more relevant and more useful to people.”

Battle was formally honored by the TCDE community during the 38th IEEE International Conference on Data Engineering (ICDE 2022) held earlier this month in Kuala Lumpur, Malaysia.

Congratulations, Leilani!

Published by Kristin Osborne on May 31, 2022