Rules are vital for building a safe and healthy functioning online community, and Reddit is no exception. For community moderators, however, it can be difficult to make data-driven decisions on what rules are best for their community.
A team of researchers in the Allen School’s Social Futures Lab and Behavioral Data Science Lab conducted the largest-to-date analysis of rules on Reddit looking at over 67,000 rules and their evolution across more than 5,000 communities over a period of five years — accounting for almost 70% of all content on the platform. This study is the first to connect Reddit rules to community outcomes. They found that rules on who participates, how content is formatted and tagged as well as rules about commercial activities were the most strongly associated with community members speaking positively about how their community is governed.
“This was my first paper, and I am extremely grateful for it to be named best paper of ICWSM 2025,” said lead author Leon Liebmann (B.S., ‘25), now at the online privacy company Westbold. “The work was difficult at times, and my advisors and co-authors Allen School Ph.D. student Galen Weld and professor Tim Althoff provided me with the direction and methods I needed to get it done. These people shaped my time in the Allen School and gave me a love for research I’d love to revisit.”
To better understand the rules on Reddit, the team first had to map out which communities had what rules and when. The researchers developed a retrieval-augmented GPT-4o model to classify rules into different categories based on their target, tone and topic. They then assessed the rules based on how common they were and how they varied across different communities, and also collected timelines on how the communities’ rules changed over time. At the same time, the researchers used a classification pipeline to identify posts and comments discussing community governance.
Taken together, this study can help inform Reddit moderators and community leaders on what rules their communities should have. The researchers found that the most common rules across communities covered post content, spam and low quality content, and respect for others — guidelines that platforms could use to create “starter packs” for new communities. They also found that how moderators word rules could influence how positively or negatively communities view their governance. For example, prescriptive rules, or those that describe what community members should do, are viewed more favorably than restrictive rules that focus on what community members should not do. By choosing to phrase rules prescriptively, moderators can help communities have a positive view of their governance.
In addition to Leibmann, Weld and Althoff, Allen School professor Amy X. Zhang was also a co-author on the paper.
“The industry will continue to need smart, creative software engineers who understand how to build and harness the latest tools — including AI.” Allen School leaders Magdalena Balazinska and Dan Grossman examine some of the myths surrounding computer science careers in the era of artificial intelligence. Photo by Yevhen Smyk, Vecteezy
There has been a lot of chatter lately, online and in various media outlets, about the supposed dwindling prospects for new computer science graduates in the artificial intelligence era. Recent layoffs in the technology sector have students, parents and educators worried that a degree in computing, once seen as a sure path to a fulfilling career, is no longer a reliable bet.
“The alarmist doom and gloom prevalent in the news is not consistent with the experiences of the vast majority of our graduates,” said Magdalena Balazinska, professor and director of the Allen School. “The industry will continue to need smart, creative software engineers who understand how to build and harness the latest tools — including AI. And the fact remains that a computer science degree is great preparation for a broad range of fields within and beyond technology, including the natural sciences, finance, medicine and law.”
In our own version of MythBusters, we asked Balazinska and professor Dan Grossman, vice director of the Allen School, to examine the myths and realities surrounding AI and the prospects for current and future Allen School majors. Their answers indicate that the rumored demise of software engineering as a career path has been greatly exaggerated — and that no matter what path computer science majors choose after graduation, they can use their education to change the world.
Let’s start with the question on everyone’s mind: What’s the job market like for computer science graduates these days?
Dan Grossman: Individuals’ mileage may vary depending on a range of factors, but what’s being reported in some media outlets doesn’t reflect what we’re seeing here at the University of Washington. More than 120 different companies hired this year’s Allen School graduates into software engineering roles. Amazon alone hired more than 100 graduates from the Allen School’s 2024-25 class. Google and Meta didn’t hire at that scale, but still more than last year — they each hired 20 graduates this year. Microsoft hired more than two dozen graduates from this latest class. I expect these numbers to grow as we hear from more graduates.
So, while the job market is tighter now than it was a few years ago, the sky is not falling. It’s important to remember that the Allen School is one of the top computer science programs in the nation, with a cutting-edge curriculum that evolves alongside the industry while remaining grounded in the fundamentals of our field. Our graduates are highly sought after, so their experience in the job market doesn’t necessarily reflect the experience of others. That has always been the case, even before this latest handwringing over AI. A B.S. in CS is not a uniform credential. The Allen School has always produced highly competitive graduates.
Magdalena Balazinska: In addition to those who found employment after graduation, more than 100 of our recent graduates opted to continue their education by enrolling in a master’s or Ph.D. program, which, of course, also makes us immensely proud!
How is AI impacting software engineering jobs?
Magdalena Balazinska: “The industry will continue to need smart, creative software engineers who understand how to build and harness the latest tools — including AI.”
MB: There are two factors: (1) AI’s impact on the work of a software engineer, and (2) AI’s impact on the job market for software engineers. Regarding the latter, it’s not so much that AI is taking the jobs, but that companies are having to devote tremendous resources to the infrastructure behind AI, which is very expensive. Also, many companies over-hired during COVID, and now they’re doing a course-correction for the AI era. I look at this as more of a reset. There’s no question that AI is affecting many areas of computing, just as it’s affecting just about every other sector of the economy. Companies will continue to invest in the people who know how to build and leverage these and other tools.
To my first point: With AI, we should expect the work of a software engineer to change, but to change in a really exciting way! The task of coding, or the translation of a very precise design into software instructions, can largely be handled by AI. But that’s not the most exciting or challenging part of software engineering. Understanding the requirements, figuring out an appropriate design, and articulating it as a precise specification are the hard parts. Going forward, software engineers will spend more time imagining what systems to build and how to organize the implementation of those systems, and then let AI handle many of the details of converting those ideas into code.
DG: One of our former faculty colleagues, Oren Etzioni, said, “You won’t be replaced by an AI system, but you might be replaced by a person who uses AI better than you.” I think that’s the direction we’re headed. Not AI as a replacement for people, but as a differentiator. Here at the Allen School, one of our goals is to enable students to differentiate themselves in this rapidly evolving landscape. For example, we are introducing a course on AI-aided software development, which will teach students how to effectively harness these tools.
How has AI affected student interest in the Allen School?
DG: Student interest remains strong — we received roughly 7,000 applications for Fall 2025 first-year admission.
MB: That may sound like a daunting number. However, we were able to offer admission to 37% of the Washington high school students who applied. That’s not as high as we would like it to be, but it’s far higher than public perception. We achieve this by heavily favoring applicants from Washington. For Fall 2025, we offered admission to only 4% of applicants from outside Washington.
If AI can write code, why should students major in computer science?
MB: Because computer science is so much more than coding! Creating a new system or application, perhaps a system to help the elderly take care of their daily tasks and manage their paperwork or a new approach for doctors to perform long-distance tele-operations, isn’t just a matter of “writing code.” A software engineer begins by clearly understanding the requirements — what the system needs to provide. Then the software engineer will decompose the problem into pieces, understand how those pieces will fit together, and anticipate failures. What happens if there is a power or network failure, or someone tries to hack the system? This gets progressively more challenging with the complexity and scale of systems that software engineers build, typically on teams with many people working together. Coding is the relatively easier part.
DG: In that spirit, here in the Allen School, we do teach students how to code, but as a component of how to envision, design and build systems and applications that solve complex problems and touch people’s lives. The principles, precision, and reasoning gained from reading and writing code is a necessary foundation that serves our students very well — including graduates who now use AI in industry. It is the software engineers with the deepest knowledge who will be most effective at using AI to write their code, because they will know how and where AI can go wrong and how to steer it toward producing a correct output.
MB:Engineers have always used tools, and their tools have always advanced with time and opened the door to innovation. Thanks to developments like modern coding libraries and languages, online repositories like StackOverflow, GitHub, automated testing, cloud computing, and more, software engineers today are far more efficient and can develop applications more quickly than ever before. And this was before AI for coding had really taken off. And yet, there are more software engineers doing more interesting and important things than ever before!
How does the Allen School prepare students for a workplace — and a world — being transformed by AI?
Dan Grossman: “One of our goals is to enable students to differentiate themselves in this rapidly evolving landscape.”
MB: As a leader in AI research, the Allen School is ideally positioned to help students learn how to use AI, how to build AI, and how to move the field of AI forward to benefit humanity. We give students multiple opportunities to explore AI topics and tools as part of our curriculum. Dan mentioned our AI-assisted software development course, and many of our other courses allow for using AI assistance in well-prescribed ways. This enables students to focus on core course concepts, generate more complex projects, and so on. Gaining experience with any AI tool can give a sense of what the technology can help with — along with its limitations. That said, we will continue in some courses to expect students to build, design, test, and document software without AI assistance.
DG: Our courses sometimes use the same cutting-edge tools used in industry, and other times will provide a simpler setting for pedagogical purposes. Software engineering tools change rapidly, so we tend not to get into the weeds on any one particular tool but give students the confidence to pick up future tools. Importantly, we don’t just teach students how to build and use AI. We also help them to think critically about the ethics and societal impacts of these technologies, such as their potential to reinforce bias or be used as a surveillance tool, and ways to mitigate those impacts.
MB: Another advantage we have at the Allen School is that we are a leading research institution, and our faculty are among the foremost experts in the field. This gives us the ability to incorporate new concepts and techniques into our coursework quickly. We also have a program devoted to supporting undergraduates in engaging in hands-on research in our labs alongside those very same faculty and our amazing graduate students and postdocs. Many students choose to get involved in research during their undergraduate studies.
What if a student is interested in computing, but not AI?
DG: Great! There are many open challenges across computing, from systems development, to human-facing interactive software design, to hardware design, to data management, and many others. Even if you are not using or developing AI itself, building systems that can run AI efficiently is driving a lot of exciting work in the field these days. While the big breakthroughs that have been driving rapid change over the last few years are AI-centered, computing remains a broad field.
MB: A student can major in computer science and follow their passion wherever it takes them. A subset of students will choose to study AI and build the next AI technologies, but the vast majority will use AI as a tool while building systems for medicine, education, transportation, the environment, and other important purposes. Or they will build back-end infrastructure at global companies like Google, Amazon, or Microsoft, or tackle other challenges like those that Dan mentioned. The more we advance computing, the more we open new opportunities. I think that’s why the number of software engineers just keeps growing. There is always more to do. The job is never done.
What is your advice to current and aspiring computer science majors who worry about their career prospects with the rise of AI?
MB: First, if you think you want to be a computer scientist or a computer engineer, pursue that! If you choose a major that you are excited about, you will not mind spending hours deepening your knowledge and sharpening your skills, which will help you to become an expert and to enjoy your chosen profession even more. My advice to every student is to take a broad range of challenging courses. Learn how to use the current tools, with the understanding that the tools you use today will not be the ones you use tomorrow. This field moves fast, which is what makes it exciting.
When it’s time to start your job search, whether for an internship or a full-time job, apply broadly. Apply to large companies, small companies, companies in various sectors, non-profits, and so on. Many organizations need software engineers! And not all interesting technical jobs that use a computing degree have the title of software engineer. Pick the position where you will learn the most. It’s important to optimize for learning and for growth, especially early on in one’s career.
DG: I would also remind students that a UW education is not about vocational training; our goal is that students graduate with the knowledge and skills to succeed in their chosen career, yes, but also to be engaged citizens of the world. While you’re here, make the most of your education — take a range of challenging courses and put in the time to learn the material. After all, it is a multi-year investment on your part, and the faculty have invested a lot of time and effort into creating a challenging, coherent curriculum for you. Take the hardest classes that you think are also the most exciting ones, and then focus on learning as much as you can.
Any final thoughts?
DG: Don’t choose a major solely because it’s popular. Choose a major that you’re passionate about. If that’s computer science or computer engineering, we’d love to see you at the Allen School. If it’s something else, we’d still love to see you in some of our classes.
MB: Everyone can benefit from learning at least a little computer science, especially now in the AI era!
Back in 2011, a team of University of Washington and University of California San Diego researchers published a paper detailing how they could remotely hack into a pair of 2009 Chevy Impalas. By targeting a range of attack vectors including CD players, Bluetooth and cellular radio, the researchers were able to control multiple vehicle functions, from the windshield wipers to the brakes.
Since its publication, the team’s research has helped lead to new standards for motor vehicle security and put the brakes on automobile cyberattacks. For their lasting contributions, their paper titled “Comprehensive Experimental Analyses of Automotive Attack Surfaces” received the Test of Time Award at the 34th USENIX Security Symposium in Seattle earlier this month.
Franziska Roesner
“I was only a first-year graduate student when we started this project, and I had just switched my focus to security. It was such a privilege to be able to help out on such an important and impactful project, and to learn from all of the other members of the team about how to do this kind of research,” said co-author Franziska Roesner (Ph.D., ‘14), Brett Helsel Professor and co-director of the Security and Privacy Research Lab in the Allen School.
Modern automobiles are made up of independent computers called electronic control units (ECUs), typically connected through the Controller Area Network (CAN), that oversee different motor functions. In a previous paper, the team found that if an attacker physically connected to the car’s internal network could override critical safety systems. Building off of that work, the researchers analyzed the modern automobile’s external attack surface and found that an adversary could hack into a car from miles away.
The team identified three categories of components that were vulnerable to cyberattacks. An attacker could use an indirect physical channel such as tools that connect to the OBD-II port, which can access all CAN buses in the car, or through the media player. For example, the researchers compromised the car’s radio and then used a doctored CD to upload custom firmware. If an attacker is able to place a wireless transmitter in proximity to the car’s receiver, they can gain access to the ECU via Bluetooth or even remote keyless entry, the team found. Attackers do not have to be nearby to wreak havoc. Using long-range communication channels such as cellular, it is possible to exploit vulnerabilities in how the car’s telematics unit uses the aqLink code to remotely control the vehicle.
“More than 10 years ago, we saw that devices in our world were becoming incredibly computerized, and we wanted to understand what the risks might be if they continued to evolve without thought toward security and privacy,” said senior author Tadayoshi Kohno, who was then a professor at the Allen School, now faculty at Georgetown University, in a UW News release.
The impact of the team’s work can still be felt today. As a result of the research, car manufacturers including GM have hired entire security teams. The work has influenced the development of guidelines for original equipment manufacturers (OEMs) and also led to the creation of the Electronic Systems Safety Research Division at the National Highway Traffic Safety Administration. As cars grow increasingly more connected and autonomous, the insights from the UW and UCSD collaboration will continue to inform the automotive industry against emerging threats.
“Beyond the practical impact of the work, that experience has also made for great stories to tell in the computer security courses I teach now — for example, the time that we accidentally set the car’s horn to a permanent ‘on’ state while experimenting outside the Allen Center,” Roesner said.
Joining Roesner and Kohno at UW at the time of the original paper were Karl Koscher (Ph.D. ‘14), now a postdoc at UCSD, and Alexei Czeskis (Ph.D., ‘13), currently at LinkedIn. The original University of California San Diego group was made up of UCSD faculty members Stefan Savage (Ph.D., ‘02) and Hovav Shacham; Stephen Checkoway (B.S., ‘05), now faculty at Oberlin College; Damon McCoy, faculty at New York University; Danny Anderson, who runs a software consulting company; and late researcher Brian Kantor.
The Allen School at the University of Washington is working with Ai2 and other partners on a new initiative to advance open AI for science and the science of AI, with support from the U.S. National Science Foundation and NVIDIA.
The University of Washington’s Paul G. Allen School of Computer Science & Engineering has teamed up with the Allen Institute for AI (Ai2) on a new project aimed at developing the first fully open set of artificial intelligence tools to accelerate scientific discovery and enhance the United States’ leadership in AI innovation. Today the U.S. National Science Foundation (NSF) and NVIDIA announced a combined investment of $152 million in this effort, including $75 million awarded through the NSF’s Mid-Scale Research Infrastructure program.
Ai2 will lead the Open Multimodal AI Infrastructure to Accelerate Science (OMAI) project. The principal investigator is Ai2 Senior Director of NLP Research Noah A. Smith, who is also Amazon Professor of Machine Learning at the Allen School. Smith’s faculty colleague Hanna Hajishirzi, Torode Family Professor at the Allen School, is co-principal investigator on behalf of UW and also Ai2’s senior director of AI.
“OMAI is a terrific opportunity to leverage the longstanding partnership between Ai2 and the Allen School, which has yielded some of the most exciting developments in building truly open AI models and trained some of the most promising young scientists working in AI today,” said Hajishirzi. “This is a pivotal moment for us to form the foundation for scientific discovery and innovation across a variety of domains — and also, importantly, advance the science of AI itself.”
Noah A. Smith (left) and Hanna Hajishirzi aim to leverage the partnership between Ai2 and the Allen School to benefit science and society.
The cost of building and maintaining today’s AI models is too prohibitive for all but the most well-resourced companies, leaving researchers in academic and not-for-profit labs without ready access to these powerful tools and stifling scientific progress. The goal of the OMAI project is to build out this foundational infrastructure through the creation and evaluation of models trained on open-access scientific literature and informed by the needs of scientists across a range of disciplines. By openly releasing the model weights, training data, code and documentation, the team will provide researchers using its tools with an unprecedented level of transparency, reproducibility and accountability, instilling confidence in both the underlying models and their results.
The concept for OMAI was incubated in an ecosystem of open research and collaboration that the Allen School and Ai2 have built since the latter’s founding in 2014. That ecosystem has enabled dozens of UW students to collaborate with Ai2 on research projects, produced leading-edge open AI artifacts like the Open Language Model (OLMo) and Tulu, and developed tools like OLMoTrace to give anyone full visibility into models’ training data — all of which have helped fuel Seattle’s emergence as a hub of AI innovation.
Smith looks forward to leveraging that longstanding synergy to push technologies that will have a transformational impact on the American scientific enterprise — and even transform the conversation around AI itself.
“There’s been a reaction that seems to be widespread that AI is a thing that is happening to us, as if we are passively subject to this technology and don’t have agency,” Smith said. “But we do have agency. We get to define what the priorities should be for AI and to build tools that scientists will actually be able to use and trust. With OMAI, the UW will be a leader in this new paradigm and push AI in a more responsible direction that will benefit society in a multitude of ways.”
In addition to the UW, academic partners in the OMAI project include the University of Hawai’i at Hilo, the University of New Hampshire and the University of New Mexico.
OMAI represents a landmark NSF investment in the technology infrastructure needed to power AI-driven science — a development that Brian Stone, performing the duties of the agency’s director, described as a “game changer.”
“These investments are not just about enabling innovation; they are about securing U.S. global leadership in science and technology and tackling challenges once thought impossible,” Stone said.
Understanding how different parts of the brain communicate is like trying to follow conversations at a crowded party — many voices overlap, some speakers are far away and others might be hidden entirely. Neuroscientists face a similar challenge: even when they can record signals from multiple brain regions, it is difficult to figure out who is “talking” to whom and what is being said.
Matt Golub
In a recent paper published at the 2025 International Conference on Machine Learning (ICML), a team of researchers led by Allen School professor Matt Golub developed a new machine learning technique to cut through that noise and identify communication between brain regions. The technique, called Multi-Region Latent Factor Analysis via Dynamical Systems (MR-LFADS), uses multi-region neural activity data to decode how different parts of the brain talk to each other — even when some parts can’t be directly observed.
“The many regions within your brain are constantly talking to each other. This communication underlies everything our brains do for us, like sensing our environment, governing our thoughts, and moving our bodies,” said Golub, who directs the Systems Neuroscience & AI Lab (SNAIL) at the University of Washington. “In experiments, we can monitor neural activity within many different brain regions, but these data don’t directly reveal what each region is actually saying — or which other regions are listening. That’s the core challenge we sought to address in this work.”
Unlike existing approaches, MR-LFADS is able to automatically account for unobserved brain regions. For example, neuroscientists can use electrodes to simultaneously monitor the activity of large populations of individual neurons across multiple brain regions. However, this activity may be influenced by neurons and brain regions that are not being recorded, explained Belle Liu, UW Department of Neuroscience Ph.D. student and the study’s lead author.
“Imagine trying to understand a conversation when you’re not able to hear one of the key speakers. You’re only hearing part of the story,” Liu said.
To overcome this, the team devised a custom deep learning architecture to detect when a recorded region reflects an unobserved influence and to infer what the unobserved region was likely saying.
“We wanted to make sure the model can’t just pipe in any unobserved signal that you might need to explain the data,” said co-author and Allen School postdoc Jacob Sacks (Ph.D., ‘23). “Instead, we figured out how to encourage the model to infer input from unobserved sources only when it’s very much needed, because that information can’t be found anywhere else in the recorded neural activity.”
The team tested MR-LFADS using both simulated brain networks and real brain data. First, they designed simulated multi-region brain activity that reflected complicated scenarios for studying brain communication, such as giving each region unique signals from both observed and unobserved sources. For the model, the challenge is to recover those signals and to disentangle the source of those signals and whether they come from observed regions — and if so, which ones — or unobserved regions. The researchers found that their model was able to infer this communication more accurately than existing approaches. When applied to real neural recordings, MR-LFADS could even predict how disrupting one brain region would impact another — effects that it had never seen before.
By helping neuroscientists better map brain activity, this model can help provide insights into treatments for various brain disorders and injuries. For example, different parts of the brain communicate in certain ways in healthy individuals, but “something about that communication gets out of whack in a diseased state,” explained Golub. Understanding when and how that communication breaks down might enable the design of therapies that intervene in just the right way and at just the right time.
“Models and techniques like these are desperately needed for basic neuroscience to understand how distributed circuits in the brain work,” Golub said. “Neuroscientists are rapidly improving our ability to monitor activity in the brain, and these experiments provide tremendous opportunities for computer scientists and engineers to model and understand the intricate flow of computation in the brain.”
The RoboFly (left) in comparison to the TinySense sensor (center) next to a pencil for scale.
Flying insect robots (FIRs) have the potential for use in search and rescue operations, environmental monitoring and even space missions due to their small size and low material cost. The challenge, however, is finding the minimum sensor suite and computation resources, or avionics, needed for the robot to maintain flight and control.
A team of researchers in the University of Washington’s Autonomous Insect Robotics (AIR) Lab developed TinySense, the current lightest avionics system with the potential for FIR sensor autonomy. Smaller than the size of a penny and less than half the size of the previous lightest avionics system, TinySense features a global shutter camera, a gyroscope and a pressure sensor to help the FIR estimate the different variables needed to control hover — pitch angle, translational velocity and altitude. The team presented their research titled “TinySense: A Lighter Weight and More Power-efficient Avionics System for Flying Insect-scale Robots” at 2025 IEEE International Conference on Robotics and Automation (ICRA) and received the Best Student Paper Award.
“Despite huge progress towards flying insect robots like the UW’s RoboFly and Harvard’s RoboBee, none have yet been able to fly using only sensors carried onboard,” said co-lead author and Allen School undergraduate student Joshua Tran. “The TinySense is light and efficient enough to finally make this feat a possibility, and opens the door to many other tiny flying applications like the TinyQuad and Coincopter, gram-scale propeller drones also from our lab.”
The TinySense sensor is smaller than the size of a penny.
TinySense builds on and improves previous FIR sensor suites from the AIR Lab to create an avionics system that is even better tailored in mass and energy consumption for an insect-scale robot. To help reduce the system’s mass and power needs, the team first replaced the power-hungry laser rangefinder with a lighter and more efficient Bosch BMP390 pressure sensor. They then replaced the bulky optic flow sensor with a novel global shutter camera and a custom-written optic flow algorithm running on a 10 milligram microcontroller — small enough to fly onboard an FIR. TinySense weighs approximately 75 milligrams and uses about 15 milliwatts of power to fly.
“The team made important contributions in a number of areas that hadn’t previously been addressed because nobody has been thinking deeply about how to make flight controllers really efficient and lightweight. They built a new ultra-light flex circuit, their own camera optics and then performed extensive validation on the full system they created,” said senior author Sawyer Fuller, UW Department of Mechanical Engineering professor and Allen School adjunct faculty member.
The team demonstrated the TinySense sensor suite onboard the Crazyflie, the smallest commercially available sensor-autonomous flying robot, and found that TinySense had a comparable performance to the industry-standard sensors on the Crazyflie. In future work, the team aims to integrate TinySense into Robofly so that it will be able to, for the first time, hover without needing external sensors.
(From left to right) Joshua Tran, Claire Li and Zhitao Yu earned a Best Student Paper Award for TinySense at ICRA.
“It was exciting to hear the interest in the TinySense project and its future integration with the Robofly at the ICRA conference,” said co-author and Allen School undergraduate student Claire Li.
For co-lead author and mechanical engineering Ph.D. student Zhitao Yu, working on TinySense also gave him the opportunity to help mentor the next generation of researchers.
“Mentoring Josh and Claire was a rewarding experience on this project,” said Yu. “It was great to see them grow into confident researchers and contribute meaningfully to such a challenging and impactful system.”
The cardinality estimation problem, or the challenge of accurately predicting the size of the output to a query without actually evaluating the query, is one of the oldest and most important problems in databases and data management. Cardinality estimation helps guide decisions on every aspect of query execution, from how much memory should be allocated for storing the query result to the number of servers needed to successfully process an expensive query. However, cardinality estimation is notoriously difficult; current methods can often have large errors, leading to poor decisions downstream.
Dan Suciu
A team of researchers led by Allen School professor Dan Suciu of the UW Database Group introduced a new pessimistic cardinality estimator called LpBound which provides a guaranteed upper bound on the query output size. This method offers a strong, theoretical guarantee that for any database that meets the given statistics, the query output size will always be below the bound set by LpBound. They presented their research titled “LpBound: Pessimistic Cardinality Estimation using Lp-Norms of Degree Sequences” at the 2025 ACM SIGMOD/PODS International Conference on Management of Data last month and received a Best Paper Award for their work.
“Cardinality estimation is difficult, because it needs to rely on a very small amount of information (statistics on the input data), and needs to produce an accurate estimate,” said senior author Suciu, who also holds the Microsoft Endowed Professorship in the Allen School. “The novel solution described in the paper estimates the cardinality of the output by using simple statistics on the input data, and applying Shannon inequalities from information theory. The method outperforms not only traditional cardinality estimators, but also novel estimators based on machine learning.”
The LpBound cardinality estimator provides several advantages over other learned estimators currently available, including FactorJoin, BayesCard and DeepDB. In addition to the guaranteed upper bounds, it has a low estimation time and error as well as space requirements, making it useful for practical applications. LpBound also works for both cyclic and acyclic queries — meaning it can estimate the cardinality in traditional SQL workloads, which are often acyclic, and in graph pattern matching or SparQL queries, which are more likely to be cyclic. When integrated into the query optimization framework PostgreSQL, the researchers found that LpBound’s estimates led to query plans as good as those made from true cardinalities, making it more applicable for real-world database systems.
Magdalena Balazinska, professor and director of the Allen School, has been elected a member of the Washington State Academy of Sciences (WSAS) in recognition of her “contributions in data management for data science, big data systems, cloud computing, and image/video analytics and leadership in data science education.” The WSAS was established in 2015 as a source of independent, evidence-based scientific and technical advice for state policy makers, modeled after the National Academies of Science, Engineering and Medicine. Balazinska, who was directly elected by her WSAS peers, is one of 36 members in the 2025 class.
“We are pleased to recognize the achievements of these world-renowned scientists, engineers, and innovators,” said WSAS President Allison Campbell. “And we are grateful for their willingness to contribute expertise from a wide range of fields and institutions to support the state in making informed choices in a time of growing complexity.”
One of Balazinska’s most influential achievements has been her foundational work on Borealis, a distributed stream processing engine that made large-scale, low-latency data processing more dynamic, flexible and fault tolerant for a variety of applications, from financial services and industrial processing, to network monitoring and wireless sensing. Borealis introduced the ability to quickly and easily modify queries at runtime in response to current conditions, correct query results to account for newly available data, and allocate resources and optimize performance across a variety of networks and devices. Earlier this year, Balazinska and her collaborators earned a Test of Time Award at the Conference on Innovations Data Systems Research (CIDR 2025) for their work on Borealis. They received a Test of Time Award in 2017 from the Association for Computing Machinery Special Interest Group on the Management of Data (ACM SIGMOD) for a related paper expanding the system’s fault tolerant stream processing capabilities.
Balazinska also advanced the then-burgeoning field of “big data,” particularly for scientific applications. She co-led the design and development of Myria, a fast, flexible, open-source cloud-based service that enabled domain experts across various scientific fields to perform big data management and analytics. Myria was designed for efficiency and ease of use; it also functioned as a test-bed for Balazinska and her colleagues to explore new directions in data management research in response to real users’ needs. Her work on Myria and related projects earned Balazinska the inaugural VLDB Women in Database Research Award at the International Conference on Very Large Databases in 2016.
More recently, Balazinska has focused on data management for visually intensive applications such as video and augmented, virtual and mixed reality. For example, she and her collaborators developed VOCAL, or Video Organization and Compositional AnaLytics, to make it easier for users to organize and extract information from any video dataset. In the absence of a pretrained model, the system combines active learning with a clustering technique to reduce the manual effort involved in identifying and labeling features. It also supports compositional queries for analyzing the interaction of multiple objects over time, and it can self-enhance its own capabilities by using large language models (LLMs) to identify and generate missing functionality in response to user workloads.
Balazinska, who has served as director of the Allen School since 2020, holds the Bill & Melinda Gates Chair in Computer Science & Engineering at the University of Washington and is a senior data science fellow in the eScience Institute. She previously served as director of the eScience Institute and associate vice provost for data science at the UW, in addition to co-chairing the National Science Foundation’s Advisory Committee for Computer and Information Science and Engineering (CISE). Last year, Balazinska was appointed to Washington state’s Artificial Intelligence Task Force charged with developing recommendations on potential guidelines or legislation governing the use of AI systems. She currently co-chairs two task-force subcommittees focused on AI in education and workforce development and in health care and accessibility, respectively.
A total of 12 UW faculty members were elected as part of the incoming WSAS class, which also includes Allen School adjunct professor Julie Kientz, chair of the Department of Human-Centered Design & Engineering. Kientz was recognized for her research and leadership in human-computer interaction that “has advanced health and education technology, influenced policy, and shaped the HCI field through impactful scholarship, interdisciplinary collaboration, and inclusive, real-world technology design.” Balazinska, Kientz and their colleagues will be formally inducted at an event marking the Academy’s 20th anniversary in October.
Balazinska is the fourth Allen School faculty member to be elected to the WSAS; professors Anna Karlin and Ed Lazowska and professor emeritus Hank Levy previously joined following their elections to the National Academies of Science and/or Engineering.
Allen School professor Abhishek Gupta, who directs the Washington Embodied Intelligence and Robotics Development (WEIRD) Lab, is interested in developing ways to help robots learn new skills with minimal human help and engineering. Gupta joined the Allen School faculty in 2022, and already he has introduced research that has shaped the future of robotics.
“It is an honor to receive this award, which will support our group’s ongoing research into robot learning methods that are deployable and improvable in high-impact, human-centric environments,” Gupta said.
A robot uses the Cherrybot reinforcement learning system to acquire fine manipulation skills such as picking up a cherry.
Gupta’s research has focused on developing methods that can make it practical for robots to improve safely and reliably through reinforcement learning. However, applying reinforcement learning to real-world robotics presents challenges, from safety, to reward specification, to efficiency; as such, its success has been limited to controlled settings or simulation. To address these challenges, Gupta established that robotic systems learning in the real world need to be able to determine the success measure from its own sensory input, and then reset the environment without human help so it can retry solving a task and learn from a small set of real-world interactions. He subsequently demonstrated what that process looks like via projects focused on dexterous, multi-fingered hands, fine manipulation tasks and teaching robots to grasp different objects through expert demonstrations.
His work was one of the first to propose solutions to the unavailability of automatic resets — one of the most fundamental, yet often overlooked, struggles in implementing robot learning in the real world. He developed a formalism and set of benchmark tasks that help robots navigate a continual, non-episodic world without assuming access to an oracle reset mechanism. His later work addressed the reset-free learning problem as actually being a multi-task learning problem, where a robot performing some tasks then resets others. These systems and algorithms set the stage for the next generation of deployment systems that will not just remain static, but improve autonomously on the job through multi-task reset-free data collection. Gupta has also built a range of reinforcement learning libraries and tooling to make real-world learning accessible to a broader range of developers.
Building off of that research, Gupta has been investigating how leveraging alternative sources of data such as generative models, simulation and videos can help scale up robotic learning. He and his collaborators were one of the first to develop GenAug, short for generative augmentation, which uses a diffusion model to synthetically modify robotic images for improved generalization. This system tackles the issue of the lack of in-domain robotics data through pre-trained generative models.
Using the RialTo system, robots can learn new skills in “digital twin” simulation environments that they can then transfer to the real world. (Dennis Wise/University of Washington)
Gupta has also introduced a new method for robotic learning in simulation by using a real world to simulation and then back to the real world approach. Using small amounts of real-world data, researchers can construct a simulation of the deployment area that the robot can interact with and learn from. Simulations also have helped the design of effective policies that, when deployed in the real world, could help robots perform tasks with many variations and disturbances. For example, robots using this framework can efficiently put away dishes in a dish rack where they have to account for different dish shapes and configurations. Through a set of algorithmic ideas, Gupta and his collaborators were able to directly transfer behaviors from simulations to reality, and then efficiently finetune those behaviors using small amounts of real-world experience. More recently, Gupta and his students have developed techniques for learning unified prediction and control models from raw-video experience, allowing for the use of large internet-scale datasets in robot learning.
In addition to pioneering real-world reinforcement learning, Gupta has developed methods for unsupervised and self-supervised reinforcement learning. By combining the best of both worlds from model-based and model-free reinforcement learning, he introduced a simple and effective self-supervised reinforcement learning technique to make successor representations more practical using deep reinforcement learning methods. These representations help predict how likely a robot is to visit different states in the future. He and his collaborators were also among the first to develop a method that enables robots to learn useful skills without a reward function. This work has prompted a subcommunity of research on unsupervised reinforcement learning and skill discovery for both robotics and machine learning.
Abhishek Gupta (center) poses with members of his WEIRD Lab among their robots. (Dennis Wise/University of Washington)
“Abhishek’s work has been consistently creative, innovative and practical, making a significant impact on the current and future state of robotic reinforcement learning,” Allen School professor Dieter Fox said. “He’s been a wonderful person to collaborate with, and we are very excited to have him in the Allen School. Abhishek’s work is laying the foundation for the next generation of robot learning, and he is poised to become one of the key leaders in our field.”
Prior to earning this year’s RAS Early Academic Career Award, Gupta has received a 2024 Amazon Science Hub Research Award and was named a 2023 Toyota Research Institute Young Faculty Investigator.
Allen School Ph.D. student Cheng-Yu Hsieh is interested in tackling one of the biggest challenges in today’s large-scale machine learning environment — how to make artificial intelligence development more accessible. Large foundation models trained on massive datasets have revolutionized AI, however, these scaling efforts are often out of reach for many except for well-resourced companies, Hsieh explained. His research focuses on making both data and model scaling more efficient and affordable to help democratize AI development.
“I develop effective data curation techniques for training large foundation models, as well as efficient methods to deploy and adapt these models to various downstream applications. My research spans key stages of today’s artificial intelligence development pipeline,” said Hsieh, who works with professor Ranjay Krishna in the UW RAIVN Lab and affiliate professor Alex Ratner in the Data Science Lab.
For his contributions, Hsieh was awarded a 2024 Google Ph.D. Fellowship in machine intelligence. The fellowship recognizes outstanding graduate students from around the world representing the next generation of leaders with the potential to influence the future of technology through their research in computer science and related fields.
“This fellowship will support my research on making large-scale AI systems more efficient, accessible and adaptable. I’m excited to continue exploring how we can make AI technology more sustainable and inclusive,” Hsieh said.
Hsieh designs methods to help mitigate the high costs and other complexities associated with large-scale AI model development. For example, one of the major bottlenecks in today’s machine learning pipeline is manually labeling or curating large datasets, which can be labor intensive. Hsieh and his collaborators introduced Nemo, an end-to-end interactive system that guides users through creating informative datasets using weak supervision techniques in order to lower the barrier for building capable AI models in low-resource settings. Nemo was able to improve overall workflow efficiency by 20% on average compared to other weak supervision approaches, Hsieh found.
Some of his research projects have been put into practice and have already made a real-world impact. As part of a collaboration between UW and Google, Hsieh helped develop the distilling step-by-step method that enables users to train smaller task-specific models using less training data compared to other standard fine-tuning or distillation approaches. With this method, a smaller 770M parameter T5 model trained with only 80% of the data on a benchmark can outperform a much larger 540B PaLM model. The team launched the project on Google Vertex AI, the company’s generative AI development platform, and Google highlighted the research at the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023). Hsieh’s research into model adaptation was also integrated into the Vertex platform, allowing users to adapt models to new applications without needing explicit training data.
“Cheng-Yu is a self-sufficient, diligent, effective and productive researcher,” said Krishna. “His recent papers propose solutions to a wide range of pertinent problems in natural language processing, efficient machine learning and retrieval augmented generation, and I have no doubt that he will continue to produce impactful research.”
As part of his goal to make data and model scaling more efficient and affordable, in future research, Hsieh is interested in developing new approaches for querying powerful, but oftentimes expensive, generative AI models to help create informative and controllable datasets for model training and alignment.
“This fellowship is both a recognition of the work I’ve done and an incredible encouragement to continue pushing my research direction in AI. I am very thankful to my advisors, mentors and collaborators who have supported me along the way,” Hsieh said. “I am excited to continue pursuing research with real-world impact in this fast-paced era of AI development.“