
One of the foundations of database theory is efficient query execution. On the theory side, researchers have been tackling this issue by finding the upper bounds on the query size results to determine the fundamental hardness of query execution. Other researchers have been taking a practical approach by honing query optimizers that can automatically create query evaluation algorithms based on various data properties. These two approaches have converged in the form of worst-case optimal join algorithms, which ensure the optimal execution time for query evaluations by taking into account certain statistics about the queried dataset.
In a paper titled “Partition Constraints for Conjunctive Queries: Bounds and Worst-Case Optimal Joins,” Allen School Ph.D. student Kyle Deeds introduced an innovative statistic called partition constraints that can improve existing worst-case optimal join algorithms by capturing the latent structure in relations within the data. The rigid nature of traditional constraints used in worst-case optimal join algorithms, such as degree constraints or cardinality constraints, can end up hindering query execution speed. Instead, partition constraints extend the notion of degree constraints, enabling the relationships between database attributes to be divided into smaller and more manageable sub-relations that each have their own degree constraints.
Deeds and his collaborator Timo Camillo Merkl of TU Wien presented their research at the 28th International Conference on Database Theory (ICDT) in Barcelona, Spain, last month where their work received the Best Student Paper and Best Paper Awards.
“A core problem of database theory is query execution,” said Deeds, who is advised by professors Dan Suciu and Magdalena Balazinska in the University of Washington’s Database Group. “What this work did was present a new kind of statistic, called partition constraints, on that data that we can take into account when we are executing a query, essentially speeding up the whole process for data that has this nice structure.”
An example of a use of partition constraints is in an algorithm that records who has key card access to which rooms within a university. Faculty and students may only need to enter a few rooms such as lecture halls and offices, while security and cleaning staff may need to be able to reach many more, or maybe all, rooms. It makes sense to then partition keycard clearance by tracking the access restrictions of faculty and students, and of the various room caretakers, to efficiently determine who needs access to which rooms.
Deeds took inspiration for partition constraints from another field of computer science called graph theory, or the study of networks. Degeneracy in graph theory measures how sparse a graph is. In a network with low degeneracy, researchers can design algorithms that minimize the performance impact of highly connected vertices, which can make some graph algorithms more efficient. When it comes to partitioning and databases, this property can also help maintain database performance, even in cases where the tables have very high degrees. The researchers developed partition constraints as a declarative version of graph degeneracy for higher-arity data, or data with more attributes or fields.
“This paper is like a translation style work,” Deeds said. “The goal of this work was to translate ideas from graph theory over in a way that made sense to the database community. Once we found the right way to do the translation, things just clicked into place.”
These ideas started clicking into place when Deeds and Merkl met and began collaborating on their research.
“Kyle and Camillo first met at a previous instance of the same conference, at ICDT in 2023 in Greece. Since then they worked remotely, without any help or guidance from their respective advisors,” said Suciu, who holds the Microsoft Endowed Professorship in the Allen School. “They started from an existing, elegant concept in graph theory, called ‘degeneracy,’ and extended it to a practical method for processing relational databases. Their main idea is that, by carefully partitioning each relation into a small number of fragments, then computing a query separately on each fragment, one can significantly reduce the query’s execution time.”
For Deeds, partition constraints point to a new direction for database theory research to pursue. Partition constraints help uncover underlying and useful structures within data, and understanding the different correlations and properties that a database has can help find ways to further optimize query executions.
“The same partitioning method that Kyle and Camillo developed has many other applications beyond query evaluation. For example, in one of my research projects we are developing improved techniques for cardinality estimation, and asked Kyle to join us to adapt his method for this problem,” Suciu said.
Outside of partition constraints, Deeds has been researching ways to optimize other programs. He introduced Galley, a system for declarative sparse tensor computing that enables users to draft efficient tensor programs without needing to make complex algorithm decisions.
Read the full paper on partition constraints.