Professor Zachary Tatlock, a member of the Allen School’s Programming Languages & Software Engineering (PLSE) group, has earned a CAREER Award from the National Science Foundation to advance the development of a practical verification framework and other methods for improving the reliability of distributed software systems that form the backbone of modern computing applications.
Billions of people around the world rely on distributed systems every day for critical services, including banking, healthcare, transportation, and more. Such systems are designed for optimum scalability and availability, so that when load spikes, machines crash, or networks misbehave, the system is able to compensate for those failures and continue servicing user requests. But these systems are not infallible in practice, and failures can have devastating impacts in human and economic terms — halting essential services and causing significant data loss. On a single day in the summer of 2015, a series of software failures halted trading on the New York Stock Exchange, grounded the entire mainland fleet of United Airlines, and knocked out the website of the The Wall Street Journal. Four years previously, a widespread failure in Amazon’s Elastic Compute Cloud (EC2), part of Amazon Web Services, brought down sites such as Foursquare and Reddit and affected the functionality of others, such as The New York Times. In all, more than 70 sites were affected by that outage.
Tatlock aims to reduce the likelihood and severity of such failures by applying a practical verification-based approach that makes it easier for programmers to implement reliable, high-performance distributed systems. Currently, the set of potential failures is so complex, and the rate of change in software so high, that it is infeasible to effectively test such systems against all scenarios. An alternative approach is to mathematically prove the system works correctly in all cases. But researchers typically only prove the correctness of high-level algorithms for simplified models of these systems, compelled by their complexity to ignore low-level implementation details. This can lead to mismatches between the simplified model and actual implementation which yield subtle errors that may result in large-scale failure. Furthermore, even the most painstakingly constructed proofs eventually become obsolete as the systems they are written for evolve to meet the increasing demand for scale and performance. Tatlock will address these shortcomings by designing compositional verification techniques for independently proving implementation correctness for applications and reliability for fault-tolerance components. This approach would enable programmers to verify the safety and reliability of distributed systems implementations when faced with a variety of network or machine failures — making them less likely in future to ground flights or grind financial markets to a halt.
The NSF’s Faculty Early Career Development Program recognizes and supports junior faculty who exemplify the role of teacher-scholar and demonstrate the potential to be lifelong leaders at the intersection of education and research. Tatlock is the 11th Allen School professor to earn a CAREER Award through the program in the past two years — an incredible success rate that is a testament to the high caliber of our young faculty. A total of 58 current or former Allen School faculty members have earned a CAREER Award or its predecessor, the Presidential/NSF Young Investigator Award.
Read Tatlock’s award abstract here.
Congratulations, Zach!