Skip to main content

UW CSE researchers tap their inner Indiana Jones to unearth the history of web tracking

UW CSE web tracking study diagrams

Researchers in UW CSE’s Security & Privacy Research Lab turned archaeologists to deliver the first comprehensive study of third-party web tracking based on a new tool, TrackingExcavator, that detects and analyzes third-party tracking behavior. UW CSE Ph.D. student Adam Lerner presented the results of the study, which examines tracking on the most popular online destinations dating back to 1996, at the USENIX Security Conference in Austin, Texas last week.

“Third-party tracking started quite early in the history of the web,” Lerner noted in a UW News release. “People are becoming more concerned about the potential impact of third-party web tracking, but we lacked a comprehensive history of how trackers — and the types of information they collect — have evolved over time.”

The team, which in addition to Lerner includes CSE Ph.D. student Anna Kornfeld Simpson and CSE professors Franzi Roesner and Yoshi Kohno, set out to build that history by reconstructing tracking data for the top 500 websites using web pages archived in the Wayback Machine. The task was made more complicated by the fact that no one anticipated, when putting together those early websites, that we would want or need to trace the evolution of third-party tracking decades later.

“Reconstructing tracking behavior from the Wayback Machine is difficult because it was designed to archive web content, not tracking techniques,” Kornfeld Simpson told UW News. “We had to develop techniques to extract tracking information from the archive. For example, we collected tracking cookies from archived HTTP headers and Javascript and then simulated the browser’s cookie storage behaviors to detect tracking behavior.”

They found that activity on popular websites by third-party trackers—such as advertisers, analytics engines and social media widgets—has increased four-fold over the past two decades. Tracking has also become more complex, evolving from simple cookies and pop-up windows to more sophisticated methods.

According to the news release,

“Today, the average top website has an average of at least four third-party trackers looking at user activity. The team stresses that these numbers are likely underestimates, since not all websites are fully archived.

“They also found that today individual trackers cover a much larger fraction of the web.…These findings are important to understanding the effects of tracking on privacy, since tracking users on more sites allows trackers to develop a more detailed and intimate picture of their behavior.

“This 20-year historical perspective paints a clear picture of how third-party tracking has evolved with the rise and fall of different techniques, advances in technology, and our increasing reliance on the web in our lives. In general, third parties are watching and collecting information. How we may feel about that remains to be seen.”

Read the complete UW News release here, and the research paper here. Learn more and gain access to the team’s data on the TrackingExcavator website here. Read coverage of the study in TechCrunchUSA TodayFortune and IEEE Spectrum, and watch video from NBC Today and KOMO News.

The project is the latest example of UW CSE’s leadership in web privacy research, including previous work by Roesner, Kohno, and then-CSE professor David Weatherall to analyze and classify web-tracker behavior and to empower users with tools such as ShareMeNot, which was subsequently incorporated into the Electronic Frontier Foundation’s Privacy Badger.