We live in a data-driven world. Many of the Web services, mobile apps, and third parties we interact with daily are collecting immense amounts of information about us – every location, click, search, email, document, and site that we visit. And they are using all of this information for various purposes. Some uses of these uses might be beneficial for us (e.g., recommendations for new videos or songs to see); other uses may not be as beneficial. The problem is that we have limited visibility into how our data is being used, and hence we are vulnerable to potential abuses.
For example, did you know that credit companies might be adjusting loan offers based on your Facebook data? Or that certain travel companies used to discriminate prices based on user profile and location? Or that some companies target ads on illness-related emails, and if you click on them, you can leak sensitive information to them?
Steve Lohr writes in the New York Times:
“The web today is a big black box,’ said Roxana Geambasu, an assistant professor of computer science at Columbia University [and a 2011 UW CSE Ph.D. alumna]. ‘What’s needed is transparency.’
“Ms. Geambasu; another assistant professor at Columbia, Augustin Chaintreau, and a team of graduate students, led by Mathias Lecuyer, have come up with a tool that addresses the data transparency challenge. It is called XRay, and they will present a paper and explain their early research results on Wednesday at the Usenix Security Symposium in San Diego …
“XRay is essentially a reverse-engineering machine that models the correlations made by web services. The group’s three initial efforts have tried to determine the kinds of ads shown to Gmail users based on the text in their email messages; the product recommendations Amazon shows users based on their wish lists and other data; and the video recommendations made by YouTube determined by the videos users have previously viewed.”