Skip to main content

UW CSE’s Jeffrey Heer, Matt Mohebbi in NY Times on “Data Wrangling”

logosSteve Lohr writes in the New York Times:

“Technology revolutions come in measured, sometimes foot-dragging steps. The lab science and marketing enthusiasm tend to underestimate the bottlenecks to progress that must be overcome with hard work and practical engineering.

“The field known as ‘big data’ offers a contemporary case study. The catchphrase stands for the modern abundance of digital data from many sources … Its promise is smarter, data-driven decision-making in every field. That is why data scientist is the economy’s hot new job …

“Yet far too much handcrafted work … is still required …

“Several start-ups are trying to break through these big data bottlenecks by developing software to automate the gathering, cleaning and organizing of disparate data …

“‘It’s an absolute myth that you can send an algorithm over raw data and have insights pop up,’ said Jeffrey Heer, a professor of computer science at the University of Washington and a co-founder of Trifacta, a start-up based in San Francisco …

“Data formats are one challenge, but so is the ambiguity of human language. Iodine, a new health start-up, gives consumers information on drug side effects and interactions. Its lists, graphics and text descriptions are the result of combining the data from clinical research, government reports and online surveys of people’s experience with specific drugs …

“Data experts try to automate as many steps in the process as possible. ‘But practically, because of the diversity of data, you spend a lot of your time being a data janitor, before you can get to the cool, sexy things that got you into the field in the first place,’ said Matt Mohebbi, a data scientist and co-founder of Iodine [and a 2004 UW CSE alum].”

Read more here.