Skip to main content

UW and Microsoft researchers create system to store and retrieve digital images using DNA


The Molecular Information Systems Lab research team: Front (left to right): Bichlien Nguyen, Lee Organick, Hsing-Yeh Parker, Siena Dumas Ang, Chris Takahashi. Back (left to right): James Bornholt, Yuan-Jyue Chen, Georg Seelig, Randolph Lopez, Luis Ceze, Karin Strauss. Not pictured: Doug Carmean, Rob Carlson, Krittika d’Silva.

Researchers from the UW’s Molecular Information Systems Lab (MISL) have created one of the first systems that uses DNA molecules to store digital images —  and successfully demonstrated the ability to retrieve the encoded images intact.

UW CSE professor Luis Ceze, joint CSE and EE professor Georg Seelig, CSE affiliate faculty members Doug Carmean and Karin Strauss of Microsoft Research, CSE Ph.D. student James Bornholt, and BioE Ph.D. student Randolph Lopez are the authors of an ASPLOS paper describing the effort to advance the state of the art in digital storage. Taking their cues from nature, the researchers aim to create a system that will be able to accommodate the growing volume of data being generated around the world — predicted to reach 44 trillion gigabytes by 2020.

From the UW media release:

“The team of computer scientists and electrical engineers has detailed one of the first complete systems to encode, store and retrieve digital data using DNA molecules, which can store information millions of times more compactly than current archival technologies.

“In one experiment…the team successfully encoded digital data from four image files into the nucleotide sequences of synthetic DNA snippets.

Close-up of DNA

This smear of DNA stores 10,000 gigabytes of data

“More significantly, they were also able to reverse that process — retrieving the correct sequences from a larger pool of DNA and reconstructing the images without losing a single byte of information.”

According to Ceze, “Life has produced this fantastic molecule called DNA that efficiently stores all kinds of information about your genes and how a living system works — it’s very, very compact and very durable…We’re essentially repurposing it to store digital data — pictures, videos, documents — in a manageable way for hundreds or thousands of years.”

The MISL team became one of only two nationwide that have demonstrated the ability to achieve “random access” — that is, to retrieve the correct sequences of data from a large pool of random DNA molecules — by encoding the equivalent of street addresses in the DNA sequences and then employing a technique commonly used in molecular biology, Polymerase Chain Reaction (PCR), to identify and reorder the data. They also applied error correction techniques typically used in computer memory to the DNA to address errors in the encoding process.

Luis Ceze and Lee Organick

Luis Ceze and research scientist Lee Organick in the lab

“This is an example where we’re borrowing something from nature — DNA — to store information. But we’re using something we know from computers — how to correct memory errors — and applying that back to nature,” Ceze said.

Read the full UW media release here and check out our previous blog post here. The team presented its findings at the ASPLOS 2016 conference earlier this month — read the research paper here. Check out coverage of the project by NewsweekGizmodo, Discover MagazineCNET, Motherboard, Crosscut, Geekwire and the Daily Mail.

Photos: Tara Brown Photography