Skip to main content

#MemoriesInDNA Project aims to crowdsource 10,000 images to advance DNA-based data storage

What do you want to remember forever?

MISL researcher piping synthetic DNA

MISL researchers aim to collect 10,000 images to encode in synthetic DNA to develop new techniques for data storage, search, and retrieval.

That’s a question that researchers in the University of Washington’s Molecular Information Systems Laboratory (MISL) hope will inspire people around the world to submit original photos to the #MemoriesInDNA Project. The project — the result of a partnership between Allen School, UW Department of Electrical Engineering, Microsoft, and Twist Bioscience — aims to build a robust dataset of 10,000 images to develop exciting new capabilities for DNA-based data storage and processing.

The MISL launched in 2015 to develop synthetic DNA as an archival storage medium for digital data that is denser and more durable than existing technologies. Now, backed by a $6.3 million grant from the U.S. Defense Advanced Research Projects Agency (DARPA) as part of its Molecular Informatics program, MISL researchers plan to build upon their prior work. Using the trove of visual data that will be assembled as part of the #MemoriesInDNA Project, the team will explore new ways to process and search for data still encoded in DNA — without having to retrieve and convert the images back into their electronic form. It’s the next frontier in the evolution of DNA as a viable — and truly useful — solution for the world’s growing data storage needs.

“Let’s suppose you have a trillion images encoded in DNA and want to find all the photographs that have a red car in them,” Allen School professor Luis Ceze explained in a UW News release. “We want to be able to do that information processing in DNA directly — to search in a smart way and make the molecules themselves carry out that computer vision work.”

People around the world are invited to submit photos of people, places, and moments that they want to remember forever. Here is a sampling of images submitted via the upload site.

To achieve this “smart” search capability, Ceze and his colleagues will leverage the tendency of certain nucleotides that make up DNA molecules to bind themselves to others — adenine (A) to thymine (T), and cytosine (C) to guanine (G). As part of the encoding process, MISL researchers convert the digital data of an image — 0s and 1s — to the A, T, C, and G molecules that make up strands of DNA. To retrieve only those images they are interested in out of the thousands that make up the dataset, without having to convert them back to binary, the researchers plan to introduce a query containing complementary DNA that will cause only those that meet their search criteria to bind to it. The inclusion of magnetic nanoparticles in the query will enable them to pull out the images bound to it with the help of a magnet. The team will also employ machine learning techniques to enable the detailed mapping and encoding of all visual features that may be contained in an image to enable scientists to perform meaningful data processing.

The MISL team has already set a world record for the amount of digital data stored in and successfully retrieved from DNA, from the hip (a video by the band OK Go) to the historic (the Universal Declaration of Human Rights in 100 languages). To develop a robust capability to search digital data within the DNA itself, however, the team needs a significantly larger volume and variety of images to work with. That’s where the #MemoriesInDNA social media campaign, also launched today, comes in.

“It’s your turn to show us what should be preserved in DNA forever,” Ceze said. “We want people to go out and take a picture of something that they want the world to remember — it’s a fun opportunity to send a message to future generations and help our research in the process.”

The team plans to eventually make this digital time capsule — stripped of any personally identifying information — available to researchers around the world.

Ana Mari Cauce and Paul Allen onstage

The Allen School’s contribution to the #MemoriesInDNA digital time capsule: UW President Ana Mari Cauce and Paul G. Allen celebrating the naming of the Paul G. Allen School of Computer Science & Engineering on March 9, 2017.

“It is thrilling to bring computer science and molecular biology together in this project,” said Microsoft senior researcher Karin Strauss, an affiliate associate professor at the Allen School. “There has been amazing progress recently in both areas and, when combined, they can be very powerful in tackling problems created by the massive amounts of data we’ve been generating.”

Other lead contributors to the project include Allen School and Electrical Engineering professor Georg Seelig and Microsoft partner architect Douglas Carmean. Twist Bioscience will supply the synthetic DNA for the project.

Snap a photo for science!

Anyone can contribute to the data set by uploading an original photo via the website memoriesindna.com. Afterward, help the campaign go global and inspire others to participate by sharing your image on social media with the hashtag #MemoriesInDNA.

Read the UW News release here , a related post on the Twist Bioscience blog here, and the Wired article here.

 

January 24, 2018