A team of researchers in the Molecular Information Systems Lab, a collaboration between the University of Washington and Microsoft Research, worked with DNA synthesis company Twist Bioscience to encode two archival-quality audio recordings from the world-renowned Montreux Jazz Festival in nature’s perfect storage medium. The preservation of “Smoke on the Water” by Deep Purple and “Tutu” by Miles Davis represent the first time that DNA has been used for long-term archival storage — making the songs not only pieces of musical history, but now pieces of scientific history, as well. The project builds upon work by the MISL team to develop a next-generation digital storage system using DNA.
“Storing items from the Montreux Jazz Festival is a perfect way to show how fast DNA digital data storage is becoming real,” he said.
The team’s latest effort to illustrate the potential of a DNA-based storage system for digital date grew out of a partnership between the Claude Nobs Foundation — curator of the festival’s audio-visual collection — and the École Polytechnique Fédérale de Lausanne (EPFL) on the Montreux Jazz Digital Project, which aims to digitize, store, preserve and share the musical legacy of festival founder Claude Nobs. Whereas existing recordings in the collection may last a decade before they need to be replaced, a DNA-based archival storage system could preserve the same material for thousands of years.
The two songs preserved as a proof-of-concept by UW, Microsoft, and Twist amounted to 140 megabytes of data. According to Microsoft researcher and Allen School affiliate professor Karin Strauss, that represents barely a drop in the bucket when it comes to the potential storage capacity of DNA.
“The amount of DNA used to store these songs is much smaller than one grain of sand,” she noted. “Amazingly, storing the entire six petabyte Montreux Jazz Festival’s collection would result in DNA smaller than one grain of rice.”
Allen School Ph.D. student Lee Organick, MISL lab manager David Ward, and Microsoft researchers Siena Dumas and Yuan-Jyue Chen of Microsoft were part of the team that worked with Twist Bioscience to encode, decode, and analyze the DNA samples in which the iconic recordings were preserved. The team converted the audio files from binary code — 0s and 1s — to the four nucleotide bases that make up a strand of DNA: A, C, G, and T (adenine, cytosine, guanine, and thymine). After the DNA was sequenced, the team decoded and read it back to confirm 100% accuracy.
The decoded versions were played at a forum hosted by the ArtTech Foundation in Lausanne, Switzerland today. The DNA-based recordings represent part of UNESCO’s Memory of the World Register, which includes a collection of more than 5,000 hours of Montreux Jazz Festival concerts.
“The UNESCO archive provides the perfect use-case for testing our approach,” Ceze said. “Thanks to Twist and the Montreux Jazz Festival, our team had a unique opportunity to apply cutting-edge digital storage research to preserving a sliver of cultural heritage for posterity.”
Illustration: The lyrics of Deep Purple’s Smoke on the Water encoded into DNA. Each letter, space and punctuation mark are represented by a unique triplet of the four bases (A, T, G, C), the building blocks of DNA. For example, “smoke” becomes GACCGACGTCAGAGC. Credit: Martin Krzywinski, courtesy of Twist Bioscience.