Scientists from the Data Science Institute at Columbia University and the New York Genome Center (NYGC) published new research this week detailing a new data storage technique that leverages DNA molecules to store digital information.
Deoxyribonucleic acid, commonly known by its acronym of DNA, is the molecule around which all life revolves. In nature, DNA works by storing information about different forms of life and its characteristics using four nucleotides: A, G, C and T.
In essence, DNA works just like your hard drive, but instead of binary ones and zeros to store digital data, it uses a quaternary base to store information about a living organism’s genes.
STORE EMBEDDED FILES IN DNA MOLECULES
In previous experiments, DNA can be created from scratch using DNA sequencing, putting together the DNA gene sequence. However, through this technique, DNA could be used to store binary data.
Furthermore, Columbia scientists refined the technique that converted digital data into molecular sequences and optimized DNA’s storage capacity.
The Researchers successfully stored six files inside DNA molecules which includes :
♦ a full computer operating system (KolibriOS)
♦ a $50 Amazon gift card
♦ a computer virus
♦ a 1895 French film – “Arrival of a train at La Ciotat”
♦ a Pioneer plaque
♦ a 1948 study by information theorist Claude Shannon
BINARY CODE – DNA SEQUENCES CONVERSION
According to new research, which comes courtesy of Yaniv Erlich and Dina Zielinski, has been published in the journal Science.
Researchers took these six files and compressed them in an archive. They then used an ” erasure-correcting algorithm called fountain code ” to randomly package binary data strings into “droplets”.
The researchers then “ mapped the binary code of each droplet to the four DNA nucleotides : A, G, C and T.”
At the end, the six archived files were converted to 72,000 DNA strands, each one consisting of 200 DNA base pairs.
This data stored in a text file was then sent to a DNA sequencing laboratory in San Francisco, who sequenced the DNA strands into actual biological DNA molecules.
The research team then used special software coded in Python (GitHub) to read the DNA molecules and reassemble the data. A video is available on DNA Fountain, showing one of the researchers powering up the operating system retrieved from the DNA molecules and then playing Minecraft.
PERFECT DATA STORAGE MEDIUM
The researchers believe that DNA is the perfect storage medium – as it is ultra-compact, can last for 1000+ years.
So, DNA data storage could help big organizations store an enormous amount of information in a way that one can still be able to read it in a hundred years.
However, it costs $7,000 to sequence 2MB of data into DNA, and then another $2,000 to read it.
Additionally, researchers discovered that technically you could only store a maximum of 1.8 bits of data per DNA nucleotide base. However, previous research achieved 1.0 bit storage capacity per DNA nucleotide base, while the Columbia team reached 1.6 bits.
Through time and further research scientists hope to maximize their method’s data storage capacity for DNA nucleotide bases, and also reduce the cost of writing and reading data from DNA.