Exploiting sparseness in de novo genome assembly

Chengxi Ye, Zhanshan Sam Ma, Charles H. Cannon, Mihai Pop, Douglas W. Yu

Research output: Contribution to journalArticlepeer-review

160 Citations (Scopus)

Abstract

Background:
The very large memory requirements for the construction of assembly graphs for de novo genome assembly limit current algorithms to super-computing environments.

Methods:
In this paper, we demonstrate that constructing a sparse assembly graph which stores only a small fraction of the observed k- mers as nodes and the links between these nodes allows the de novo assembly of even moderately-sized genomes (~500 M) on a typical laptop computer.

Results:
We implement this sparse graph concept in a proof-of-principle software package, SparseAssembler, utilizing a new sparse k- mer graph structure evolved from the de Bruijn graph. We test our SparseAssembler with both simulated and real data, achieving ~90% memory savings and retaining high assembly accuracy, without sacrificing speed in comparison to existing de novo assemblers.
Original languageEnglish
Article numberS1
Number of pages8
JournalBMC Bioinformatics
Volume13
Issue numberSuppl 6
DOIs
Publication statusPublished - 19 Apr 2012

Cite this