Abstract
Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.
Original language | English |
---|---|
Pages (from-to) | 1063-1071 |
Number of pages | 9 |
Journal | Nature Methods |
Volume | 14 |
Issue number | 11 |
Early online date | 2 Oct 2017 |
DOIs | |
Publication status | Published - 1 Nov 2017 |
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver
}
Critical assessment of metagenome interpretation - A benchmark of metagenomics software. / Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter; Koslicki, David; Janssen, Stefan; Dröge, Johannes; Gregor, Ivan; Majda, Stephan; Fiedler, Jessika; Dahms, Eik; Bremges, Andreas; Fritz, Adrian; Garrido-Oter, Ruben; Jørgensen, Tue Sparholt; Shapiro, Nicole; Blood, Philip D.; Gurevich, Alexey; Bai, Yang; Turaev, Dmitrij; Demaere, Matthew Z.; Chikhi, Rayan; Nagarajan, Niranjan; Quince, Christopher; Meyer, Fernando; Balvočiutė, Monika; Hansen, Lars Hestbjerg; Sørensen, Søren J.; Chia, Burton K. H.; Denis, Bertrand; Froula, Jeff L.; Wang, Zhong; Egan, Robert; Don Kang, Dongwan; Cook, Jeffrey J.; Deltel, Charles; Beckstette, Michael; Lemaitre, Claire; Peterlongo, Pierre; Rizk, Guillaume; Lavenier, Dominique; Wu, Yu Wei; Singer, Steven W.; Jain, Chirag; Strous, Marc; Klingenberg, Heiner; Meinicke, Peter; Barton, Michael D.; Lingner, Thomas; Lin, Hsin Hung; Liao, Yu Chieh; Silva, Genivaldo Gueiros Z.; Cuevas, Daniel A.; Edwards, Robert A.; Saha, Surya; Piro, Vitor C.; Renard, Bernhard Y.; Pop, Mihai; Klenk, Hans Peter; Göker, Markus; Kyrpides, Nikos C.; Woyke, Tanja; Vorholt, Julia A.; Schulze-Lefert, Paul; Rubin, Edward M.; Darling, Aaron E.; Rattei, Thomas; McHardy, Alice C.
In: Nature Methods, Vol. 14, No. 11, 01.11.2017, p. 1063-1071.Research output: Contribution to journal › Article › peer-review
TY - JOUR
T1 - Critical assessment of metagenome interpretation - A benchmark of metagenomics software
AU - Sczyrba, Alexander
AU - Hofmann, Peter
AU - Belmann, Peter
AU - Koslicki, David
AU - Janssen, Stefan
AU - Dröge, Johannes
AU - Gregor, Ivan
AU - Majda, Stephan
AU - Fiedler, Jessika
AU - Dahms, Eik
AU - Bremges, Andreas
AU - Fritz, Adrian
AU - Garrido-Oter, Ruben
AU - Jørgensen, Tue Sparholt
AU - Shapiro, Nicole
AU - Blood, Philip D.
AU - Gurevich, Alexey
AU - Bai, Yang
AU - Turaev, Dmitrij
AU - Demaere, Matthew Z.
AU - Chikhi, Rayan
AU - Nagarajan, Niranjan
AU - Quince, Christopher
AU - Meyer, Fernando
AU - Balvočiutė, Monika
AU - Hansen, Lars Hestbjerg
AU - Sørensen, Søren J.
AU - Chia, Burton K. H.
AU - Denis, Bertrand
AU - Froula, Jeff L.
AU - Wang, Zhong
AU - Egan, Robert
AU - Don Kang, Dongwan
AU - Cook, Jeffrey J.
AU - Deltel, Charles
AU - Beckstette, Michael
AU - Lemaitre, Claire
AU - Peterlongo, Pierre
AU - Rizk, Guillaume
AU - Lavenier, Dominique
AU - Wu, Yu Wei
AU - Singer, Steven W.
AU - Jain, Chirag
AU - Strous, Marc
AU - Klingenberg, Heiner
AU - Meinicke, Peter
AU - Barton, Michael D.
AU - Lingner, Thomas
AU - Lin, Hsin Hung
AU - Liao, Yu Chieh
AU - Silva, Genivaldo Gueiros Z.
AU - Cuevas, Daniel A.
AU - Edwards, Robert A.
AU - Saha, Surya
AU - Piro, Vitor C.
AU - Renard, Bernhard Y.
AU - Pop, Mihai
AU - Klenk, Hans Peter
AU - Göker, Markus
AU - Kyrpides, Nikos C.
AU - Woyke, Tanja
AU - Vorholt, Julia A.
AU - Schulze-Lefert, Paul
AU - Rubin, Edward M.
AU - Darling, Aaron E.
AU - Rattei, Thomas
AU - McHardy, Alice C.
N1 - Funding Information: We thank C. Della Beffa, J. Alneberg, D. Huson and P. Grupp for their input, and the Isaac Newton Institute for Mathematical Sciences for its hospitality during the MTG program (supported by UK Engineering and Physical Sciences Research Council (EPSRC) grant EP/K032208/1). Sequencing at the US Department of Energy Joint Genome Institute was supported under contract DE-AC02-05CH11231. R.G.O. was supported by the Cluster of Excellence on Plant Sciences program of the Deutsche Forschungsgemeinschaft; A.E.D. and M.Z.D., through the Australian Research Council’s Linkage Projects (LP150100912); J.A.V., by the European Research Council advanced grant (PhyMo); D.B., B.K.H.C. and N.N., by the Agency for Science, Technology and Research (A*STAR), Singapore; T.S.J., by the Lundbeck Foundation (project DK nr R44-A4384); L.H.H. by a VILLUM FONDEN Block Stipend on Mobilomics; and P.D.B. by the National Science Foundation (NSF, grant DBI-1458689). This work used the Bridges and Blacklight systems, supported by NSF awards ACI-1445606 and Funding Information: ACI-1041726, respectively, at the Pittsburgh Supercomputing Center (PSC), under the Extreme Science and Engineering Discovery Environment (XSEDE), supported by NSF grant OCI-1053575.
PY - 2017/11/1
Y1 - 2017/11/1
N2 - Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.
AB - Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.
UR - http://www.scopus.com/inward/record.url?scp=85030454987&partnerID=8YFLogxK
U2 - 10.1038/nmeth.4458
DO - 10.1038/nmeth.4458
M3 - Article
C2 - 28967888
AN - SCOPUS:85030454987
VL - 14
SP - 1063
EP - 1071
JO - Nature Methods
JF - Nature Methods
SN - 1548-7091
IS - 11
ER -