Benchmarking of viral haplotype reconstruction programmes: An overview of the capacities and limitations of currently available programmes

Melanie Schirmer, William T. Sloan, Christopher Quince

Research output: Contribution to journalArticlepeer-review

49 Citations (Scopus)

Abstract

Viral haplotype reconstruction from a set of observed reads is one of the most challenging problems in bioinformatics today. Next-generation sequencing technologies enable us to detect single-nucleotide polymorphisms (SNPs) of haplotypes-even if the haplotypes appear at low frequencies. However, there are two major problems. First, we need to distinguish real SNPs from sequencing errors. Second, we need to determine which SNPs occur on the same haplotype, which cannot be inferred from the reads if the distance between SNPs on a haplotype exceeds the read length. We conducted an independent benchmarking study that directly compares the currently available viral haplotype reconstruction programmes. We also present nine in silico data sets that we generated to reflect biologically plausible populations. For these data sets, we simulated 454 and Illumina reads and applied the programmes to test their capacity to reconstruct whole genomes and individual genes. We developed a novel statistical framework to demonstrate the strengths and limitations of the programmes. Our benchmarking demonstrated that all the programmes we tested performed poorly when sequence divergence was low and failed to recover haplotype populations with rare haplotypes.

Original languageEnglish
Article numberbbs081
Pages (from-to)431-442
Number of pages12
JournalBriefings in Bioinformatics
Volume15
Issue number3
DOIs
Publication statusPublished - May 2014

Keywords

  • Benchmarking
  • In silico data sets
  • Quasispecies
  • Statistics for validation
  • Viral haplotype reconstruction programmes

Cite this