Projects per year
Abstract
Background
Metatranscriptome sequence data can contain highly redundant sequences from diverse populations of microbes and so data reduction techniques are often applied before taxonomic and functional annotation. For metagenomic data, it has been observed that the variable coverage and presence of closely related organisms can lead to fragmented assemblies containing chimeric contigs that may reduce the accuracy of downstream analyses and some advocate the use of alternate data reduction techniques. However, it is unclear how such data reduction techniques impact the annotation of metatranscriptome data and thus affect the interpretation of the results.
Results
To investigate the effect of such techniques on the annotation of metatranscriptome data we assess two commonly employed methods: clustering and de-novo assembly. To do this, we also developed an approach to simulate 454 and Illumina metatranscriptome data sets with varying degrees of taxonomic diversity. For the Illumina simulations, we found that a two-step approach of assembly followed by clustering of contigs and unassembled sequences produced the most accurate reflection of the real protein domain content of the sample. For the 454 simulations, the combined annotation of contigs and unassembled reads produced the most accurate protein domain annotations.
Conclusions
Based on these data we recommend that assembly be attempted, and that unassembled reads be included in the final annotation for metatranscriptome data, even from highly diverse environments as the resulting annotations should lead to a more accurate reflection of the transcriptional behaviour of the microbial population under investigation.
Metatranscriptome sequence data can contain highly redundant sequences from diverse populations of microbes and so data reduction techniques are often applied before taxonomic and functional annotation. For metagenomic data, it has been observed that the variable coverage and presence of closely related organisms can lead to fragmented assemblies containing chimeric contigs that may reduce the accuracy of downstream analyses and some advocate the use of alternate data reduction techniques. However, it is unclear how such data reduction techniques impact the annotation of metatranscriptome data and thus affect the interpretation of the results.
Results
To investigate the effect of such techniques on the annotation of metatranscriptome data we assess two commonly employed methods: clustering and de-novo assembly. To do this, we also developed an approach to simulate 454 and Illumina metatranscriptome data sets with varying degrees of taxonomic diversity. For the Illumina simulations, we found that a two-step approach of assembly followed by clustering of contigs and unassembled sequences produced the most accurate reflection of the real protein domain content of the sample. For the 454 simulations, the combined annotation of contigs and unassembled reads produced the most accurate protein domain annotations.
Conclusions
Based on these data we recommend that assembly be attempted, and that unassembled reads be included in the final annotation for metatranscriptome data, even from highly diverse environments as the resulting annotations should lead to a more accurate reflection of the transcriptional behaviour of the microbial population under investigation.
Original language | English |
---|---|
Article number | 901 |
Journal | BMC Genomics |
Volume | 15 |
DOIs | |
Publication status | Published - 15 Oct 2014 |
Keywords
- Metratranscriptomics
- Sequence processing
- Data reduction
- Clustering
- Assembly
Profiles
-
Vincent Moulton
- School of Computing Sciences - Professor in Computational Biology
- Norwich Epidemiology Centre - Member
- Computational Biology - Member
Person: Research Group Member, Academic, Teaching & Research
-
Simon Moxon
- School of Biological Sciences - Associate Professor in Bioinformatics
Person: Academic, Teaching & Research
Projects
- 1 Finished
-
From the North Sea to the Arctic Ocean; The Impact of Temperature on Eukaryotic Phytoplankton.
Mock, T., Moulton, V., Toseland, A., Toseland, A., Utting, R. & Utting, R.
Natural Environment Research Council
29/07/13 → 28/07/16
Project: Research