Quality Control in Metagenomics Data

Research output: Chapter in Book/Report/Conference proceedingChapter

3 Citations (Scopus)


Experiments involving metagenomics data are become increasingly commonplace. Processing such data requires a unique set of considerations. Quality control of metagenomics data is critical to extracting pertinent insights. In this chapter, we outline some considerations in terms of study design and other confounding factors that can often only be realized at the point of data analysis.

In this chapter, we outline some basic principles of quality control in metagenomics, including overall reproducibility and some good practices to follow. The general quality control of sequencing data is then outlined, and we introduce ways to process this data by using bash scripts and developing pipelines in Snakemake (Python).

A significant part of quality control in metagenomics is in analyzing the data to ensure you can spot relationships between variables and to identify when they might be confounded. This chapter provides a walkthrough of analyzing some microbiome data (in the R statistical language) and demonstrates a few days to identify overall differences and similarities in microbiome data. The chapter is concluded by discussing remarks about considering taxonomic results in the context of the study and interrogating sequence alignments using the command line.
Original languageEnglish
Title of host publicationMetagenomic Data Analysis
EditorsSuparna Mitra
Place of PublicationNew York
Number of pages34
ISBN (Electronic)978-1-0716-3072-3
ISBN (Print)978-1-0716-3071-6
Publication statusPublished - 1 Jun 2023

Publication series

NameMethods in Molecular Biology


  • Metagenomics
  • Microbial bioinformatics contamination
  • Microbiome bacteria
  • Quality control data
  • Virus

Cite this