Abstract
We re-analyzed the data from a recent large-scale study that reported strong correlations between DNA signatures of microbial organisms and 33 different cancer types and that created machine-learning predictors with near-perfect accuracy at distinguishing among cancers. We found at least two fundamental flaws in the reported data and in the methods: (i) errors in the genome database and the associated computational methods led to millions of false-positive findings of bacterial reads across all samples, largely because most of the sequences identified as bacteria were instead human; and (ii) errors in the transformation of the raw data created an artificial signature, even for microbes with no reads detected, tagging each tumor type with a distinct signal that the machine-learning programs then used to create an apparently accurate classifier. Each of these problems invalidates the results, leading to the conclusion that the microbiome-based classifiers for identifying cancer presented in the study are entirely wrong. These flaws have subsequently affected more than a dozen additional published studies that used the same data and whose results are likely invalid as well.
| Original language | English |
|---|---|
| Article number | e01607-23 |
| Journal | mBIO |
| Volume | 14 |
| Issue number | 5 |
| DOIs | |
| Publication status | Published - 9 Oct 2023 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- bioinformatics
- cancer
- computational biology
- KEYWORDS microbiome
- metagenomics
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver