Bayesian unsupervised clustering identifies clinically relevant osteosarcoma subtypes

Research output: Contribution to journalArticlepeer-review

5 Downloads (Pure)

Abstract

Identification of cancer subtypes is a critical step for developing precision medicine. Most cancer subtyping is based on the analysis of RNA sequencing (RNA-seq) data from patient cohorts using unsupervised machine learning methods such as hierarchical cluster analysis, but these computational approaches disregard the heterogeneous composition of individual cancer samples. Here, we used a more sophisticated unsupervised Bayesian model termed latent process decomposition (LPD), which handles individual cancer sample heterogeneity and deconvolutes the structure of transcriptome data to provide clinically relevant information. The work was performed on the pediatric tumor osteosarcoma, which is a prototypical model for a rare and heterogeneous cancer. The LPD model detected three osteosarcoma subtypes. The subtype with the poorest prognosis was validated using independent patient datasets. This new stratification framework will be important for more accurate diagnostic labeling, expediting precision medicine, and improving clinical trial success. Our results emphasize the importance of using more sophisticated machine learning approaches (and for teaching deep learning and artificial intelligence) for RNA-seq data analysis, which may assist drug targeting and clinical management.
Original languageEnglish
Article numberbbae665
JournalBriefings in Bioinformatics
Volume26
Issue number1
Early online date20 Dec 2024
DOIs
Publication statusPublished - Jan 2025

Keywords

  • heterogeneity
  • latent process decomposition
  • osteosarcoma
  • precision medicine
  • RNA-seq

Cite this