TY - JOUR
T1 - Interpreting machine learning models to investigate circadian regulation and facilitate exploration of clock function
AU - Gardiner, Laura-Jayne
AU - Rusholme-Pilcher, Rachel
AU - Colmer, Josh
AU - Rees, Hannah
AU - Crescente, Juan Manuel
AU - Carrieri, Anna Paola
AU - Duncan, Susan
AU - Pyzer-Knapp, Edward O.
AU - Krishna, Ritesh
AU - Hall, Anthony
PY - 2021/8/10
Y1 - 2021/8/10
N2 - The circadian clock is an important adaptation to life on Earth. Here, we use machine learning to predict complex, temporal, and circadian gene expression patterns in Arabidopsis. Most significantly, we classify circadian genes using DNA sequence features generated de novo from public, genomic resources, facilitating downstream application of our methodswith no experimental work or prior knowledge needed. We use local model explanation that is transcript specific to rank DNA sequence features, providing a detailed profile of the potential circadian regulatory mechanisms for each transcript. Furthermore, we can discriminate the temporal phase of transcript expression using the local, explanation-derived, and ranked DNA sequence features, revealing hidden subclasses within the circadian class. Model interpretation/explanation provides the backbone of our methodological advances, giving insight into biological processes and experimental design. Next, we use model interpretation to optimize sampling strategies when we predict circadian transcripts using reduced numbers of transcriptomic timepoints. Finally, we predict the circadian time from a single, transcriptomic timepoint, deriving marker transcripts that are most impactful for accurate prediction; this could facilitate the identification of altered clock function from existing datasets.
AB - The circadian clock is an important adaptation to life on Earth. Here, we use machine learning to predict complex, temporal, and circadian gene expression patterns in Arabidopsis. Most significantly, we classify circadian genes using DNA sequence features generated de novo from public, genomic resources, facilitating downstream application of our methodswith no experimental work or prior knowledge needed. We use local model explanation that is transcript specific to rank DNA sequence features, providing a detailed profile of the potential circadian regulatory mechanisms for each transcript. Furthermore, we can discriminate the temporal phase of transcript expression using the local, explanation-derived, and ranked DNA sequence features, revealing hidden subclasses within the circadian class. Model interpretation/explanation provides the backbone of our methodological advances, giving insight into biological processes and experimental design. Next, we use model interpretation to optimize sampling strategies when we predict circadian transcripts using reduced numbers of transcriptomic timepoints. Finally, we predict the circadian time from a single, transcriptomic timepoint, deriving marker transcripts that are most impactful for accurate prediction; this could facilitate the identification of altered clock function from existing datasets.
KW - Circadian
KW - Explainable AI
KW - Function
KW - Regulation
KW - Transcriptome
UR - http://www.scopus.com/inward/record.url?scp=85112485292&partnerID=8YFLogxK
U2 - 10.1073/pnas.2103070118
DO - 10.1073/pnas.2103070118
M3 - Article
SN - 0027-8424
VL - 118
JO - Proceedings of the National Academy of Sciences of the United States of America (PNAS)
JF - Proceedings of the National Academy of Sciences of the United States of America (PNAS)
IS - 32
M1 - e2103070118
ER -