TY - JOUR
T1 - Improving data archiving practices in ancient genomics
AU - Bergström, Anders
N1 - Data availability statement: The data analysed in this study was obtained from the following archive study accessions: PRJEB54831 (ENA)8,94, PRJEB51180 (ENA)9,95, HRA001777 (GSA)10,96, PRJEB52849 (ENA)11,97, PRJEB42656 (ENA)12,98, PRJEB52230 (ENA)13,99, SRP352154 (SRA)14,100, SRP455000 (SRA)15,101, PRJEB51440 (ENA)16,102, PRJEB56773 (ENA)17,103, PRJEB49291 (ENA)18,104, PRJEB42781 (ENA)19,105, PRJEB46734 (ENA)20,106, SRP356017 (SRA)21,107, PRJEB42269 (ENA)22,108, PRJEB47891 (ENA)23,109, PRJEB54899 (ENA)24,110, PRJEB43715 (ENA)25,111, PRJEB39134 (ENA)26,112, PRJEB46162 (ENA)27,113, PRJEB46875 (ENA)28,114, PRJEB44430 (ENA)29,115, PRJEB42199 (ENA)30,116, PRJEB38555 (ENA)31,117, PRJEB55327 (ENA)32,118, PRJEB56213 (ENA)33,119, PRJEB51862 (ENA)34,120, PRJEB58698 (ENA)35,121, PRJEB62503 (ENA)36,122, PRJEB66319 (ENA)37,123, PRJEB59008 (ENA)38,124, PRJEB61818 (ENA)39,125, PRJEB50368 (ENA)40,126, PRJEB50857 (ENA)41,127, HRA000451 (GSA)42,128, HRA000411 (GSA)43,129, PRJEB53475 (ENA)44,130, PRJEB37782 (ENA)45,131, SRP299553 (SRA)46,132, PRJEB42372 (ENA)47,133, PRJEB66422 (ENA)48,134, PRJEB57364 (ENA)49,135.
Code availability: No custom code was written for this paper.
PY - 2024/7/10
Y1 - 2024/7/10
N2 - Ancient DNA is producing a rich record of past genetic diversity in humans and other species. However, unless the primary data is appropriately archived, its long-term value will not be fully realised. I surveyed publicly archived data from 42 recent ancient genomics studies. Half of the studies archived incomplete datasets, preventing accurate replication and representing a loss of data of potential future use. No studies met all criteria that could be considered best practice. Based on these results, I make six recommendations for data producers: (1) archive all sequencing reads, not just those that aligned to a reference genome, (2) archive read alignments too, but as secondary analysis files, (3) provide correct experiment metadata on samples, libraries and sequencing runs, (4) provide informative sample metadata, (5) archive data from low-coverage and negative experiments, and (6) document archiving choices in papers, and peer review these. Given the reliance on destructive sampling of finite material, ancient genomics studies have a particularly strong responsibility to ensure the longevity and reusability of generated data.
AB - Ancient DNA is producing a rich record of past genetic diversity in humans and other species. However, unless the primary data is appropriately archived, its long-term value will not be fully realised. I surveyed publicly archived data from 42 recent ancient genomics studies. Half of the studies archived incomplete datasets, preventing accurate replication and representing a loss of data of potential future use. No studies met all criteria that could be considered best practice. Based on these results, I make six recommendations for data producers: (1) archive all sequencing reads, not just those that aligned to a reference genome, (2) archive read alignments too, but as secondary analysis files, (3) provide correct experiment metadata on samples, libraries and sequencing runs, (4) provide informative sample metadata, (5) archive data from low-coverage and negative experiments, and (6) document archiving choices in papers, and peer review these. Given the reliance on destructive sampling of finite material, ancient genomics studies have a particularly strong responsibility to ensure the longevity and reusability of generated data.
UR - http://www.scopus.com/inward/record.url?scp=85198053395&partnerID=8YFLogxK
U2 - 10.1101/2023.05.15.540553
DO - 10.1101/2023.05.15.540553
M3 - Article
SN - 2052-4463
VL - 11
JO - Scientific Data
JF - Scientific Data
M1 - 754
ER -