TY - JOUR
T1 - Population structure, stratification, and introgression of human structural variation
AU - Almarri, Mohamed A.
AU - Bergström, Anders
AU - Prado-Martinez, Javier
AU - Yang, Fengtang
AU - Fu, Beiyuan
AU - Dunham, Alistair S.
AU - Chen, Yuan
AU - Hurles, Matthew E.
AU - Tyler-Smith, Chris
AU - Xue, Yali
N1 - Funding information: M.A.A., A.B., J.P.-M., A.S.D., Y.C., C.T.-S., and Y.X. were supported by Wellcome grant 098051. M.A.A. was supported by the Government of Dubai – Dubai Police GHQ. A.B. was supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (FC001595), the UK Medical Research Council (FC001595), and the Wellcome Trust (FC001595).
PY - 2020/7/9
Y1 - 2020/7/9
N2 - Structural variants contribute substantially to genetic diversity and are important evolutionarily and medically, but they are still understudied. Here we present a comprehensive analysis of structural variation in the Human Genome Diversity panel, a high-coverage dataset of 911 samples from 54 diverse worldwide populations. We identify, in total, 126,018 variants, 78% of which were not identified in previous global sequencing projects. Some reach high frequency and are private to continental groups or even individual populations, including regionally restricted runaway duplications and putatively introgressed variants from archaic hominins. By de novo assembly of 25 genomes using linked-read sequencing, we discover 1,643 breakpoint-resolved unique insertions, in aggregate accounting for 1.9 Mb of sequence absent from the GRCh38 reference. Our results illustrate the limitation of a single human reference and the need for high-quality genomes from diverse populations to fully discover and understand human genetic variation.
AB - Structural variants contribute substantially to genetic diversity and are important evolutionarily and medically, but they are still understudied. Here we present a comprehensive analysis of structural variation in the Human Genome Diversity panel, a high-coverage dataset of 911 samples from 54 diverse worldwide populations. We identify, in total, 126,018 variants, 78% of which were not identified in previous global sequencing projects. Some reach high frequency and are private to continental groups or even individual populations, including regionally restricted runaway duplications and putatively introgressed variants from archaic hominins. By de novo assembly of 25 genomes using linked-read sequencing, we discover 1,643 breakpoint-resolved unique insertions, in aggregate accounting for 1.9 Mb of sequence absent from the GRCh38 reference. Our results illustrate the limitation of a single human reference and the need for high-quality genomes from diverse populations to fully discover and understand human genetic variation.
U2 - 10.1016/j.cell.2020.05.024
DO - 10.1016/j.cell.2020.05.024
M3 - Article
VL - 182
SP - 189-199.e15
JO - Cell
JF - Cell
SN - 0092-8674
IS - 1
ER -