当前位置: X-MOL 学术Mol. Biol. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Integrating Linguistics, Social Structure, and Geography to Model Genetic Diversity within India
Molecular Biology and Evolution ( IF 10.7 ) Pub Date : 2021-01-22 , DOI: 10.1093/molbev/msaa321
Aritra Bose 1 , Daniel E Platt 1 , Laxmi Parida 1 , Petros Drineas 2 , Peristera Paschou 3
Affiliation  

India represents an intricate tapestry of population substructure shaped by geography, language, culture and social stratification. While geography closely correlates with genetic structure in other parts of the world, the strict endogamy imposed by the Indian caste system and the large number of spoken languages add further levels of complexity to understand Indian population structure. To date, no study has attempted to model and evaluate how these factors have interacted to shape the patterns of genetic diversity within India. We merged all publicly available data from the Indian subcontinent into a dataset of 891 individuals from 90 well-defined groups. Bringing together geography, genetics and demographic factors, we developed COGG (Correlation Optimization of Genetics and Geodemographics) to build a model that explains the observed population genetic substructure. We show that shared language along with social structure have been the most powerful forces in creating paths of gene flow in the subcontinent. Furthermore, we discover the ethnic groups that best capture the diverse genetic substructure using a ridge leverage score statistic. Integrating data from India with a dataset of additional 1,323 individuals from 50 Eurasian populations we find that Indo-European and Dravidian speakers of India show shared genetic drift with Europeans, whereas the Tibeto-Burman speaking tribal groups have maximum shared genetic drift with East Asians.

中文翻译:

整合语言学、社会结构和地理来模拟印度的遗传多样性

印度是由地理、语言、文化和社会分层塑造的错综复杂的人口基础结构。虽然地理与世界其他地区的遗传结构密切相关,但印度种姓制度实行的严格的内婚制和大量的口头语言进一步增加了了解印度人口结构的复杂性。迄今为止,还没有研究试图模拟和评估这些因素如何相互作用以塑造印度境内的遗传多样性模式。我们将印度次大陆的所有公开数据合并为一个数据集,其中包含来自 90 个定义明确的群体的 891 名个体。我们将地理、遗传学和人口因素结合在一起,开发了 COGG(遗传学和地理人口统计学的相关优化)来构建一个模型来解释观察到的种群遗传子结构。我们表明,共享语言和社会结构是在次大陆创造基因流动路径的最强大力量。此外,我们使用岭杠杆得分统计发现了最能捕捉多样化遗传子结构的种族群体。将印度的数据与来自 50 个欧亚人群的另外 1,323 个人的数据集相结合,我们发现印度的印欧语和德拉威语族显示出与欧洲人共同的遗传漂变,而藏缅语部落群体与东亚人的共同遗传漂变最大。
更新日期:2021-01-22
down
wechat
bug