当前位置: X-MOL 学术bioRxiv. Genom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Integrating linguistics, social structure, and geography to model genetic diversity within India
bioRxiv - Genomics Pub Date : 2020-08-10 , DOI: 10.1101/164640
Aritra Bose , Daniel E. Platt , Laxmi Parida , Petros Drineas , Peristera Paschou

India represents an intricate tapestry of population substructure shaped by geography, language, culture and social stratification. While geography closely correlates with genetic structure in other parts of the world, the strict endogamy imposed by the Indian caste system and the large number of spoken languages add further levels of complexity to understand Indian population structure. To date, no study has attempted to model and evaluate how these factors have interacted to shape the patterns of genetic diversity within India. We merged all publicly available data from the Indian subcontinent into a data set of 891 individuals from 90 well-defined groups. Bringing together geography, genetics and demographic factors, we developed COGG (Correlation Optimization of Genetics and Geodemographics) to build a model that explains the observed population genetic substructure. We show that shared language along with social structure have been the most powerful forces in creating paths of gene flow in the subcontinent. Furthermore, we discover the ethnic groups that best capture the diverse genetic substructure highlighted by COGG. Integrating data from India with a data set of additional 1,323 individuals from 50 populations we find that Europeans show shared genetic drift with the Indo-European and Dravidian speakers of India, whereas the East Asians have the maximum shared genetic drift with Tibeto-Burman speaking tribal groups.

中文翻译:

整合语言学,社会结构和地理来模拟印度境内的遗传多样性

印度代表着由地理,语言,文化和社会分层形成的错综复杂的人口子结构。尽管地理与世界其他地区的遗传结构密切相关,但印度种姓制度所施加的严格的内婚制以及大量的口语为理解印度人口结构增加了进一步的复杂性。迄今为止,还没有研究试图模拟和评估这些因素如何相互作用以塑造印度境内的遗传多样性模式。我们将印度次大陆的所有公开可用数据合并到来自90个明确定义的组的891个人的数据集中。综合地理,遗传和人口因素,我们开发了COGG(遗传学和地理人口统计学的相关性优化),以建立一个模型来解释观察到的种群遗传子结构。我们表明,共享语言以及社会结构一直是在次大陆创建基因流动路径的最强大力量。此外,我们发现最能捕捉COGG突出显示的多样遗传亚结构的种族群体。将来自印度的数据与来自50个人口的1,323个人的数据集整合在一起,我们发现欧洲人与印度裔印度裔和德拉维裔讲者共享遗传漂移,而东亚人与藏缅语族讲的共享遗传漂移最大组。我们表明,共享语言以及社会结构一直是在次大陆创建基因流动路径的最强大力量。此外,我们发现最能捕捉COGG突出显示的多样遗传亚结构的种族群体。将来自印度的数据与来自50个人口的1,323个人的数据集整合在一起,我们发现欧洲人与印度裔印度裔和德拉维裔讲者共享遗传漂移,而东亚人与藏缅语族讲的共享遗传漂移最大组。我们表明,共享语言以及社会结构一直是在次大陆创建基因流动路径的最强大力量。此外,我们发现最能捕捉COGG突出显示的多样遗传亚结构的种族群体。将来自印度的数据与来自50个人口的1,323个人的数据集整合在一起,我们发现欧洲人与印度裔印度裔和德拉维裔讲者共享遗传漂移,而东亚人与藏缅语族讲的共享遗传漂移最大组。
更新日期:2020-08-11
down
wechat
bug