当前位置: X-MOL 学术Royal Soc. Open Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Factors influencing taxonomic unevenness in scientific research: a mixed-methods case study of non-human primate genomic sequence data generation
Royal Society Open Science ( IF 2.9 ) Pub Date : 2020-09-30 , DOI: 10.1098/rsos.201206
Margarita Hernandez 1 , Mary K. Shenk 1 , George H. Perry 1, 2, 3
Affiliation  

Scholars have noted major disparities in the extent of scientific research conducted among taxonomic groups. Such trends may cascade if future scientists gravitate towards study species with more data and resources already available. As new technologies emerge, do research studies employing these technologies continue these disparities? Here, using non-human primates as a case study, we identified disparities in massively parallel genomic sequencing data and conducted interviews with scientists who produced these data to learn their motivations when selecting study species. We tested whether variables including publication history and conservation status were significantly correlated with publicly available sequence data in the NCBI Sequence Read Archive (SRA). Of the 179.6 terabases (Tb) of sequence data in SRA for 519 non-human primate species, 135 Tb (approx. 75%) were from only five species: rhesus macaques, olive baboons, green monkeys, chimpanzees and crab-eating macaques. The strongest predictors of the amount of genomic data were the total number of non-medical publications (linear regression; r2 = 0.37; p = 6.15 × 10−12) and number of medical publications (r2 = 0.27; p = 9.27 × 10−9). In a generalized linear model, the number of non-medical publications (p = 0.00064) and closer phylogenetic distance to humans (p = 0.024) were the most predictive of the amount of genomic sequence data. We interviewed 33 authors of genomic data-producing publications and analysed their responses using grounded theory. Consistent with our quantitative results, authors mentioned their choice of species was motivated by sample accessibility, prior published work and relevance to human medicine. Our mixed-methods approach helped identify and contextualize some of the driving factors behind species-uneven patterns of scientific research, which can now be considered by funding agencies, scientific societies and research teams aiming to align their broader goals with future data generation efforts.



中文翻译:

科学研究中影响生物分类不平衡的因素:非人类灵长类动物基因组序列数据生成的混合方法案例研究

学者们注意到分类学团体之间进行的科学研究范围存在重大差异。如果未来的科学家倾向于研究具有更多数据和资源的物种,则这种趋势可能会层叠。随着新技术的出现,采用这些技术的研究是否会继续这些差距?在这里,我们以非人类灵长类动物为例,在大规模平行基因组测序数据中发现了差异,并采访了产生这些数据的科学家,以了解他们在选择研究物种时的动机。我们测试了包括出版历史和保存状态在内的变量是否与NCBI序列阅读档案(SRA)中的公开可用序列数据显着相关。在SRA中针对519种非人类灵长类物种的179.6 terabaase(Tb)序列数据中,135 Tb(约75%)仅来自以下5种:恒河猴,橄榄狒狒,绿猴,黑猩猩和食蟹猕猴。基因组数据量的最强预测因子是非医学出版物的总数(线性回归;r 2 = 0.37;p = 6.15×10 -12)和医学出版物的数量(r 2 = 0.27;p = 9.27×10 -9)。在广义线性模型中,非医学出版物的数量(p = 0.00064)和与人类的亲缘距离更近(p= 0.024)是最能预测基因组序列数据量的数据。我们采访了33位基因组数据产生出版物的作者,并使用扎根理论分析了他们的反应。与我们的定量结果相一致,作者提到他们对物种的选择是受样品可及性,先前发表的工作以及与人类医学的相关性所驱动。我们的混合方法方法有助于识别物种背后的某些驱动因素并将其背景化-科学研究模式不均匀,供资机构,科学协会和研究团队现在可以考虑这些方法,以使其更广泛的目标与未来的数据生成工作保持一致。

更新日期:2020-09-30
down
wechat
bug