当前位置: X-MOL 学术PeerJ › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PyDamage: automated ancient damage identification and estimation for contigs in ancient DNA de novo assembly
PeerJ ( IF 2.3 ) Pub Date : 2021-07-27 , DOI: 10.7717/peerj.11845
Maxime Borry 1 , Alexander Hübner 1, 2 , Adam B Rohrlach 3, 4 , Christina Warinner 1, 2, 5
Affiliation  

DNA de novo assembly can be used to reconstruct longer stretches of DNA (contigs), including genes and even genomes, from short DNA sequencing reads. Applying this technique to metagenomic data derived from archaeological remains, such as paleofeces and dental calculus, we can investigate past microbiome functional diversity that may be absent or underrepresented in the modern microbiome gene catalogue. However, compared to modern samples, ancient samples are often burdened with environmental contamination, resulting in metagenomic datasets that represent mixtures of ancient and modern DNA. The ability to rapidly and reliably establish the authenticity and integrity of ancient samples is essential for ancient DNA studies, and the ability to distinguish between ancient and modern sequences is particularly important for ancient microbiome studies. Characteristic patterns of ancient DNA damage, namely DNA fragmentation and cytosine deamination (observed as C-to-T transitions) are typically used to authenticate ancient samples and sequences, but existing tools for inspecting and filtering aDNA damage either compute it at the read level, which leads to high data loss and lower quality when used in combination with de novo assembly, or require manual inspection, which is impractical for ancient assemblies that typically contain tens to hundreds of thousands of contigs. To address these challenges, we designed PyDamage, a robust, automated approach for aDNA damage estimation and authentication of de novo assembled aDNA. PyDamage uses a likelihood ratio based approach to discriminate between truly ancient contigs and contigs originating from modern contamination. We test PyDamage on both on simulated aDNA data and archaeological paleofeces, and we demonstrate its ability to reliably and automatically identify contigs bearing DNA damage characteristic of aDNA. Coupled with aDNA de novo assembly, Pydamage opens up new doors to explore functional diversity in ancient metagenomic datasets.

中文翻译:

PyDamage:古代 DNA 从头组装中重叠群的自动古代损伤识别和估计

DNA 从头组装可用于从短的 DNA 测序读数中重建更长的 DNA(重叠群)片段,包括基因甚至基因组。将这种技术应用于来自考古遗迹(如古粪便和牙结石)的宏基因组数据,我们可以调查过去的微生物组功能多样性,这些多样性在现代微生物组基因目录中可能不存在或代表性不足。然而,与现代样本相比,古代样本往往受到环境污染的影响,导致宏基因组数据集代表古代和现代 DNA 的混合。快速可靠地确定古代样本真实性和完整性的能力对于古代 DNA 研究至关重要,而区分古代和现代序列的能力对于古代微生物组研究尤为重要。古代 DNA 损伤的特征模式,即 DNA 片段化和胞嘧啶脱氨(观察为 C 到 T 转换)通常用于验证古代样本和序列,但用于检查和过滤 aDNA 损伤的现有工具要么在读取级别计算它,当与从头组装结合使用或需要手动检查时,这会导致高数据丢失和低质量,这对于通常包含数万至数十万重叠群的古代组装来说是不切实际的。为了应对这些挑战,我们设计了 PyDamage,这是一种强大的自动化方法,用于 aDNA 损伤估计和从头组装 aDNA 的验证。PyDamage 使用基于似然比的方法来区分真正古老的重叠群和源自现代污染的重叠群。我们在模拟的 aDNA 数据和考古古粪便上测试了 PyDamage,我们证明了它能够可靠地自动识别带有 aDNA 的 DNA 损伤特征的重叠群。结合 aDNA 从头组装,Pydamage 为探索古代宏基因组数据集中的功能多样性开辟了新的大门。
更新日期:2021-07-27
down
wechat
bug