当前位置: X-MOL 学术bioRxiv. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimizing the molecular diagnosis of Covid-19 by combining RT-PCR and a pseudo-convolutional machine learning approach to characterize virus DNA sequences
bioRxiv - Bioinformatics Pub Date : 2020-09-28 , DOI: 10.1101/2020.06.02.129775
Juliana Carneiro Gomes , Aras Ismael Masood , Leandro Honorato de S. Silva , Janderson Ferreira , Agostinho A. F. Júnior , Allana Lais dos Santos Rocha , Letícia Castro , Nathália R. C. da Silva , Bruno J. T. Fernandes , Wellington Pinheiro dos Santos

The proliferation of the SARS-Cov-2 virus to the whole world caused more than 250,000 deaths worldwide and over 4 million confirmed cases. The severity of Covid-19, the exponential rate at which the virus proliferates, and the rapid exhaustion of the public health resources are critical factors. The RT-PCR with virus DNA identification is still the benchmark Covid-19 diagnosis method. In this work we propose a new technique for representing DNA sequences: they are divided into smaller sequences with overlap in a pseudo-convolutional approach, and represented by co-occurrence matrices. This technique analyzes the DNA sequences obtained by the RT-PCR method, eliminating sequence alignment. Through the proposed method, it is possible to identify virus sequences from a large database: 347,363 virus DNA sequences from 24 virus families and SARSCov-2. Experiments with all 24 virus families and SARS-Cov-2 (multi-class scenario) resulted 0.822222 ± 0.05613 for sensitivity and 0.99974 ± 0.00001 for specificity using Random Forests with 100 trees and 30% overlap. When we compared SARS-Cov-2 with similar-symptoms virus families, we got 0.97059 ± 0.03387 for sensitivity, and 0.99187 ± 0.00046 for specificity with MLP classifier and 30% overlap. In the real test scenario, in which SARS-Cov-2 is compared to Coronaviridae and healthy human DNA sequences, we got 0.98824 ± 001198 for sensitivity and 0.99860 ± 0.00020 for specificity with MLP and 50% overlap. Therefore, the molecular diagnosis of Covid-19 can be optimized by combining RT-PCR and our pseudo-convolutional method to identify SARS-Cov-2 DNA sequences faster with higher specificity and sensitivity.

中文翻译:

通过结合RT-PCR和拟卷积机器学习方法表征病毒DNA序列来优化Covid-19的分子诊断

SARS-Cov-2病毒向全世界的扩散导致全球25万多人死亡,400万例确诊病例。Covid-19的严重程度,病毒扩散的指数速率以及公共卫生资源的迅速枯竭是关键因素。带有病毒DNA鉴定的RT-PCR仍然是Covid-19基准诊断方法。在这项工作中,我们提出了一种代表DNA序列的新技术:以伪卷积方法将它们分成重叠的较小序列,并以共现矩阵表示。该技术分析通过RT-PCR方法获得的DNA序列,消除了序列比对。通过提出的方法,可以从大型数据库中识别病毒序列:来自24个病毒家族和SARSCov-2的347,363个病毒DNA序列。使用随机森林(100棵树和30%重叠)对所有24个病毒家族和SARS-Cov-2(多类情况)进行的实验得出的敏感性为0.822222±0.05613,特异性为0.99974±0.00001。当我们将SARS-Cov-2与具有类似症状的病毒家族进行比较时,使用MLP分类器的敏感性为0.97059±0.03387,特异性为0.99187±0.00046,重叠率为30%。在真实的测试场景中,将SARS-Cov-2与冠状病毒科和健康的人类DNA序列进行比较,我们在MLP和50%重叠时的灵敏度为0.98824±001198,特异性为0.99860±0.00020。因此,通过结合RT-PCR和我们的拟卷积方法可以更快地以更高的特异性和灵敏度识别SARS-Cov-2 DNA序列,从而优化Covid-19的分子诊断。
更新日期:2020-09-29
down
wechat
bug