Assessing the Impact of Data Preprocessing on Analyzing Next Generation Sequencing Data,Frontiers in Bioengineering and Biotechnology

当前位置： X-MOL 学术 › Front. Bioeng. Biotech. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Assessing the Impact of Data Preprocessing on Analyzing Next Generation Sequencing Data
Frontiers in Bioengineering and Biotechnology ( IF 5.7 ) Pub Date : 2020-07-30 , DOI: 10.3389/fbioe.2020.00817
Binsheng He ₁ , Rongrong Zhu ₂ , Huandong Yang ₃ , Qingqing Lu ₄ , Weiwei Wang ₄ , Lei Song ₄ , Xue Sun ₄ , Guandong Zhang ₄ , Shijun Li ₅ , Jialiang Yang _{1,

4} , Geng Tian ₄ , Pingping Bing ₁ , Jidong Lang ₄

Affiliation

Data quality control and preprocessing are often the first step in processing next-generation sequencing (NGS) data of tumors. Not only can it help us evaluate the quality of sequencing data, but it can also help us obtain high-quality data for downstream data analysis. However, by comparing data analysis results of preprocessing with Cutadapt, FastP, Trimmomatic, and raw sequencing data, we found that the frequency of mutation detection had some fluctuations and differences, and human leukocyte antigen (HLA) typing directly resulted in erroneous results. We think that our research had demonstrated the impact of data preprocessing steps on downstream data analysis results. We hope that it can promote the development or optimization of better data preprocessing methods, so that downstream information analysis can be more accurate.

中文翻译：

评估数据预处理对分析下一代测序数据的影响

数据质量控制和预处理通常是处理肿瘤下一代测序 (NGS) 数据的第一步。它不仅可以帮助我们评估测序数据的质量，还可以帮助我们获得高质量的数据进行下游数据分析。然而，通过将预处理的数据分析结果与 Cutadapt、FastP、Trimmomatic 和原始测序数据进行比较，我们发现突变检测的频率存在一些波动和差异，人类白细胞抗原（HLA）分型直接导致错误结果。我们认为我们的研究已经证明了数据预处理步骤对下游数据分析结果的影响。我们希望它能促进更好的数据预处理方法的发展或优化，让下游的信息分析更准确。

更新日期：2020-07-30

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>