当前位置: X-MOL 学术RNA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A systematic evaluation of bioinformatics tools for identification of long noncoding RNAs
RNA ( IF 4.2 ) Pub Date : 2020-10-14 , DOI: 10.1261/rna.074724.120
You Duan 1, 2 , Wanting Zhang 1, 3 , Yingyin Cheng 1, 3 , Mijuan Shi 1, 3 , Xiao-Qin Xia 1, 2, 3
Affiliation  

High-throughput RNA sequencing unveiled the complexity of transcriptome and significantly increased the records of long noncoding RNAs (lncRNAs) which were reported to participate in a variety of biological processes. Identification of lncRNAs is a key step of lncRNA analysis, and a bunch of bioinformatics tools have been developed for this purpose in recent years. While these tools allow us to identify lncRNA more efficiently and accurately, they may produce inconsistent results, making selection a confusing issue. We compared the performance of 41 analysis models based on 14 software packages and different datasets, including high-quality data and low-quality data from 33 species. In addition, computational efficiency, robustness, and joint prediction of the models were explored. As a practical guidance, key points for lncRNA identification under different situations were summarized. In this investigation, no one of these models could be superior to others under all test conditions. The performance of a model relied to a great extent on the source of transcripts and the quality of assemblies. As general references, FEELnc_all_cl, CPC and CPAT_mouse work well in most species while COME, CNCI, and lncScore are good choices for model organisms. Since these tools are sensitive to different factors such as the species involved and the quality of assembly, researchers must carefully select the appropriate tool based on the actual data. Alternatively, our test suggests that joint prediction could behave better than any single model if proper models were chosen. All scripts/data used in this research can be accessed at http://bioinfo.ihb.ac.cn/elit.

中文翻译:


用于识别长非编码 RNA 的生物信息学工具的系统评估



高通量RNA测序揭示了转录组的复杂性,并显着增加了据报道参与多种生物过程的长非编码RNA(lncRNA)的记录。 lncRNA的鉴定是lncRNA分析的关键步骤,近年来为此开发了一系列生物信息学工具。虽然这些工具使我们能够更有效、更准确地识别 lncRNA,但它们可能会产生不一致的结果,使选择成为一个令人困惑的问题。我们比较了基于 14 个软件包和不同数据集的 41 个分析模型的性能,包括来自 33 个物种的高质量数据和低质量数据。此外,还探讨了模型的计算效率、鲁棒性和联合预测。总结了不同情况下lncRNA识别的要点,作为实践指导。在这项调查中,在所有测试条件下,这些模型中没有一个模型能够优于其他模型。模型的性能在很大程度上取决于转录本的来源和组装的质量。作为一般参考,FEELnc_all_cl、CPC 和 CPAT_mouse 在大多数物种中效果良好,而 COME、CNCI 和 lncScore 是模式生物的不错选择。由于这些工具对涉及的物种和组装质量等不同因素很敏感,因此研究人员必须根据实际数据仔细选择合适的工具。另外,我们的测试表明,如果选择适当的模型,联合预测的表现可能比任何单一模型更好。本研究中使用的所有脚本/数据均可访问http://bioinfo.ihb.ac.cn/elit。
更新日期:2020-10-14
down
wechat
bug