当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning.
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2020-06-01 , DOI: 10.1186/s12859-020-3531-9
Ali Haisam Muhammad Rafid 1, 2 , Md Toufikuzzaman 1 , Mohammad Saifur Rahman 1 , M Sohel Rahman 1
Affiliation  

The latest works on CRISPR genome editing tools mainly employs deep learning techniques. However, deep learning models lack explainability and they are harder to reproduce. We were motivated to build an accurate genome editing tool using sequence-based features and traditional machine learning that can compete with deep learning models. In this paper, we present CRISPRpred(SEQ), a method for sgRNA on-target activity prediction that leverages only traditional machine learning techniques and hand-crafted features extracted from sgRNA sequences. We compare the results of CRISPRpred(SEQ) with that of DeepCRISPR, the current state-of-the-art, which uses a deep learning pipeline. Despite using only traditional machine learning methods, we have been able to beat DeepCRISPR for the three out of four cell lines in the benchmark dataset convincingly (2.174%, 6.905% and 8.119% improvement for the three cell lines). CRISPRpred(SEQ) has been able to convincingly beat DeepCRISPR in 3 out of 4 cell lines. We believe that by exploring further, one can design better features only using the sgRNA sequences and can come up with a better method leveraging only traditional machine learning algorithms that can fully beat the deep learning models.

中文翻译:


CRISPRpred(SEQ):一种基于序列的 sgRNA 方法,使用传统机器学习预测目标活性。



CRISPR基因组编辑工具的最新成果主要采用了深度学习技术。然而,深度学习模型缺乏可解释性,并且更难以重现。我们的动机是使用基于序列的特征和可以与深度学习模型竞争的传统机器学习来构建精确的基因组编辑工具。在本文中,我们提出了 CRISPRpred(SEQ),这是一种 sgRNA 靶向活性预测方法,该方法仅利用传统的机器学习技术和从 sgRNA 序列中提取的手工特征。我们将 CRISPRpred(SEQ) 的结果与当前最先进的 DeepCRISPR 的结果进行比较,后者使用深度学习流程。尽管仅使用传统的机器学习方法,但我们已经能够在基准数据集中的四分之三的细胞系中令人信服地击败 DeepCRISPR(三种细胞系分别提高了 2.174%、6.905% 和 8.119%)。 CRISPRpred(SEQ) 已能够在四分之三的细胞系中令人信服地击败 DeepCRISPR。我们相信,通过进一步探索,人们可以仅使用 sgRNA 序列设计出更好的特征,并可以提出一种仅利用传统机器学习算法来完全击败深度学习模型的更好方法。
更新日期:2020-06-01
down
wechat
bug