当前位置: X-MOL 学术ACS Synth. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Synthesis Success Calculator: Predicting the Rapid Synthesis of DNA Fragments with Machine Learning.
ACS Synthetic Biology ( IF 3.7 ) Pub Date : 2020-06-19 , DOI: 10.1021/acssynbio.9b00460
Sean M. Halper , Ayaan Hossain , Howard M. Salis

The synthesis and assembly of long DNA fragments has greatly accelerated synthetic biology and biotechnology research. However, long turnaround times or synthesis failures create unpredictable bottlenecks in the design–build–test–learn cycle. We developed a machine learning model, called the Synthesis Success Calculator, to predict whether a long DNA fragment can be readily synthesized with a short turnaround time. The model also identifies the sequence determinants associated with the synthesis outcome. We trained a random forest classifier using biophysical features and a compiled data set of 1076 DNA fragment sequences to achieve high predictive performance (F1 score of 0.928 on 251 unseen sequences). Feature importance analysis revealed that repetitive DNA sequences were the most important contributor to synthesis failures. We then applied the Synthesis Success Calculator across large sequence data sets and found that 84.9% of the Escherichia coli MG1655 genome, but only 34.4% of sampled plasmids in NCBI, could be readily synthesized. Overall, the Synthesis Success Calculator can be applied on its own to prevent synthesis failures or embedded within optimization algorithms to design large genetic systems that can be rapidly synthesized and assembled.

中文翻译:

合成成功计算器:通过机器学习预测DNA片段的快速合成。

长DNA片段的合成和组装极大地加速了合成生物学和生物技术研究。但是,较长的周转时间或综合失败会在设计-构建-测试-学习周期中造成无法预测的瓶颈。我们开发了一种称为“合成成功计算器”的机器学习模型,以预测是否可以在较短的周转时间内轻松合成长的DNA片段。该模型还识别与合成结果相关的序列决定簇。我们使用生物物理特征和1076个DNA片段序列的汇编数据集训练了随机森林分类器,以实现较高的预测性能(F 1251个未见序列的得分为0.928)。特征重要性分析表明,重复的DNA序列是合成失败的最重要原因。然后,我们将合成成功计算器应用于大序列数据集,发现可以轻松合成84.9%的大肠杆菌MG1655基因组,但只有34.4%的NCBI采样质粒可以合成。总体而言,综合成功计算器可以单独使用以防止综合失败,也可以嵌入优化算法中以设计可以快速合成和组装的大型遗传系统。
更新日期:2020-07-17
down
wechat
bug