In silico, in vitro, and in vivo machine learning in synthetic biology and metabolic engineering
Introduction
We have seen in the past few years a growing interest in using machine learning for chemistry and biology, synthetic biology and metabolic engineering making no exception to this trend [1]. This study reviews three main techniques used when engineering biological systems. In section 2, we present an overview of supervised and semisupervised machine learning techniques, providing examples on searching for promiscuous enzyme activities. In section 3, we discuss active learning (AL) and reinforcement learning (RL) methods, which are generally based on supervised learning, with training sets acquired on the fly in an iterative process. These methods are particularly amendable to the design-build-test-learn synthetic biology cycle. Examples are provided in the context of predicting enzymatic activities, optimizing metabolic pathways, and performing retro-biosynthesis. Engineering information processing devices in living systems is a long-standing venture of synthetic biology. Yet, the problem of engineering devices that perform basic operations found in machine learning remains largely unexplored. Section 4 presents attempts to construct in vitro and in vivo perceptrons which are the basic units of all artificial neural networks.
Section snippets
Supervised and semisupervised learning
Supervised learning is one of the main machine learning methods that is being used in biology and in particular in bioinformatics where it has been extensively developed [2]. Focusing on biosynthesis, and to name a few, supervised learning enables one to predict enzyme activities [∗3, 4, ∗5, 6], to propose protein structures [7], to engineer sequences (DNA, RNA, protein) [8, 9, 10, 11], to complete metabolome [12], to optimize culture conditions [13], and to perform more unexpected tasks like
Active learning and reinforcement learning
AL is a special case of supervised machine learning, where a learner (any learning algorithm mentioned in the previous section) can interactively query an oracle (a human, a robot, a computer simulation) to ask new data points to be labeled [21]. The process is iterative, and the training set is acquired and growing on the fly. Because the learner chooses the examples to be labeled, the number of examples can be made lower than the number required in normal supervised learning while maintaining
In vitro and in vivo learning
In all the applications we have seen so far, learning is performed in silico. In this section, we are interested in performing learning in vitro or in vivo; the main challenge is therefore to be able to construct molecular devices processing information the same way as the basic blocks of machine learning programs. Two main goals motivate this innovative learning approach. The first, rather theoretical, is to probe to which extent cellular networks can be engineered to learn. The second, more
Conclusion and perspectives
The use of machine learning in biology will continue to grow. In fact, a search on bioRxiv with the key words ‘deep learning’ returns about 450 articles deposited each month for the last year and that number nearly doubled between march 2020 (370) and march 2021 (682). However, the number of published articles actually prompting design of experiments and new experimental finding is much smaller. That number will undoubtfully increase as machine learning techniques are being interfaced with
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
J-L.F. would like to acknowledge funding provided by the ANR funding agency, grant numbers ANR-15-CE21-0008, ANR-17-CE07-0046, and ANR-18-CE44-0015. L.F. is supported by INRAE's MICA department and INRAE's metaprogram BIOLPREDICT.
References (46)
- et al.
Opportunities at the intersection of synthetic biology, machine learning, and automation
ACS Synth Biol
(Jul. 2019) Machine learning in bioinformatics
Briefings Bioinf
(Mar. 2006)- et al.
“Genome scale enzyme–metabolite and drug–target interaction predictions using the signature molecular descriptor
Bioinformatics
(2008) DEEPre: sequence-based enzyme EC number prediction by deep learning
Bioinformatics
(Mar. 2018)- et al.
Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers
Proc Natl Acad Sci Unit States Am
(Jul. 2019) - et al.
DEEPred: automated protein function prediction with multi-task feed-forward deep neural networks
Sci Rep
(Dec. 2019) Improved protein structure prediction using potentials from deep learning
Nature
(Jan. 2020)- et al.
Synthetic promoter design in Escherichia coli based on generative adversarial network
Bioinformatics
(Feb. 2019) Sequence-to-function deep learning frameworks for engineered riboregulators
Nat Commun
(Dec. 2020)- et al.
A deep learning approach to programmable RNA switches
Nat Commun
(Dec. 2020)
Computational protein design with deep learning neural networks
Sci Rep
Machine learning predicts the yeast metabolome from the quantitative proteome of kinase knockouts
Cell Syst
The artificial neural network approach based on uniform design to optimize the fed-batch fermentation condition: application to the production of iturin A
Microb Cell Factories
Deep learning to predict the lab-of-origin of engineered DNA
Nat Commun
LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone
BMC Bioinf
Semisupervised Gaussian process for automated enzyme search
ACS Synth Biol
Predicting protein-protein interactions using signature products
Bioinformatics
“Analysis of multiple compound–protein interactions reveals novel bioactive molecules
Mol Syst Biol
Molecular signatures-based prediction of enzyme promiscuity
Bioinformatics
Semi-supervised learning for peptide identification from shotgun proteomics datasets
Nat Methods
Active learning with statistical models
jair
Predicting novel substrates for enzymes with minimal experimental effort with active learning
Metab Eng
Engineering cellular metabolism
Cell
Cited by (23)
Machine learning in bioprocess development: from promise to practice
2023, Trends in BiotechnologyCitation Excerpt :Additionally, full process data is often only available for production runs within specification since a predictably failing run would mean high loss of resources. This results in an imbalance of datasets available for ML [141,142]. Hence, knowledge transfer models seem realistic only as a company-internal project because of very likely nondisclosure of corresponding, valuable data.
Machine learning in fermentative biohydrogen production: Advantages, challenges, and applications
2023, Bioresource TechnologyCitation Excerpt :Therefore, for process comprehension, design and control, monitoring, and prediction, domain experts are increasingly adopting data-driven techniques(Gunther et al., 2009; Martagan et al., 2018; Mercier et al., 2013; Teixeira et al., 2009). According to Faulon and Faure (2021), synthetic biology and metabolic engineering are two fields where supervised learning, reinforcement, active learning, and in vitro or in vivo learning are researched and applied. In biosynthesis, RL organizes and designs sequences, predicts biological sequence activities, and optimizes culture conditions.
Iterative design of training data to control intricate enzymatic reaction networks
2024, Nature CommunicationsEnhancing chemical synthesis: a two-stage deep neural network for predicting feasible reaction conditions
2024, Journal of CheminformaticsBioprocessing 4.0 in biomanufacturing: paving the way for sustainable bioeconomy
2024, Systems Microbiology and BiomanufacturingMachine Learning for Biological Design
2024, Methods in Molecular Biology