In silico, in vitro, and in vivo machine learning in synthetic biology and metabolic engineering

doi:10.1016/j.cbpa.2021.06.002

Current Opinion in Chemical Biology

Volume 65, December 2021, Pages 85-92

https://doi.org/10.1016/j.cbpa.2021.06.002 Get rights and content

Abstract

Among the main learning methods reviewed in this study and used in synthetic biology and metabolic engineering are supervised learning, reinforcement and active learning, and in vitro or in vivo learning.

In the context of biosynthesis, supervised machine learning is being exploited to predict biological sequence activities, predict structures and engineer sequences, and optimize culture conditions.

Active and reinforcement learning methods use training sets acquired through an iterative process generally involving experimental measurements. They are applied to design, engineer, and optimize metabolic pathways and bioprocesses.

The nascent but promising developments with in vitro and in vivo learning comprise molecular circuits performing simple tasks such as pattern recognition and classification.

Introduction

We have seen in the past few years a growing interest in using machine learning for chemistry and biology, synthetic biology and metabolic engineering making no exception to this trend [1]. This study reviews three main techniques used when engineering biological systems. In section 2, we present an overview of supervised and semisupervised machine learning techniques, providing examples on searching for promiscuous enzyme activities. In section 3, we discuss active learning (AL) and reinforcement learning (RL) methods, which are generally based on supervised learning, with training sets acquired on the fly in an iterative process. These methods are particularly amendable to the design-build-test-learn synthetic biology cycle. Examples are provided in the context of predicting enzymatic activities, optimizing metabolic pathways, and performing retro-biosynthesis. Engineering information processing devices in living systems is a long-standing venture of synthetic biology. Yet, the problem of engineering devices that perform basic operations found in machine learning remains largely unexplored. Section 4 presents attempts to construct in vitro and in vivo perceptrons which are the basic units of all artificial neural networks.

Section snippets

Supervised and semisupervised learning

Supervised learning is one of the main machine learning methods that is being used in biology and in particular in bioinformatics where it has been extensively developed [2]. Focusing on biosynthesis, and to name a few, supervised learning enables one to predict enzyme activities [∗3, 4, ∗5, 6], to propose protein structures [7], to engineer sequences (DNA, RNA, protein) [8, 9, 10, 11], to complete metabolome [12], to optimize culture conditions [13], and to perform more unexpected tasks like

Active learning and reinforcement learning

AL is a special case of supervised machine learning, where a learner (any learning algorithm mentioned in the previous section) can interactively query an oracle (a human, a robot, a computer simulation) to ask new data points to be labeled [21]. The process is iterative, and the training set is acquired and growing on the fly. Because the learner chooses the examples to be labeled, the number of examples can be made lower than the number required in normal supervised learning while maintaining

In vitro and in vivo learning

In all the applications we have seen so far, learning is performed in silico. In this section, we are interested in performing learning in vitro or in vivo; the main challenge is therefore to be able to construct molecular devices processing information the same way as the basic blocks of machine learning programs. Two main goals motivate this innovative learning approach. The first, rather theoretical, is to probe to which extent cellular networks can be engineered to learn. The second, more

Conclusion and perspectives

The use of machine learning in biology will continue to grow. In fact, a search on bioRxiv with the key words ‘deep learning’ returns about 450 articles deposited each month for the last year and that number nearly doubled between march 2020 (370) and march 2021 (682). However, the number of published articles actually prompting design of experiments and new experimental finding is much smaller. That number will undoubtfully increase as machine learning techniques are being interfaced with

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

J-L.F. would like to acknowledge funding provided by the ANR funding agency, grant numbers ANR-15-CE21-0008, ANR-17-CE07-0046, and ANR-18-CE44-0015. L.F. is supported by INRAE's MICA department and INRAE's metaprogram BIOLPREDICT.

References (46)

P. Carbonell et al.
Opportunities at the intersection of synthetic biology, machine learning, and automation
ACS Synth Biol
(Jul. 2019)
P. Larranaga
Machine learning in bioinformatics
Briefings Bioinf
(Mar. 2006)
J.-L. Faulon et al.
“Genome scale enzyme–metabolite and drug–target interaction predictions using the signature molecular descriptor
Bioinformatics
(2008)
Y. Li
DEEPre: sequence-based enzyme EC number prediction by deep learning
Bioinformatics
(Mar. 2018)
J.Y. Ryu et al.
Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers
Proc Natl Acad Sci Unit States Am
(Jul. 2019)
A. Sureyya Rifaioglu et al.
DEEPred: automated protein function prediction with multi-task feed-forward deep neural networks
Sci Rep
(Dec. 2019)
A.W. Senior
Improved protein structure prediction using potentials from deep learning
Nature
(Jan. 2020)
Y. Wang et al.
Synthetic promoter design in Escherichia coli based on generative adversarial network
Bioinformatics
(Feb. 2019)
J.A. Valeri
Sequence-to-function deep learning frameworks for engineered riboregulators
Nat Commun
(Dec. 2020)
N.M. Angenent-Mari et al.
A deep learning approach to programmable RNA switches
Nat Commun
(Dec. 2020)

J. Wang et al.

Computational protein design with deep learning neural networks

Sci Rep

(Dec. 2018)

A. Zelezniak

Machine learning predicts the yeast metabolome from the quantitative proteome of kinase knockouts

Cell Syst

(Sep. 2018)

W. Peng

The artificial neural network approach based on uniform design to optimize the fed-batch fermentation condition: application to the production of iturin A

Microb Cell Factories

(Apr. 2014)

A.A.K. Nielsen et al.

Deep learning to predict the lab-of-origin of engineered DNA

Nat Commun

(Aug. 2018)

P. Chen et al.

LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone

BMC Bioinf

(2014)

J. Mellor et al.

Semisupervised Gaussian process for automated enzyme search

ACS Synth Biol

(2016)

S. Martin et al.

Predicting protein-protein interactions using signature products

Bioinformatics

(Jan. 2005)

H. Yabuuchi

“Analysis of multiple compound–protein interactions reveals novel bioactive molecules

Mol Syst Biol

(Mar. 2011)

P. Carbonell et al.

Molecular signatures-based prediction of enzyme promiscuity

Bioinformatics

(Aug. 2010)

L. Käll et al.

Semi-supervised learning for peptide identification from shotgun proteomics datasets

Nat Methods

(Nov. 2007)

D.A. Cohn et al.

Active learning with statistical models

jair

(Mar. 1996)

D.A. Pertusi et al.

Predicting novel substrates for enzymes with minimal experimental effort with active learning

Metab Eng

(Nov. 2017)

J. Nielsen et al.

Engineering cellular metabolism

Cell

(Mar. 2016)

Cited by (23)

Machine learning in bioprocess development: from promise to practice
2023, Trends in Biotechnology
Citation Excerpt :
Additionally, full process data is often only available for production runs within specification since a predictably failing run would mean high loss of resources. This results in an imbalance of datasets available for ML [141,142]. Hence, knowledge transfer models seem realistic only as a company-internal project because of very likely nondisclosure of corresponding, valuable data.
Fostered by novel analytical techniques, digitalization, and automation, modern bioprocess development provides large amounts of heterogeneous experimental data, containing valuable process information. In this context, data-driven methods like machine learning (ML) approaches have great potential to rationally explore large design spaces while exploiting experimental facilities most efficiently. Herein we demonstrate how ML methods have been applied so far in bioprocess development, especially in strain engineering and selection, bioprocess optimization, scale-up, monitoring, and control of bioprocesses. For each topic, we will highlight successful application cases, current challenges, and point out domains that can potentially benefit from technology transfer and further progress in the field of ML.
Machine learning in fermentative biohydrogen production: Advantages, challenges, and applications
2023, Bioresource Technology
Citation Excerpt :
Therefore, for process comprehension, design and control, monitoring, and prediction, domain experts are increasingly adopting data-driven techniques(Gunther et al., 2009; Martagan et al., 2018; Mercier et al., 2013; Teixeira et al., 2009). According to Faulon and Faure (2021), synthetic biology and metabolic engineering are two fields where supervised learning, reinforcement, active learning, and in vitro or in vivo learning are researched and applied. In biosynthesis, RL organizes and designs sequences, predicts biological sequence activities, and optimizes culture conditions.
Hydrogen can be produced in an environmentally friendly manner through biological processes using a variety of organic waste and biomass as feedstock. However, the complexity of biological processes limits their predictability and reliability, which hinders the scale-up and dissemination. This article reviews contemporary research and perspectives on the application of machine learning in biohydrogen production technology. Several machine learning algorithems have recently been implemented for modeling the nonlinear and complex relationships among operational and performance parameters in biohydrogen production as well as predicting the process performance and microbial population dynamics. Reinforced machine learning methods exhibited precise state prediction and retrieved the underlying kinetics effectively. Machine-learning based prediction was also improved by using microbial sequencing data as input parameters. Further research on machine learning could be instrumental in designing a process control tool to maintain reliable hydrogen production performance and identify connection between the process performance and the microbial population.
Iterative design of training data to control intricate enzymatic reaction networks
2024, Nature Communications
Enhancing chemical synthesis: a two-stage deep neural network for predicting feasible reaction conditions
2024, Journal of Cheminformatics
Bioprocessing 4.0 in biomanufacturing: paving the way for sustainable bioeconomy
2024, Systems Microbiology and Biomanufacturing
Machine Learning for Biological Design
2024, Methods in Molecular Biology

View all citing articles on Scopus

^a: www.jfaulon.com.

View full text

In silico, in vitro, and in vivo machine learning in synthetic biology and metabolic engineering

Abstract

Introduction

Section snippets

Supervised and semisupervised learning

Active learning and reinforcement learning

In vitro and in vivo learning

Conclusion and perspectives

Declaration of Competing Interest

Acknowledgements

Opportunities at the intersection of synthetic biology, machine learning, and automation

ACS Synth Biol

Machine learning in bioinformatics

Briefings Bioinf

“Genome scale enzyme–metabolite and drug–target interaction predictions using the signature molecular descriptor

Bioinformatics

DEEPre: sequence-based enzyme EC number prediction by deep learning

Bioinformatics

Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers

Proc Natl Acad Sci Unit States Am

DEEPred: automated protein function prediction with multi-task feed-forward deep neural networks

Sci Rep

Improved protein structure prediction using potentials from deep learning

Nature

Synthetic promoter design in Escherichia coli based on generative adversarial network

Bioinformatics

Sequence-to-function deep learning frameworks for engineered riboregulators

Nat Commun

A deep learning approach to programmable RNA switches

Nat Commun

Computational protein design with deep learning neural networks

Sci Rep

Machine learning predicts the yeast metabolome from the quantitative proteome of kinase knockouts

Cell Syst

The artificial neural network approach based on uniform design to optimize the fed-batch fermentation condition: application to the production of iturin A

Microb Cell Factories

Deep learning to predict the lab-of-origin of engineered DNA

Nat Commun

LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone

BMC Bioinf

Semisupervised Gaussian process for automated enzyme search

ACS Synth Biol

Predicting protein-protein interactions using signature products

Bioinformatics

“Analysis of multiple compound–protein interactions reveals novel bioactive molecules

Mol Syst Biol

Molecular signatures-based prediction of enzyme promiscuity

Bioinformatics

Semi-supervised learning for peptide identification from shotgun proteomics datasets

Nat Methods

Active learning with statistical models

jair

Predicting novel substrates for enzymes with minimal experimental effort with active learning

Metab Eng

Engineering cellular metabolism

Cell