Abstract
Transcription rates are regulated by the interactions between RNA polymerase, sigma factor, and promoter DNA sequences in bacteria. However, it remains unclear how non-canonical sequence motifs collectively control transcription rates. Here, we combined massively parallel assays, biophysics, and machine learning to develop a 346-parameter model that predicts site-specific transcription initiation rates for any σ70 promoter sequence, validated across 17396 bacterial promoters with diverse sequences. We applied the model to predict genetic context effects, design σ70 promoters with desired transcription rates, and identify undesired promoters inside engineered genetic systems. The model provides a biophysical basis for understanding gene regulation in natural genetic systems and precise transcriptional control for engineering synthetic genetic systems.
One-Sentence Summary A 346-parameter model predicted DNA’s interactions with RNA polymerase initiation complex, enabling accurate transcription rate predictions and automated promoter design in bacterial genetic systems.
Competing Interest Statement
HMS is a founder of De Novo DNA. TL and AH declare no competing interests.