当前位置: X-MOL 学术J. Royal Soc. Interface › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Statistical modelling of bacterial promoter sequences for regulatory motif discovery with the help of transcriptome data: application to Listeria monocytogenes
Journal of The Royal Society Interface ( IF 3.7 ) Pub Date : 2020-10-01 , DOI: 10.1098/rsif.2020.0600
Ibrahim Sultan 1 , Vincent Fromion 1 , Sophie Schbath 1 , Pierre Nicolas 1
Affiliation  

Automatic de novo identification of the main regulons of a bacterium from genome and transcriptome data remains a challenge. To address this task, we propose a statistical model that can use information on exact positions of the transcription start sites and condition-dependent expression profiles. The central idea of this model is to improve the probabilistic representation of the promoter DNA sequences by incorporating covariates summarizing expression profiles (e.g. coordinates in projection spaces or hierarchical clustering trees). A dedicated trans-dimensional Markov chain Monte Carlo algorithm adjusts the width and palindromic properties of the corresponding position-weight matrices, the number of parameters to describe exact position relative to the transcription start site, and chooses the expression covariates relevant for each motif. All parameters are estimated simultaneously, for many motifs and many expression covariates. The method is applied to a dataset of transcription start sites and expression profiles available for Listeria monocytogenes. The results validate the approach and provide a new global view of the transcription regulatory network of this important pathogen. Remarkably, a previously unreported motif is found in promoter regions of ribosomal protein genes, suggesting a role in the regulation of growth.

中文翻译:


借助转录组数据对细菌启动子序列进行统计建模,以发现调控基序:在单核细胞增生李斯特菌中的应用



从基因组和转录组数据中自动从头识别细菌的主要调节子仍然是一个挑战。为了解决这个任务,我们提出了一个统计模型,可以使用转录起始位点的确切位置和条件依赖性表达谱的信息。该模型的中心思想是通过合并总结表达谱的协变量(例如投影空间中的坐标或分层聚类树)来改进启动子 DNA 序列的概率表示。专用的跨维马尔可夫链蒙特卡罗算法调整相应位置权重矩阵的宽度和回文属性、描述相对于转录起始位点的确切位置的参数数量,并选择与每个基序相关的表达协变量。对于许多基序和许多表达协变量,所有参数都是同时估计的。该方法应用于可用于单核细胞增生李斯特氏菌的转录起始位点和表达谱的数据集。结果验证了该方法,并为这种重要病原体的转录调控网络提供了新的全局视图。值得注意的是,在核糖体蛋白基因的启动子区域中发现了一个以前未报道的基序,表明其在生长调节中的作用。
更新日期:2020-10-01
down
wechat
bug