当前位置: X-MOL 学术Stat. Interface › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
motifDiverge: a model for assessing the statistical significance of gene regulatory motif divergence between two DNA sequences
Statistics and Its Interface ( IF 0.3 ) Pub Date : 2015-01-01 , DOI: 10.4310/sii.2015.v8.n4.a6
Dennis Kostka 1 , Tara Friedrich 2 , Alisha K Holloway 3 , Katherine S Pollard 4
Affiliation  

Next-generation sequencing technology enables the identification of thousands of gene regulatory sequences in many cell types and organisms. We consider the problem of testing if two such sequences differ in their number of binding site motifs for a given transcription factor (TF) protein. Binding site motifs impart regulatory function by providing TFs the opportunity to bind to genomic elements and thereby affect the expression of nearby genes. Evolutionary changes to such functional DNA are hypothesized to be major contributors to phenotypic diversity within and between species; but despite the importance of TF motifs for gene expression, no method exists to test for motif loss or gain. Assuming that motif counts are Binomially distributed, and allowing for dependencies between motif instances in evolutionarily related sequences, we derive the probability mass function of the difference in motif counts between two nucleotide sequences. We provide a method to numerically estimate this distribution from genomic data and show through simulations that our estimator is accurate. Finally, we introduce the R package motifDiverge that implements our methodology and illustrate its application to gene regulatory enhancers identified by a mouse developmental time course experiment. While this study was motivated by analysis of regulatory motifs, our results can be applied to any problem involving two correlated Bernoulli trials.

中文翻译:


基序分歧:用于评估两个 DNA 序列之间基因调控基序分歧的统计显着性的模型



新一代测序技术能够识别许多细胞类型和生物体中的数千个基因调控序列。我们考虑测试两个这样的序列对于给定转录因子(TF)蛋白的结合位点基序数量是否不同的问题。结合位点基序通过为 TF 提供与基因组元件结合的机会来赋予调节功能,从而影响附近基因的表达。据推测,这种功能性 DNA 的进化变化是物种内部和物种之间表型多样性的主要贡献者。尽管 TF 基序对于基因表达很重要,但尚无方法可以测试基序的丢失或增加。假设基序计数呈二项式分布,并考虑到进化相关序列中基序实例之间的依赖性,我们推导出两个核苷酸序列之间基序计数差异的概率质量函数。我们提供了一种方法来根据基因组数据对这种分布进行数值估计,并通过模拟表明我们的估计器是准确的。最后,我们介绍了 R 包 MotifDivege,它实现了我们的方法,并说明了其在通过小鼠发育时间过程实验鉴定的基因调控增强子中的应用。虽然这项研究的动机是对监管主题的分析,但我们的结果可以应用于涉及两个相关伯努利试验的任何问题。
更新日期:2015-01-01
down
wechat
bug