当前位置: X-MOL 学术ACM Trans. Knowl. Discov. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Motif discovery in physiological datasets
ACM Transactions on Knowledge Discovery from Data ( IF 3.6 ) Pub Date : 2010-01-12 , DOI: 10.1145/1644873.1644875
Zeeshan Syed 1 , Collin Stultz , Manolis Kellis , Piotr Indyk , John Guttag
Affiliation  

In this article, we propose a methodology for identifying predictive physiological patterns in the absence of prior knowledge. We use the principle of conservation to identify activity that consistently precedes an outcome in patients, and describe a two-stage process that allows us to efficiently search for such patterns in large datasets. This involves first transforming continuous physiological signals from patients into symbolic sequences, and then searching for patterns in these reduced representations that are strongly associated with an outcome. Our strategy of identifying conserved activity that is unlikely to have occurred purely by chance in symbolic data is analogous to the discovery of regulatory motifs in genomic datasets. We build upon existing work in this area, generalizing the notion of a regulatory motif and enhancing current techniques to operate robustly on non-genomic data. We also address two significant considerations associated with motif discovery in general: computational efficiency and robustness in the presence of degeneracy and noise. To deal with these issues, we introduce the concept of active regions and new subset-based techniques such as a two-layer Gibbs sampling algorithm. These extensions allow for a framework for information inference, where precursors are identified as approximately conserved activity of arbitrary complexity preceding multiple occurrences of an event. We evaluated our solution on a population of patients who experienced sudden cardiac death and attempted to discover electrocardiographic activity that may be associated with the endpoint of death. To assess the predictive patterns discovered, we compared likelihood scores for motifs in the sudden death population against control populations of normal individuals and those with non-fatal supraventricular arrhythmias. Our results suggest that predictive motif discovery may be able to identify clinically relevant information even in the absence of significant prior knowledge.

中文翻译:

生理数据集中的基序发现

在本文中,我们提出了一种在没有先验知识的情况下识别预测生理模式的方法。我们使用守恒原理来识别始终先于患者出现结果的活动,并描述了一个两阶段过程,该过程使我们能够在大型数据集中有效地搜索此类模式。这涉及首先将来自患者的连续生理信号转换为符号序列,然后在这些简化的表示中搜索与结果密切相关的模式。我们识别在符号数据中不太可能纯粹偶然发生的保守活动的策略类似于在基因组数据集中发现调控基序。我们以该领域的现有工作为基础,概括调节基序的概念并增强当前技术以在非基因组数据上稳健运行。我们还解决了与基序发现相关的两个重要考虑因素:计算效率和存在退化和噪声时的鲁棒性。为了解决这些问题,我们引入了活动区域的概念和新的基于子集的技术,例如两层吉布斯采样算法。这些扩展允许一个信息推理框架,其中前体被识别为在事件多次发生之前具有任意复杂性的近似保守活动。我们针对经历过心源性猝死的患者群体评估了我们的解决方案,并试图发现可能与死亡终点相关的心电图活动。为了评估发现的预测模式,我们将猝死人群中基序的似然评分与正常个体和非致命性室上性心律失常的对照人群进行了比较。我们的结果表明,即使在没有重要的先验知识的情况下,预测性基序发现也可能能够识别临床相关信息。
更新日期:2010-01-12
down
wechat
bug