当前位置: X-MOL 学术Entropy › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Inference and Learning in a Latent Variable Model for Beta Distributed Interval Data
Entropy ( IF 2.1 ) Pub Date : 2021-04-29 , DOI: 10.3390/e23050552
Hamid Mousavi 1 , Mareike Buhl 2 , Enrico Guiraud 1, 3 , Jakob Drefs 1 , Jörg Lücke 1
Affiliation  

Latent Variable Models (LVMs) are well established tools to accomplish a range of different data processing tasks. Applications exploit the ability of LVMs to identify latent data structure in order to improve data (e.g., through denoising) or to estimate the relation between latent causes and measurements in medical data. In the latter case, LVMs in the form of noisy-OR Bayes nets represent the standard approach to relate binary latents (which represent diseases) to binary observables (which represent symptoms). Bayes nets with binary representation for symptoms may be perceived as a coarse approximation, however. In practice, real disease symptoms can range from absent over mild and intermediate to very severe. Therefore, using diseases/symptoms relations as motivation, we here ask how standard noisy-OR Bayes nets can be generalized to incorporate continuous observables, e.g., variables that model symptom severity in an interval from healthy to pathological. This transition from binary to interval data poses a number of challenges including a transition from a Bernoulli to a Beta distribution to model symptom statistics. While noisy-OR-like approaches are constrained to model how causes determine the observables’ mean values, the use of Beta distributions additionally provides (and also requires) that the causes determine the observables’ variances. To meet the challenges emerging when generalizing from Bernoulli to Beta distributed observables, we investigate a novel LVM that uses a maximum non-linearity to model how the latents determine means and variances of the observables. Given the model and the goal of likelihood maximization, we then leverage recent theoretical results to derive an Expectation Maximization (EM) algorithm for the suggested LVM. We further show how variational EM can be used to efficiently scale the approach to large networks. Experimental results finally illustrate the efficacy of the proposed model using both synthetic and real data sets. Importantly, we show that the model produces reliable results in estimating causes using proofs of concepts and first tests based on real medical data and on images.

中文翻译:


Beta 分布区间数据的潜变量模型中的推理和学习



潜变量模型 (LVM) 是完善的工具,可完成一系列不同的数据处理任务。应用程序利用 LVM 识别潜在数据结构的能力,以改进数据(例如,通过去噪)或估计医疗数据中潜在原因和测量之间的关系。在后一种情况下,噪声或贝叶斯网络形式的 LVM 代表了将二元潜伏(代表疾病)与二元可观察量(代表症状)相关联的标准方法。然而,具有症状二进制表示的贝叶斯网络可能被视为粗略近似。实际上,真正的疾病症状可以从无症状到轻度、中度到非常严重。因此,使用疾病/症状关系作为动机,我们在这里询问如何将标准的噪声或贝叶斯网络推广到合并连续可观测值,例如,在从健康到病理的区间内模拟症状严重程度的变量。从二进制数据到区间数据的转变带来了许多挑战,包括从伯努利分布到 Beta 分布的转变以建模症状统计数据。虽然类似噪声或的方法仅限于对原因如何确定可观测值的平均值进行建模,但 Beta 分布的使用另外提供(并且还要求)原因确定可观测值的方差。为了应对从 Bernoulli 推广到 Beta 分布式可观测值时出现的挑战,我们研究了一种新颖的 LVM,它使用最大非线性来建模潜在变量如何确定可观测值的均值和方差。给定模型和似然最大化目标,我们然后利用最新的理论结果为建议的 LVM 导出期望最大化 (EM) 算法。 我们进一步展示了如何使用变分 EM 来有效地将方法扩展到大型网络。实验结果最终说明了使用合成数据集和真实数据集所提出的模型的有效性。重要的是,我们表明该模型使用概念证明和基于真实医学数据和图像的首次测试来估计原因,从而产生可靠的结果。
更新日期:2021-04-29
down
wechat
bug