当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Variational auto-encoder based Bayesian Poisson tensor factorization for sparse and imbalanced count data
Data Mining and Knowledge Discovery ( IF 4.8 ) Pub Date : 2020-12-10 , DOI: 10.1007/s10618-020-00723-7
Yuan Jin , Ming Liu , Yunfeng Li , Ruohua Xu , Lan Du , Longxiang Gao , Yong Xiang

Non-negative tensor factorization models enable predictive analysis on count data. Among them, Bayesian Poisson–Gamma models can derive full posterior distributions of latent factors and are less sensitive to sparse count data. However, current inference methods for these Bayesian models adopt restricted update rules for the posterior parameters. They also fail to share the update information to better cope with the data sparsity. Moreover, these models are not endowed with a component that handles the imbalance in count data values. In this paper, we propose a novel variational auto-encoder framework called VAE-BPTF which addresses the above issues. It uses multi-layer perceptron networks to encode and share complex update information. The encoded information is then reweighted per data instance to penalize common data values before aggregated to compute the posterior parameters for the latent factors. Under synthetic data evaluation, VAE-BPTF tended to recover the right number of latent factors and posterior parameter values. It also outperformed current models in both reconstruction errors and latent factor (semantic) coherence across five real-world datasets. Furthermore, the latent factors inferred by VAE-BPTF are perceived to be meaningful and coherent under a qualitative analysis.



中文翻译:

基于变分自动编码器的贝叶斯泊松张量分解,用于稀疏和不平衡计数数据

非负张量分解模型可对计数数据进行预测分析。其中,贝叶斯Poisson-Gamma模型可以导出潜在因子的全部后验分布,并且对稀疏计数数据不那么敏感。但是,当前用于这些贝叶斯模型的推理方法对后验参数采用了受限的更新规则。他们也无法共享更新信息以更好地应对数据稀疏性。此外,这些模型没有配备处理计数数据值不平衡的组件。在本文中,我们提出了一种新颖的可变自动编码器框架,称为VAE-BPTF,可以解决上述问题。它使用多层感知器网络来编码和共享复杂的更新信息。然后,对每个数据实例重新对编码的信息进行加权,以惩罚公共数据值,然后再进行汇总以计算潜在因子的后验参数。在综合数据评估下,VAE-BPTF倾向于恢复正确数量的潜在因子和后验参数值。它在五个真实世界数据集的重建误差和潜在因子(语义)连贯性方面均优于当前模型。此外,在定性分析下,可以认为由VAE-BPTF推断出的潜在因素是有意义的和连贯的。它在五个真实世界数据集的重建误差和潜在因子(语义)连贯性方面均优于当前模型。此外,在定性分析下,可以认为由VAE-BPTF推断出的潜在因素是有意义的和连贯的。它在五个真实世界数据集的重建误差和潜在因子(语义)连贯性方面均优于当前模型。此外,在定性分析下,可以认为由VAE-BPTF推断出的潜在因素是有意义的和连贯的。

更新日期:2020-12-10
down
wechat
bug