Abstract
Non-negative tensor factorization models enable predictive analysis on count data. Among them, Bayesian Poisson–Gamma models can derive full posterior distributions of latent factors and are less sensitive to sparse count data. However, current inference methods for these Bayesian models adopt restricted update rules for the posterior parameters. They also fail to share the update information to better cope with the data sparsity. Moreover, these models are not endowed with a component that handles the imbalance in count data values. In this paper, we propose a novel variational auto-encoder framework called VAE-BPTF which addresses the above issues. It uses multi-layer perceptron networks to encode and share complex update information. The encoded information is then reweighted per data instance to penalize common data values before aggregated to compute the posterior parameters for the latent factors. Under synthetic data evaluation, VAE-BPTF tended to recover the right number of latent factors and posterior parameter values. It also outperformed current models in both reconstruction errors and latent factor (semantic) coherence across five real-world datasets. Furthermore, the latent factors inferred by VAE-BPTF are perceived to be meaningful and coherent under a qualitative analysis.
Similar content being viewed by others
Notes
For a more comprehensive review on this subject, we refer readers to Zhang et al. (2019).
For a more detailed mathematical description, we refer readers to Gopalan et al. (2015).
For simplicity, we omitted the activation functions and the bias terms in between.
The combinations include either softplus or sigmoid for \(\text {h}(\cdot )\), and either them or ReLU for \(\text {q}(\cdot )\).
The symbol \({\mathbb {1}}\) denotes a matrix of the same size as \({\varvec{Y}}\) and contains all ones.
For more details about the exact formulas of TE\((\epsilon _{uk};\alpha _{uk})\) and \(\text {R}\big (\text {log}(\frac{\epsilon _{uk}}{\alpha _{uk}}), \text {log}(\alpha _{uk})\big )\), and their derivation, we refer readers to the supplementary materials of Jankowiak and Obermeyer (2018).
We found that a small positive mean for the latter Normal distribution could stabilize the algorithm right after the initialization compared to a zero mean.
The prediction targets in this case are the ratings from the Amazon Prime video review dataset.
The embedding was done based on the entity IDs.
Three zero values per one non-zero values.
We show only the LLs of VAE-BPTF and BPTF as the other Poisson-based models are significantly inferior to them in this aspect.
The NPMI scoring uses a large Wikipedia dump hosted by Palmetto: http://palmetto.aksw.org.
The Game data and the Amazon rating data are not text data, and thus NPMI is not applicable.
We used the cmdscale function in R that implements the classical multi-dimensional scaling.
References
Ahn HJ (2008) A new similarity measure for collaborative filtering to alleviate the new user cold-starting problem. Inf Sci 178(1):37–51
Aletras N, Stevenson M (2013) Evaluating topic coherence using distributional semantics. In: Proceedings of the 10th international conference on computational semantics (IWCS 2013)–Long Papers, pp 13–22
Buntine WL, Mishra S (2014) Experiments with non-parametric topic models. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 881–890
Chi EC, Kolda TG (2012) On tensors, sparsity, and nonnegative factorizations. SIAM J Matrix Anal Appl 33(4):1272–1299
Cox TF, Cox MA (2000) Multidimensional scaling. Chapman and Hall/CRC, Boca Raton
Deng Z, Navarathna R, Carr P, Mandt S, Yue Y, Matthews I, Mori G (2017) Factorized variational autoencoders for modeling audience reactions to movies. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2577–2586
Figurnov M, Mohamed S, Mnih A (2018) Implicit reparameterization gradients. In: Advances in neural information processing systems, pp 441–452
Friedlander MP, Hatz K (2008) Computing non-negative tensor factorizations. Optim Methods Softw 23(4):631–647
Gopalan P, Hofman JM, Blei DM (2015) Scalable recommendation with hierarchical poisson factorization. In: Proceedings of the 31st conference on uncertainty in artificial intelligence, AUAI Press, UAI’15, pp 326–335
He X, Liao L, Zhang H, Nie L, Hu X, Chua TS (2017) Neural collaborative filtering. In: Proceedings of the 26th international conference on world wide web, pp 173–182
He X, Du X, Wang X, Tian F, Tang J, Chua TS (2018) Outer product-based neural collaborative filtering. In: Proceedings of the 27th international joint conference on artificial intelligence, AAAI Press, IJCAI’18, pp 2227–2233
Hidasi B, Karatzoglou A, Baltrunas L, Tikk D (2016) Session-based recommendations with recurrent neural networks. In: Proceedings of the 4th international conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, conference track proceedings
Hinrich JL, Nielsen SFV, Madsen KH, Mørup M (2018) Variational Bayesian partially observed non-negative tensor factorization. In: 2018 IEEE 28th international workshop on machine learning for signal processing (MLSP), pp 1–6. https://doi.org/10.1109/MLSP.2018.8516924
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Hu C, Rai P, Chen C, Harding M, Carin L (2015) Scalable Bayesian non-negative tensor factorization for massive count data. In: Machine learning and knowledge discovery in databases. Springer International Publishing, pp 53–70
Hu Y, Koren Y, Volinsky C (2008) Collaborative filtering for implicit feedback datasets. In: 2008 Eighth IEEE international conference on data mining, IEEE, pp 263–272
Jankowiak M, Obermeyer F (2018) Pathwise derivatives beyond the reparameterization trick. In: International conference on machine learning, pp 2240–2249
Kim D, Park C, Oh J, Lee S, Yu H (2016) Convolutional matrix factorization for document context-aware recommendation. In: Proceedings of the 10th ACM conference on recommender systems, ACM, RecSys ’16, pp 233–240
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: Proceedings of the 2nd international conference on learning representations (ICLR)
Knowles DA (2015) Stochastic gradient variational bayes for gamma approximating distributions. arXiv preprint arXiv:1509.01631
Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51(3):455–500
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25, Curran Associates, Inc., pp 1097–1105. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Li S, Kawale J, Fu Y (2015) Deep collaborative filtering via marginalized denoising auto-encoder. In: Proceedings of the 24th ACM international on conference on information and knowledge management, ACM, CIKM ’15, pp 811–820
Liu H, Li Y, Tsang M, Liu Y (2019) Costco: a neural tensor completion model for sparse tensors. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, Association for Computing Machinery, pp 324–334
Rashid AM, Karypis G, Riedl J (2008) Learning preferences of new users in recommender systems: an information theoretic approach. ACM SIGKDD Explor Newsl 10(2):90–100
Schein A, Paisley J, Blei DM, Wallach H (2015) Bayesian Poisson tensor factorization for inferring multilateral relations from sparse dyadic event counts. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1045–1054
Schein A, Zhou M, Blei DM, Wallach H (2016) Bayesian Poisson tucker decomposition for learning the structure of international relations. In: Proceedings of the 33rd international conference on international conference on machine learning—volume 48, JMLR.org, ICML’16, pp 2810–2819. http://dl.acm.org/citation.cfm?id=3045390.3045686
Schmidt MN, Mohamed S (2009) Probabilistic non-negative tensor factorization using Markov chain Monte Carlo. In: 2009 17th European signal processing conference, IEEE, pp 1918–1922
Sedhain S, Menon AK, Sanner S, Xie L (2015) Autorec: autoencoders meet collaborative filtering. In: Proceedings of the 24th international conference on world wide web, ACM, WWW ’15 Companion, pp 111–112
Shashua A, Hazan T (2005) Non-negative tensor factorization with applications to statistics and computer vision. In: Proceedings of the 22nd international conference on machine learning, ACM, pp 792–799
Welling M, Weber M (2001) Positive tensor factorization. Pattern Recogn Lett 22(12):1255–1261
Wu X, Shi B, Dong Y, Huang C, Chawla NV (2019) Neural tensor factorization for temporal interaction learning. In: Proceedings of the twelfth ACM international conference on web search and data mining, Association for Computing Machinery, New York, NY, USA, WSDM ’19, pp 537–545. https://doi.org/10.1145/3289600.3290998
Xue HJ, Dai X, Zhang J, Huang S, Chen J (2017) Deep matrix factorization models for recommender systems. In: Proceedings of the 26th international joint conference on artificial intelligence, IJCAI-17, pp 3203–3209
Yu Y, Zhang L, Wang C, Gao R, Zhao W Jiang J (2019) Neural personalized ranking via Poisson factor model for item recommendation. Complexity
Zhang S, Yao L, Sun A, Tay Y (2019) Deep learning based recommender system: a survey and new perspectives. ACM Comput Surv 52(1):5:1–5:38
Zhou M, Hannah L, Dunson D, Carin L (2012) Beta-negative binomial process and poisson factor analysis. In: Proceedings of the 15th international conference on artificial intelligence and statistics, PMLR, Proceedings of Machine Learning Research, vol 22, pp 1462–1471
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Sriraam Natarajan.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jin, Y., Liu, M., Li, Y. et al. Variational auto-encoder based Bayesian Poisson tensor factorization for sparse and imbalanced count data. Data Min Knowl Disc 35, 505–532 (2021). https://doi.org/10.1007/s10618-020-00723-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-020-00723-7