Skip to main content
Log in

Variational auto-encoder based Bayesian Poisson tensor factorization for sparse and imbalanced count data

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Non-negative tensor factorization models enable predictive analysis on count data. Among them, Bayesian Poisson–Gamma models can derive full posterior distributions of latent factors and are less sensitive to sparse count data. However, current inference methods for these Bayesian models adopt restricted update rules for the posterior parameters. They also fail to share the update information to better cope with the data sparsity. Moreover, these models are not endowed with a component that handles the imbalance in count data values. In this paper, we propose a novel variational auto-encoder framework called VAE-BPTF which addresses the above issues. It uses multi-layer perceptron networks to encode and share complex update information. The encoded information is then reweighted per data instance to penalize common data values before aggregated to compute the posterior parameters for the latent factors. Under synthetic data evaluation, VAE-BPTF tended to recover the right number of latent factors and posterior parameter values. It also outperformed current models in both reconstruction errors and latent factor (semantic) coherence across five real-world datasets. Furthermore, the latent factors inferred by VAE-BPTF are perceived to be meaningful and coherent under a qualitative analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. For a more comprehensive review on this subject, we refer readers to Zhang et al. (2019).

  2. For a more detailed mathematical description, we refer readers to Gopalan et al. (2015).

  3. For simplicity, we omitted the activation functions and the bias terms in between.

  4. For simplicity, we omitted the prior shape \(\alpha \) and rate \(\beta \) in Eq. 10 and in Fig. 3. They are not directly used to compute \(\alpha _{uk}\) and \(\beta _{uk}\) in Eqs. 8 and 9. Instead, they are leveraged by the KL regularization in Eq. 5.

  5. The combinations include either softplus or sigmoid for \(\text {h}(\cdot )\), and either them or ReLU for \(\text {q}(\cdot )\).

  6. The symbol \({\mathbb {1}}\) denotes a matrix of the same size as \({\varvec{Y}}\) and contains all ones.

  7. For more details about the exact formulas of TE\((\epsilon _{uk};\alpha _{uk})\) and \(\text {R}\big (\text {log}(\frac{\epsilon _{uk}}{\alpha _{uk}}), \text {log}(\alpha _{uk})\big )\), and their derivation, we refer readers to the supplementary materials of Jankowiak and Obermeyer (2018).

  8. We found that a small positive mean for the latter Normal distribution could stabilize the algorithm right after the initialization compared to a zero mean.

  9. https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/index.html.

  10. https://www.kaggle.com/benhamner/nips-papers.

  11. http://www.shandesitong.com/.

  12. http://jmcauley.ucsd.edu/data/amazon.

  13. The prediction targets in this case are the ratings from the Amazon Prime video review dataset.

  14. https://github.com/aschein/bptf/blob/master/code/bptf.py.

  15. https://github.com/ch237/BayesPoissonFactor/blob/master/PTF_OnlineGibbs.m.

  16. https://www.tensortoolbox.org/cp_apr_doc.html.

  17. The embedding was done based on the entity IDs.

  18. Three zero values per one non-zero values.

  19. We show only the LLs of VAE-BPTF and BPTF as the other Poisson-based models are significantly inferior to them in this aspect.

  20. The NPMI scoring uses a large Wikipedia dump hosted by Palmetto: http://palmetto.aksw.org.

  21. The Game data and the Amazon rating data are not text data, and thus NPMI is not applicable.

  22. We used the cmdscale function in R that implements the classical multi-dimensional scaling.

References

  • Ahn HJ (2008) A new similarity measure for collaborative filtering to alleviate the new user cold-starting problem. Inf Sci 178(1):37–51

    Article  Google Scholar 

  • Aletras N, Stevenson M (2013) Evaluating topic coherence using distributional semantics. In: Proceedings of the 10th international conference on computational semantics (IWCS 2013)–Long Papers, pp 13–22

  • Buntine WL, Mishra S (2014) Experiments with non-parametric topic models. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 881–890

  • Chi EC, Kolda TG (2012) On tensors, sparsity, and nonnegative factorizations. SIAM J Matrix Anal Appl 33(4):1272–1299

    Article  MathSciNet  Google Scholar 

  • Cox TF, Cox MA (2000) Multidimensional scaling. Chapman and Hall/CRC, Boca Raton

    Book  Google Scholar 

  • Deng Z, Navarathna R, Carr P, Mandt S, Yue Y, Matthews I, Mori G (2017) Factorized variational autoencoders for modeling audience reactions to movies. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2577–2586

  • Figurnov M, Mohamed S, Mnih A (2018) Implicit reparameterization gradients. In: Advances in neural information processing systems, pp 441–452

  • Friedlander MP, Hatz K (2008) Computing non-negative tensor factorizations. Optim Methods Softw 23(4):631–647

    Article  MathSciNet  Google Scholar 

  • Gopalan P, Hofman JM, Blei DM (2015) Scalable recommendation with hierarchical poisson factorization. In: Proceedings of the 31st conference on uncertainty in artificial intelligence, AUAI Press, UAI’15, pp 326–335

  • He X, Liao L, Zhang H, Nie L, Hu X, Chua TS (2017) Neural collaborative filtering. In: Proceedings of the 26th international conference on world wide web, pp 173–182

  • He X, Du X, Wang X, Tian F, Tang J, Chua TS (2018) Outer product-based neural collaborative filtering. In: Proceedings of the 27th international joint conference on artificial intelligence, AAAI Press, IJCAI’18, pp 2227–2233

  • Hidasi B, Karatzoglou A, Baltrunas L, Tikk D (2016) Session-based recommendations with recurrent neural networks. In: Proceedings of the 4th international conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, conference track proceedings

  • Hinrich JL, Nielsen SFV, Madsen KH, Mørup M (2018) Variational Bayesian partially observed non-negative tensor factorization. In: 2018 IEEE 28th international workshop on machine learning for signal processing (MLSP), pp 1–6. https://doi.org/10.1109/MLSP.2018.8516924

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  • Hu C, Rai P, Chen C, Harding M, Carin L (2015) Scalable Bayesian non-negative tensor factorization for massive count data. In: Machine learning and knowledge discovery in databases. Springer International Publishing, pp 53–70

  • Hu Y, Koren Y, Volinsky C (2008) Collaborative filtering for implicit feedback datasets. In: 2008 Eighth IEEE international conference on data mining, IEEE, pp 263–272

  • Jankowiak M, Obermeyer F (2018) Pathwise derivatives beyond the reparameterization trick. In: International conference on machine learning, pp 2240–2249

  • Kim D, Park C, Oh J, Lee S, Yu H (2016) Convolutional matrix factorization for document context-aware recommendation. In: Proceedings of the 10th ACM conference on recommender systems, ACM, RecSys ’16, pp 233–240

  • Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  • Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: Proceedings of the 2nd international conference on learning representations (ICLR)

  • Knowles DA (2015) Stochastic gradient variational bayes for gamma approximating distributions. arXiv preprint arXiv:1509.01631

  • Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51(3):455–500

    Article  MathSciNet  Google Scholar 

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25, Curran Associates, Inc., pp 1097–1105. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

  • Li S, Kawale J, Fu Y (2015) Deep collaborative filtering via marginalized denoising auto-encoder. In: Proceedings of the 24th ACM international on conference on information and knowledge management, ACM, CIKM ’15, pp 811–820

  • Liu H, Li Y, Tsang M, Liu Y (2019) Costco: a neural tensor completion model for sparse tensors. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, Association for Computing Machinery, pp 324–334

  • Rashid AM, Karypis G, Riedl J (2008) Learning preferences of new users in recommender systems: an information theoretic approach. ACM SIGKDD Explor Newsl 10(2):90–100

    Article  Google Scholar 

  • Schein A, Paisley J, Blei DM, Wallach H (2015) Bayesian Poisson tensor factorization for inferring multilateral relations from sparse dyadic event counts. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1045–1054

  • Schein A, Zhou M, Blei DM, Wallach H (2016) Bayesian Poisson tucker decomposition for learning the structure of international relations. In: Proceedings of the 33rd international conference on international conference on machine learning—volume 48, JMLR.org, ICML’16, pp 2810–2819. http://dl.acm.org/citation.cfm?id=3045390.3045686

  • Schmidt MN, Mohamed S (2009) Probabilistic non-negative tensor factorization using Markov chain Monte Carlo. In: 2009 17th European signal processing conference, IEEE, pp 1918–1922

  • Sedhain S, Menon AK, Sanner S, Xie L (2015) Autorec: autoencoders meet collaborative filtering. In: Proceedings of the 24th international conference on world wide web, ACM, WWW ’15 Companion, pp 111–112

  • Shashua A, Hazan T (2005) Non-negative tensor factorization with applications to statistics and computer vision. In: Proceedings of the 22nd international conference on machine learning, ACM, pp 792–799

  • Welling M, Weber M (2001) Positive tensor factorization. Pattern Recogn Lett 22(12):1255–1261

    Article  Google Scholar 

  • Wu X, Shi B, Dong Y, Huang C, Chawla NV (2019) Neural tensor factorization for temporal interaction learning. In: Proceedings of the twelfth ACM international conference on web search and data mining, Association for Computing Machinery, New York, NY, USA, WSDM ’19, pp 537–545. https://doi.org/10.1145/3289600.3290998

  • Xue HJ, Dai X, Zhang J, Huang S, Chen J (2017) Deep matrix factorization models for recommender systems. In: Proceedings of the 26th international joint conference on artificial intelligence, IJCAI-17, pp 3203–3209

  • Yu Y, Zhang L, Wang C, Gao R, Zhao W Jiang J (2019) Neural personalized ranking via Poisson factor model for item recommendation. Complexity

  • Zhang S, Yao L, Sun A, Tay Y (2019) Deep learning based recommender system: a survey and new perspectives. ACM Comput Surv 52(1):5:1–5:38

    Google Scholar 

  • Zhou M, Hannah L, Dunson D, Carin L (2012) Beta-negative binomial process and poisson factor analysis. In: Proceedings of the 15th international conference on artificial intelligence and statistics, PMLR, Proceedings of Machine Learning Research, vol 22, pp 1462–1471

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuan Jin.

Additional information

Responsible editor: Sriraam Natarajan.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, Y., Liu, M., Li, Y. et al. Variational auto-encoder based Bayesian Poisson tensor factorization for sparse and imbalanced count data. Data Min Knowl Disc 35, 505–532 (2021). https://doi.org/10.1007/s10618-020-00723-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-020-00723-7

Keywords

Navigation