Skip to main content
Log in

Modeling low- and high-order feature interactions with FM and self-attention network

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Click-Through Rate (CTR) prediction has always been a very popular topic. In many online applications, such as online advertising and product recommendation, a small increase in CTR will bring great returns. However, CTR prediction has always faced several challenges. A large number of users and items and the different sizes of the feature space of different data types lead to high-dimensional and sparse input, and high-order feature interactions rely too much on expert knowledge and are very time-consuming. In this paper, we build a novel model called multi-order interactive features aware factorization machine (MoFM) for CTR prediction. To effectively capturing both low-order and high-order interactive features, three different types of prediction models are integrated, of which logistic regression (LR) and factorization machine (FM) model the original features and 2-order interactive features respectively, and a multi-head self-attention network with residual connections is used to automatically identify high-value high-order feature combinations. There is also an embedding layer in the model to realize a unified embedding processing of different data types, avoiding diversification, sparsity, and high dimensionality of features. Since, feature engineering is not required, we can carry out end-to-end model learning. Experiments on three public datasets show the superiority of the proposed model over the state-of-the-art models, and the flexibility and scalability of the model structure have also been verified.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Cheng H, Koc L, Harmsen J et al (2016) Wide & deep learning for recommender systems. In: Proceedings of the 1st workshop on deep learning for recommender systems (DLRS) - in conjunction with RecSys

  2. Andrew G, Gao J (2007) Scalable training of L1-regularized log-linear models. In: Proceedings of the international conference on machine learning (ICML)

  3. Chang Y, Hsieh C, Chang K et al (2010) Training and testing low-degree polynomial data mappings via linear SVM. J Mach Learn Res 11(4):1471–1490

    MathSciNet  MATH  Google Scholar 

  4. Rendle S (2011) Factorization machines with libFM. ACM Trans Intel Syst Technol 3(3):57

    Google Scholar 

  5. Juan Y, Zhuang Y, Chin W et al (2016) Field-aware factorization machines for CTR prediction. In: Proceedings of the 10th ACM conference on recommender systems

  6. Yan C, Zhang Q, Zhao X et al (2017) An intelligent field-aware factorization machine model. In: Inproceedings of the 22nd international conference on database systems for advanced applications (DASFAA)

  7. He X, Pan J, Jin O et al (2014) Practical lessons from predicting clicks on ads at facebook. In: Proceedings of the 8th international workshop on data mining for online advertising (ADKDD) - in conjunction with SIGKDD

  8. Gai K, Zhu X, Li H et al (2017) Learning piece-wise linear models from large scale data for ad click prediction. arXiv:1704.05194

  9. Ahmadian S, Joorabloo N, Jalili M et al (2020) A social recommender system based on reliable implicit relationships. Knowledge-Based Systems 192(3):105371

    Article  Google Scholar 

  10. Ahmadian S, Meghdadi M, Afsharchi M (2018) Incorporating reliable virtual ratings into social recommendation systems. Appl Intell 48(11):4448–4469

    Article  Google Scholar 

  11. Abualigah LM Feature Selection and Enhanced Krill Herd Algorithm for Text Document Clustering[M]. 2018. 10.1007/978-3-030-10674-4

  12. Abualigah LM, Khader AT, Hanandeh ES (2018) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071

    Article  Google Scholar 

  13. Shih S, Sun F, Lee H (2019) Temporal pattern attention for multivariate time series forecasting. Mach Learn 108(8-9):1421–1441

    Article  MathSciNet  Google Scholar 

  14. Ahmadian S, Afsharchi M, Meghdadi M (2019) An effective social recommendation method based on user reputation model and rating profile enhancement. J Inf Sci 45(5):607–642

    Article  Google Scholar 

  15. Ahmadian S, Afsharchi M, Meghdadi M. (2019) A novel approach based on multi-view reliability measures to alleviate data sparsity in recommender systems. Multimedia Tools and Applications 78(13):17763–17798

    Article  Google Scholar 

  16. He X, Liao L, Zhang H et al (2017) Neural collaborative filtering. In: Proceedings of the 26th international world wide Web conference (WWW)

  17. Shan Y, Hoens TR, Jiao J et al (2016) Deep crossing: Web-scale modeling without manually crafted combinatorial features. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining

  18. Qu Y, Cai H, Ren K et al (2016) Product-based neural networks for user response prediction. In: Proceedings of the 16th IEEE international conference on data mining (ICDM)

  19. Wang R, Fu G, Fu B et al (2017) Deep & cross network for ad click predictions. In: Proceedings of the 2017 AdKDD and TargetAd - In conjunction with ACM SIGKDD

  20. Zhang W, Du T, Wang J (2016) Deep learning over multi-field categorical data - A case study on user response prediction. In: Proceedings of the European conference on information retrieval (ECIR)

  21. He X, Chua TS (2017) Neural factorization machines for sparse predictive analytics. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval (SIGIR)

  22. Guo H, Tang R, Ye Y et al (2017) DeepFM: A factorization-machine based neural network for CTR prediction. In: Proceedings of the 26th international joint conference on artificial intelligence (IJCAI)

  23. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Proceedings of the conference on advances in neural information processing systems

  24. Xiao J, Ye H, He X et al (2017) Attentional factorization machines: Learning the weight of feature interactions via attention networks. In: Proceedings of the 26th international joint conference on artificial intelligence (IJCAI)

  25. Zhou G, Fan Y, Yan Y et al (2018) Deep interest network for click-through rate prediction. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining (KDD)

  26. Song W, Duan Z, Xu Y et al (2019) Autoint: Automatic feature interaction learning via self-attentive neural networks. In: Proceedings of the 28th ACM international conference on information and knowledge management (CIKM)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cairong Yan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, C., Chen, Y., Wan, Y. et al. Modeling low- and high-order feature interactions with FM and self-attention network. Appl Intell 51, 3189–3201 (2021). https://doi.org/10.1007/s10489-020-01951-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01951-6

Keywords

Navigation