Abstract
Click-Through Rate (CTR) prediction has always been a very popular topic. In many online applications, such as online advertising and product recommendation, a small increase in CTR will bring great returns. However, CTR prediction has always faced several challenges. A large number of users and items and the different sizes of the feature space of different data types lead to high-dimensional and sparse input, and high-order feature interactions rely too much on expert knowledge and are very time-consuming. In this paper, we build a novel model called multi-order interactive features aware factorization machine (MoFM) for CTR prediction. To effectively capturing both low-order and high-order interactive features, three different types of prediction models are integrated, of which logistic regression (LR) and factorization machine (FM) model the original features and 2-order interactive features respectively, and a multi-head self-attention network with residual connections is used to automatically identify high-value high-order feature combinations. There is also an embedding layer in the model to realize a unified embedding processing of different data types, avoiding diversification, sparsity, and high dimensionality of features. Since, feature engineering is not required, we can carry out end-to-end model learning. Experiments on three public datasets show the superiority of the proposed model over the state-of-the-art models, and the flexibility and scalability of the model structure have also been verified.
Similar content being viewed by others
References
Cheng H, Koc L, Harmsen J et al (2016) Wide & deep learning for recommender systems. In: Proceedings of the 1st workshop on deep learning for recommender systems (DLRS) - in conjunction with RecSys
Andrew G, Gao J (2007) Scalable training of L1-regularized log-linear models. In: Proceedings of the international conference on machine learning (ICML)
Chang Y, Hsieh C, Chang K et al (2010) Training and testing low-degree polynomial data mappings via linear SVM. J Mach Learn Res 11(4):1471–1490
Rendle S (2011) Factorization machines with libFM. ACM Trans Intel Syst Technol 3(3):57
Juan Y, Zhuang Y, Chin W et al (2016) Field-aware factorization machines for CTR prediction. In: Proceedings of the 10th ACM conference on recommender systems
Yan C, Zhang Q, Zhao X et al (2017) An intelligent field-aware factorization machine model. In: Inproceedings of the 22nd international conference on database systems for advanced applications (DASFAA)
He X, Pan J, Jin O et al (2014) Practical lessons from predicting clicks on ads at facebook. In: Proceedings of the 8th international workshop on data mining for online advertising (ADKDD) - in conjunction with SIGKDD
Gai K, Zhu X, Li H et al (2017) Learning piece-wise linear models from large scale data for ad click prediction. arXiv:1704.05194
Ahmadian S, Joorabloo N, Jalili M et al (2020) A social recommender system based on reliable implicit relationships. Knowledge-Based Systems 192(3):105371
Ahmadian S, Meghdadi M, Afsharchi M (2018) Incorporating reliable virtual ratings into social recommendation systems. Appl Intell 48(11):4448–4469
Abualigah LM Feature Selection and Enhanced Krill Herd Algorithm for Text Document Clustering[M]. 2018. 10.1007/978-3-030-10674-4
Abualigah LM, Khader AT, Hanandeh ES (2018) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071
Shih S, Sun F, Lee H (2019) Temporal pattern attention for multivariate time series forecasting. Mach Learn 108(8-9):1421–1441
Ahmadian S, Afsharchi M, Meghdadi M (2019) An effective social recommendation method based on user reputation model and rating profile enhancement. J Inf Sci 45(5):607–642
Ahmadian S, Afsharchi M, Meghdadi M. (2019) A novel approach based on multi-view reliability measures to alleviate data sparsity in recommender systems. Multimedia Tools and Applications 78(13):17763–17798
He X, Liao L, Zhang H et al (2017) Neural collaborative filtering. In: Proceedings of the 26th international world wide Web conference (WWW)
Shan Y, Hoens TR, Jiao J et al (2016) Deep crossing: Web-scale modeling without manually crafted combinatorial features. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining
Qu Y, Cai H, Ren K et al (2016) Product-based neural networks for user response prediction. In: Proceedings of the 16th IEEE international conference on data mining (ICDM)
Wang R, Fu G, Fu B et al (2017) Deep & cross network for ad click predictions. In: Proceedings of the 2017 AdKDD and TargetAd - In conjunction with ACM SIGKDD
Zhang W, Du T, Wang J (2016) Deep learning over multi-field categorical data - A case study on user response prediction. In: Proceedings of the European conference on information retrieval (ECIR)
He X, Chua TS (2017) Neural factorization machines for sparse predictive analytics. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval (SIGIR)
Guo H, Tang R, Ye Y et al (2017) DeepFM: A factorization-machine based neural network for CTR prediction. In: Proceedings of the 26th international joint conference on artificial intelligence (IJCAI)
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Proceedings of the conference on advances in neural information processing systems
Xiao J, Ye H, He X et al (2017) Attentional factorization machines: Learning the weight of feature interactions via attention networks. In: Proceedings of the 26th international joint conference on artificial intelligence (IJCAI)
Zhou G, Fan Y, Yan Y et al (2018) Deep interest network for click-through rate prediction. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining (KDD)
Song W, Duan Z, Xu Y et al (2019) Autoint: Automatic feature interaction learning via self-attentive neural networks. In: Proceedings of the 28th ACM international conference on information and knowledge management (CIKM)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yan, C., Chen, Y., Wan, Y. et al. Modeling low- and high-order feature interactions with FM and self-attention network. Appl Intell 51, 3189–3201 (2021). https://doi.org/10.1007/s10489-020-01951-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01951-6