Modeling low- and high-order feature interactions with FM and self-attention network,Applied Intelligence

当前位置： X-MOL 学术 › Appl. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Modeling low- and high-order feature interactions with FM and self-attention network
Applied Intelligence ( IF 5.3 ) Pub Date : 2020-11-09 , DOI: 10.1007/s10489-020-01951-6
Cairong Yan , Yizhou Chen , Yongquan Wan , Pengwei Wang

Click-Through Rate (CTR) prediction has always been a very popular topic. In many online applications, such as online advertising and product recommendation, a small increase in CTR will bring great returns. However, CTR prediction has always faced several challenges. A large number of users and items and the different sizes of the feature space of different data types lead to high-dimensional and sparse input, and high-order feature interactions rely too much on expert knowledge and are very time-consuming. In this paper, we build a novel model called multi-order interactive features aware factorization machine (MoFM) for CTR prediction. To effectively capturing both low-order and high-order interactive features, three different types of prediction models are integrated, of which logistic regression (LR) and factorization machine (FM) model the original features and 2-order interactive features respectively, and a multi-head self-attention network with residual connections is used to automatically identify high-value high-order feature combinations. There is also an embedding layer in the model to realize a unified embedding processing of different data types, avoiding diversification, sparsity, and high dimensionality of features. Since, feature engineering is not required, we can carry out end-to-end model learning. Experiments on three public datasets show the superiority of the proposed model over the state-of-the-art models, and the flexibility and scalability of the model structure have also been verified.

中文翻译：

使用FM和自我注意网络对低阶和高阶特征交互进行建模

点击率（CTR）预测一直是非常受欢迎的话题。在许多在线应用中，例如在线广告和产品推荐，CTR的小幅增长将带来丰厚的回报。但是，点击率预测一直面临着一些挑战。大量的用户和项以及不同数据类型的要素空间的不同大小导致高维和稀疏的输入，并且高阶要素交互过于依赖专家知识并且非常耗时。在本文中，我们建立了一个称为多阶交互式特征感知因子分解机（MoFM）的CTR预测模型。为了有效捕获低阶和高阶交互功能，我们集成了三种不同类型的预测模型，其中逻辑回归（LR）和因式分解机（FM）分别对原始特征和2阶交互特征进行建模，并使用带有残差连接的多头自注意网络自动识别高价值的高阶特征组合。模型中还有一个嵌入层，可实现对不同数据类型的统一嵌入处理，避免了特征的多样性，稀疏性和高维性。由于不需要特征工程，因此我们可以进行端到端模型学习。在三个公共数据集上的实验表明，所提出的模型优于最新模型，并且还验证了模型结构的灵活性和可伸缩性。具有残差连接的多头自注意网络用于自动识别高价值的高阶特征组合。模型中还有一个嵌入层，可实现对不同数据类型的统一嵌入处理，避免了特征的多样性，稀疏性和高维性。由于不需要特征工程，因此我们可以进行端到端模型学习。在三个公共数据集上的实验表明，所提出的模型优于最新模型，并且还验证了模型结构的灵活性和可伸缩性。具有残差连接的多头自注意网络用于自动识别高价值的高阶特征组合。模型中还有一个嵌入层，可实现对不同数据类型的统一嵌入处理，避免了特征的多样性，稀疏性和高维性。由于不需要特征工程，因此我们可以进行端到端模型学习。在三个公共数据集上的实验表明，所提出的模型优于最新模型，并且还验证了模型结构的灵活性和可伸缩性。稀疏性和高维度特征。由于不需要特征工程，因此我们可以进行端到端模型学习。在三个公共数据集上的实验表明，所提出的模型优于最新模型，并且还验证了模型结构的灵活性和可伸缩性。稀疏性和高维度特征。由于不需要特征工程，因此我们可以进行端到端模型学习。在三个公共数据集上的实验表明，所提出的模型优于最新模型，并且还验证了模型结构的灵活性和可伸缩性。

更新日期：2020-11-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>