MGAT: Multimodal Graph Attention Network for Recommendation,Information Processing & Management

当前位置： X-MOL 学术 › Inf. Process. Manag. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MGAT: Multimodal Graph Attention Network for Recommendation
Information Processing & Management ( IF 8.6 ) Pub Date : 2020-05-12 , DOI: 10.1016/j.ipm.2020.102277
Zhulin Tao , Yinwei Wei , Xiang Wang , Xiangnan He , Xianglin Huang , Tat-Seng Chua

Graph neural networks (GNNs) have shown great potential for personalized recommendation. At the core is to reorganize interaction data as a user-item bipartite graph and exploit high-order connectivity among user and item nodes to enrich their representations. While achieving great success, most existing works consider interaction graph based only on ID information, foregoing item contents from multiple modalities (e.g., visual, acoustic, and textual features of micro-video items). Distinguishing personal interests on different modalities at a granular level was not explored until recently proposed MMGCN (Wei et al., 2019). However, it simply employs GNNs on parallel interaction graphs and treats information propagated from all neighbors equally, failing to capture user preference adaptively. Hence, the obtained representations might preserve redundant, even noisy information, leading to non-robustness and suboptimal performance. In this work, we aim to investigate how to adopt GNNs on multimodal interaction graphs, to adaptively capture user preference on different modalities and offer in-depth analysis on why an item is suitable to a user. Towards this end, we propose a new Multimodal Graph Attention Network, short for MGAT, which disentangles personal interests at the granularity of modality. In particular, built upon multimodal interaction graphs, MGAT conducts information propagation within individual graphs, while leveraging the gated attention mechanism to identify varying importance scores of different modalities to user preference. As such, it is able to capture more complex interaction patterns hidden in user behaviors and provide a more accurate recommendation. Empirical results on two micro-video recommendation datasets, Tiktok and MovieLens, show that MGAT exhibits substantial improvements over the state-of-the-art baselines like NGCF (Wang, He, et al., 2019) and MMGCN (Wei et al., 2019). Further analysis on a case study illustrates how MGAT generates attentive information flow over multimodal interaction graphs.

中文翻译：

MGAT：推荐的多峰图形注意网络

图神经网络（GNN）已显示出个性化推荐的巨大潜力。核心是将交互数据重组为用户项二部图，并利用用户和项节点之间的高级连接性来丰富其表示。在取得巨大成功的同时，大多数现有作品仅考虑基于ID信息的交互图，而上述信息来自多种形式（例如，微型视频项目的视觉，听觉和文字功能）。直到最近提出MMGCN（Wei et al。，2019），才开始探索在细微层面上区分个人兴趣的方式。但是，它仅在并行交互图上采用GNN，并且平等地对待从所有邻居传播的信息，因此无法自适应地捕获用户偏好。因此，所获得的表示可能会保留冗余甚至嘈杂的信息，从而导致不稳健和次优性能。在这项工作中，我们旨在研究如何在多模式交互图上采用GNN，以适应性地捕获不同模式下的用户偏好，并就为什么某项商品适合用户提供深入的分析。为此，我们提出了一个新的多峰图形注意网络，它是MGAT的缩写，在形式的粒度上，这使个人利益无法得到解决。特别是，MGAT建立在多模式交互图的基础上，在各个图内进行信息传播，同时利用门控注意机制来识别不同模式对用户偏好的不同重要性得分。这样，它能够捕获隐藏在用户行为中的更复杂的交互模式，并提供更准确的推荐。在两个微型视频推荐数据集Tiktok和MovieLens上的经验结果表明，与NGCF（Wang等，2019）和MMGCN（Wei等。，2019）。对案例研究的进一步分析说明了MGAT如何在多峰交互图上生成关注的信息流。特别是，MGAT建立在多模式交互图的基础上，可以在各个图内进行信息传播，同时利用门控注意机制来识别不同模式对用户偏好的不同重要性得分。这样，它能够捕获隐藏在用户行为中的更复杂的交互模式，并提供更准确的推荐。在两个微型视频推荐数据集Tiktok和MovieLens上的经验结果表明，与NGCF（Wang等，2019）和MMGCN（Wei等。，2019）。对案例研究的进一步分析说明了MGAT如何在多峰交互图上生成关注的信息流。特别是，MGAT建立在多模式交互图的基础上，可以在各个图内进行信息传播，同时利用门控注意机制来识别不同模式对用户偏好的不同重要性得分。这样，它能够捕获隐藏在用户行为中的更复杂的交互模式，并提供更准确的推荐。在两个微型视频推荐数据集Tiktok和MovieLens上的经验结果表明，与NGCF（Wang等，2019）和MMGCN（Wei等。，2019）。对案例研究的进一步分析说明了MGAT如何在多峰交互图上生成关注的信息流。同时利用门控注意力机制来识别针对用户偏好的不同方式的不同重要性得分。这样，它能够捕获隐藏在用户行为中的更复杂的交互模式，并提供更准确的推荐。在两个微型视频推荐数据集Tiktok和MovieLens上的经验结果表明，与NGCF（Wang等，2019）和MMGCN（Wei等。，2019）。对案例研究的进一步分析说明了MGAT如何在多峰交互图上生成关注的信息流。同时利用门控注意力机制来识别针对用户偏好的不同方式的不同重要性得分。这样，它能够捕获隐藏在用户行为中的更复杂的交互模式，并提供更准确的推荐。在两个微型视频推荐数据集（Tiktok和MovieLens）上的经验结果表明，与NGCF（Wang等，2019）和MMGCN（Wei等。，2019）。案例研究的进一步分析说明了MGAT如何在多模式交互图上生成关注的信息流。它能够捕获用户行为中隐藏的更复杂的交互模式，并提供更准确的建议。在两个微型视频推荐数据集（Tiktok和MovieLens）上的经验结果表明，与NGCF（Wang等，2019）和MMGCN（Wei等。，2019）。对案例研究的进一步分析说明了MGAT如何在多峰交互图上生成关注的信息流。它能够捕获用户行为中隐藏的更复杂的交互模式，并提供更准确的建议。在两个微型视频推荐数据集Tiktok和MovieLens上的经验结果表明，与NGCF（Wang等，2019）和MMGCN（Wei等。，2019）。对案例研究的进一步分析说明了MGAT如何在多峰交互图上生成关注的信息流。

更新日期：2020-05-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>