当前位置: X-MOL 学术arXiv.cs.CV › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Recommending Themes for Ad Creative Design via Visual-Linguistic Representations
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-01-20 , DOI: arxiv-2001.07194
Yichao Zhou, Shaunak Mishra, Manisha Verma, Narayan Bhamidipati, and Wei Wang

There is a perennial need in the online advertising industry to refresh ad creatives, i.e., images and text used for enticing online users towards a brand. Such refreshes are required to reduce the likelihood of ad fatigue among online users, and to incorporate insights from other successful campaigns in related product categories. Given a brand, to come up with themes for a new ad is a painstaking and time consuming process for creative strategists. Strategists typically draw inspiration from the images and text used for past ad campaigns, as well as world knowledge on the brands. To automatically infer ad themes via such multimodal sources of information in past ad campaigns, we propose a theme (keyphrase) recommender system for ad creative strategists. The theme recommender is based on aggregating results from a visual question answering (VQA) task, which ingests the following: (i) ad images, (ii) text associated with the ads as well as Wikipedia pages on the brands in the ads, and (iii) questions around the ad. We leverage transformer based cross-modality encoders to train visual-linguistic representations for our VQA task. We study two formulations for the VQA task along the lines of classification and ranking; via experiments on a public dataset, we show that cross-modal representations lead to significantly better classification accuracy and ranking precision-recall metrics. Cross-modal representations show better performance compared to separate image and text representations. In addition, the use of multimodal information shows a significant lift over using only textual or visual information.

中文翻译:

通过视觉语言表示为广告创意设计推荐主题

在线广告行业一直需要更新广告创意,即用于吸引在线用户选择品牌的图像和文本。需要进行此类更新,以减少在线用户出现广告疲劳的可能性,并将其他成功营销活动的见解纳入相关产品类别。给定一个品牌,为新广告设计主题对于创意战略家来说是一个艰苦而耗时的过程。战略家通常从过去广告活动中使用的图像和文字以及品牌的世界知识中汲取灵感。为了通过过去广告活动中的这种多模态信息源自动推断广告主题,我们为广告创意策略师提出了一个主题(关键短语)推荐系统。主题推荐器基于视觉问答 (VQA) 任务的聚合结果,该任务包含以下内容:(i) 广告图像,(ii) 与广告相关的文本以及广告中品牌的维基百科页面,以及(iii) 围绕广告的问题。我们利用基于转换器的跨模态编码器来为我们的 VQA 任务训练视觉语言表示。我们沿着分类和排序研究了 VQA 任务的两种公式;通过在公共数据集上的实验,我们表明跨模态表示显着提高了分类准确性和排名精度 - 召回指标。与单独的图像和文本表示相比,跨模态表示表现出更好的性能。此外,
更新日期:2020-08-05
down
wechat
bug