当前位置: X-MOL 学术Neural Process Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
aiTPR: Attribute Interaction-Tensor Product Representation for Image Caption
Neural Processing Letters ( IF 3.1 ) Pub Date : 2021-02-17 , DOI: 10.1007/s11063-021-10438-5
Chiranjib Sur

Region visual features enhance the generative capability of the machines based on features. However, they lack proper interaction-based attentional perceptions and end up with biased or uncorrelated sentences or pieces of misinformation. In this work, we propose Attribute Interaction-Tensor Product Representation (aiTPR), which is a convenient way of gathering more information through orthogonal combination and learning the interactions as physical entities (tensors) and improving the captions. Compared to previous works, where features add up to undefined feature spaces, TPR helps maintain sanity in combinations, and orthogonality helps define familiar spaces. We have introduced a new concept layer that defines the objects and their interactions that can play a crucial role in determining different descriptions. The interaction portions have contributed heavily to better caption quality and have out-performed various previous works on this domain and MSCOCO dataset. For the first time, we introduced the notion of combining regional image features and abstracted interaction likelihood embedding for image captioning.



中文翻译:

aiTPR:用于图像标题的属性交互作用张量产品表示

区域视觉特征可增强基于特征的机器的生成能力。但是,他们缺乏适当的基于交互的注意力感知,最终会出现有偏见或不相关的句子或错误信息。在这项工作中,我们提出了属性交互作用张量产品表示(aiTPR),这是一种通过正交组合收集更多信息并了解作为物理实体(张量)的交互作用并改善字幕的便捷方法。与以前的作品(将特征加到未定义的特征空间)相比,TPR有助于保持组合的完整性,而正交性则有助于定义熟悉的空间。我们引入了一个新的概念层,该层定义了对象及其相互作用,这些对象及其相互作用在确定不同的描述中起着至关重要的作用。交互部分对改善字幕质量做出了巨大贡献,并且在此领域和MSCOCO数据集上的表现优于以前的各种工作。首次,我们引入了将区域图像特征与抽象的交互可能性嵌入相结合以进行图像字幕的概念。

更新日期:2021-02-18
down
wechat
bug