A study of untrained models for multimodal information retrieval,Information Retrieval Journal

当前位置： X-MOL 学术 › Inf. Retrieval J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A study of untrained models for multimodal information retrieval
Information Retrieval Journal ( IF 1.7 ) Pub Date : 2017-11-03 , DOI: 10.1007/s10791-017-9322-x
Melanie Imhof , Martin Braschler

Operational multimodal information retrieval systems have to deal with increasingly complex document collections and queries that are composed of a large set of textual and non-textual modalities such as ratings, prices, timestamps, geographical coordinates, etc. The resulting combinatorial explosion of modality combinations makes it intractable to treat each modality individually and to obtain suitable training data. As a consequence, instead of finding and training new models for each individual modality or combination of modalities, it is crucial to establish unified models, and fuse their outputs in a robust way. Since the most popular weighting schemes for textual retrieval have in the past generalized well to many retrieval tasks, we demonstrate how they can be adapted to be used with non-textual modalities, which is a first step towards finding such a unified model. We demonstrate that the popular weighting scheme BM25 is suitable to be used for multimodal IR systems and analyze the underlying assumptions of the BM25 formula with respect to merging modalities under the so-called raw-score merging hypothesis, which requires no training. We establish a multimodal baseline for two multimodal test collections, show how modalities differ with respect to their contribution to relevance and the difficulty of treating modalities with overlapping information. Our experiments demonstrate that our multimodal baseline with no training achieves a significantly higher retrieval effectiveness than using just the textual modality for the social book search 2016 collection and lies in the range of a trained multimodal approach using the optimal linear combination of the modality scores.

中文翻译：

多模式信息检索的未经训练模型的研究

可操作的多模式信息检索系统必须处理日益复杂的文档收集和查询，这些文档收集和查询由大量文本和非文本模式（例如评级，价格，时间戳，地理坐标等）组成。结果，模式组合的组合爆炸式增长使得单独对待每种方式并获得合适的训练数据是很棘手的。因此，与其为每个单独的模态或模态组合寻找和训练新模型，不如建立统一的模型并以可靠的方式融合其输出，这一点至关重要。由于过去最流行的文本检索加权方案已经很好地概括了许多检索任务，因此我们演示了如何将其改编以用于非文本模式，这是找到这种统一模型的第一步。我们证明了流行的加权方案BM25适用于多模式红外系统，并在所谓的原始分数合并假设下分析了BM25公式有关合并模式的基本假设，该假设无需训练。我们为两个多模式测试集合建立了一个多模式基线，显示了模式对相关性的贡献以及使用重叠信息处理模式的难度方面的差异。

更新日期：2017-11-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11