Mitigating the Position Bias of Transformer Models in Passage Re-Ranking,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Mitigating the Position Bias of Transformer Models in Passage Re-Ranking
arXiv - CS - Information Retrieval Pub Date : 2021-01-18 , DOI: arxiv-2101.06980
Sebastian Hofstätter, Aldo Lipani, Sophia Althammer, Markus Zlabinger, Allan Hanbury

Supervised machine learning models and their evaluation strongly depends on the quality of the underlying dataset. When we search for a relevant piece of information it may appear anywhere in a given passage. However, we observe a bias in the position of the correct answer in the text in two popular Question Answering datasets used for passage re-ranking. The excessive favoring of earlier positions inside passages is an unwanted artefact. This leads to three common Transformer-based re-ranking models to ignore relevant parts in unseen passages. More concerningly, as the evaluation set is taken from the same biased distribution, the models overfitting to that bias overestimate their true effectiveness. In this work we analyze position bias on datasets, the contextualized representations, and their effect on retrieval results. We propose a debiasing method for retrieval datasets. Our results show that a model trained on a position-biased dataset exhibits a significant decrease in re-ranking effectiveness when evaluated on a debiased dataset. We demonstrate that by mitigating the position bias, Transformer-based re-ranking models are equally effective on a biased and debiased dataset, as well as more effective in a transfer-learning setting between two differently biased datasets.

中文翻译：

缓解通道重新排列时变压器模型的位置偏差

监督式机器学习模型及其评估在很大程度上取决于基础数据集的质量。当我们搜索相关信息时，它可能出现在给定段落中的任何位置。但是，我们在用于段落重新排名的两个流行的Question Answering数据集中观察到正确答案在文本中的位置存在偏差。过分偏爱通道内部的较早位置是一种不希望的伪像。这导致了三种常见的基于Transformer的重新排序模型，从而忽略了看不见的段落中的相关部分。更令人担忧的是，由于评估集取自相同的偏差分布，因此过度拟合该偏差的模型会高估其真实有效性。在这项工作中，我们分析了数据集的位置偏差，上下文表示形式及其对检索结果的影响。我们提出了一种用于检索数据集的去偏方法。我们的结果表明，在偏向的数据集上进行训练时，在偏向位置的数据集上训练的模型在重新排名有效性方面显着降低。我们证明，通过减轻位置偏差，基于变压器的重新排序模型在有偏差和无偏差的数据集上同样有效，并且在两个有不同偏差的数据集之间的转移学习设置中也更有效。

更新日期：2021-01-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文