当前位置: X-MOL 学术IEEE ACM Trans. Audio Speech Lang. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distribution Distance Regularized Sequence Representation for Text Matching in Asymmetrical Domains
IEEE/ACM Transactions on Audio, Speech, and Language Processing ( IF 4.1 ) Pub Date : 2022-02-01 , DOI: 10.1109/taslp.2022.3145289
Weijie Yu 1 , Chen Xu 2 , Jun Xu 2 , Liang Pang 3 , Ji-Rong Wen 2
Affiliation  

Projecting the input text pair into a common semantic space where the matching function can be readily learned is an essential step for asymmetrical text matching. In the practice, it is often observed that the feature vectors from asymmetrical texts show a tendency to be gradually undistinguishable in the semantic space as the model is trained. However, the phenomenon is overlooked in existing studies. As a result, the feature vectors are constructed without any regularization, which inevitably hinders the learning of the downstream matching functions. In this paper, we first exploit the phenomenon and propose DDR-Match, a novel matching framework tailored for asymmetrical text matching. Specifically, in DDR-Match, a distribution distance-based regularizer is devised to accelerate the fusion of sequence representations corresponding to different domains in the semantic space. Then, we provide three instances of DDR-Match and make a comparison among them. DDR-Match is compatible with existing text matching methods by incorporating them as the underlying matching model. Four popular text matching methods are exploited in the paper. Extensive experimental results based on five publicly available benchmarks showed that DDR-Match consistently outperformed its underlying methods.

中文翻译:


非对称域文本匹配的分布距离正则序列表示



将输入文本对投影到可以轻松学习匹配函数的公共语义空间中是非对称文本匹配的重要步骤。在实践中,经常观察到,随着模型的训练,不对称文本的特征向量在语义空间中表现出逐渐不可区分的趋势。然而,现有的研究却忽视了这一现象。结果,特征向量的构造没有任何正则化,这不可避免地阻碍了下游匹配函数的学习。在本文中,我们首先利用这一现象并提出 DDR-Match,这是一种专为非对称文本匹配而定制的新型匹配框架。具体来说,在DDR-Match中,设计了基于分布距离的正则化器来加速语义空间中不同域对应的序列表示的融合。然后,我们提供了三个DDR-Match实例并对其进行比较。 DDR-Match 通过将现有的文本匹配方法合并为底层匹配模型来与它们兼容。本文利用了四种流行的文本匹配方法。基于五个公开基准的大量实验结果表明,DDR-Match 的性能始终优于其基础方法。
更新日期:2022-02-01
down
wechat
bug