当前位置: X-MOL 学术ACM Trans. Internet Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sentence Semantic Matching Based on 3D CNN for Human–Robot Language Interaction
ACM Transactions on Internet Technology ( IF 5.3 ) Pub Date : 2021-07-16 , DOI: 10.1145/3450520
Wenpeng Lu 1 , Rui Yu 1 , Shoujin Wang 2 , Can Wang 3 , Ping Jian 4 , Heyan Huang 4
Affiliation  

The development of cognitive robotics brings an attractive scenario where humans and robots cooperate to accomplish specific tasks. To facilitate this scenario, cognitive robots are expected to have the ability to interact with humans with natural language, which depends on natural language understanding ( NLU ) technologies. As one core task in NLU, sentence semantic matching ( SSM ) has widely existed in various interaction scenarios. Recently, deep learning–based methods for SSM have become predominant due to their outstanding performance. However, each sentence consists of a sequence of words, and it is usually viewed as one-dimensional ( 1D ) text, leading to the existing available neural models being restricted into 1D sequential networks. A few researches attempt to explore the potential of 2D or 3D neural models in text representation. However, it is hard for their works to capture the complex features in texts, and thus the achieved performance improvement is quite limited. To tackle this challenge, we devise a novel 3D CNN-based SSM ( 3DSSM ) method for human–robot language interaction. Specifically, first, a specific architecture called feature cube network is designed to transform a 1D sentence into a multi-dimensional representation named as semantic feature cube. Then, a 3D CNN module is employed to learn a semantic representation for the semantic feature cube by capturing both the local features embedded in word representations and the sequential information among successive words in a sentence. Given a pair of sentences, their representations are concatenated together to feed into another 3D CNN to capture the interactive features between them to generate the final matching representation. Finally, the semantic matching degree is judged with the sigmoid function by taking the learned matching representation as the input. Extensive experiments on two real-world datasets demonstrate that 3DSSM is able to achieve comparable or even better performance over the state-of-the-art competing methods.

中文翻译:

基于 3D CNN 的人机语言交互句子语义匹配

认知机器人技术的发展带来了一个有吸引力的场景,即人类和机器人合作完成特定任务。为了促进这种情况,认知机器人有望能够使用自然语言与人类互动,这取决于自然语言理解(NLU) 技术。作为 NLU 的一项核心任务,句子语义匹配(SSM) 广泛存在于各种交互场景中。最近,基于深度学习的 SSM 方法由于其出色的性能而变得占主导地位。但是,每个句子都由一系列单词组成,通常被视为一维(一维) 文本,导致现有可用的神经模型被限制为一维序列网络。一些研究试图探索 2D 或 3D 神经模型在文本表示中的潜力。然而,他们的作品很难捕捉到文本中的复杂特征,因此所取得的性能提升非常有限。为了应对这一挑战,我们设计了一部小说基于 3D CNN 的 SSM(3DSSM) 人机语言交互方法。具体来说,首先,设计了一种称为特征立方体网络的特定架构,将一维句子转换为称为语义特征立方体的多维表示。然后,使用 3D CNN 模块通过捕获嵌入在单词表示中的局部特征和句子中连续单词之间的顺序信息来学习语义特征立方体的语义表示。给定一对句子,它们的表示被连接在一起以输入另一个 3D CNN,以捕获它们之间的交互特征以生成最终的匹配表示。最后,以学习到的匹配表示为输入,用sigmoid函数判断语义匹配度。
更新日期:2021-07-16
down
wechat
bug