Structurally Comparative Hinge Loss for Dependency-Based Neural Text Representation,ACM Transactions on Asian and Low-Resource Language Information Processing

当前位置： X-MOL 学术 › ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Structurally Comparative Hinge Loss for Dependency-Based Neural Text Representation
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 2 ) Pub Date : 2020-05-22 , DOI: 10.1145/3387633
Kexin Wang ₁ , Yu Zhou ₂ , Jiajun Zhang ₁ , Shaonan Wang ₁ , Chengqing Zong ₁

Affiliation

Dependency-based graph convolutional networks (DepGCNs) are proven helpful for text representation to handle many natural language tasks. Almost all previous models are trained with cross-entropy (CE) loss, which maximizes the posterior likelihood directly. However, the contribution of dependency structures is not well considered by CE loss. As a result, the performance improvement gained by using the structure information can be narrow due to the failure in learning to rely on this structure information. To face the challenge, we propose the novel structurally comparative hinge (SCH) loss function for DepGCNs. SCH loss aims at enlarging the margin gained by structural representations over non-structural ones. From the perspective of information theory, this is equivalent to improving the conditional mutual information of model decision and structure information given text. Our experimental results on both English and Chinese datasets show that by substituting SCH loss for CE loss on various tasks, for both induced structures and structures from an external parser, performance is improved without additional learnable parameters. Furthermore, the extent to which certain types of examples rely on the dependency structure can be measured directly by the learned margin, which results in better interpretability. In addition, through detailed analysis, we show that this structure margin has a positive correlation with task performance and structure induction of DepGCNs, and SCH loss can help model focus more on the shortest dependency path between entities. We achieve the new state-of-the-art results on TACRED, IMDB, and Zh. Literature datasets, even compared with ensemble and BERT baselines.

中文翻译：

基于依赖的神经文本表示的结构比较铰链损失

基于依赖的图卷积网络 (DepGCN) 被证明有助于文本表示处理许多自然语言任务。几乎所有以前的模型都使用交叉熵 (CE) 损失进行训练，直接最大化后验似然。然而，依赖结构的贡献并没有被 CE 损失充分考虑。结果，由于无法学习依赖此结构信息，因此使用结构信息获得的性能改进可能会很有限。为了应对挑战，我们为 DepGCNs 提出了新颖的结构比较铰链 (SCH) 损失函数。SCH 损失旨在扩大结构表示相对于非结构表示所获得的利润。从信息论的角度来看，这相当于提高了模型决策的条件互信息和给定文本的结构信息。我们在英文和中文数据集上的实验结果表明，通过在各种任务中用 SCH 损失代替 CE 损失，无论是诱导结构还是来自外部解析器的结构，性能都得到了提高，而无需额外的可学习参数。此外，某些类型的示例依赖依赖结构的程度可以直接通过学习的边际来衡量，这会导致更好的可解释性。此外，通过详细分析，我们表明该结构边距与 DepGCNs 的任务性能和结构归纳呈正相关，而 SCH 损失可以帮助模型更多地关注实体之间的最短依赖路径。我们在 TACRED、IMDB 和 Zh 上取得了新的最先进的结果。文献数据集，甚至与 ensemble 和 BERT 基线相比。

更新日期：2020-05-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>