Grounding Natural Language Instructions: Can Large Language Models Capture Spatial Information?,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Grounding Natural Language Instructions: Can Large Language Models Capture Spatial Information?
arXiv - CS - Computation and Language Pub Date : 2021-09-17 , DOI: arxiv-2109.08634
Julia Rozanova, Deborah Ferreira, Krishna Dubba, Weiwei Cheng, Dell Zhang, Andre Freitas

Models designed for intelligent process automation are required to be capable of grounding user interface elements. This task of interface element grounding is centred on linking instructions in natural language to their target referents. Even though BERT and similar pre-trained language models have excelled in several NLP tasks, their use has not been widely explored for the UI grounding domain. This work concentrates on testing and probing the grounding abilities of three different transformer-based models: BERT, RoBERTa and LayoutLM. Our primary focus is on these models' spatial reasoning skills, given their importance in this domain. We observe that LayoutLM has a promising advantage for applications in this domain, even though it was created for a different original purpose (representing scanned documents): the learned spatial features appear to be transferable to the UI grounding setting, especially as they demonstrate the ability to discriminate between target directions in natural language instructions.

中文翻译：

扎根自然语言指令：大型语言模型能否捕获空间信息？

为智能过程自动化设计的模型需要能够接地用户界面元素。界面元素基础的这项任务的中心是将自然语言中的指令链接到它们的目标对象。尽管 BERT 和类似的预训练语言模型在多项 NLP 任务中表现出色，但它们在 UI 基础领域的使用尚未得到广泛探索。这项工作专注于测试和探索三种不同的基于变压器的模型的接地能力：BERT、RoBERTa 和 LayoutLM。鉴于它们在该领域的重要性，我们的主要重点是这些模型的空间推理技能。我们观察到 LayoutLM 在该领域的应用程序中具有很大的优势，即使它是为不同的原始目的（代表扫描的文档）而创建的：

更新日期：2021-09-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文