当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
TableQnA: Answering List Intent Queries With Web Tables
arXiv - CS - Information Retrieval Pub Date : 2020-01-10 , DOI: arxiv-2001.04828
Kaushik Chakrabarti, Zhimin Chen, Siamak Shakeri, Guihong Cao, Surajit Chaudhuri

The web contains a vast corpus of HTML tables. They can be used to provide direct answers to many web queries. We focus on answering two classes of queries with those tables: those seeking lists of entities (e.g., `cities in california') and those seeking superlative entities (e.g., `largest city in california'). The main challenge is to achieve high precision with significant coverage. Existing approaches train machine learning models to select the answer from the candidates; they rely on textual match features between the query and the content of the table along with features capturing table quality/importance. These features alone are inadequate for achieving the above goals. Our main insight is that we can improve precision by (i) first extracting intent (structured information) from the query for the above query classes and (ii) then performing structure-aware matching (instead of just textual matching) between the extracted intent and the candidates to select the answer. We model (i) as a sequence tagging task. We leverage state-of-the-art deep neural network models with word embeddings. The model requires large scale training data which is expensive to obtain via manual labeling; we therefore develop a novel method to automatically generate the training data. For (ii), we develop novel features to compute structure-aware match and train a machine learning model. Our experiments on real-life web search queries show that (i) our intent extractor for list and superlative intent queries has significantly higher precision and coverage compared with baseline approaches and (ii) our table answer selector significantly outperforms the state-of-the-art baseline approach. This technology has been used in production by Microsoft's Bing search engine since 2016.

中文翻译:

TableQnA:使用 Web 表回答列表意图查询

网络包含大量的 HTML 表格语料库。它们可用于为许多网络查询提供直接答案。我们专注于用这些表回答两类查询:那些寻求实体列表(例如,“加利福尼亚州的城市”)和那些寻求最高级实体(例如,“加利福尼亚州的最大城市”)。主要挑战是实现高精度和显着覆盖。现有方法训练机器学习模型以从候选人中选择答案;它们依赖于查询和表格内容之间的文本匹配特征以及捕获表格质量/重要性的特征。仅这些特征不足以实现上述目标。我们的主要见解是,我们可以通过(i)首先从上述查询类别的查询中提取意图(结构化信息),然后在提取的意图和考生选择答案。我们将(i)建模为序列标记任务。我们利用最先进的深度神经网络模型和词嵌入。该模型需要大规模的训练数据,通过人工标注获取成本很高;因此,我们开发了一种自动生成训练数据的新方法。对于(ii),我们开发了新的特征来计算结构感知匹配并训练机器学习模型。我们在现实生活中的网络搜索查询上的实验表明,(i) 我们的列表和最高级意图查询的意图提取器与基线方法相比具有更高的精度和覆盖率,并且 (ii) 我们的表格答案选择器显着优于当前状态 -艺术基线方法。自 2016 年以来,这项技术已被微软的 Bing 搜索引擎用于生产。
更新日期:2020-01-15
down
wechat
bug