当前位置: X-MOL 学术Found. Trends Inf. Ret. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Semantic Matching in Search
Foundations and Trends in Information Retrieval ( IF 10.4 ) Pub Date : 2014-6-11 , DOI: 10.1561/1500000035
Hang Li , Jun Xu

Relevance is the most important factor to assure users’ satisfaction in search and the success of a search engine heavily depends on its performance on relevance. It has been observed that most of the dissatisfaction cases in relevance are due to term mismatch between queries and documents (e.g., query “NY times” does not match well with a document only containing “New York Times”), because term matching, i.e., the bag-of-words approach, still functions as the main mechanism of modern search engines. It is not exaggerated to say, therefore, that mismatch between query and document poses the most critical challenge in search. Ideally, one would like to see query and document match with each other, if they are topically relevant. Recently, researchers have expended significant effort to address the problem. The major approach is to conduct semantic matching, i.e., to perform more query and document understanding to represent the meanings of them, and perform better matching between the enriched query and document representations. With the availability of large amounts of log data and advanced machine learning techniques, this becomes more feasible and significant progress has been made recently. This survey gives a systematic and detailed introduction to newly developed machine learning technologies for query document matching (semantic matching) in search, particularly web search. It focuses on the fundamental problems, as well as the state-of-the-art solutions of query document matching on form aspect, phrase aspect, word sense aspect, topic aspect, and structure aspect. The ideas and solutions explained may motivate industrial practitioners to turn the research results into products. The methods introduced and the discussions made may also stimulate academic researchers to find new research directions and approaches. Matching between query and document is not limited to search and similar problems can be found in question answering, online advertising, cross-language information retrieval, machine translation, recommender systems, link prediction, image annotation, drug design, and other applications, as the general task of matching between objects from two different spaces. The technologies introduced can be generalized into more general machine learning techniques, which is referred to as learning to match in this survey.



中文翻译:

搜索中的语义匹配

关联性是确保用户对搜索的满意度的最重要因素,而搜索引擎的成功很大程度上取决于其对关联性的表现。已经观察到,大多数相关的不满意案例是由于查询和文档之间的术语不匹配(例如,查询“ NY Times”与仅包含“ New York Times”的文档不完全匹配),因为术语匹配即词袋方法仍然是现代搜索引擎的主要机制。因此,毫不夸张地说,查询和文档之间的不匹配是搜索中最关键的挑战。理想情况下,如果查询和文档在主题上相关,则希望它们相互匹配。最近,研究人员已花费大量精力来解决该问题。主要方法是进行语义匹配,即执行更多查询和文档理解以表示它们的含义,并在丰富的查询和文档表示之间执行更好的匹配。随着大量日志数据和先进的机器学习技术的可用性,这变得更加可行,并且最近已经取得了重大进展。该调查对搜索中的查询文档匹配(语义匹配)(特别是Web搜索)的最新开发的机器学习技术进行了系统详细的介绍。它着重于基本问题以及在形式方面,词组方面,词义方面,主题方面和结构方面的查询文档匹配的最新解决方案。解释的想法和解决方案可能会激励工业从业人员将研究成果转化为产品。引入的方法和进行的讨论也可能激发学术研究人员寻找新的研究方向和方法。查询和文档之间的匹配不仅限于搜索,在问题解答,在线广告,跨语言信息检索,机器翻译,推荐系统,链接预测,图像注释,药物设计以及其他应用中也可以找到类似的问题,例如来自两个不同空间的对象之间匹配的一般任务。可以将引入的技术概括为更通用的机器学习技术,在本次调查中称为学习匹配。引入的方法和进行的讨论也可能激发学术研究人员寻找新的研究方向和方法。查询和文档之间的匹配不仅限于搜索,在问题解答,在线广告,跨语言信息检索,机器翻译,推荐系统,链接预测,图像注释,药物设计以及其他应用中也可以找到类似的问题,例如来自两个不同空间的对象之间匹配的一般任务。可以将引入的技术概括为更通用的机器学习技术,在本次调查中称为学习匹配。引入的方法和进行的讨论也可能激发学术研究人员寻找新的研究方向和方法。查询和文档之间的匹配不仅限于搜索,在问题解答,在线广告,跨语言信息检索,机器翻译,推荐系统,链接预测,图像注释,药物设计以及其他应用中也可以找到类似的问题,例如来自两个不同空间的对象之间匹配的一般任务。可以将引入的技术概括为更通用的机器学习技术,在本次调查中称为学习匹配。在线广告,跨语言信息检索,机器翻译,推荐系统,链接预测,图像标注,药物设计和其他应用程序,是在两个不同空间中进行对象匹配的一般任务。可以将引入的技术概括为更通用的机器学习技术,在本次调查中称为学习匹配。在线广告,跨语言信息检索,机器翻译,推荐系统,链接预测,图像标注,药物设计和其他应用程序,是在两个不同空间中进行对象匹配的一般任务。可以将引入的技术概括为更通用的机器学习技术,在本次调查中称为学习匹配。

更新日期:2014-06-11
down
wechat
bug