当前位置: X-MOL 学术Expert Syst. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Interactive example-based finding of text items
Expert Systems with Applications ( IF 8.5 ) Pub Date : 2020-04-06 , DOI: 10.1016/j.eswa.2020.113403
Eric Medvet , Alberto Bartoli , Andrea De Lorenzo , Fabiano Tarlao

We consider the problem of identifying within a given document all text items which follow a certain pattern to be specified by a user. In particular, we focus on scenarios in which the task is to be completed very quickly and the user is not able to specify the exact pattern of interest. The key use case corresponds to the interactive exploration of documents in search of snippets that do not fit Boolean, word-based search expressions. We propose an interactive framework in which the user provides examples of the items he is interested in, the system identifies items similar to those provided by the user and progressively refines the similarity criterion by submitting selected queries to the user, in an active learning fashion. The fact that the search is to be executed very quickly places severe requirements on the algorithms that can be used by the system, both for identifying the items and for constructing the queries. We propose and assess experimentally in detail a number of different design options for the components of the learning machinery. The results demonstrate the ability of our approach to achieve effectiveness close to state-of-the-art approaches based on regular expressions, while requiring an execution time which is orders of magnitude shorter.



中文翻译:

基于交互式示例的文本项查找

我们考虑在给定文档中识别所有遵循用户指定模式的文本项的问题。特别是,我们专注于任务将很快完成且用户无法指定确切兴趣模式的场景。关键用例对应于文档的交互式浏览,以搜索不适合基于布尔的,基于单词的搜索表达式的代码片段。我们提出了一个交互式框架,其中用户可以提供他感兴趣的项目的示例,系统可以在主动学习中识别出与用户提供的项目相似的项目,并通过向用户提交选定的查询来逐步完善相似性标准时尚。搜索将很快执行的事实对系统可以使用的算法提出了严格的要求,系统既要识别项目,也要构建查询。我们为学习机器的组件提供了许多不同的设计选择,并通过实验进行了详细评估。结果证明了我们的方法具有达到与基于正则表达式的最新方法相近的有效性的能力,同时所需的执行时间要短几个数量级。

更新日期:2020-04-06
down
wechat
bug