当前位置: X-MOL 学术Intell. Data Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Could spatial features help the matching of textual data?
Intelligent Data Analysis ( IF 1.7 ) Pub Date : 2020-09-30 , DOI: 10.3233/ida-194749
Jacques Fize 1, 2 , Mathieu Roche 1, 2 , Maguelonne Teisseire 2
Affiliation  

Textual data is available to an increasing extent through different media (social networks, companies data, data catalogues, etc.). New information extraction methods are needed since these new resources are highly heterogeneous. In this article, we propose a text matching process based on spatialfeatures and assessed through heterogeneous textual data. Besides being compatible with heterogeneous data, it comprises two contributions: first, spatial information is extracted for comparison purposes and subsequently stored in a dedicated spatial textual representation (STR); and then two transformations are applied on STR to improve the spatial similarity estimation. This article outlines the proposed approach with new contributions: (i) a new geocoding methods using general co-occurrences between entities, and (ii) a thorough evaluation followed by (iii) an in-depth discussion. The results obtained on two corpora demonstrate that good spatial matches (≈ 80% precision on major criteria) can be obtained between the most similar STRs with further enhancement achieved via STR transformation.

中文翻译:

空间特征可以帮助文本数据匹配吗?

文本数据可通过不同的媒体(社交网络,公司数据,数据目录等)越来越多地获得。由于这些新资源高度异构,因此需要新的信息提取方法。在本文中,我们提出了一种基于空间特征并通过异构文本数据进行评估的文本匹配过程。除了与异构数据兼容外,它还包括两个方面:首先,提取空间信息以进行比较,然后将其存储在专用的空间文本表示形式(STR)中;然后对STR进行两次变换以改善空间相似度估计。本文概述了提出的方法,并提供了新的贡献:(i)一种新的地理编码方法,该方法使用实体之间的一般共现,(ii)进行彻底评估,然后(iii)进行深入讨论。在两个语料库上获得的结果表明,可以在最相似的STR之间获得良好的空间匹配(在主要标准上约为80%的精度),并且可以通过STR转换实现进一步的增强。
更新日期:2020-10-04
down
wechat
bug