当前位置: X-MOL 学术ACM SIGMOD Rec. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Database Principles and Challenges in Text Analysis
ACM SIGMOD Record ( IF 0.9 ) Pub Date : 2021-08-31 , DOI: 10.1145/3484622.3484624
Johannes Doleschal 1 , Benny Kimelfeld 2 , Wim Martens 3
Affiliation  

A common conceptual view of text analysis is that of a two-step process, where we first extract relations from text documents and then apply a relational query over the result. Hence, text analysis shares technical challenges with, and can draw ideas from, relational databases. A framework that formally instantiates this connection is that of the document spanners. In this article, we review recent advances in various research efforts that adapt fundamental database concepts to text analysis through the lens of document spanners. Among others, we discuss aspects of query evaluation, aggregate queries, provenance, and distributed query planning.

中文翻译:

文本分析中的数据库原理和挑战

文本分析的一个常见概念视图是一个两步过程,我们首先从文本文档中提取关系,然后对结果应用关系查询。因此,文本分析与关系数据库共享技术挑战,并可以从关系数据库中汲取灵感。正式实例化此连接的框架是文档生成器的框架。在本文中,我们回顾了各种研究工作的最新进展,这些研究工作通过文档扳手将基本数据库概念应用于文本分析。其中,我们讨论了查询评估、聚合查询、出处和分布式查询计划的各个方面。
更新日期:2021-08-31
down
wechat
bug