当前位置: X-MOL 学术ACM SIGMOD Rec. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Technical Perspective
ACM SIGMOD Record ( IF 0.9 ) Pub Date : 2020-09-04 , DOI: 10.1145/3422648.3422654
Benny Kimelfeld 1
Affiliation  

The challenge of extracting structured information from text, or sequential data in general, is prevalent across a multitude of data-science domains. This challenge, known as Information Extraction (IE), instantiates to core components in text analytics, and a plethora of IE paradigms have been developed over the past decades. Rules and rule systems have consistently been key components in such paradigms, yet their roles have varied and evolved over time. Analytics engines such as IBM's SystemT use IE rules for materializing relations inside relational query languages. Machinelearning classifiers and probabilistic graphical models (e.g., Conditional Random Fields) use rules for feature generation. They also serve as weak constraints in Markov Logic Networks (and extensions such as DeepDive), and generators of noisy training data in the state-of-the-art Snorkel system.

中文翻译:

技术视角

从文本或一般的顺序数据中提取结构化信息的挑战在众多数据科学领域中普遍存在。这一挑战被称为信息提取 (IE),它实例化为文本分析中的核心组件,并且在过去几十年中已经开发了过多的 IE 范例。规则和规则系统一直是此类范例中的关键组成部分,但它们的作用随着时间的推移而变化和演变。IBM 的 SystemT 等分析引擎使用 IE 规则在关系查询语言中实现关系。机器学习分类器和概率图形模型(例如,条件随机场)使用规则生成特征。它们还充当马尔可夫逻辑网络(以及 DeepDive 等扩展)中的弱约束,
更新日期:2020-09-04
down
wechat
bug