当前位置: X-MOL 学术IEEE Trans. Ind. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Multi-Domain Layered Approach in Development of Industrial Ontology to Support Domain Identification for Unstructured Text
IEEE Transactions on Industrial Informatics ( IF 12.3 ) Pub Date : 2018-09-01 , DOI: 10.1109/tii.2018.2835567
Rajbabu Kumaravel , Sudha Selvaraj , C. Mala

Due to the emergence of digital revolution and competitiveness in recent decades, almost all organizations and industries intend to develop solutions to extract information from unstructured documents. These documents comprise of information related to multiple divergent domains, and therefore, there is a need of a multidomain knowledge base. Since recent research works suggest ontology as the predominant model, it is proposed to evolve a unified ontology modeling approach with multiple layers and divergent domains to support information processing from unstructured documents. The model is evolved by integrating relevant domains to facilitate cross domain query. Further as the features of unstructured documents span across multiple domains, domain identification is to be performed prior to any information processing. Hence, an attempt is made to identify the domain using the proposed ontology model. The proposed ontology is developed for the Thermal Power Plant Industry and domain identification is demonstrated with an example. A statistical similarity index is proposed to associate divergent volatile features of unstructured text with ontology knowledge for domain identification. The outcome of the proposal is evaluated using the proposed similarity index. A subsequent study to extract information using classified content with the support of directed acyclic graph relationship is under progress. The merit of the proposal is its ability to extend its usage across multiple stages of information processing with distinctive purpose.

中文翻译:

工业本体开发中支持非结构化文本域识别的多域分层方法

由于近几十年来数字革命和竞争力的出现,几乎所有组织和行业都打算开发从非结构化文档中提取信息的解决方案。这些文档包含与多个不同域有关的信息,因此,需要一个多域知识库。由于最近的研究工作建议将本体作为主要模型,因此提出了发展具有多个层次和不同域的统一本体建模方法以支持来自非结构化文档的信息处理的建议。该模型是通过集成相关域来发展的,以促进跨域查询。此外,由于非结构化文档的功能跨越多个域,因此在进行任何信息处理之前都应执行域标识。因此,尝试使用提出的本体模型来识别域。拟议的本体是为火力发电厂行业开发的,并通过示例演示了域识别。提出了统计相似性指标,以将非结构化文本的不同易失特征与本体知识相关联以进行域识别。使用提议的相似性指标评估提议的结果。在有向无环图关系的支持下,使用分类内容提取信息的后续研究正在进行中。该提案的优点在于它能够以独特的目的将其用途扩展到信息处理的多个阶段。拟议的本体是为火力发电厂行业开发的,并通过示例演示了域识别。提出了统计相似性指标,以将非结构化文本的不同易失特征与本体知识相关联以进行域识别。使用提议的相似性指标评估提议的结果。在有向无环图关系的支持下,使用分类内容提取信息的后续研究正在进行中。该提案的优点在于它能够以独特的目的将其用途扩展到信息处理的多个阶段。拟议的本体是为火力发电厂行业开发的,并通过示例演示了域识别。提出了统计相似性指标,以将非结构化文本的不同易失特征与本体知识相关联以进行域识别。使用提议的相似性指标评估提议的结果。在有向无环图关系的支持下,使用分类内容提取信息的后续研究正在进行中。该提案的优点在于它能够以独特的目的将其用途扩展到信息处理的多个阶段。使用提议的相似性指标评估提议的结果。在有向无环图关系的支持下,使用分类内容提取信息的后续研究正在进行中。该提案的优点在于它能够以独特的目的将其用途扩展到信息处理的多个阶段。使用提议的相似性指标评估提议的结果。在有向无环图关系的支持下,使用分类内容提取信息的后续研究正在进行中。该提案的优点在于它能够以独特的目的将其用途扩展到信息处理的多个阶段。
更新日期:2018-09-01
down
wechat
bug