当前位置: X-MOL 学术The Electronic Library › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
HerCulB: content-based information extraction and retrieval for cultural heritage of the Balkans
The Electronic Library ( IF 1.675 ) Pub Date : 2020-10-30 , DOI: 10.1108/el-03-2020-0052
Ivana Tanasijević , Gordana Pavlović-Lažetić

The purpose of this paper is to provide a methodology for automatic annotation of a multimedia collection of intangible cultural heritage mostly in the form of interviews. Assigned annotations provide a way to search the collection.,Annotation is based on automatic extraction of metadata and is conducted by named entity and topic extraction from textual descriptions with a rule-based approach supported by vocabulary resources, a compiled domain-specific classification scheme and domain-oriented corpus analysis.,The proposed methodology for automatic annotation of a collection of intangible cultural heritage, applied on the cultural heritage of the Balkans, has very good results according to F measure, which is 0.87 for the named entity and 0.90 for topic annotation. The overall methodology enables encapsulating domain-specific and language-specific knowledge into collections of finite state transducers and allows further improvements.,Although cultural heritage has a significant role in the development of identity of a group or an individual, it is one of those specific domains that have not yet been fully explored in case of many languages. A methodology is proposed that can be used for incorporating natural language processing techniques into digital libraries of cultural heritage.

中文翻译:

HerCulB:基于内容的巴尔干文化遗产信息提取和检索

本文旨在提供一种主要以访谈形式对非物质文化遗产多媒体馆藏进行自动标注的方法。分配的注释提供了一种搜索集合的方法。注释基于元数据的自动提取,由命名实体和主题从文本描述中提取进行,使用词汇资源支持的基于规则的方法,编译的特定于领域的分类方案和面向领域的语料库分析。,所提出的非物质文化遗产集合自动注释方法,应用于巴尔干文化遗产,根据 F 测度获得非常好的结果,命名实体为 0.87,主题为 0.90注解。整体方法能够将特定领域和特定语言的知识封装到有限状态转换器的集合中,并允许进一步改进。,虽然文化遗产在群体或个人身份的发展中具有重要作用,但它是那些特定的知识之一在许多语言的情况下尚未完全探索的领域。提出了一种可用于将自然语言处理技术纳入文化遗产数字图书馆的方法。
更新日期:2020-10-30
down
wechat
bug