当前位置: X-MOL 学术Library Hi Tech › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The role of news title for linking during preservation process in digital archives
Library Hi Tech ( IF 1.623 ) Pub Date : 2020-11-10 , DOI: 10.1108/lht-07-2020-0157
Muzammil Khan , Sarwar Shah Khan , Arshad Ahmad , Arif Ur Rahman

Purpose

The World Wide Web has become an essential platform for a news publication, and it has become one of the primary sources of information dissemination in the past few years. Electronic media, i.e., television channels, magazines and newspapers, have started publishing news online. This online information is prompt to be disappeared because of short life-span and imperative to be archived for the long-term and future generations. This paper presents a content-based similarity measure based on the headings of the news articles for linking digital news stories published in various newspapers during the preservation process that helps to ensure future accessibility.

Design/methodology/approach

To evaluate the accuracy and assess the effectiveness and worth of the proposed measure for linking news articles in Digital News Story Archive (DNSA), we adopted both, system-centric and user-centric (human judgment) evaluation over different datasets of news articles.

Findings

The proposed similarity measure is evaluated using different sizes of datasets, and the results are compared by both user-centric technique, i.e., expert judgment and system-centric techniques, i.e., cosine similarity measure, extended Jaccard measure and common ratio measure for stories (CRMS). The comparison helps to get a broader impact and can be helpful for generalization of the measure for different categories of news articles. Multiple experiments have conducted the findings of which showed that the measure presented viable results for national and international news, while best results for linking sports news articles during preservation based on headings.

Originality/value

The DNSA preserves a huge number of news articles from multiple news sources and to link with a vast collection, which encourages to introduce an efficient linking mechanism with few terms to manipulate. The CRMS is modified to deal with the headings of news articles as a part of the digital news stories preservation framework and comprehensively analysed.



中文翻译:

新闻标题在数字档案保存过程中的链接作用

目的

万维网已成为新闻发布的重要平台,并在过去几年成为信息传播的主要来源之一。电子媒体,即电视频道、杂志和报纸,已经开始在线发布新闻。这些在线信息由于寿命短而迅速消失,必须为长期和子孙后代存档。本文提出了一种基于新闻文章标题的基于内容的相似性度量,用于在保存过程中链接各种报纸上发布的数字新闻故事,这有助于确保未来的可访问性。

设计/方法/途径

为了评估在数字新闻故事档案 (DNSA) 中链接新闻文章的建议措施的准确性和有效性和价值,我们对不同的新闻文章数据集采用了以系统为中心和以用户为中心(人为判断)的评估。

发现

使用不同大小的数据集评估所提出的相似性度量,并通过以用户为中心的技术(即专家判断)和以系统为中心的技术(即余弦相似性度量、扩展 Jaccard 度量和故事的公比度量)对结果进行比较(管理系统)。这种比较有助于获得更广泛的影响,并有助于对不同类别的新闻文章进行衡量。多项实验的结果表明,该措施为国内和国际新闻提供了可行的结果,而在基于标题的保存过程中链接体育新闻文章的效果最好。

原创性/价值

DNSA 保留了来自多个新闻来源的大量新闻文章,并链接到一个庞大的集合,这鼓励引入一种有效的链接机制,只需很少的术语即可操作。CRMS 被修改为处理新闻文章的标题作为数字新闻故事保存框架的一部分并被综合分析。

更新日期:2020-11-10
down
wechat
bug