当前位置:
X-MOL 学术
›
arXiv.cs.DL
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data
arXiv - CS - Digital Libraries Pub Date : 2020-01-15 , DOI: arxiv-2001.05414 Shuqi Xu, Manuel Sebastian Mariani, Linyuan L\"u, Mat\'u\v{s} Medo
arXiv - CS - Digital Libraries Pub Date : 2020-01-15 , DOI: arxiv-2001.05414 Shuqi Xu, Manuel Sebastian Mariani, Linyuan L\"u, Mat\'u\v{s} Medo
Despite the increasing use of citation-based metrics for research evaluation
purposes, we do not know yet which metrics best deliver on their promise to
gauge the significance of a scientific paper or a patent. We assess 17
network-based metrics by their ability to identify milestone papers and patents
in three large citation datasets. We find that traditional
information-retrieval evaluation metrics are strongly affected by the interplay
between the age distribution of the milestone items and age biases of the
evaluated metrics. Outcomes of these metrics are therefore not representative
of the metrics' ranking ability. We argue in favor of a modified evaluation
procedure that explicitly penalizes biased metrics and allows us to reveal
metrics' performance patterns that are consistent across the datasets. PageRank
and LeaderRank turn out to be the best-performing ranking metrics when their
age bias is suppressed by a simple transformation of the scores that they
produce, whereas other popular metrics, including citation count, HITS and
Collective Influence, produce significantly worse ranking results.
中文翻译:
排名指标的公正评估揭示了科学和技术引文数据的一致表现
尽管越来越多地将基于引文的指标用于研究评估目的,但我们尚不知道哪些指标最能兑现其衡量科学论文或专利重要性的承诺。我们通过在三个大型引文数据集中识别里程碑论文和专利的能力来评估 17 个基于网络的指标。我们发现传统的信息检索评估指标受到里程碑项目的年龄分布与评估指标的年龄偏差之间的相互作用的强烈影响。因此,这些指标的结果并不代表指标的排名能力。我们赞成修改后的评估程序,该程序明确惩罚有偏见的指标,并允许我们揭示在整个数据集中一致的指标性能模式。
更新日期:2020-07-10
中文翻译:
排名指标的公正评估揭示了科学和技术引文数据的一致表现
尽管越来越多地将基于引文的指标用于研究评估目的,但我们尚不知道哪些指标最能兑现其衡量科学论文或专利重要性的承诺。我们通过在三个大型引文数据集中识别里程碑论文和专利的能力来评估 17 个基于网络的指标。我们发现传统的信息检索评估指标受到里程碑项目的年龄分布与评估指标的年龄偏差之间的相互作用的强烈影响。因此,这些指标的结果并不代表指标的排名能力。我们赞成修改后的评估程序,该程序明确惩罚有偏见的指标,并允许我们揭示在整个数据集中一致的指标性能模式。