当前位置: X-MOL 学术Expert Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Special section on mining knowledge from scientific data
Expert Systems ( IF 3.3 ) Pub Date : 2021-05-27 , DOI: 10.1111/exsy.12710
Tanmoy Chakraborty 1 , Sumit Bhatia 2 , Cornelia Caragea 3
Affiliation  

The past two decades have witnessed the rapid growth of scientific publications in all areas of research. Easier access to published literature (open access, arxiv preprints, etc.) along with the recent development in computational methods, has provided researchers with a productive platform to study vast amounts of scholarly data. Scholarly data mining has thus made it possible to do “research about research!” It plays a vital role in scientometrics, bibliometrics, webometrics, and altmetrics, which require applying sophisticated algorithms to curate and derive useful insights from scholarly data. Moreover, the knowledge extracted from the scientific data can help in several decision-making processes such as policy making for fund disbursement, identifying research gap in a department and recruiting faculties to fill up the gap, speculating upcoming research areas, etc. On the other hand, the increasing popularity and use of these metrics as a measure of the quality of research output, for determining university rankings, and in decision making (tenure and recruitment decisions), has also given rise to objectionable practices to artificially boost these measures (self-citations, citation-cliques, etc.). Given that, is it always right to consider these metrics as a reliable proxy of research quality? How should decision and policymakers use these metrics to account for such malpractices?

This special section aims to bring together the latest groundbreaking research on issues related to knowledge extraction and deriving insights from scientific data. Of special interest is the role these metrics play in policy and decision making – both positives and negatives. We welcomed both theoretical and empirical research, and case studies that lead to the development of novel algorithms, tool, techniques, metrics, decisions and measurements related to scholarly data. A total of 12 papers were submitted to this special section. All the submitted papers underwent a rigorous review process. Each paper was reviewed by at least two reviewers. Every paper went through at least two rounds of revisions. The contributions of the accepted papers are briefly summarized below.

Outlier detection in data mining is a major research agenda. However, when it comes to scientometric research, such outliers indicate malpractices in scientific research, resulting in issues such as citation cartels and citation stacking. Chakraborty et al. (2020) defined a diverse feature set that can identify such cases of extreme outliers and reasoned them. They also showed the effect of such outlier behaviour on the bibliographic metrics such as h-index and impact factor.

Madisetty et al. (2020) proposed a tool to extract inline mathematical expressions from scientific articles. This is a major problem in scientific document processing as mathematic systems often act as a bottleneck due to their cryptic symbols that a parse is unable to extract. The authors proposed two models - the first one uses conditional random field using hand-crafted features and the second one uses bidirectional LSTM. This work contributes to building a real-world tool or can act as a plugin of a scientific document parser.

Document classification is always an important problem in text processing. In scientific data mining, document classification is needed to categorize scientific papers into topics, keywords, etc. Masmoudi et al. (2020) proposed a novel hierarchical document classification approach using limited labelled data. They utilized the co-training paradigm to exploit content and bibliographic coupling information as two distinct papers' views. A massive unlabelled data was utilized during co-training.

We hope that this special section will provide insights, analysis, and understanding about “scientific research” and help in building the foundation for future research and development in scientometrics.

We sincerely thank the Editor-in-Chief of this journal, Dr Jon G. Hall for accepting our request to organize the special section with the Expert Systems journal. We would also like to thank the entire editorial team of the journal, the authors who submitted their valuable research to this special section and the reviews for their support for timely evaluations and comments.



中文翻译:

从科学数据中挖掘知识的特别部分

在过去的二十年中,科学出版物在所有研究领域都得到了迅速发展。较容易获得已发表的文献(开放获取,arxiv预印本等),以及计算机方法的最新发展,为研究人员提供了一个研究大量学术数据的高效平台。因此,学术数据挖掘使“研究研究”成为可能。它在科学计量学,文献计量学,网络计量学和高度计量学中起着至关重要的作用,这要求应用复杂的算法来策划和从学术数据中得出有用的见解。此外,从科学数据中提取的知识还可以帮助进行多个决策过程,例如制定资金支出政策,确定部门的研究缺口以及招募教职人员来填补缺口,用于确定大学排名和决策(任期和招聘决定)的研究输出质量,也引起了令人反感的做法,人为地提高了这些衡量标准(自我引文,引文引用等)。鉴于此,将这些指标视为研究质量的可靠代表总是正确的吗?决策者和决策者应如何使用这些指标来解决此类不当行为?

本专题旨在汇集有关知识提取和从科学数据中得出见解的问题的最新突破性研究。这些度量标准在政策和决策制定中所起的作用(正反两方面都特别令人感兴趣)。我们欢迎理论和实证研究以及案例研究,这些案例导致了与学术数据相关的新颖算法,工具,技术,度量,决策和度量的发展。共有12篇论文提交到了这个特殊部分。所有提交的论文都经过了严格的审查过程。每篇论文至少由两名审稿人审阅。每篇论文都经过了至少两轮修订。接受论文的贡献简述如下。

数据挖掘中的异常值检测是一个主要的研究议程。但是,在进行科学计量学研究时,这些异常值表明科学研究中存在不当行为,从而导致了诸如引用卡特尔和引用堆积之类的问题。Chakraborty等。(2020)定义了一个多样化的功能集,可以识别极端离群值的情况并对其进行推理。他们还显示了这种异常行为对书目度量标准(例如h指数和影响因子)的影响。

Madisetty等。(2020)提出了一种从科学文章中提取内联数学表达式的工具。这是科学文档处理中的一个主要问题,因为数学系统由于解析无法提取的神秘符号而经常成为瓶颈。作者提出了两个模型-第一个模型使用手工制作的功能使用条件随机场,第二个模型使用双向LSTM。这项工作有助于构建现实世界的工具,或者可以充当科学文档解析器的插件。

文档分类始终是文本处理中的重要问题。在科学数据挖掘中,需要文档分类以将科学论文分类为主题,关键词等。(2020)提出了一种使用有限标签数据的新颖的分层文档分类方法。他们利用共同训练范式来利用内容和书目耦合信息作为两篇不同的论文的观点。在联合训练过程中利用了大量的未标记数据。

我们希望这个特殊的部分将提供有关“科学研究”的见识,分析和理解,并有助于为科学计量学的未来研究和发展奠定基础。

我们衷心感谢该期刊的总编辑Jon G. Hall博士接受了我们与“专家系统”期刊一起组织这一特别版块的要求。我们还要感谢该期刊的整个编辑团队,向这一特别部分提交了宝贵研究成果的作者,以及对及时评估和评论提供支持的评论。

更新日期:2021-05-27
down
wechat
bug