当前位置: X-MOL 学术Empir. Software Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Standing on shoulders or feet? An extended study on the usage of the MSR data papers
Empirical Software Engineering ( IF 3.5 ) Pub Date : 2020-07-18 , DOI: 10.1007/s10664-020-09834-7
Zoe Kotti , Konstantinos Kravvaritis , Konstantina Dritsa , Diomidis Spinellis

The establishment of the Mining Software Repositories (MSR) data showcase conference track has encouraged researchers to provide data sets as a basis for further empirical studies. The objective of this study is to examine the usage of data papers published in the MSR proceedings in terms of use frequency, users, and use purpose. Data track papers were collected from the MSR data showcase track and through the manual inspection of older MSR proceedings. The use of data papers was established through manual citation searching followed by reading the citing studies and dividing them into strong and weak citations. Contrary to weak, strong citations truly use the data set of a data paper. Data papers were then manually clustered based on their content, whereas their strong citations were classified by hand according to the knowledge areas of the Guide to the Software Engineering Body of Knowledge. A survey study on 108 authors and users of data papers provided further insights regarding motivation and effort in data paper production, encouraging and discouraging factors in data set use, and future desired direction regarding data papers. We found that 65% of the data papers have been used in other studies, with a long-tail distribution in the number of strong citations. Weak citations to data papers usually refer to them as an example. MSR data papers are cited in total less than other MSR papers. A considerable number of the strong citations stem from the teams that authored the data papers. Publications providing Version Control System (VCS) primary and derived data are the most frequent data papers and the most often strongly cited ones. Enhanced developer data papers are the least common ones, and the second least frequently strongly cited. Data paper authors tend to gather data in the context of other research. Users of data sets appreciate high data quality and are discouraged by lack of replicability of data set construction. Data related to machine learning or derived from the manufacturing sector are two suggestions of the respondents for future data papers. Overall, data papers have provided the foundation for a significant number of studies, but there is room for improvement in their utilization. This can be done by setting a higher bar for their publication, by encouraging their use, by promoting open science initiatives, and by providing incentives for the enrichment of existing data collections.

中文翻译:

站在肩膀上还是脚上?MSR数据文件使用的扩展研究

Mining Software Repositories (MSR) 数据展示会议轨道的建立鼓励研究人员提供数据集作为进一步实证研究的基础。本研究的目的是检查 MSR 会议记录中发表的数据论文在使用频率、用户和使用目的方面的使用情况。数据轨道文件是从 MSR 数据展示轨道和通过对旧 MSR 程序的手动检查收集的。数据论文的使用是通过手动引文搜索,然后阅读引文研究并将其分为强引文和弱引文来建立的。与弱引用相反,强引用真正使用数据论文的数据集。然后根据其内容手动对数据论文进行聚类,而他们的强引用是根据软件工程知识体系指南的知识领域手工分类的。对 108 位数据论文作者和用户的调查研究提供了关于数据论文生产的动机和努力、数据集使用中的鼓励和阻碍因素以及未来数据论文的预期方向的进一步见解。我们发现 65% 的数据论文已用于其他研究,强引用次数呈长尾分布。对数据论文的弱引用通常以它们为例。MSR 数据论文的总引用次数少于其他 MSR 论文。相当多的强引用来自撰写数据论文的团队。提供版本控制系统 (VCS) 主要和派生数据的出版物是最常见的数据论文,也是最常被引用的论文。增强型开发人员数据论文是最不常见的论文,其次是最不经常被引用的论文。数据论文作者倾向于在其他研究的背景下收集数据。数据集的用户欣赏高数据质量,并因数据集构建缺乏可复制性而气馁。与机器学习相关或源自制造业的数据是受访者对未来数据论文的两个建议。总体而言,数据论文为大量研究提供了基础,但它们的利用仍有改进的余地。这可以通过为它们的出版设定更高的标准、鼓励它们的使用、促进开放科学倡议、
更新日期:2020-07-18
down
wechat
bug