CodeLabeller: A Web-based Code Annotation Tool for Java Design Patterns and Summaries,arXiv - CS - Software Engineering

当前位置： X-MOL 学术 › arXiv.cs.SE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CodeLabeller: A Web-based Code Annotation Tool for Java Design Patterns and Summaries
arXiv - CS - Software Engineering Pub Date : 2021-06-14 , DOI: arxiv-2106.07513
Norman Chen, Najam Nazar, Chun Yong Chong

The appropriate use of design patterns in code is a vital measurement of good software quality in object-oriented software applications. There exist tools to detect design pattern usage in Java source files, where their detection mechanisms have been honed through the use of supervised machine learning techniques that require large datasets of labelled files. However, manually labelling these files leads to issues such as tediousness if the team of labellers is small, and conflicting opinions between labellers, if large. Thus, we present CodeLabeller, a web-based tool which aims to provide a more efficient approach in handling the process of labelling Java source files at scale by improving the data collection process throughout, and improving the degree of reliability of responses by requiring each labeller to attach a confidence rating to each of their responses. We test CodeLabeller by constructing a corpus of over a thousand source files obtained from a large collection of open-source Java projects, and labelling each Java source file with their respective design patterns (if any), and summaries. This paper discusses the motivation behind thecreation of CodeLabeller, a demonstration of the tool and its UI, its implementation, benefits and lastly, some ideas for future improvements. A demo version of CodeLabeller can be found at: https://codelabeller.org.

中文翻译：

CodeLabeller：用于 Java 设计模式和摘要的基于 Web 的代码注释工具

在代码中适当使用设计模式是衡量面向对象软件应用程序中良好软件质量的重要指标。有一些工具可以检测 Java 源文件中的设计模式使用情况，它们的检测机制已经通过使用需要大量标记文件数据集的监督机器学习技术来磨练。但是，手动标记这些文件会导致一些问题，例如，如果标记者团队很小，则会导致乏味，而如果标记者团队规模很大，则会导致标记者之间的意见冲突。因此，我们提出了 CodeLabeller，这是一种基于 Web 的工具，旨在通过改进整个数据收集过程，提供一种更有效的方法来处理大规模标记 Java 源文件的过程，并通过要求每个贴标者为其每个响应附加一个置信度等级来提高响应的可靠性程度。我们通过构建从大量开源 Java 项目中获得的一千多个源文件的语料库来测试 CodeLabeller，并用各自的设计模式（如果有）和摘要标记每个 Java 源文件。本文讨论了创建 CodeLabeller 背后的动机、该工具及其 UI 的演示、它的实现、好处，最后是对未来改进的一些想法。CodeLabeller 的演示版可在以下网址找到：https://codelabeller.org。并用各自的设计模式（如果有）和摘要标记每个 Java 源文件。本文讨论了创建 CodeLabeller 背后的动机、该工具及其 UI 的演示、它的实现、好处，最后是对未来改进的一些想法。CodeLabeller 的演示版可在以下网址找到：https://codelabeller.org。并用各自的设计模式（如果有）和摘要标记每个 Java 源文件。本文讨论了创建 CodeLabeller 背后的动机、该工具及其 UI 的演示、它的实现、好处，最后是对未来改进的一些想法。CodeLabeller 的演示版可在以下网址找到：https://codelabeller.org。

更新日期：2021-06-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>