On the generation, structure, and semantics of grammar patterns in source code identifiers,Journal of Systems and Software

当前位置： X-MOL 学术 › J. Syst. Softw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On the generation, structure, and semantics of grammar patterns in source code identifiers
Journal of Systems and Software ( IF 3.7 ) Pub Date : 2020-12-01 , DOI: 10.1016/j.jss.2020.110740
Christian D. Newman , Reem S. AlSuhaibani , Michael J. Decker , Anthony Peruma , Dishant Kaushik , Mohamed Wiem Mkaouer , Emily Hill

Identifiers make up a majority of the text in code. They are one of the most basic mediums through which developers describe the code they create and understand the code that others create. Therefore, understanding the patterns latent in identifier naming practices and how accurately we are able to automatically model these patterns is vital if researchers are to support developers and automated analysis approaches in comprehending and creating identifiers correctly and optimally. This paper investigates identifiers by studying sequences of part-of-speech annotations, referred to as grammar patterns. This work advances our understanding of these patterns and our ability to model them by 1) establishing common naming patterns in different types of identifiers, such as class and attribute names; 2) analyzing how different patterns influence comprehension; and 3) studying the accuracy of state-of-the-art techniques for part-of-speech annotations, which are vital in automatically modeling identifier naming patterns, in order to establish their limits and paths toward improvement. To do this, we manually annotate a dataset of 1,335 identifiers from 20 open-source systems and use this dataset to study naming patterns, semantics, and tagger accuracy.

中文翻译：

关于源代码标识符中语法模式的生成、结构和语义

标识符构成了代码中的大部分文本。它们是开发人员描述他们创建的代码并理解其他人创建的代码的最基本的媒介之一。因此，如果研究人员要支持开发人员和自动分析方法正确和最佳地理解和创建标识符，那么了解标识符命名实践中的潜在模式以及我们能够自动对这些模式进行建模的准确程度至关重要。本文通过研究词性注释序列（称为语法模式）来研究标识符。这项工作促进了我们对这些模式的理解以及我们通过以下方式对它们进行建模的能力：1) 在不同类型的标识符中建立通用命名模式，例如类和属性名称；2）分析不同的模式如何影响理解；3) 研究最先进的词性标注技术的准确性，这对于自动建模标识符命名模式至关重要，以确定它们的局限性和改进途径。为此，我们手动注释了来自 20 个开源系统的 1,335 个标识符的数据集，并使用该数据集来研究命名模式、语义和标记器准确性。

更新日期：2020-12-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11