Automatic and Accurate Expansion of Abbreviations in Parameters,IEEE Transactions on Software Engineering

当前位置： X-MOL 学术 › IEEE Trans. Softw. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automatic and Accurate Expansion of Abbreviations in Parameters
IEEE Transactions on Software Engineering ( IF 6.5 ) Pub Date : 2020-07-01 , DOI: 10.1109/tse.2018.2868762
Yanjie Jiang , Hui Liu , Jiaqi Zhu , Lu Zhang

Abbreviations are widely used in identifiers. However, they have severe negative impact on program comprehension and IR-based software maintenance activities, e.g., concept location, software clustering, and recovery of traceability links. Consequently, a number of efficient approaches have been proposed successfully to expand abbreviations in identifiers. Most of such approaches rely heavily on dictionaries, and rarely exploit the specific and fine-grained context of identifiers. As a result, such approaches are less accurate in expanding abbreviations (especially short ones) that may match multiple dictionary words. To this end, in this paper we propose an automatic approach to improve the accuracy of abbreviation expansion by exploiting the specific and fine-grained context. It focuses on a special but common category of abbreviations (abbreviations in parameter names), and thus it can exploit the specific and fine-grained context, i.e., the type of the enclosing parameter as well the corresponding formal (or actual) parameter name. The recent empirical study on parameters suggest that actual parameters are often lexically similar to their corresponding formal parameters. Consequently, it is likely that an abbreviation in a formal parameter can find its full terms in the corresponding actual parameter, and vice versa. Based on this assumption, a series of heuristics are proposed to look for full terms from the corresponding actual (or formal) parameter names. To the best of our knowledge, we are the first to expand abbreviations by exploiting the lexical similarity between actual and formal parameters. We also search for full terms in the data type of the enclosing parameter. Only if all such heuristics fail, the approach turns to the traditional abbreviation dictionaries. We evaluate the proposed approach on seven well known open-source projects. Evaluation results suggest that when only parameter abbreviations are involved, the proposed approach can improve the precision from 26 to 95 percent and recall from 26 to 65 percent compared against the state-of-the-art general purpose approach. Consequently, the proposed approach could be employed as a useful supplement to existing approaches to expand parameter abbreviations.

中文翻译：

参数缩略语自动准确扩展

缩写在标识符中被广泛使用。然而，它们对程序理解和基于 IR 的软件维护活动产生了严重的负面影响，例如，概念定位、软件集群和可追溯性链接的恢复。因此，已经成功地提出了许多有效的方法来扩展标识符中的缩写。大多数此类方法严重依赖字典，很少利用标识符的特定和细粒度上下文。因此，这种方法在扩展可能匹配多个字典单词的缩写（尤其是短缩写）时不太准确。为此，在本文中，我们提出了一种自动方法，通过利用特定和细粒度的上下文来提高缩写扩展的准确性。它专注于一个特殊但常见的缩写类别（参数名称中的缩写），因此它可以利用特定的细粒度上下文，即封闭参数的类型以及相应的形式（或实际）参数名称。最近对参数的实证研究表明，实际参数通常与其相应的形式参数在词汇上相似。因此，形式参数中的缩写很可能可以在相应的实际参数中找到其完整项，反之亦然。基于此假设，提出了一系列启发式方法，以从相应的实际（或形式）参数名称中查找完整项。据我们所知，我们是第一个通过利用实际参数和形式参数之间的词汇相似性来扩展缩写的。我们还在封闭参数的数据类型中搜索完整术语。只有当所有这些启发式方法都失败时，该方法才会转向传统的缩写词典。我们在七个著名的开源项目上评估了提议的方法。评估结果表明，当仅涉及参数缩写时，与最先进的通用方法相比，所提出的方法可以将准确率从 26% 提高到 95%，将召回率从 26% 提高到 65%。因此，所提出的方法可以用作现有方法的有用补充，以扩展参数缩写。我们在七个著名的开源项目上评估了提议的方法。评估结果表明，当仅涉及参数缩写时，与最先进的通用方法相比，所提出的方法可以将准确率从 26% 提高到 95%，将召回率从 26% 提高到 65%。因此，所提出的方法可以用作现有方法的有用补充，以扩展参数缩写。我们在七个著名的开源项目上评估了提议的方法。评估结果表明，当仅涉及参数缩写时，与最先进的通用方法相比，所提出的方法可以将准确率从 26% 提高到 95%，将召回率从 26% 提高到 65%。因此，所提出的方法可以用作现有方法的有用补充，以扩展参数缩写。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11