当前位置: X-MOL 学术Inf. Softw. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prioritizing code documentation effort: Can we do it simpler but better?
Information and Software Technology ( IF 3.8 ) Pub Date : 2021-07-13 , DOI: 10.1016/j.infsof.2021.106686
Shiran Liu 1 , Zhaoqiang Guo 1 , Yanhui Li 1 , Hongmin Lu 1 , Lin Chen 1 , Lei Xu 1 , Yuming Zhou 1 , Baowen Xu 1
Affiliation  

Context

. Due to time or economic pressures, code developers are often unable to write documents for all modules in a project. Recently, a supervised artificial neural network (ANN) approach is proposed to prioritize documentation effort “to ensure that sections of code important to program comprehension are thoroughly explained”.

Objective

. However, as a supervised approach, there is a need to use labeled training data to train the prediction model, which may not easy to obtain in practice. Furthermore, it is unclear whether the ANN approach is generalizable, as it is only evaluated on several small data sets collected from API libraries.

Method

. In this paper, we propose an unsupervised approach based on improved PageRank to prioritize documentation effort. This approach identifies “important” modules only based on the dependence relationships between modules in a project. As a result, the PageRank approach does not need any training data to build the prediction model.

Results

. In order to evaluate the effectiveness of the PageRank approach, we use six additional large data sets collected from two larger libraries and four applications to conduct the experiment. The experimental results show that the PageRank approach is superior to the state-of-the-art ANN approach.

Conclusion

. Due to the simplicity and effectiveness, we advocate that the PageRank approach should be used as an easy-to-implement baseline in future research on documentation effort prioritization, and any newly proposed approach should be compared with it to demonstrate its effectiveness.



中文翻译:

优先考虑代码文档工作:我们可以做得更简单但更好吗?

语境

. 由于时间或经济压力,代码开发人员往往无法为项目中的所有模块编写文档。最近,提出了一种受监督的人工神经网络 (ANN) 方法来优先考虑文档工作,“以确保对程序理解重要的代码部分得到彻底解释”。

客观的

. 然而,作为一种有监督的方法,需要使用标记的训练数据来训练预测模型,这在实践中可能并不容易获得。此外,尚不清楚 ANN 方法是否可推广,因为它仅在从 API 库收集的几个小数据集上进行评估。

方法

. 在本文中,我们提出了一种基于改进的 PageRank 的无监督方法来优先考虑文档工作。这种方法仅根据项目中模块之间的依赖关系来识别“重要”模块。因此,PageRank 方法不需要任何训练数据来构建预测模型。

结果

. 为了评估 PageRank 方法的有效性,我们使用从两个较大的库和四个应用程序收集的六个额外的大型数据集来进行实验。实验结果表明,PageRank 方法优于最先进的 ANN 方法。

结论

. 由于其简单性和有效性,我们主张将 PageRank 方法用作未来文档工作优先级研究中易于实施的基线,并且应将任何新提出的方法与其进行比较以证明其有效性。

更新日期:2021-07-19
down
wechat
bug