Abstract
The problem of code generation from textual program descriptions has long been viewed as a grand challenge in software engineering. In recent years, many deep learning based approaches have been proposed, which can generate a sequence of code from a sequence of textual program description. However, the existing approaches ignore the global relationships among API methods, which are important for understanding the usage of APIs. In this paper, we propose to model the dependencies among API methods as an API dependency graph (ADG) and incorporate the graph embedding into a sequence-to-sequence (Seq2Seq) model. In addition to the existing encoder-decoder structure, a new module named “embedder” is introduced. In this way, the decoder can utilize both global structural dependencies and textual program description to predict the target code. We conduct extensive code generation experiments on three public datasets and in two programming languages (Python and Java). Our proposed approach, called ADG-Seq2Seq, yields significant improvements over existing state-of-the-art methods and maintains its performance as the length of the target code increases. Extensive ablation tests show that the proposed ADG embedding is effective and outperforms the baselines.
Similar content being viewed by others
Notes
A textual program description can be a natural language description of requirements or a structured specification.
Specifically, we resort to Spoon (http://spoon.gforge.inria.fr/) and Javassist (http://www.javassist.org/) for Java-based programs’ analysis.
An instance of the parameter type is provided by its predecessor method.
The pooling we used followed the method of GraphSAGE [27], namely, max pooling.
As presented in the experimental results, we concatenate the name of each dataset and the code length level to indicate a split dataset. Taking two items of HS as examples, HS-30 consists of the data with code length between 0 and 30, while the code length of HS-60 ranges from 50 to 60.
The Inspur Company is a leading cloud computing and big data service provider in China. More information is available at https://en.inspur.com/en/about_inspur/index.html.
Most of the replication. results are consistent with those reported in the original paper, and a few (BLEU scores for ASN and SNM) are slightly lower (the difference is about 0.8% in BLEU). Considering the stochastic nature of deep learning, the differences in these results are acceptable. We choose to present the original results for each method in the paper, which help to maintain consistency with the results reported by the related studies (Yin and Neubig 2017; Rabinovich et al. 2017; Sun et al. 2019; Sun et al. 2020).
References
Aho AV, Ganapathi M, Tjiang SWK (1989) Code generation using tree matching and dynamic programming. ACM Trans Program Lang Syst (TOPLAS) 11(4):491–516
Allamanis M, Brockschmidt M, Khademi M (2017) Learning to represent programs with graphs. arXiv:1711.00740
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd international conference on learning representations, p 2015
Brockschmidt M, Allamanis M, Gaunt AL, Polozov O (2018) Generative code modeling with graphs. arXiv:1805.08490
Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1724–1734
Costa-jussà MR, Fonollosa JAR (2016) Character-based neural machine translation. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 2:, Short Papers), pp 357–361
Dong L, Lapata M (2016) Language to logical form with neural attention. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1:, Long Papers), pp 33–43
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. In: Proceedings of the 34th international conference on machine learning-Volume 70. JMLR.org, pp 1243–1252
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256
Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp 855–864
Gu X, Zhang H, Kim S (2019) Codekernel: A graph kernel based approach to the selection of api usage examples. In: 2019 34th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 590–601
Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: Advances in neural information processing systems, pp 1024–1034
Hayati SA, Olivier R, Avvaru P, Yin P, Tomasic A, Neubig G (2018) Retrieval-based neural code generation. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 925–930
Hu X, Ge L, Xia X, Lo D, Jin Z (2018) Deep code comment generation. In: Proceedings of the 26th conference on program comprehension, pp 200–210
Hu X, Li G, Xia X, Lo D, Jin Z (2018) Summarizing source code with transferred api knowledge. In: Twenty-seventh international joint conference on artificial intelligence IJCAI-18
Hu X, Li G, Xia X, Lo D, Jin Z (2020) Deep code comment generation with hybrid lexical and syntactical information. Empir Softw Eng 25(3):2179–2217
Huang P-Y, Liu F, Shiang S-R, Jean O, Chris D (2016) Attention-based multimodal neural machine translation. In: Proceedings of the first conference on machine translation: Volume 2, Shared Task Papers, pp 639–645
Isozaki H, Hirao T, Duh K, Sudoh K, Hajime T (2010) Automatic evaluation of translation quality for distant language pairs. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, p 2010
Kalchbrenner N, Danihelka I, Graves A (2015) Grid long short-term memory. arXiv:1507.01526
Kingma DP, Adam Ba J (2014) A method for stochastic optimization. arXiv:1412.6980
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv:1609.02907
Li Y, Chenjie G, Dullien T, Vinyals O, Kohli P (2019) Graph matching networks for learning the similarity of graph structured objects. In: International conference on machine learning, pp 3835–3845
Li Y, Tarlow D, Brockschmidt M, Zemel R (2015) Gated graph sequence neural networks. arXiv:1511.05493
Lin Chin-Yew, Cao Guihong, Gao Jianfeng, Nie J-Y (2006) An information-theoretic approach to automatic evaluation of summaries. In: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. Association for Computational Linguistics, pp 463–470
Ling W, Blunsom P, Grefenstette E, Hermann KM, Kočiskỳ T, Wang F, Senior A (2016) Latent predictor networks for code generation. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1:, Long Papers), pp 599–609
Liu Z, Xia X, Treude C, Lo D, Li S (2019) Automatic generation of pull request descriptions. In: 2019 34th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 176–188
Luong M-T, Le QV, Sutskever I, Vinyals O, Kaiser L (2015) Multi-task sequence to sequence learning. arXiv:1511.06114
Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1412–1421
Mnih V, Heess N, Graves A, Kavukcuoglu K (2014) Recurrent models of visual attention. In: Proceedings of the 27th international conference on neural information processing systems - Volume 2, NIPS’14. MIT Press, Cambridge, pp 2204–2212
Mou L, Li G, Zhang L, Wang T, Jin Z (2016) Convolutional neural networks over tree structures for programming language processing. In: Thirtieth AAAI conference on artificial intelligence
Mou L, Men R, Li G, Zhang L, Jin Z (2015) On end-to-end program generation from user intention by deep neural networks. arXiv:1510.07211
Murali V, Qi L, Chaudhuri S, Jermaine C (2017) Neural sketch learning for conditional program generation. arXiv:1703.05698
Neubig G (2015) lamtram: A toolkit for language and translation modeling using neural networks
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: A method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 311–318
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 701–710
Phan AV, Nguyen ML, Bui LT (2017) Convolutional neural networks over control flow graphs for software defect prediction. In: IEEE 29th international conference on tools with artificial intelligence (ICTAI). IEEE, p 2017
Quirk C, Mooney R, Galley M (2015) Language to code: Learning semantic parsers for if-this-then-that recipes. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1:, Long Papers), pp 878–888
Rabinovich M, Stern M, Klein D (2017) Abstract syntax networks for code generation and semantic parsing. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1:, Long Papers), pp 1139–1149
Satter A, Kazi S (2017) A similarity-based method retrieval technique to improve effectiveness in code search. In: Companion to the first international conference on the art, science and engineering of programming, pp 1–3
Scarselli F, Gori M, Chung TA, Hagenbuchner M, Monfardini G (2008) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80
Shiv V, Quirk C (2019) Novel positional encodings to enable tree-based transformers. In: Advances in neural information processing systems, pp 12081–12091
Sim J, Wright CC (2005) The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys Ther 85(3):257–268
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: A simple way to prevent neural networks from overfitting. J Machine Learn Res 15(1):1929–1958
Sun Z, Zhu Q, Mou L, Xiong Y, Ge L, Zhang L (2019) A grammar-based structural cnn decoder for code generation. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 7055–7062
Sun Z, Zhu Q, Xiong Y, Sun Y, Mou L, Zhang L (2020) Treegen: A tree-based transformer architecture for code generation. In: AAAI 2020 : The Thirty-fourth AAAI conference on artificial intelligence, vol 34, pp 8984–8991
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems. Curran Associates Inc, New York, pp 6000–6010
Vedantam R, Lawrence ZC, Parikh D (2015) Cider: Consensus-based image description evaluation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4566–4575
Veličković P, Cucurull G, Casanova A, Romero A, lio P, Bengio Y (2017) Graph attention networks. arXiv:1710.10903
Wan Y, Shu J, Sui Y, Xu G, Zhao Z, Wu J, Yu P (2019) Multi-modal attention network learning for semantic source code retrieval. In: 2019 34th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 13–25
Wang W, Ge L, Bo M, Xia X, Zhi J (2020) Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In: IEEE 27th international conference on software analysis, evolution and reengineering (SANER). IEEE, p 2020
Wang K, Singh R, Su Z (2017) Dynamic neural program embedding for program repair. arXiv:1711.07163
Wei B, Li G, Xia X, Fu Z, Jin Z (2019) Code generation as a dual task of code summarization. In: Advances in neural information processing systems, pp 6559–6569
Williams RJ, Zipser D (1989) A learning algorithm for continually running fully recurrent neural networks. Neural Comput 1(2):270–280
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
Yin P, Neubig G (2017) A syntactic neural model for general-purpose code generation. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1:, Long Papers), pp 440–450
Zhang J, Wang M, Liu Q, Zhou J (2017) Incorporating word reordering knowledge into attention-based neural machine translation. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1:, Long Papers), pp 1524–1534
Zhang J, Wang X, Zhang H, Sun H, Liu X (2020) Retrieval-based neural source code summarization. In: Proceedings of the 42nd international conference on software engineering. IEEE
Zhang J, Wang X, Zhang H, Sun H, Wang K, Xudong L (2019) A novel neural source code representation based on abstract syntax tree. In: IEEE/ACM 41st international conference on software engineering (ICSE). IEEE, p 2019
Zhou K, Dong Y, Lee WS, Hooi B, Xu H, Feng J (2020) Effective training strategies for deep graph neural networks. arXiv:2006.07107
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61602286 and 61976127, in part by the Shandong Key Research and Development Program under Grant 2018GGX101003, and in part by the Shandong Province Higher Educational Science and Technology Program under Grant J16LN09.
Author information
Authors and Affiliations
Contributions
Chen Lyu conceived and designed the study. Ruyun Wang and Hanwen Zhang performed the experiments. Chen Lyu and Hanwen Zhang wrote the paper. Hongyu Zhang encouraged Chen Lyu to investigate the API dependency graph and supervised the findings of this work. Hongyu Zhang and Songlin Hu reviewed and edited the manuscript. All authors discussed the results and contributed to the final manuscript.
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Communicated by: Martin Monperrus
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: More Details about the Experimental Data and Settings Used in Comparisons
Appendix A: More Details about the Experimental Data and Settings Used in Comparisons
In Section 7.1, we describe our experimental data. For models with existing experimental data, we use the experimental results presented in the original paperFootnote 10 For those without, we retrain the model with the default parameters provided in the original paper and calculate the experimental results under different metrics.
More specifically, we compared six models: the attention-based Seq2Seq model (Neubig 2015) and the Transformer model (Vaswani et al. 2017) were used as baselines; and SNM (Yin and Neubig 2017), ASN (Rabinovich et al. 2017), GB-CNN (Sun et al. 2019), and TreeGen (Sun et al. 2020) were used as the state-of-the-art (SOTA) methods.
All evaluation results of the baselines (the attention-based Seq2Seq model and the Transformer model) were calculated by training the models on HS, MTG and E-JDT datasets (see Table 4 and Fig. 9), and on the split datasets (see Figs. 12 and 13).
In the original publications of the four SOTA methods (SNM, ASN, GB-CNN, and TreeGen) (Yin and Neubig 2017; Rabinovich et al. 2017; Sun et al. 2019; Sun et al. 2020), HS was chosen as one of the experimental datasets, which is also used in our experiments. The scores of BLEU and Acc metrics on HS in Table 4 and Fig. 9 were obtained directly from the respective papers (see Table 4 and Fig. 9a–b). Since the papers describing SNM, ASN, GB-CNN, and TreeGen did not provide scores apart from Acc and BLEU, we have to retrain these models to obtain the remaining metric scores (see Table 4 HS: F1-RIBES). We use the same hyperparameter values as those described in the respective paper. As the authors of SNM provided the trained model, we calculated the scores on HS by using their trained model.
For datasets MTG, E-JDT, and the split datasets, since the papers describing the SOTA methods did not provide any evaluation scores, we retrained their models based on the hyperparameters provided in the original papers, and calculated scores for all metrics on these datasets (see Table 4 and Fig. 9i–x, Figs. 12 and 13).
In detail, the experimental data and settings for each model are as follows:
-
1.
All evaluation results of the attention-based Seq2Seq model and the Transformer model are calculated by us. The attention-based Seq2Seq model is our basic model. The Transformer is a well-known NMT model in the field of NLP, and the relevant code is available at https://github.com/jadore801120/attention-is-all-you-need-pytorch.
-
2.
The SNM model was proposed by Yin et al. (Yin and Neubig 2017) and its code is available at https://github.com/pcyin/NL2code. We use the ACC and BLEU values from the paper and calculate other metrics by testing the results on HS from the trained model they provided. For MTG and E-JDT, all metrics’ values are determined by retraining the model. The size of all embeddings is 128, except for node type embeddings, the size of which is 64. Dimensions of RNN states and hidden layers are 256 and 50, respectively.
-
3.
The ASN model was proposed by Rabinovich et al. (2017). This paper does not provide the source code for the implementation, but since this study has had a large impact, we have been able to ascertain the ASN code implemented by other scholars via PyTorch at https://github.com/xiye17/torchASN. Hence, the Acc and BLEU scores were copied from the cited paper. Other scores on HS and scores on MTG and E-JDT were computed by us by retraining the model. For each experiment, all feedforward and LSTM hidden dimensions were set to the same value. We selected the dimension from {50, 75, 100, 125, 150} for all datasets. The dimensionality used for the inputs to the encoder was set to 100 in all cases. We applied dropout to non-recurrent connections of vertical and horizontal LSTMs, selecting the noise ratio from {0.2, 0.3, 0.4, 0.5}. All parameters were randomly initialized using Glorot initialization.
-
4.
GB-CNN is a CNN-based model proposed by Sun et al. (2019). We copied the StrAcc and BLEU figures for HS. GB-CNN is also an open-source model available at https://github.com/zysszy/GrammarCNN. We retrained the model following default settings to evaluate other metrics on all three datasets. For the input descriptions, we replaced all punctuations with a space; all letters were lowercase. For the neural network, we set the number of CNN layers L to 21, where the bottom layer does not have skipping connections. The layers of difference CNN modules were set to the same dimension and chosen by validation from {128, 192, 256} for each predictor network. We applied dropout (drop rate= 0.5) and l2 penalty to regularize the fully connected layers. The network was trained by the Adam optimizer with default hyperparameters.
-
5.
TreeGen was proposed by Sun et al. (Sun et al. 2020) and is a Transformer-based model. Its implementation is available at https://github.com/zysszy/TreeGen. The Acc and BLEU numbers were copied from the cited paper, and other evaluation results were obtained by training the model on the three datasets. For neural networks, we set the number of NL reader layers Nd = 6, and N1 = N2 = 5 for the AST reader as well as the decoder. The size of all embeddings was 256. The hidden sizes were all set to 256 except for each fully-connected layer and the first layer, which had 1024 dimensions. We applied dropout after each layer (including attention layers, the gating mechanism’s layers, convolutional layers, and fully-connected layers, where the drop rate was 0.15). The model was optimized by Adafactor with default parameters.
Rights and permissions
About this article
Cite this article
Lyu, C., Wang, R., Zhang, H. et al. Embedding API dependency graph for neural code generation. Empir Software Eng 26, 61 (2021). https://doi.org/10.1007/s10664-021-09968-2
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-021-09968-2