Skip to main content
Log in

Embedding API dependency graph for neural code generation

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

The problem of code generation from textual program descriptions has long been viewed as a grand challenge in software engineering. In recent years, many deep learning based approaches have been proposed, which can generate a sequence of code from a sequence of textual program description. However, the existing approaches ignore the global relationships among API methods, which are important for understanding the usage of APIs. In this paper, we propose to model the dependencies among API methods as an API dependency graph (ADG) and incorporate the graph embedding into a sequence-to-sequence (Seq2Seq) model. In addition to the existing encoder-decoder structure, a new module named “embedder” is introduced. In this way, the decoder can utilize both global structural dependencies and textual program description to predict the target code. We conduct extensive code generation experiments on three public datasets and in two programming languages (Python and Java). Our proposed approach, called ADG-Seq2Seq, yields significant improvements over existing state-of-the-art methods and maintains its performance as the length of the target code increases. Extensive ablation tests show that the proposed ADG embedding is effective and outperforms the baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. A textual program description can be a natural language description of requirements or a structured specification.

  2. Specifically, we resort to Spoon (http://spoon.gforge.inria.fr/) and Javassist (http://www.javassist.org/) for Java-based programs’ analysis.

  3. An instance of the parameter type is provided by its predecessor method.

  4. github.com/danielyule/hearthbreaker/

  5. github.com/magefree/mage/

  6. The pooling we used followed the method of GraphSAGE [27], namely, max pooling.

  7. As presented in the experimental results, we concatenate the name of each dataset and the code length level to indicate a split dataset. Taking two items of HS as examples, HS-30 consists of the data with code length between 0 and 30, while the code length of HS-60 ranges from 50 to 60.

  8. The Inspur Company is a leading cloud computing and big data service provider in China. More information is available at https://en.inspur.com/en/about_inspur/index.html.

  9. https://github.com/RuYunW/ADG-Seq2Seq/tree/master/code%20classification

  10. Most of the replication. results are consistent with those reported in the original paper, and a few (BLEU scores for ASN and SNM) are slightly lower (the difference is about 0.8% in BLEU). Considering the stochastic nature of deep learning, the differences in these results are acceptable. We choose to present the original results for each method in the paper, which help to maintain consistency with the results reported by the related studies (Yin and Neubig 2017; Rabinovich et al. 2017; Sun et al. 2019; Sun et al. 2020).

References

  • Aho AV, Ganapathi M, Tjiang SWK (1989) Code generation using tree matching and dynamic programming. ACM Trans Program Lang Syst (TOPLAS) 11(4):491–516

    Article  Google Scholar 

  • Allamanis M, Brockschmidt M, Khademi M (2017) Learning to represent programs with graphs. arXiv:1711.00740

  • Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd international conference on learning representations, p 2015

  • Brockschmidt M, Allamanis M, Gaunt AL, Polozov O (2018) Generative code modeling with graphs. arXiv:1805.08490

  • Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1724–1734

  • Costa-jussà MR, Fonollosa JAR (2016) Character-based neural machine translation. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 2:, Short Papers), pp 357–361

  • Dong L, Lapata M (2016) Language to logical form with neural attention. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1:, Long Papers), pp 33–43

  • Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. In: Proceedings of the 34th international conference on machine learning-Volume 70. JMLR.org, pp 1243–1252

  • Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256

  • Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp 855–864

  • Gu X, Zhang H, Kim S (2019) Codekernel: A graph kernel based approach to the selection of api usage examples. In: 2019 34th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 590–601

  • Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: Advances in neural information processing systems, pp 1024–1034

  • Hayati SA, Olivier R, Avvaru P, Yin P, Tomasic A, Neubig G (2018) Retrieval-based neural code generation. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 925–930

  • Hu X, Ge L, Xia X, Lo D, Jin Z (2018) Deep code comment generation. In: Proceedings of the 26th conference on program comprehension, pp 200–210

  • Hu X, Li G, Xia X, Lo D, Jin Z (2018) Summarizing source code with transferred api knowledge. In: Twenty-seventh international joint conference on artificial intelligence IJCAI-18

  • Hu X, Li G, Xia X, Lo D, Jin Z (2020) Deep code comment generation with hybrid lexical and syntactical information. Empir Softw Eng 25(3):2179–2217

    Article  Google Scholar 

  • Huang P-Y, Liu F, Shiang S-R, Jean O, Chris D (2016) Attention-based multimodal neural machine translation. In: Proceedings of the first conference on machine translation: Volume 2, Shared Task Papers, pp 639–645

  • Isozaki H, Hirao T, Duh K, Sudoh K, Hajime T (2010) Automatic evaluation of translation quality for distant language pairs. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, p 2010

  • Kalchbrenner N, Danihelka I, Graves A (2015) Grid long short-term memory. arXiv:1507.01526

  • Kingma DP, Adam Ba J (2014) A method for stochastic optimization. arXiv:1412.6980

  • Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv:1609.02907

  • Li Y, Chenjie G, Dullien T, Vinyals O, Kohli P (2019) Graph matching networks for learning the similarity of graph structured objects. In: International conference on machine learning, pp 3835–3845

  • Li Y, Tarlow D, Brockschmidt M, Zemel R (2015) Gated graph sequence neural networks. arXiv:1511.05493

  • Lin Chin-Yew, Cao Guihong, Gao Jianfeng, Nie J-Y (2006) An information-theoretic approach to automatic evaluation of summaries. In: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. Association for Computational Linguistics, pp 463–470

  • Ling W, Blunsom P, Grefenstette E, Hermann KM, Kočiskỳ T, Wang F, Senior A (2016) Latent predictor networks for code generation. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1:, Long Papers), pp 599–609

  • Liu Z, Xia X, Treude C, Lo D, Li S (2019) Automatic generation of pull request descriptions. In: 2019 34th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 176–188

  • Luong M-T, Le QV, Sutskever I, Vinyals O, Kaiser L (2015) Multi-task sequence to sequence learning. arXiv:1511.06114

  • Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1412–1421

  • Mnih V, Heess N, Graves A, Kavukcuoglu K (2014) Recurrent models of visual attention. In: Proceedings of the 27th international conference on neural information processing systems - Volume 2, NIPS’14. MIT Press, Cambridge, pp 2204–2212

  • Mou L, Li G, Zhang L, Wang T, Jin Z (2016) Convolutional neural networks over tree structures for programming language processing. In: Thirtieth AAAI conference on artificial intelligence

  • Mou L, Men R, Li G, Zhang L, Jin Z (2015) On end-to-end program generation from user intention by deep neural networks. arXiv:1510.07211

  • Murali V, Qi L, Chaudhuri S, Jermaine C (2017) Neural sketch learning for conditional program generation. arXiv:1703.05698

  • Neubig G (2015) lamtram: A toolkit for language and translation modeling using neural networks

  • Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: A method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 311–318

  • Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 701–710

  • Phan AV, Nguyen ML, Bui LT (2017) Convolutional neural networks over control flow graphs for software defect prediction. In: IEEE 29th international conference on tools with artificial intelligence (ICTAI). IEEE, p 2017

  • Quirk C, Mooney R, Galley M (2015) Language to code: Learning semantic parsers for if-this-then-that recipes. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1:, Long Papers), pp 878–888

  • Rabinovich M, Stern M, Klein D (2017) Abstract syntax networks for code generation and semantic parsing. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1:, Long Papers), pp 1139–1149

  • Satter A, Kazi S (2017) A similarity-based method retrieval technique to improve effectiveness in code search. In: Companion to the first international conference on the art, science and engineering of programming, pp 1–3

  • Scarselli F, Gori M, Chung TA, Hagenbuchner M, Monfardini G (2008) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80

    Article  Google Scholar 

  • Shiv V, Quirk C (2019) Novel positional encodings to enable tree-based transformers. In: Advances in neural information processing systems, pp 12081–12091

  • Sim J, Wright CC (2005) The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys Ther 85(3):257–268

    Article  Google Scholar 

  • Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: A simple way to prevent neural networks from overfitting. J Machine Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  • Sun Z, Zhu Q, Mou L, Xiong Y, Ge L, Zhang L (2019) A grammar-based structural cnn decoder for code generation. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 7055–7062

  • Sun Z, Zhu Q, Xiong Y, Sun Y, Mou L, Zhang L (2020) Treegen: A tree-based transformer architecture for code generation. In: AAAI 2020 : The Thirty-fourth AAAI conference on artificial intelligence, vol 34, pp 8984–8991

  • Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112

  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems. Curran Associates Inc, New York, pp 6000–6010

  • Vedantam R, Lawrence ZC, Parikh D (2015) Cider: Consensus-based image description evaluation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4566–4575

  • Veličković P, Cucurull G, Casanova A, Romero A, lio P, Bengio Y (2017) Graph attention networks. arXiv:1710.10903

  • Wan Y, Shu J, Sui Y, Xu G, Zhao Z, Wu J, Yu P (2019) Multi-modal attention network learning for semantic source code retrieval. In: 2019 34th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 13–25

  • Wang W, Ge L, Bo M, Xia X, Zhi J (2020) Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In: IEEE 27th international conference on software analysis, evolution and reengineering (SANER). IEEE, p 2020

  • Wang K, Singh R, Su Z (2017) Dynamic neural program embedding for program repair. arXiv:1711.07163

  • Wei B, Li G, Xia X, Fu Z, Jin Z (2019) Code generation as a dual task of code summarization. In: Advances in neural information processing systems, pp 6559–6569

  • Williams RJ, Zipser D (1989) A learning algorithm for continually running fully recurrent neural networks. Neural Comput 1(2):270–280

    Article  Google Scholar 

  • Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057

  • Yin P, Neubig G (2017) A syntactic neural model for general-purpose code generation. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1:, Long Papers), pp 440–450

  • Zhang J, Wang M, Liu Q, Zhou J (2017) Incorporating word reordering knowledge into attention-based neural machine translation. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1:, Long Papers), pp 1524–1534

  • Zhang J, Wang X, Zhang H, Sun H, Liu X (2020) Retrieval-based neural source code summarization. In: Proceedings of the 42nd international conference on software engineering. IEEE

  • Zhang J, Wang X, Zhang H, Sun H, Wang K, Xudong L (2019) A novel neural source code representation based on abstract syntax tree. In: IEEE/ACM 41st international conference on software engineering (ICSE). IEEE, p 2019

  • Zhou K, Dong Y, Lee WS, Hooi B, Xu H, Feng J (2020) Effective training strategies for deep graph neural networks. arXiv:2006.07107

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61602286 and 61976127, in part by the Shandong Key Research and Development Program under Grant 2018GGX101003, and in part by the Shandong Province Higher Educational Science and Technology Program under Grant J16LN09.

Author information

Authors and Affiliations

Authors

Contributions

Chen Lyu conceived and designed the study. Ruyun Wang and Hanwen Zhang performed the experiments. Chen Lyu and Hanwen Zhang wrote the paper. Hongyu Zhang encouraged Chen Lyu to investigate the API dependency graph and supervised the findings of this work. Hongyu Zhang and Songlin Hu reviewed and edited the manuscript. All authors discussed the results and contributed to the final manuscript.

Corresponding author

Correspondence to Chen Lyu.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Communicated by: Martin Monperrus

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: More Details about the Experimental Data and Settings Used in Comparisons

Appendix A: More Details about the Experimental Data and Settings Used in Comparisons

In Section 7.1, we describe our experimental data. For models with existing experimental data, we use the experimental results presented in the original paperFootnote 10 For those without, we retrain the model with the default parameters provided in the original paper and calculate the experimental results under different metrics.

More specifically, we compared six models: the attention-based Seq2Seq model (Neubig 2015) and the Transformer model (Vaswani et al. 2017) were used as baselines; and SNM (Yin and Neubig 2017), ASN (Rabinovich et al. 2017), GB-CNN (Sun et al. 2019), and TreeGen (Sun et al. 2020) were used as the state-of-the-art (SOTA) methods.

All evaluation results of the baselines (the attention-based Seq2Seq model and the Transformer model) were calculated by training the models on HS, MTG and E-JDT datasets (see Table 4 and Fig. 9), and on the split datasets (see Figs. 12 and 13).

In the original publications of the four SOTA methods (SNM, ASN, GB-CNN, and TreeGen) (Yin and Neubig 2017; Rabinovich et al. 2017; Sun et al. 2019; Sun et al. 2020), HS was chosen as one of the experimental datasets, which is also used in our experiments. The scores of BLEU and Acc metrics on HS in Table 4 and Fig. 9 were obtained directly from the respective papers (see Table 4 and Fig. 9a–b). Since the papers describing SNM, ASN, GB-CNN, and TreeGen did not provide scores apart from Acc and BLEU, we have to retrain these models to obtain the remaining metric scores (see Table 4 HS: F1-RIBES). We use the same hyperparameter values as those described in the respective paper. As the authors of SNM provided the trained model, we calculated the scores on HS by using their trained model.

For datasets MTG, E-JDT, and the split datasets, since the papers describing the SOTA methods did not provide any evaluation scores, we retrained their models based on the hyperparameters provided in the original papers, and calculated scores for all metrics on these datasets (see Table 4 and Fig. 9i–x, Figs. 12 and 13).

In detail, the experimental data and settings for each model are as follows:

  1. 1.

    All evaluation results of the attention-based Seq2Seq model and the Transformer model are calculated by us. The attention-based Seq2Seq model is our basic model. The Transformer is a well-known NMT model in the field of NLP, and the relevant code is available at https://github.com/jadore801120/attention-is-all-you-need-pytorch.

  2. 2.

    The SNM model was proposed by Yin et al. (Yin and Neubig 2017) and its code is available at https://github.com/pcyin/NL2code. We use the ACC and BLEU values from the paper and calculate other metrics by testing the results on HS from the trained model they provided. For MTG and E-JDT, all metrics’ values are determined by retraining the model. The size of all embeddings is 128, except for node type embeddings, the size of which is 64. Dimensions of RNN states and hidden layers are 256 and 50, respectively.

  3. 3.

    The ASN model was proposed by Rabinovich et al. (2017). This paper does not provide the source code for the implementation, but since this study has had a large impact, we have been able to ascertain the ASN code implemented by other scholars via PyTorch at https://github.com/xiye17/torchASN. Hence, the Acc and BLEU scores were copied from the cited paper. Other scores on HS and scores on MTG and E-JDT were computed by us by retraining the model. For each experiment, all feedforward and LSTM hidden dimensions were set to the same value. We selected the dimension from {50, 75, 100, 125, 150} for all datasets. The dimensionality used for the inputs to the encoder was set to 100 in all cases. We applied dropout to non-recurrent connections of vertical and horizontal LSTMs, selecting the noise ratio from {0.2, 0.3, 0.4, 0.5}. All parameters were randomly initialized using Glorot initialization.

  4. 4.

    GB-CNN is a CNN-based model proposed by Sun et al. (2019). We copied the StrAcc and BLEU figures for HS. GB-CNN is also an open-source model available at https://github.com/zysszy/GrammarCNN. We retrained the model following default settings to evaluate other metrics on all three datasets. For the input descriptions, we replaced all punctuations with a space; all letters were lowercase. For the neural network, we set the number of CNN layers L to 21, where the bottom layer does not have skipping connections. The layers of difference CNN modules were set to the same dimension and chosen by validation from {128, 192, 256} for each predictor network. We applied dropout (drop rate= 0.5) and l2 penalty to regularize the fully connected layers. The network was trained by the Adam optimizer with default hyperparameters.

  5. 5.

    TreeGen was proposed by Sun et al. (Sun et al. 2020) and is a Transformer-based model. Its implementation is available at https://github.com/zysszy/TreeGen. The Acc and BLEU numbers were copied from the cited paper, and other evaluation results were obtained by training the model on the three datasets. For neural networks, we set the number of NL reader layers Nd = 6, and N1 = N2 = 5 for the AST reader as well as the decoder. The size of all embeddings was 256. The hidden sizes were all set to 256 except for each fully-connected layer and the first layer, which had 1024 dimensions. We applied dropout after each layer (including attention layers, the gating mechanism’s layers, convolutional layers, and fully-connected layers, where the drop rate was 0.15). The model was optimized by Adafactor with default parameters.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lyu, C., Wang, R., Zhang, H. et al. Embedding API dependency graph for neural code generation. Empir Software Eng 26, 61 (2021). https://doi.org/10.1007/s10664-021-09968-2

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-021-09968-2

Keywords

Navigation