Abstract
Developers often search for relevant code examples on the web for their programming tasks. Unfortunately, they face three major problems. First, they frequently need to read and analyse multiple results from the search engines to obtain a satisfactory solution. Second, the search is impaired due to a lexical gap between the query (task description) and the information associated with the solution (e.g., code example). Third, the retrieved solution may not be comprehensible, i.e., the code segment might miss a succinct explanation. To address these three problems, we propose CROKAGE (CrowdKnowledge Answer Generator), a tool that takes the description of a programming task (the query) as input and delivers a comprehensible solution for the task. Our solutions contain not only relevant code examples but also their succinct explanations written by human developers. The search for code examples is modeled as an Information Retrieval (IR) problem. We first leverage the crowd knowledge stored in Stack Overflow to retrieve the candidate answers against a programming task. For this, we use a fine-tuned IR technique, chosen after comparing 11 IR techniques in terms of performance. Then we use a multi-factor relevance mechanism to mitigate the lexical gap problem, and select the top quality answers related to the task. Finally, we perform natural language processing on the top quality answers and deliver the comprehensible solutions containing both code examples and code explanations unlike earlier studies. We evaluate and compare our approach against ten baselines, including the state-of-art. We show that CROKAGE outperforms the ten baselines in suggesting relevant solutions for 902 programming tasks (i.e., queries) of three popular programming languages: Java, Python and PHP. Furthermore, we use 24 programming tasks (queries) to evaluate our solutions with 29 developers and confirm that CROKAGE outperforms the state-of-art tool in terms of relevance of the suggested code examples, benefit of the code explanations and the overall solution quality (code + explanation).
Similar content being viewed by others
Notes
https://data.stackexchange.com/stackoverflow/query on July, 2019
https://archive.org/details/stackexchange - dump published in March 2019
the complete list of words is available at: https://bit.ly/2Hjv0tW
Despite this limitation was explicitly stated in the CROKAGE web Page (i.e., http://isel.ufu.br:9000/), a significant number of non Java queries were found
CROKAGE search requires the query to have a minimum of one character and a maximum of 70 characters to run
We adopt the list provided by Stanford: https://bit.ly/1Nt4eMh
despite we use a semi-automatic process to filter out queries not related to Java, several still queries remained
we append to the query “site:stackoverflow.com”
herein we test the IR techniques using their default parameters. In the case of BM25, the default parameters are: k= 1.2 and b= 0.75
The simplest form of language model that disconsiders all conditioning context, and estimates each term independently
Although the title of the Q&A pair alone could represent the query intent, our output is the answer, thus we concatenate the title and body text of the answer in order to match the query with the answers of the candidate pairs
Our general conclusions are not statistically confirmed for PHP language, despite supported by the four adopted metrics.
except BIKER, whose behaviour we do not change
References
Ahasanuzzaman M, Asaduzzaman M, Roy CK, Schneider KA (2016) Mining duplicate questions in Stack Overflow. In: Proceeding MSR, pp 402–412
An L, Mlouki O, Khomh F, Antoniol G (2017) Stack overflow: a code laundering platform?. In: Proceeding SANER, pp 283–293
Apache (2020) Lucene, http://lucene.apache.org/
Baeza-Yates R, Ribeiro-Neto B, et al. (1999) Modern information retrieval, vol 463. ACM Press, New York
Bajracharya S, Ossher J, Lopes C (2010) Searching API usage examples in code repositories with Sourcerer API search. In: Workshop on search-driven development, pp 5–8
BeginnersBook (2020) BeginnersBook, http://beginnersbook.com
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. TACL 5:135–146
Campbell BA, Treude C (2017) NLP2code: Code snippet content assist via natural language tasks. In: Proceeding ICSME, pp 628–632
Campos EC, Souza LBLD, Maia MA (2014) Nuggets miner: assisting developers by harnessing the Stack Overflow crowd knowledge and the Github traceability. In: Proceeding CBSoft-Tool Session
Campos EC, de Souza LB, Maia MA (2016) Searching crowd knowledge to recommend solutions for API usage tasks. J Softw Evol Process 28 (10):863–892
Chatterjee P, Gause B, Hedinger H, Pollock L (2017) Extracting code segments and their descriptions from research articles. In: Proceeding MSR, pp 91–101
Chen C, Xing Z, Liu Y, Ong KLX (2019) Mining likely analogical apis across third-party libraries via large-scale unsupervised api semantics embedding, TSE
Ciborowska A, Kraft NA, Damevski K (2018) Detecting and characterizing developer behavior following opportunistic reuse of code snippets from the web. In: Proceeding MSR, pp 94–97
Corbin J, Strauss A (1990) Basics of qualitative research: techniques and procedures for developing grounded theory sage publications
De Souza LBL, Campos EC, Maia MA (2014) Ranking crowd knowledge to assist software development. In: Proceeding Intl. Conf. on Program Comprehension, pp 72–82
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding, arXiv:1810.04805
Diamantopoulos T, Symeonidis AL (2015) Employing source code information to improve question-answering in Stack Overflow. In: Proceeding MSR, pp 454–457
Facebook Inc (2020) Word representations in fastText, https://fasttext.cc/docs/en/unsupervised-tutorial.html
Fang H, Zhai C (2005) An exploration of axiomatic approaches to information retrieval. In: Proceeding SIGIR ACM, pp 480–487
Fielding RT, Taylor RN (2002) Principled design of the modern web architecture. ACM Trans Int Technol (TOIT) 2(2):115–150
Fritz C, Peter E, Richler J (2012) Effect size estimates: current use, calculations, and interpretation. JEPG 141(1):2–18
Fu W, Menzies T (2017) Easy over hard: A case study on deep learning. In: Proceeding ESEC/FSE, pp 49–60
Google Inc (2020) Google search engine, http://google.com
Gu X, Zhang H, Zhang D, Kim S (2016) Deep API learning. In: Proceeding FSE, pp 631–642
Gu X, Zhang H, Kim S (2018) Deep code search. In: Proceeding ICSE, pp 933–944
Gvero T, Kuncak V (2015) Interactive synthesis using free-form queries. In: Proceeding ICSE, pp 689–692
Hill E, Rao S, Kak A (2012) On the use of stemming for concern location and bug localization in Java. In: Proceeding SCAM, pp 184–193
Hoogeveen D, Bennett A, Li Y, Verspoor KM, Baldwin T (2018) Detecting misflagged duplicate questions in community question-answering archives. In: Proceeding ICWSM, pp 112–120
Hu X, Li G, Xia X, Lo D, Jin Z (2018) Deep code comment generation. In: Proceeding ICPC, pp 200–210
Huang Q, Xia X, Xing Z, Lo D, Wang X (2018) API method recommendation without worrying about the task-API knowledge gap. In: Proceeding ASE, pp 293–304
Java2s (2020) Java2s, http://java2s.com
Jsoup (2020) Java HTML parser, http://jsoup.org
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174
Li Z, Wang T, Zhang Y, Zhan Y, Yin G (2016) Query reformulation by leveraging crowd wisdom for scenario-based software search. In: Proceedings of the 8th asia-pacific symposium on internetware ACM, pp 36–44
Lv F, Zhang H, Lou J-G, Wang S, Zhang D, Zhao J (2015) Codehow: effective code search based on API understanding and extended boolean model (e). In: Proceeding ASE, pp 260–270
McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C (2011) Portfolio: finding relevant functions and their usage. In: Proceeding ICSE, pp 111–120
Microsoft Inc (2020) Bing search engine, http://bing.com
Mihalcea R, Corley C, Strapparava C, et al. (2006) Corpus-based and knowledge-based measures of text semantic similarity. In: Aaai 6 (2006):775–780
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013a) Distributed representations of words and phrases and their compositionality. In: Proceeding NIPS, pp 3111–3119
Mikolov T, Chen K, Corrado G, Dean J (2013b) Efficient estimation of word representations in vector space, arXiv:1301.3781
Nasehi SM, Sillito J, Maurer F, Burns C (2012) What makes a good code example?: A study of programming q&a in stackoverflow. In: Procedding ICSM IEEE, pp 25–34
Nguyen T, Rigby PC, Nguyen AT, Karanfil M, Nguyen TN (2016) T2API: synthesizing API code usage templates from English texts with statistical translation. In: Proceeding FSE, pp 1013–1017
Nie L, Jiang H, Ren Z, Sun Z, Li X (2016) Query expansion based on crowd knowledge for code search. IEEE Trans Serv Comput 9(5):771–783
Pagliardini M, Gupta P, Jaggi M (2017) Unsupervised learning of sentence embeddings using compositional n-gram features, arXiv:1703.02507
Ponzanelli L, Bacchelli A, Lanza M (2013a) Seahawk: Stack Overflow in the IDE. In: International conference on software engineering (ICSE), pp 1295–1298
Ponzanelli L, Bacchelli A, Lanza M (2013b) Leveraging crowd knowledge for software comprehension and development. In: Proceeding CSMR, pp 57–66
Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014a) Mining Stack Overflow to turn the IDE into a self-confident programming prompter. In: Proceeding MSR, pp 102–111
Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014b) Prompter: A self-confident recommender system. In: Proceeding ICSME. IEEE, pp 577–580
Raghothaman M, Wei Y, Hamadi Y (2016) SWIM: Synthesizing what I mean-code search and idiomatic snippet synthesis. In: Proceeding ICSE, pp 357–367
Ragkhitwetsagul C, Krinke J, Paixao M, Bianco G, Oliveto R (2018) Toxic code snippets on Stack Overflow, arXiv:1806.07659
Rahman MM, Roy CK (2017) STRICT: Information retrieval based search term identification for concept location. In: Proceeding SANER, pp 79–90
Rahman MM, Roy CK (2018) Effective reformulation of query for code search using crowdsourced knowledge and extra-large data analytics. In: Proceedings ICSME, pp 473–484
Rahman MM, Roy CK, Keivanloo I (2015) Recommending insightful comments for source code using crowdsourced knowledge. In: Proceeding SCAM, pp 81–90
Rahman MM, Roy CK, Lo D (2016) RACK: Automatic API recommendation using crowdsourced knowledge. In: Proceeding SANER, pp 349–359
Rahman MM, Roy CK, Lo D (2017) Rack: Code search in the IDE using crowdsourced knowledge. In: Proceeding ICSE, pp 51–54
Robertson SE, Walker S (1994) Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In: Proceeding ACM SIGIR, pp 232–241
Saryada W (2020) Kodejava, http://kodejava.org
Silva RFG, Paixao KVR, Maia MA (2018) Duplicate question detection in Stack overflow: a reproducibility study. In: Proceeding SANER, pp 572–581
Silva RF, Roy CK, Rahman MM, Schneider KA, Paixao K, de Almeida Maia M (2019) Recommending comprehensive solutions for programming tasks by mining crowd knowledge. In: Proceedings of the 27th international conference on program comprehension, IEEE Press, pp 358–368
Stack Exchange Inc (2020) Stack Overflow search engine, http://stackoverflow.com
Van Nguyen T, Nguyen AT, Phan HD, Nguyen TD, Nguyen TN (2017) Combining word2vec with revised vector space model for better code retrieval. In: Proceeding ICSE IEEE Press, pp 183–185
Wang Y, Feng Y, Martins R, Kaushik A, Dillig I, Reiss SP (2016) Hunter: next-generation code reuse for Java. In: Proceeding FSE, pp 1028–1032
Wang S, Lo D, Jiang L (2014) Active code search: incorporating user feedback to improve code search relevance. In: Proceedings of the 29th ACM/IEEE international conference on Automated software engineering. ACM, pp 677–682
Wang X, Peng Y, Zhang B (2018) Comment generation for source code:, State of the art, challenges and opportunities, arXiv:1802.02971
Wilcoxon F (1945) Individual comparisons by ranking methods. Biomet Bull 1(6):80–83
Wong E, Yang J, Tan L (2013) Autocomment: mining question and answer sites for automatic comment generation. In: Proceeding ASE, pp 562–567
Wong E, Liu T, Tan L (2015) Clocom: mining existing source code for automatic comment generation. In: Proceeding SANER, pp 380–389
Xu B, Ye D, Xing Z, Xia X, Chen G, Li S (2016) Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In: Proceeding ASE, pp 51–62
Xu B, Xing Z, Xia X, Lo D (2017) Answerbot: automated generation of answer summary to developers technical questions. In: Proceedings ASE, pp 706–716
Xu B, Shirani A, Lo D, Alipour MA (2018) Prediction of relatedness in stack overflow: deep learning vs. svm: a reproducibility study. In: Proceeding ESEM ACM, p 21
Yang D, Martins P, Saini V, Lopes C (2017) Stack overflow in github: any snippets there?. In: Proceeding MSR, pp 280–290
Ye X, Shen H, Ma X, Bunescu R, Liu C (2016) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceeding ICSE, pp 404–415
Yin P, Deng B, Chen E, Vasilescu B, Neubig G (2018) Learning to mine aligned code and natural language pairs from stack overflow. In: Proceeding MSR, ser MSR ACM, pp 476–486
Zagalsky A, Barzilay O, Yehudai A (2012) Example overflow: using social media for code recommendation. In: Proceeding RSSE, pp 38–42
Zhai C, Lafferty J (2004) A study of smoothing methods for language models applied to information retrieval. TOIS 22(2):179–214
Zhang Y, Lo D, Xia X, Sun J-L (2015) Multi-factor duplicate question detection in Stack Overflow. JCST 30(5):981–997
Zhang WE, Sheng QZ, Lau JH, Abebe E (2017a) Detecting duplicate posts in programming qa communities via latent semantics and association rules. In: Proceeding WWW, pp 1221–1229
Zhang WE, Sheng QZ, Shu Y, Nguyen VK (2017b) Feature analysis for duplicate detection in programming qa communities. In: Proceeding ADMA. Springer, New York, pp 623–638
Acknowledgments
We thank the authors of BIKER for sharing their tool. This research is supported in-part by a Canada First Research Excellence Fund (CFREF) grant coordinated by the Global Institute for Food Security (GIFS). We also thank the Brazilian funding agencies, CAPES, CNPq and FAPEMIG for supporting this research. At last, but not least, we thank the participants that worked in the qualitative evaluation of this work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Tim Menzies
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
da Silva, R.F.G., Roy, C.K., Rahman, M.M. et al. CROKAGE: effective solution recommendation for programming tasks by leveraging crowd knowledge. Empir Software Eng 25, 4707–4758 (2020). https://doi.org/10.1007/s10664-020-09863-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-020-09863-2