Skip to main content
Log in

Enhance code search via reformulating queries with evolving contexts

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

To improve code search, many query expansion (QE) approaches use APIs or crowd knowledge for expanding a query. However, these approaches may sometimes negatively impact the retrieval performance. This is because they can’t distinguish the relevant terms from the irrelevant ones among a large set of candidate expansion terms and expand a query with irrelevant terms. In this paper, we propose QREC, a query reformulation approach with evolving contexts that refer to new/deleted terms and dependent terms during the code evolution. By considering the new terms as the relevant and the deleted terms as the irrelevant, QREC could reformulate a query with appropriate expansion terms. The experimental results show that QREC outperforms the state-of-the-art QE approaches (e.g., CodeHow and QECK) by 9–11% and improves the precision of the code search algorithms IR, Portfolio and VF by up to 37–45%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://stackoverflow.com/.

  2. The full qualified name (FQN),declaration, instantiation and the signatures of methods invoked and filed accessed, etc.

  3. https://bit.ly/2VTQy2W.

  4. https://github.com/eclipse/eclipse.jdt.core.

  5. https://bit.ly/2VYOlTZ.

  6. https://bitbucket.org/sealuzh/tools-changedistiller/overview.

  7. https://code.google.com/archive/p/crystalsaf/.

  8. http://www.dofactory.com/sql/left-outer-join.

  9. https://bitbucket.org/sealuzh/tools-changedistiller/overview.

  10. https://bit.ly/2GoQjXc.

  11. https://github.com/deeplearning4j.

  12. https://bit.ly/2Uw4crB.

  13. https://github.com/googlesamples?language=java.

  14. https://bit.ly/2VkonNB.

  15. http://www.eclipse.org/documentation/.

  16. http://archive.org/download/stackexchange.

  17. https://bit.ly/2BgGyI9.

  18. https://bit.ly/2Gy3q9Q.

  19. https://bit.ly/2UQgaRZ.

  20. https://bit.ly/2GpE8tb.

  21. https://bit.ly/2ILs1cL.

  22. https://bit.ly/2UKaTLw.

  23. https://bit.ly/2ILs1cL.

References

  • Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1 (2012)

    Article  Google Scholar 

  • Chaparro, O., Florez, J.M., Marcus, A.: Using observed behavior to reformulate queries during text retrieval-based bug localization. In: IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE (2017)

  • Fischer, G., Henninger, S., Redmiles, D.: Cognitive tools for locating and comprehending software objects for reuse. In: Proceedings of the 13th International Conference on Software Engineering, pp. 318–328 (1991)

  • Fluri, B., Wursch, M., Pinzger, M., Gall, H.C.: Change distilling—tree differencing for fine-grained source code change extraction. IEEE Trans. Softw. Eng. 33(11), 725–743 (2007)

    Article  Google Scholar 

  • Haiduc, S., Bavota, G., Marcus, A., Oliveto, R., De Andrea, L., Menzies, T.: Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the 35th International Conference on Software Engineering (ICSE), pp. 842–851 (2013)

  • Howard, M.J., Gupta, S., Pollock, L., Vijay-Shanker, K.: Automatically mining software-based, semantically-similar words from comment code mappings. In: Proceedings of the 10th Working Conference on Mining Software Repositories, pp. 377–386 (2013)

  • Keivanloo, I., Rilling, J., Zou, Y.: Spotting working code examples. In: Proceedings of the 36th International Conference on Software Engineering, pp. 664–675 (2014)

  • Lemos, O., Bajracharya, S., Ossher, J., Morla, R., Masiero, P., Baldi, P., Lopes, C.: CodeGenie: using test-cases to search and reuse source code. In: Proceedings of the Twenty-Second IEEE/ACM International Conference on Automated Software Engineering, pp. 525–526 (2007)

  • Lv, F., Zhang, H., Lou, J.-G., Wang, S., Zhang, D., Zhao, J.: CodeHow: effective code search based on API understanding and extended boolean model (E). In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 260–270 (2015)

  • Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  Google Scholar 

  • McMillan, C., Grechanik, M., Poshyvanyk, D., Fu, C., Xie, Q.: Exemplar: a source code search engine for finding highly relevant applications. IEEE Trans. Softw. Eng. 38(5), 1069–1087 (2012)

    Article  Google Scholar 

  • Mcmillan, C., Poshyvanyk, D., Grechanik, M., Xie, Q., Fu, C.: Portfolio: searching for relevant functions and their usages in millions of lines of code. ACM Trans. Softw. Eng. Methodol. 22(4), 1–30 (2013)

    Article  Google Scholar 

  • Nguyen, A.T., Hilton, M., Codoban, M., Nguyen, H.A., Mast, L., Rademacher, E., Nguyen, T.N., Dig, D.: API code recommendation using statistical learning from fine-grained changes. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 511–522 (2016)

  • Nie, L., Jiang, H., Ren, Z., Sun, Z., Li, X.: Query expansion based on crowd knowledge for code search. IEEE Trans. Serv. Comput. 9(5), 771–783 (2016)

    Article  Google Scholar 

  • Proksch, S., Amann, S., Nadi, S., Mezini, M.: Evaluating the evaluations of code recommender systems: a reality check. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, Singapore, pp. 111–121 (2016)

  • Sadowski, C., Stolee, K.T., Elbaum, S.: How users search for code: a case study. Presented at the Proceedings, 10th Joint Meeting Foundations of Software Engineering (2015)

  • Salton, G., Fox, E.A., Wu, H.: Extended boolean information retrieval. Commun. ACM 26, 1022–1036 (1983)

    Article  MathSciNet  Google Scholar 

  • Sim, S.E, Clarke, C.L.A., Holt, R.C.: Archetypal source code searches: a survey of software users and maintainers. In: International Workshop on Program Comprehension, Iwpc ’98, Proceedings. IEEE, pp. 180–187 (1998)

  • Sridhara, G., Hill, E., Pollock, L.L., Vijay-Shanker, K.: Identifying word relations in software: a comparative study of semantic similarity tools. In: Proceedings 16th IEEE International Conference on Program Comprehension (ICPC 08), pp. 123–132 (2008)

  • Stolee, K.T., Elbaum, S., Dobos, D.: Solving the search for source code. ACM Trans. Softw. Eng. Methodol. (TOSEM) 23(3), 26 (2014)

    Article  Google Scholar 

  • Sun, X., Liu, X., Hu, J., Zhu, J.: Empirical studies on the NLP techniques for source code data preprocessing. In: Proceedings of the 3rd International Workshop on Evidential Assessment of Software Technologies, pp. 32–39 (2014)

  • Tian, Y., Lo, D., Lawall, J.: SEWordSim: software-specific word similarity database. In: Companion Proceedings of the 36th International Conference on Software Engineering. ACM (2014)

  • Xu, B., Lin, H., Lin, Y.: Assessment of learning to rank methods for query expansion. J. Assoc. Inf. Sci. Technol. (JASIST) 67(6), 1345–1357 (2016)

    Article  MathSciNet  Google Scholar 

  • Ye, X., Shen, H., Ma, X., Bunescu, R.C., Liu, C.: From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th International Conference on Software Engineering, pp. 404–415 (2016)

  • Youm, K.C., Ahn, J., Lee, E.: Improved bug localization based on code change histories and bug reports. Inf. Softw. Technol. 82, 177–192 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 61902162, 61762049, 61872272, 61877031, 61802350, 61862033, 61772246, 61562042 and 61672470).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qing Huang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, Q., Wu, G. Enhance code search via reformulating queries with evolving contexts. Autom Softw Eng 26, 705–732 (2019). https://doi.org/10.1007/s10515-019-00263-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10515-019-00263-5

Keywords

Navigation