research-article

Technical Q8A Site Answer Recommendation via Question Boosting

Authors:
Zhipeng Gao

Monash University, Australia

Monash University, Australia
View Profile

,
Xin Xia

Monash University, Australia

Monash University, Australia

0000-0002-6302-3256
View Profile

,
David Lo

Singapore Management University, Singapore

Singapore Management University, Singapore
View Profile

,
John Grundy

Monash University, Australia

Monash University, Australia
View Profile

ACM Transactions on Software Engineering and Methodology Volume 30 Issue 1Article No.: 11pp 1–34https://doi.org/10.1145/3412845

Published:31 December 2020Publication History

ACM Transactions on Software Engineering and Methodology

Abstract

Software developers have heavily used online question-and-answer platforms to seek help to solve their technical problems. However, a major problem with these technical Q8A sites is “answer hungriness,” i.e., a large number of questions remain unanswered or unresolved, and users have to wait for a long time or painstakingly go through the provided answers with various levels of quality. To alleviate this time-consuming problem, we propose a novel DEEPANS neural network–based approach to identify the most relevant answer among a set of answer candidates. Our approach follows a three-stage process: question boosting, label establishment, and answer recommendation. Given a post, we first generate a clarifying question as a way of question boosting. We automatically establish the positive, neutral⁺, neutral^-, and negative training samples via label establishment. When it comes to answer recommendation, we sort answer candidates by the matching scores calculated by our neural network–based model. To evaluate the performance of our proposed model, we conducted a large-scale evaluation on four datasets, collected from the real-world technical Q8A sites (i.e., Ask Ubuntu, Super User, Stack Overflow Python, and Stack Overflow Java). Our experimental results show that our approach significantly outperforms several state-of-the-art baselines in automatic evaluation. We also conducted a user study with 50 solved/unanswered/unresolved questions. The user-study results demonstrate that our approach is effective in solving the answer-hungry problem by recommending the most relevant answers from historical archives.

References

Lada A. Adamic, Jun Zhang, Eytan Bakshy, and Mark S. Ackerman. 2008. Knowledge sharing and Yahoo answers: Everyone knows something. In Proceedings of the 17th International Conference on World Wide Web. ACM, 665--674Google Scholar
Arvind Agarwal, Hema Raghavan, Karthik Subbian, Prem Melville, Richard D. Lawrence, David C. Gondek, and James Fan. 2012. Learning to rank for robust question answering. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, 833--842.Google ScholarDigital Library
Mohammad Alahmadi, Jonathan Hassel, Biswas Parajuli, Sonia Haiduc, and Piyush Kumar. 2018. Accurately predicting the location of code fragments in programming video tutorials using deep learning. In Proceedings of the 14th International Conference on Predictive Models and Data Analytics in Software Engineering. ACM, 2--11.Google ScholarDigital Library
Azilawati Azizan and Zainab Abu Bakar. 2015. Query reformulation using crop characteristic in specific domain search. In Proceedings of the IEEE European Modelling Symposium (EMS’15). IEEE, 374--379.Google ScholarCross Ref
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Arxiv Preprint Arxiv:1409.0473 (2014).Google Scholar
Steven Bird and Edward Loper. 2004. NLTK: the natural language toolkit. In Proceedings of the ACL Conference on Interactive Poster and Demonstration Sessions. Association for Computational Linguistics, 31.Google ScholarDigital Library
Christopher J. C. Burges. 2010. From RankNet to LambdaRank to LambdaMart: An overview. Learning 11, 23-581 (2010), 81.Google Scholar
Fabio Calefato, Filippo Lanubile, and Nicole Novielli. 2016. Moving to stack overflow: Best-answer prediction in legacy developer forums. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. ACM, 13.Google ScholarDigital Library
Fabio Calefato, Filippo Lanubile, and Nicole Novielli. 2019. An empirical assessment of best-answer prediction models in technical Q8A sites. Empir. Softw. Eng. 24, 2 (2019), 854--901.Google ScholarDigital Library
Qingying Chen and Minghui Zhou. 2018. A neural framework for retrieval and summarization of source code. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 826--831.Google ScholarDigital Library
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, Aug. (2011), 2493--2537.Google ScholarDigital Library
Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language modeling with gated convolutional networks. In Proceedings of the International Conference on Machine Learning. 933--941.Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. Arxiv Preprint Arxiv:1810.04805 (2018).Google Scholar
Haoyu Dong, Shijie Liu, Shi Han, Zhouyu Fu, and Dongmei Zhang. 2019. TableSense: Spreadsheet table detection with convolutional neural networks. (2019).Google Scholar
Zhipeng Gao, Vinoj Jayasundara, Lingxiao Jiang, Xin Xia, David Lo, and John Grundy. 2019. SmartEmbed: A tool for clone and bug detection in smart contracts through structural code embedding. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME’19). IEEE, 394--397. DOI:10.1109/ICSME.2019.00067Google ScholarCross Ref
Z. Gao, L. Jiang, X. Xia, D. Lo, and J. Grundy. 2020. Checking smart contracts with structural code embedding. IEEE Trans. Softw. Eng. (2020), 1--1. DOI:https://doi.org/10.1109/TSE.2020.2971482Google Scholar
Zhipeng Gao, Xin Xia, John Grundy, David Lo, and Yuan-Fang Li. 2020. Generating question titles for stack overflow from mined code snippets. Arxiv Preprint Arxiv:2005.10157 (2020).Google Scholar
Zhipeng Gao, Xin Xia, David Lo, John Grundy, and Yuan-Fang Li. 2020. Code2Que: A tool for improving question titles from mined code snippets in stack overflow. Arxiv Preprint Arxiv:2007.10851 (2020).Google Scholar
Alex Graves. 2012. Sequence transduction with recurrent neural networks. Arxiv Preprint Arxiv:1211.3711 (2012).Google Scholar
Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep code search. In Proceedings of the IEEE/ACM 40th International Conference on Software Engineering (ICSE’08). IEEE, 933--944.Google ScholarDigital Library
Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2016. Deep API learning. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 631--642.Google ScholarDigital Library
Sonia Haiduc, Gabriele Bavota, Andrian Marcus, Rocco Oliveto, Andrea De Lucia, and Tim Menzies. 2013. Automatic query reformulations for text retrieval in software engineering. In Proceedings of the International Conference on Software Engineering. IEEE Press, 842--851.Google ScholarCross Ref
Emily Hill, Manuel Roldan-Vega, Jerry Alan Fails, and Greg Mallet. 2014. NL-based query refinement and contextualized code search results: A user study. In Proceedings of the IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE’14). IEEE, 34--43.Google ScholarCross Ref
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In Proceedings of the 26th Conference on Program Comprehension. ACM, 200--210.Google ScholarDigital Library
Qiao Huang, Xin Xia, David Lo, and Gail C. Murphy. 2020. Automating intention mining. IEEE Trans. Softw. Eng. 46, 10 (2020), 1098--1119. DOI:10.1109/TSE.2018.2876340Google ScholarCross Ref
Qing Huang, Yangrui Yang, Xue Zhan, Hongyan Wan, and Guoqing Wu. 2018. Query expansion based on statistical learning from code changes. Softw.: Pract. Exper. 48, 7 (2018), 1333--1351.Google ScholarCross Ref
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 2073--2083.Google ScholarCross Ref
Maximilian Jenders, Ralf Krestel, and Felix Naumann. 2016. Which answer is best?: Predicting accepted answers in MOOC forums. In Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee, 679--684.Google ScholarDigital Library
Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. Arxiv Preprint Arxiv:1404.2188 (2014).Google Scholar
Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In Proceedings of the 41st International Conference on Software Engineering. IEEE Press, 1039--1049.Google ScholarDigital Library
Junhwi Kim, Minhyuk Kwon, and Shin Yoo. 2018. Generating test input with deep reinforcement learning. In Proceedings of the IEEE/ACM 11th International Workshop on Search-Based Software Testing (SBST’18). IEEE, 51--58.Google ScholarDigital Library
Yoon Kim. 2014. Convolutional neural networks for sentence classification. Arxiv Preprint Arxiv:1408.5882 (2014).Google Scholar
Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th Conference on Artificial Intelligence.Google Scholar
Liuqing Li, He Feng, Wenjie Zhuang, Na Meng, and Barbara Ryder. 2017. CClearner: A deep learning-based clone detection approach. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME’17). IEEE, 249--260.Google ScholarCross Ref
Zhixing Li, Tao Wang, Yang Zhang, Yun Zhan, and Gang Yin. 2016. Query reformulation by leveraging crowd wisdom for scenario-based software search. In Proceedings of the 8th Asia-Pacific Symposium on Internetware. ACM, 36--44.Google ScholarDigital Library
Peng Liu, Xiangyu Zhang, Marco Pistoia, Yunhui Zheng, Manoel Marques, and Lingfei Zeng. 2017. Automatic text input generation for mobile testing. In Proceedings of the 39th International Conference on Software Engineering. IEEE Press, 643--653.Google ScholarDigital Library
Meili Lu, Xiaobing Sun, Shaowei Wang, David Lo, and Yucong Duan. 2015. Query expansion via WordNet for effective code search. In Proceedings of the IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER’15). IEEE, 545--549.Google Scholar
Liming Nie, He Jiang, Zhilei Ren, Zeyi Sun, and Xiaochen Li. 2016. Query expansion based on crowd knowledge for code search. IEEE Trans. Serv. Comput. 9, 5 (2016), 771--783.Google ScholarCross Ref
Liqiang Nie, Xiaochi Wei, Dongxiang Zhang, Xiang Wang, Zhipeng Gao, and Yi Yang. 2017. Data-driven answer selection in community QA systems. IEEE Trans. Knowl. Data Eng. 29, 6 (2017), 1186--1198.Google ScholarDigital Library
Sudha Rao and Hal Daumé III. 2018. Learning to ask good questions: Ranking clarification questions using neural expected value of perfect information. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2737--2746. DOI:https://doi.org/10.18653/v1/P18-1255Google ScholarCross Ref
Tirath Prasad Sahu, Naresh Kumar Nagwani, and Shrish Verma. 2016. Selecting best answer: An empirical analysis on community question answering sites. IEEE Access 4 (2016), 4797--4808.Google ScholarCross Ref
Denis Savenkov. 2015. Ranking answers and web passages for non-factoid question answering: Emory university at TREC liveqa. In Proceedings of the Text REtrieval Conference.Google Scholar
Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: Summarization with pointer-generator networks. Arxiv Preprint Arxiv:1704.04368 (2017).Google Scholar
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Edinburgh neural machine translation systems for WMT 16. Arxiv Preprint Arxiv:1606.02891 (2016).Google Scholar
Zeyu Sun, Qihao Zhu, Lili Mou, Yingfei Xiong, Ge Li, and Lu Zhang. 2019. A grammar-based structural CNN decoder for code generation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 7055--7062.Google ScholarDigital Library
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 3104--3112.Google Scholar
Qiongjie Tian, Peng Zhang, and Baoxin Li. 2013. Towards predicting the best answers in community-based question-answering services. In Proceedings of the 7th International AAAI Conference on Weblogs and Social Media.Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 5998--6008.Google Scholar
Martin White, Michele Tufano, Matias Martinez, Martin Monperrus, and Denys Poshyvanyk. 2019. Sorting and transforming program repair ingredients via deep learning code similarities. In Proceedings of the IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER’19). IEEE, 479--490.Google ScholarCross Ref
Bowen Xu, Zhenchang Xing, Xin Xia, and David Lo. 2017. AnswerBot: Automated generation of answer summary to developers’ technical questions. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 706--716.Google ScholarDigital Library
Jun Xu and Hang Li. 2007. AdaRank: A boosting algorithm for information retrieval. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 391--398.Google ScholarDigital Library
Xinli Yang, David Lo, Xin Xia, Yun Zhang, and Jianling Sun. 2015. Deep learning for just-in-time defect prediction. In Proceedings of the IEEE International Conference on Software Quality, Reliability and Security. IEEE, 17--26.Google ScholarDigital Library
Jun Zhang, Mark S. Ackerman, and Lada Adamic. 2007. Expertise networks in online communities: Structure and algorithms. In Proceedings of the 16th International Conference on World Wide Web. ACM, 221--230.Google ScholarDigital Library
Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu, Hongyun Bao, and Bo Xu. 2016. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. Arxiv Preprint Arxiv:1611.06639 (2016).Google Scholar

Index Terms

Technical Q8A Site Answer Recommendation via Question Boosting
1. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Maintaining software
      2. Software evolution

Recommendations

Combining evidence with a probabilistic framework for answer ranking and answer merging in question answering

Question answering (QA) aims at finding exact answers to a user's question from a large collection of documents. Most QA systems combine information retrieval with extraction techniques to identify a set of likely candidates and then utilize some ...
Read More
Fact-based question decomposition for candidate answer re-ranking
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Factoid questions often contain one or more assertions (facts) about their answers. However, existing question-answering (QA) systems have not investigated how the multiple facts may be leveraged to enhance system performance. We argue that decomposing ...
Read More
Quality-aware collaborative question answering: methods and evaluation
WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining

Community Question Answering (QA) portals contain questions and answers contributed by hundreds of millions of users. These databases of questions and answers are of great value if they can be used directly to answer questions from any user. In this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Software Engineering and Methodology Volume 30, Issue 1
Continuous Special Section: AI and SE
January 2021
444 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3446626
Editor:
Mauro Pezzè
Università della Svizzera italiana and Università di Milano-Bicocca, Switzerland
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 December 2020
- Revised: 1 July 2020
- Accepted: 1 July 2020
- Received: 1 December 2019
Published in tosem Volume 30, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CQA
deep neural network
question answering
question boosting
sequence-to-sequence
weakly supervised learning
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 387
  Total Downloads
- Downloads (Last 12 months)41
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Technical Q8A Site Answer Recommendation via Question Boosting

ACM Transactions on Software Engineering and Methodology

Abstract

References

Cited By

Index Terms

Recommendations

Combining evidence with a probabilistic framework for answer ranking and answer merging in question answering

Fact-based question decomposition for candidate answer re-ranking

Quality-aware collaborative question answering: methods and evaluation