Abstract
Software developers have heavily used online question-and-answer platforms to seek help to solve their technical problems. However, a major problem with these technical Q8A sites is “answer hungriness,” i.e., a large number of questions remain unanswered or unresolved, and users have to wait for a long time or painstakingly go through the provided answers with various levels of quality. To alleviate this time-consuming problem, we propose a novel DEEPANS neural network–based approach to identify the most relevant answer among a set of answer candidates. Our approach follows a three-stage process: question boosting, label establishment, and answer recommendation. Given a post, we first generate a clarifying question as a way of question boosting. We automatically establish the positive, neutral+, neutral-, and negative training samples via label establishment. When it comes to answer recommendation, we sort answer candidates by the matching scores calculated by our neural network–based model. To evaluate the performance of our proposed model, we conducted a large-scale evaluation on four datasets, collected from the real-world technical Q8A sites (i.e., Ask Ubuntu, Super User, Stack Overflow Python, and Stack Overflow Java). Our experimental results show that our approach significantly outperforms several state-of-the-art baselines in automatic evaluation. We also conducted a user study with 50 solved/unanswered/unresolved questions. The user-study results demonstrate that our approach is effective in solving the answer-hungry problem by recommending the most relevant answers from historical archives.
- Lada A. Adamic, Jun Zhang, Eytan Bakshy, and Mark S. Ackerman. 2008. Knowledge sharing and Yahoo answers: Everyone knows something. In Proceedings of the 17th International Conference on World Wide Web. ACM, 665--674Google Scholar
- Arvind Agarwal, Hema Raghavan, Karthik Subbian, Prem Melville, Richard D. Lawrence, David C. Gondek, and James Fan. 2012. Learning to rank for robust question answering. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, 833--842.Google ScholarDigital Library
- Mohammad Alahmadi, Jonathan Hassel, Biswas Parajuli, Sonia Haiduc, and Piyush Kumar. 2018. Accurately predicting the location of code fragments in programming video tutorials using deep learning. In Proceedings of the 14th International Conference on Predictive Models and Data Analytics in Software Engineering. ACM, 2--11.Google ScholarDigital Library
- Azilawati Azizan and Zainab Abu Bakar. 2015. Query reformulation using crop characteristic in specific domain search. In Proceedings of the IEEE European Modelling Symposium (EMS’15). IEEE, 374--379.Google ScholarCross Ref
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Arxiv Preprint Arxiv:1409.0473 (2014).Google Scholar
- Steven Bird and Edward Loper. 2004. NLTK: the natural language toolkit. In Proceedings of the ACL Conference on Interactive Poster and Demonstration Sessions. Association for Computational Linguistics, 31.Google ScholarDigital Library
- Christopher J. C. Burges. 2010. From RankNet to LambdaRank to LambdaMart: An overview. Learning 11, 23-581 (2010), 81.Google Scholar
- Fabio Calefato, Filippo Lanubile, and Nicole Novielli. 2016. Moving to stack overflow: Best-answer prediction in legacy developer forums. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. ACM, 13.Google ScholarDigital Library
- Fabio Calefato, Filippo Lanubile, and Nicole Novielli. 2019. An empirical assessment of best-answer prediction models in technical Q8A sites. Empir. Softw. Eng. 24, 2 (2019), 854--901.Google ScholarDigital Library
- Qingying Chen and Minghui Zhou. 2018. A neural framework for retrieval and summarization of source code. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 826--831.Google ScholarDigital Library
- Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, Aug. (2011), 2493--2537.Google ScholarDigital Library
- Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language modeling with gated convolutional networks. In Proceedings of the International Conference on Machine Learning. 933--941.Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. Arxiv Preprint Arxiv:1810.04805 (2018).Google Scholar
- Haoyu Dong, Shijie Liu, Shi Han, Zhouyu Fu, and Dongmei Zhang. 2019. TableSense: Spreadsheet table detection with convolutional neural networks. (2019).Google Scholar
- Zhipeng Gao, Vinoj Jayasundara, Lingxiao Jiang, Xin Xia, David Lo, and John Grundy. 2019. SmartEmbed: A tool for clone and bug detection in smart contracts through structural code embedding. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME’19). IEEE, 394--397. DOI:10.1109/ICSME.2019.00067Google ScholarCross Ref
- Z. Gao, L. Jiang, X. Xia, D. Lo, and J. Grundy. 2020. Checking smart contracts with structural code embedding. IEEE Trans. Softw. Eng. (2020), 1--1. DOI:https://doi.org/10.1109/TSE.2020.2971482Google Scholar
- Zhipeng Gao, Xin Xia, John Grundy, David Lo, and Yuan-Fang Li. 2020. Generating question titles for stack overflow from mined code snippets. Arxiv Preprint Arxiv:2005.10157 (2020).Google Scholar
- Zhipeng Gao, Xin Xia, David Lo, John Grundy, and Yuan-Fang Li. 2020. Code2Que: A tool for improving question titles from mined code snippets in stack overflow. Arxiv Preprint Arxiv:2007.10851 (2020).Google Scholar
- Alex Graves. 2012. Sequence transduction with recurrent neural networks. Arxiv Preprint Arxiv:1211.3711 (2012).Google Scholar
- Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep code search. In Proceedings of the IEEE/ACM 40th International Conference on Software Engineering (ICSE’08). IEEE, 933--944.Google ScholarDigital Library
- Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2016. Deep API learning. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 631--642.Google ScholarDigital Library
- Sonia Haiduc, Gabriele Bavota, Andrian Marcus, Rocco Oliveto, Andrea De Lucia, and Tim Menzies. 2013. Automatic query reformulations for text retrieval in software engineering. In Proceedings of the International Conference on Software Engineering. IEEE Press, 842--851.Google ScholarCross Ref
- Emily Hill, Manuel Roldan-Vega, Jerry Alan Fails, and Greg Mallet. 2014. NL-based query refinement and contextualized code search results: A user study. In Proceedings of the IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE’14). IEEE, 34--43.Google ScholarCross Ref
- Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In Proceedings of the 26th Conference on Program Comprehension. ACM, 200--210.Google ScholarDigital Library
- Qiao Huang, Xin Xia, David Lo, and Gail C. Murphy. 2020. Automating intention mining. IEEE Trans. Softw. Eng. 46, 10 (2020), 1098--1119. DOI:10.1109/TSE.2018.2876340Google ScholarCross Ref
- Qing Huang, Yangrui Yang, Xue Zhan, Hongyan Wan, and Guoqing Wu. 2018. Query expansion based on statistical learning from code changes. Softw.: Pract. Exper. 48, 7 (2018), 1333--1351.Google ScholarCross Ref
- Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 2073--2083.Google ScholarCross Ref
- Maximilian Jenders, Ralf Krestel, and Felix Naumann. 2016. Which answer is best?: Predicting accepted answers in MOOC forums. In Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee, 679--684.Google ScholarDigital Library
- Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. Arxiv Preprint Arxiv:1404.2188 (2014).Google Scholar
- Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In Proceedings of the 41st International Conference on Software Engineering. IEEE Press, 1039--1049.Google ScholarDigital Library
- Junhwi Kim, Minhyuk Kwon, and Shin Yoo. 2018. Generating test input with deep reinforcement learning. In Proceedings of the IEEE/ACM 11th International Workshop on Search-Based Software Testing (SBST’18). IEEE, 51--58.Google ScholarDigital Library
- Yoon Kim. 2014. Convolutional neural networks for sentence classification. Arxiv Preprint Arxiv:1408.5882 (2014).Google Scholar
- Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th Conference on Artificial Intelligence.Google Scholar
- Liuqing Li, He Feng, Wenjie Zhuang, Na Meng, and Barbara Ryder. 2017. CClearner: A deep learning-based clone detection approach. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME’17). IEEE, 249--260.Google ScholarCross Ref
- Zhixing Li, Tao Wang, Yang Zhang, Yun Zhan, and Gang Yin. 2016. Query reformulation by leveraging crowd wisdom for scenario-based software search. In Proceedings of the 8th Asia-Pacific Symposium on Internetware. ACM, 36--44.Google ScholarDigital Library
- Peng Liu, Xiangyu Zhang, Marco Pistoia, Yunhui Zheng, Manoel Marques, and Lingfei Zeng. 2017. Automatic text input generation for mobile testing. In Proceedings of the 39th International Conference on Software Engineering. IEEE Press, 643--653.Google ScholarDigital Library
- Meili Lu, Xiaobing Sun, Shaowei Wang, David Lo, and Yucong Duan. 2015. Query expansion via WordNet for effective code search. In Proceedings of the IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER’15). IEEE, 545--549.Google Scholar
- Liming Nie, He Jiang, Zhilei Ren, Zeyi Sun, and Xiaochen Li. 2016. Query expansion based on crowd knowledge for code search. IEEE Trans. Serv. Comput. 9, 5 (2016), 771--783.Google ScholarCross Ref
- Liqiang Nie, Xiaochi Wei, Dongxiang Zhang, Xiang Wang, Zhipeng Gao, and Yi Yang. 2017. Data-driven answer selection in community QA systems. IEEE Trans. Knowl. Data Eng. 29, 6 (2017), 1186--1198.Google ScholarDigital Library
- Sudha Rao and Hal Daumé III. 2018. Learning to ask good questions: Ranking clarification questions using neural expected value of perfect information. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2737--2746. DOI:https://doi.org/10.18653/v1/P18-1255Google ScholarCross Ref
- Tirath Prasad Sahu, Naresh Kumar Nagwani, and Shrish Verma. 2016. Selecting best answer: An empirical analysis on community question answering sites. IEEE Access 4 (2016), 4797--4808.Google ScholarCross Ref
- Denis Savenkov. 2015. Ranking answers and web passages for non-factoid question answering: Emory university at TREC liveqa. In Proceedings of the Text REtrieval Conference.Google Scholar
- Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: Summarization with pointer-generator networks. Arxiv Preprint Arxiv:1704.04368 (2017).Google Scholar
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Edinburgh neural machine translation systems for WMT 16. Arxiv Preprint Arxiv:1606.02891 (2016).Google Scholar
- Zeyu Sun, Qihao Zhu, Lili Mou, Yingfei Xiong, Ge Li, and Lu Zhang. 2019. A grammar-based structural CNN decoder for code generation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 7055--7062.Google ScholarDigital Library
- Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 3104--3112.Google Scholar
- Qiongjie Tian, Peng Zhang, and Baoxin Li. 2013. Towards predicting the best answers in community-based question-answering services. In Proceedings of the 7th International AAAI Conference on Weblogs and Social Media.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 5998--6008.Google Scholar
- Martin White, Michele Tufano, Matias Martinez, Martin Monperrus, and Denys Poshyvanyk. 2019. Sorting and transforming program repair ingredients via deep learning code similarities. In Proceedings of the IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER’19). IEEE, 479--490.Google ScholarCross Ref
- Bowen Xu, Zhenchang Xing, Xin Xia, and David Lo. 2017. AnswerBot: Automated generation of answer summary to developers’ technical questions. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 706--716.Google ScholarDigital Library
- Jun Xu and Hang Li. 2007. AdaRank: A boosting algorithm for information retrieval. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 391--398.Google ScholarDigital Library
- Xinli Yang, David Lo, Xin Xia, Yun Zhang, and Jianling Sun. 2015. Deep learning for just-in-time defect prediction. In Proceedings of the IEEE International Conference on Software Quality, Reliability and Security. IEEE, 17--26.Google ScholarDigital Library
- Jun Zhang, Mark S. Ackerman, and Lada Adamic. 2007. Expertise networks in online communities: Structure and algorithms. In Proceedings of the 16th International Conference on World Wide Web. ACM, 221--230.Google ScholarDigital Library
- Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu, Hongyun Bao, and Bo Xu. 2016. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. Arxiv Preprint Arxiv:1611.06639 (2016).Google Scholar
Index Terms
- Technical Q8A Site Answer Recommendation via Question Boosting
Recommendations
Combining evidence with a probabilistic framework for answer ranking and answer merging in question answering
Question answering (QA) aims at finding exact answers to a user's question from a large collection of documents. Most QA systems combine information retrieval with extraction techniques to identify a set of likely candidates and then utilize some ...
Fact-based question decomposition for candidate answer re-ranking
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementFactoid questions often contain one or more assertions (facts) about their answers. However, existing question-answering (QA) systems have not investigated how the multiple facts may be leveraged to enhance system performance. We argue that decomposing ...
Quality-aware collaborative question answering: methods and evaluation
WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data MiningCommunity Question Answering (QA) portals contain questions and answers contributed by hundreds of millions of users. These databases of questions and answers are of great value if they can be used directly to answer questions from any user. In this ...
Comments