Skip to main content
Log in

Question retrieval using combined queries in community question answering

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Community question answering (cQA) has emerged as a popular service on the web; users can use it to ask and answer questions and access historical question-answer (QA) pairs. cQA retrieval, as an alternative to general web searches, has several advantages. First, user can register a query in the form of natural language sentences instead of a set of keywords; thus, they can present the required information more clearly and comprehensively. Second, the system returns several possible answers instead of a long list of ranked documents, thereby enhancing the efficient location of the desired answers. Question retrieval from a cQA archive, an essential function of cQA retrieval services, aims to retrieve historical QA pairs relevant to the query question. In this study, combined queries (combined inverted and nextword indexes) are proposed for question retrieval in cQA. The method performance is investigated for two different scenarios: (a) when only questions from QA pairs are used as documents, and (b) when QA pairs are used as documents. In the proposed method, combined indexes are first created for both queries and documents; then, different information retrieval (IR) models are used to retrieve relevant questions from the cQA archive. Evaluation is performed on a public Yahoo! Answers dataset; the results thereby obtained show that using combined queries for all three IR models (vector space model, Okapi model, and language model) improves performance in terms of the retrieval precision and ranking effectiveness. Notably, by using combined indexes when both QA pairs are used as documents, the retrieval and ranking effectiveness of these cQA retrieval models increases significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Agichtein, E., Castillo, C., Donato, D., Gionis, A., & Mishne, G. (2008). Finding high-quality content in social media. In Proceedings of the 2008 international conference on web search and data mining (pp. 183–194): ACM.

  • Bae, K., & Ko, Y. (2019). Efficient question classification and retrieval using category information and word embedding on cqa services. Journal of Intelligent Information Systems, 53(1), 27–49.

    Article  Google Scholar 

  • Berger, A., Caruana, R., Cohn, D., Freitag, D., & Mittal, V. (2000). Bridging the lexical chasm: statistical approaches to answer-finding. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval (pp. 192–199): ACM.

  • Bian, J., Liu, Y., Agichtein, E., & Zha, H. (2008). Finding the right facts in the crowd: factoid question answering over social media. In Proceedings of the 17th international conference on World Wide Web (pp. 467–476): ACM.

  • Burke, R. D., Hammond, K. J., Kulyukin, V., Lytinen, S. L., Tomuro, N., & Schoenberg, S. (1997). Question answering from frequently asked question files: Experiences with the faq finder system. AI Magazine, 18(2), 57.

    Google Scholar 

  • Cao, X., Cong, G., Cui, B., & Jensen, C. S. (2010). A generalized framework of exploring category information for question retrieval in community question answer archives. In Proceedings of the 19th international conference on World Wide Web (pp. 201–210): ACM.

  • Cao, X., Cong, G., Cui, B., Jensen, C. S., & Zhang, C. (2009). The use of categorization information in language models for question retrieval. In Proceedings of the 18th ACM conference on Information and knowledge management (pp. 265–274): ACM.

  • Chekuri, C., Goldwasser, M. H., Raghavan, P., & Upfal, E. (1997). Web search using automatic classification. In Proceedings of the sixth international conference on the World Wide Web. Citeseer (pp. 1–8).

  • Duan, H., Cao, Y., Lin, C. Y., & Yu, Y. (2008). Searching questions by identifying question topic and question focus. In ACL (pp. 156–164).

  • Fagan, J. L. (1987). Experiments in automatic phrase indexing for document retrieval: a comparison of syntactic and non-syntactic methods. Tech. rep., Cornell University.

  • Hong, B., & Kim, Y. (2016). A weighted question retrieval model using descriptive information in community question answering. In Proceedings of the international conference on Research in Adaptive and Convergent Systems, RACS ’16 (pp. 35–39).

  • Jeon, J., Croft, W. B., & Lee, J. H. (2005). Finding similar questions in large question and answer archives. In Proceedings of the 14th ACM international conference on information and knowledge management (pp. 84–90): ACM.

  • Jeon, J., Croft, W. B., Lee, J. H., & Park, S. (2006). A framework to predict the quality of answers with non-textual features. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 228–235): ACM.

  • Jijkoun, V., & de Rijke, M. (2005). Retrieving answers from frequently asked questions pages on the web. In Proceedings of the 14th ACM international conference on Information and knowledge management (pp. 76–83): ACM.

  • Lam, W., Ruiz, M., & Srinivasan, P. (1999). Automatic text categorization and its application to text retrieval. IEEE Transactions on Knowledge and Data engineering, 11(6), 865–879.

    Article  Google Scholar 

  • Liu, Y., Bian, J., & Agichtein, E. (2008). Predicting information seeker satisfaction in community question answering. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 483–490): ACM.

  • Othman, N., Faiz, R., & Smaïli, K. (2019). Manhattan siamese lstm for question retrieval in community question answering. In Panetto, H., Debruyne, C., Hepp, M., Lewis, D., Ardagna, C. A., & Meersman, R. (Eds.) On the move to meaningful internet systems: OTM 2019 conferences (pp. 661–677). Cham: Springer.

  • Othman, N., Faiz, R., & Smaïli, K. (2019). Enhancing question retrieval in community question answering using word embeddings. Procedia Computer Science, 159, 485–494. https://doi.org/10.1016/j.procs.2019.09.203. Knowledge Based and Intelligent Information and Engineering Systems: Proceedings of the 23rd International Conference KES2019.

    Article  Google Scholar 

  • Riezler, S., Vasserman, A., Tsochantaridis, I., Mittal, V., & Liu, Y. (2007). Statistical machine translation for query expansion in answer retrieval. In Annual meeting-association for computational linguistics (p. 464).

  • Schenker, A., Kandel, A., Bunke, H., & Last, M. (2005). Graph-theoretic techniques for web content mining Vol. 62. Singapore: World Scientific.

    Book  Google Scholar 

  • Soricut, R., & Brill, E. (2004). Automatic question answering: beyond the factoid. In HLT-NAACL (pp. 57–64).

  • Spink, A., Wolfram, D., Jansen, M. B., & Saracevic, T. (2001). Searching the web: the public and their queries. Journal of the Association for Information Science and Technology, 52(3), 226–234.

    Google Scholar 

  • Wang, K., Ming, Z., & Chua, T. S. (2009). A syntactic tree matching approach to finding similar questions in community-based qa services. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (pp. 187–194): ACM.

  • Williams, H. E., Zobel, J., & Anderson, P. (1999). What’s next? index structures for efficient phrase querying. In Australasian database conference (pp. 141–152).

  • Williams, H. E., Zobel, J., & Bahle, D. (2004). Fast phrase querying with combined indexes. ACM Transactions on Information Systems (TOIS), 22(4), 573–594.

    Article  Google Scholar 

  • Xue, X., Jeon, J., & Croft, W. B. (2008). Retrieval models for question and answer archives. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 475–482): ACM.

  • Ye, B., Feng, G., Cui, A., & Li, M. (2017). Learning question similarity with recurrent neural networks (pp. 111–118), DOI https://doi.org/10.1109/ICBK.2017.46, (to appear in print).

  • Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval (pp. 334–342): ACM.

  • Zhang, K., Wu, W., Wang, F., Zhou, M., & Li, Z. (2016). Learning distributed representations of data in community question answering for question retrieval. In Proceedings of the Ninth ACM international conference on web search and data mining, WSDM ’16 (pp. 533–542): ACM.

  • Zhang, K., Wu, W., Wu, H., Li, Z., & Zhou, M. (2014). Question retrieval with high quality answers in community question answering. In Proceedings of the 23rd ACM international conference on conference on information and knowledge management (pp. 371–380): ACM.

  • Zhou, G., Cai, L., Zhao, J., & Liu, K. (2011). Phrase-based translation model for question retrieval in community question answer archives. In Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies-volume 1. Association for Computational Linguistics (pp. 653–662).

  • Zhou, G., He, T., Zhao, J., & Hu, P. (2015). Learning continuous word embedding with metadata for question retrieval in community question answering. In ACL (1) (pp. 250–259).

  • Zhou, G., & Huang, J. X. (2017). Modeling and learning distributed word representation with metadata for question retrieval. IEEE Transactions on Knowledge and Data Engineering, 29(6), 1226–1239. https://doi.org/10.1109/TKDE.2017.2665625.

    Article  Google Scholar 

  • Zobel, J., & Moffat, A. (1998). Exploring the similarity space. In ACM SIGIR Forum (pp. 18–34): ACM.

  • Zobel, J., & Moffat, A. (2006). Inverted files for text search engines. ACM computing surveys (CSUR), 38(2), 6.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdul Majid.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khushhal, S., Majid, A., Abbas, S.A. et al. Question retrieval using combined queries in community question answering. J Intell Inf Syst 55, 307–327 (2020). https://doi.org/10.1007/s10844-020-00612-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-020-00612-x

Keywords

Navigation