Abstract
Query Expansion (QE) is widely applied to improve the retrieval performance of ad-hoc search, using different techniques and several data sources to find expansion terms. In Information Retrieval literature, selecting expansion terms remains a challenging task that relies on the extraction of term relationships. In this paper, we propose a new learning to rank-based query expansion model. The main idea behind is that, given a query and the set of its related ARs, our model ranks these ARs according to their relevance score regarding to this query and then selects the most suitable ones to be used in the QE process. Experiments are conducted on three test collections, namely: CLEF2003, TREC-Robust and TREC-Microblog, including long, hard and short queries. Results showed that the retrieval performance can be significantly improved when the ARs ranking method is used compared to other state of the art expansion models, especially for hard and long queries.
Similar content being viewed by others
Notes
By analogy to the itemsets terminology used in data mining for a set of items.
By analogy to the itemset terminology used in data mining.
In this paper, we denote by |X| the cardinality of the set X.
Also referred to as preference learning in the literature
This conclusion is consistent with the results obtained with the precision measures P@5,10 and NDCG@5
A topic is considered difficult when the median of the average precision scores of all participants for that topic is below a given threshold (i.e. half of the systems are scored than lower than the threshold), but there exists at least one high outlier. In this context, the most useful metric is geometric mean average precision (GMAP) which uses the geometric mean instead of the arithmetic mean when averaging precision values.
It includes word vectors for a vocabulary of 3 million words and phrases that they trained on roughly 100 billion words from a Google News dataset. The vector length is 300 features.
References
Abbache, A., Meziane, F., Belalem, G., Belkredim, F.Z. (2016). Arabic query expansion using wordnet and association rules. IJIIT, 12(3), 51–64.
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In VLDB’94, Proceedings of 20th International Conference on Very Large Data Bases, September 12-15, 1994, Santiago de Chile, Chile (pp. 487–499).
Al-Shboul, B., & Myaeng, S.H. (2014). Wikipedia-based query phrase expansion in patent class search. Information Retrieval, 17(5-6), 430–451.
Almasri, M., Berrut, C., Chevallet, J. (2016). A comparison of deep learning based query expansion with pseudo-relevance feedback and mutual information. In Advances in information retrieval-38th european conference on IR research, ECIR 2016, Padua, Italy, March 20-23, 2016. Proceedings (pp. 709–715).
Amati, G., & Van Rijsbergen, C.J. (2002). Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions Information Systems, 20(4), 357–389.
Bouziri, A., Latiri, C., Gaussier, É., Gelbukh, A.F. (2017). Efficient association rules selecting for automatic query expansion. In Computational linguistics and intelligent text processing - 18th international conference, CICLing 2017, Budapest, Hungary, April 17-23, 2017, Lecture notes in computer science, (Vol. 10762 pp. 563–574): Springer.
Buckley, C. (1994). Automatic query expansion using smart : Trec 3. In In proceedings of the third text retrieval conference (TREC-3), pages= 69–80.
Cao, G., Nie, J., Gao, J., Robertson, S. (2008). Selecting good expansion terms for pseudo-relevance feedback. In Myaeng, S., Oard, D.W., Sebastiani, F., Chua, T., Leong, M. (Eds.) Proceedings of the 31st annual international ACM SIGIR conference 2008, Singapore, July 20-24, 2008 (pp. 243–250): ACM.
Carpineto, C., de Mori, R., Romano, G., Bigi, B. (2001). An information-theoretic approach to automatic query expansion. ACM Transactions Information Systems, 19(1), 1–27.
Carpineto, C., & Romano, G. (2012). A survey of automatic query expansion in information retrieval. ACM Computing Surveys, 44(1), 1–1, 50.
Colace, F., Santo, M.D., Greco, L., Napoletano, P. (2015). Improving relevance feedback-based query expansion by the use of a weighted word pairs approach. JASIST, 66(11), 2223–2234.
Crimp, R., & Trotman, A. (2018). Refining query expansion terms using query context. In Proceedings of the 23rd Australasian Document Computing Symposium, ADCS ’18 (pp. 12:1–12:4): ACM.
Diaz, F., Mitra, B., Craswell, N. (2016). Query expansion with locally-trained word embeddings. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers, pp. 367–377. The Association for Computer Linguistics.
Fernández-Reyes, F. C., Hermosillo-Valadez, J., Montes-y-Gomez, M. (2018). A prospect-guided global query expansion strategy using word embeddings. Information Processing & Management, 54(1), 1–13.
Ganter, B., & Wille, R. (1999). Formal Concept Analysis. Berlin: Springer.
Houle, M.E., Ma, X., Oria, V., Sun, J. (2017). Query expansion for content-based similarity search using local and global features. ACM Transactions on Multimedia Computing, Communications, and Applications, 13(3), 1–23.
Joachims, T. (2006). Training linear svms in linear time. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06 (pp. 217–226). ACM.
Latiri, C., Haddad, H., Hamrouni, T. (2012). Towards an effective automatic query expansion process using an association rule mining approach. Journal of Intelligent Information System, 39(1), 209–247.
Lavrenko, V., & Croft, W.B. (2001). Relevance based language models. In Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’01 (pp. 120–127). New York: ACM.
Li, H. (2014). Learning to rank for information retrieval and natural language processing, second edition. Synthesis Lectures on Human Language Technologies, 7(3), 1–121.
Lin, H.C., Wang, L.H., Chen, S.M. (2006). Query expansion for document retrieval based on fuzzy rules and user relevance feedback techniques. Expert Systems with Applications, 31(2), 397–405.
Liu, C., Qi, R., Liu, Q. (2013). Query expansion terms based on positive and negative association rules. In 2013 IEEE Third international conference on information science and technology (ICIST) (pp. 802–808).
Lv, Y., & Zhai, C. (2014). Revisiting the divergence minimization feedback model. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM ’14 (p. 1863–1866). New York: ACM.
Lv, Y., Zhai, C., Chen, W. (2011). A boosting approach to improving pseudo-relevance feedback. In Ma, W., Nie, J., Baeza-Yates, R.A., Chua, T., Croft, W.B. (Eds.) Proceeding of the 34th International ACM SIGIR 2011, Beijing, China, July 25-29, 2011 (pp. 165–174): ACM.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States (pp. 3111–3119).
Ounis, I., Macdonald, C., Lin, J., Soboroff, I. (2011). Overview of the TREC-2011 microblog track. In In Proceedings of TREC 2011.
Pal, D., Mitra, M., Datta, K. (2014). Improving query expansion using wordnet. Journal of the Association for Information Science and Technology, 65(12), 2469–2478.
Pasquier, N., Bastide, Y., Taouil, R., Stumme, G., Lakhal, L. (2005). Generating a condensed representation for association rules. Journal of Intelligent Information Systems, 24(1), 25–60.
Rungsawang, A., Tangpong, A., Laohawee, P., Khampachua, T. (1999). Novel query expansion technique using apriori algorithm. In Proceedings of the 8th Text REtrieval Conference, TREC 8, pp. 453–456. Gaithersburg, Maryland.
Ruthven, I., & Lalmas, M. (2003). A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review, 18(2), 95–145.
Salton, G., & Buckley, C. (1997). Readings in information retrieval. chap. Improving Retrieval Performance by Relevance Feedback, Morgan Kaufmann Publishers Inc.
Sordoni, A., Bengio, Y., Nie, J. (2014). Learning concept embeddings for query expansion by quantum entropy minimization. In Brodley, C.E., & Stone. P. (Eds.) Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27 -31, 2014, Quėbec City, Quėbec, Canada (pp. 1586–1592): AAAI Press.
Voorhees, E.M. (2004). Overview of TREC 2004. In Proceedings of the thirteenth text retrieval conference, TREC 2004, Gaithersburg, Maryland, USA, November 16-19, 2004.
Xu, B., Lin, H., Lin, Y. (2016). Assessment of learning to rank methods for query expansion. JASIST, 67(6), 1345–1357.
Xu, B., Lin, H., Lin, Y. (2018). Learning to refine expansion terms for biomedical information retrieval using semantic resources. IEEE/ACM Transactions on Computational Biology and Bioinformatics, pp. 1–15.
Xu, J., & Croft, W.B. (1996). Query expansion using local and global document analysis. In Proceedings of the 19th Annual International ACM SIGIR Conference (pp. 4–11). Zurich: ACM Press.
Ye, Z., He, B., Huang, X., Lin, H. (2010). Revisiting Rocchio’s Relevance Feedback Algorithm for Probabilistic Models, Springer, Berlin.
Zaki, M.J. (2004). Mining non-redundant association rules. Data Mining and Knowledge Discovery, 9(3), 223–248.
Zhai, C., & Lafferty, J. (2001). Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM ’01 (pp. 403–410). ACM.
Zhang, Z., Wang, Q., Si, L., Gao, J. (2016). Learning for efficient supervised query expansion via two-stage feature selection. In Perego, R., Sebastiani, F., Aslam, J.A., Ruthven, I., Zobel, J. (Eds.) Proceedings of the 39th International ACM SIGIR conference 2016, Pisa, Italy, July 17-21, 2016 (pp. 265–274): ACM.
Zingla, M.A., Latiri, C., Mulhem, P., Berrut, C., Slimani, Y. (2018). Hybrid query expansion model for text and microblog information retrieval. Information Retrieval Journal, 21(4), 337–367.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bouziri, A., Latiri, C. & Gaussier, E. LTR-expand: query expansion model based on learning to rank association rules. J Intell Inf Syst 55, 261–286 (2020). https://doi.org/10.1007/s10844-020-00596-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-020-00596-8