Abstract
In this paper, we develop an interactive hierarchical topic search system. In our system, the generation of topic names is mainly based on the N-gram statistical language model. The construction of hierarchical tree relationships between topics is mainly based on the concept of mathematical sets. In this study, the concept of mathematical sets not only helps the system to construct a topic hierarchy tree quickly, but also allows different users to use different binary operations to generate different interactive search results. In general, this study has the following three advantages. First, the generated topic names are presented in a hierarchical form rather than a flat form. Secondly, the interactive search for this study was achieved by non-stored user search and click history. Therefore, our approach can avoid personal privacy and large storage space issues. Finally, the concept of mathematical sets not only allows us to generate topic trees in linear time, but also allows users to run all possible binary operations to meet various interactive search needs.
Similar content being viewed by others
References
Akhlaghian F, Arzanian B, Moradi P (2010) A personalized search engine using ontology-based fuzzy concept networks. In: Proceedings of the 2010 International Conference on Data Storage and Data Engineering, pp. 137-141.
Alattar, B., & Norwawi, N. M. (2016). A personalized search engine based on correlation clustering method. Journal of Theoretical and Applied Information Technology, 93(2), 345–352.
Aydin, M. N., & Perdahci, N. Z. (2019). Dynamic network analysis of online interactive platform. Information Systems Frontiers, 21(2), 229–240.
Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern information retrieval: The concepts and technology behind search (2nd ed.). Boston: Addison Wesley Press.
Bao, S., Li, R., Yu, Y., & Cao, Y. (2008). Competitor mining with the web. IEEE Transactions on Knowledge and Data Engineering, 20(10), 1297–1310.
Baraglia, R., Dazzi, P., Mordacchini, M., & Ricci, L. (2013). A peer-to-peer recommender system for self-emerging user communities based on gossip overlays. Journal of Computer and System Sciences, 79(2), 291–308.
Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(1), 993–1022.
Boswell, W. (2015). What is the most popular search engine? https://goo.gl/5v70i9. Accessed 10 January, 2019.
Brenneke R, Mandl T, Womser-hacker C (2011) The development and application of an evaluation methodology for person search engines. In: Proceedings of the 1st European Workshop on Human-Computer Interaction and Information retrieval, Newcastle, UK, June 13-14, 2015 July 4, 2011, pp. 42-45.
Carpineto, C., Osinski, S., Romano, G., & Weiss, D. (2009). A survey of web clustering engines. ACM Computing Surveys, 41(3), 17:11–17:38.
Chawla, S. (2016). A novel approach of cluster based optimal ranking of clicked URLs using genetic algorithm for effective personalized web search. Applied Soft Computing, 46(C), 90–103.
Chen, L.-C. (2011). Building a web-snippet clustering system based on a mixed clustering method. Online Information Review, 35(4), 611–635.
Chen, L.-C., & Luh, C.-J. (2005). Web page prediction from MetaSearch results. Internet Research, 15(4), 421–446.
Chen, C.-L., Tseng, F. S. C., & Liang, T. (2010). Mining fuzzy frequent Itemsets for hierarchical document clustering. Information Processing and Management, 46(2), 193–211.
Chiang, M.-C., Tsai, C.-W., & Yang, C.-S. (2011). A time-efficient pattern reduction algorithm for K-means clustering. Information Sciences, 181(4), 716–731.
Chitika. (2012). Average number of words in a query. https://goo.gl/Bh9iqC. Accessed 10 December 2019.
Chitika. (2013). The value of Google result positioning. https://goo.gl/Uewg59. Accessed 10 December 2019.
Cilibrasi, R. L., & Vit’anyi, P. M. B. (2007). The Google similarity distance. IEEE Transaction on Knowledge and Data Engineering, 19(3), 370–383.
Cobos, C., Muñoz-Collazos, H., Urbano-Muñoz, R., Mendoza, M., León, E., & Herrera-Viedma, E. (2014). Clustering of web search results based on the cuckoo search algorithm and balanced Bayesian information criterion. Information Sciences, 281(1), 248–264.
Croft, B., & Lafferty, J. (2013). Language modeling for information retrieval. Berlin: Springer Science & Business Media.
Dang, Y. M., Zhang, Y. G., Brown, S. A., & Chen, H. (2018). Examining the impacts of mental workload and task-technology fit on user acceptance of the social media search system. Information Systems Frontiers. https://doi.org/10.1007/s10796-018-9879-y.
Das, B., Pal, S., Mondal, S. K., Dalui, D., & Shome, S. K. (2013). Automatic keyword extraction from any text document using N-gram rigid collocation. International Journal of Soft Computing and Engineering, 3(2), 238–242.
Divya, R., Robin, C. R. R. (2014). Onto-search: An ontology based personalized Mobile search engine. In: Proceedings of the 2014 International Conference on Green Computing Communication and Electrical Engineering, pp. 1-4.
Ferragina, P., & Guli, A. (2008). A personalized search engine based on web-snippet hierarchical clustering. Software: Practice and Experience, 38(2), 189–225.
Fox, C. (1989). A stop list for general text. ACM SIGIR Forum, 24(1–2), 19–35.
Gamare, P. S., & Patil, G. A. (2015). Web document clustering using hybrid approach in data mining. International Journal of Research in Advent Technology, 3(7), 92–97.
Google. (2010). Google trends. http://www.google.com/trends. Accessed 11 September 2012.
Google. (2017). Google search history. https://google.com/history. Accessed 10 January, 2019.
Guo, X., Wei, Q., Chen, G., Zhang, J., & Qiao, D. (2017). Extracting representative information on intra-organizational blogging platforms. MIS Quarterly, 41(4), 1105–1127.
Hazel, P. (2018). PCRE - Perl Compatible Regular Expressions. http://www.pcre.org/. Accessed 10 December 2019.
Hong, X., Shen, T., Shen, L., Yu, Z., & Guo, J. (2014). Unstructured data extraction of Chinese expert web page. International Journal of Wireless and Mobile Computing, 7(2), 132–136.
IDC. (2014). The digital universe of opportunities: rich data and the increasing value of the internet of things. https://goo.gl/GbmFKN. Accessed 10 January, 2019.
Jiang, Z., Deng, X. (2010). A personalized search engine model based on RSS User's interest. In: Proceedings of the 2010 2nd International Conference on Future Computer and Communication, pp. V2-196-V192-199.
Jinarat, S., Haruechaiyasak, C., & Rungsawang, A. (2015). Graph-based concept clustering for web search results. International Journal of Electrical and Computer Engineering, 5(6), 1536–1544.
Jing, L., Ng, M. K., & Huang, J. Z. (2010). Knowledge-based vector space model for text clustering. Knowledge and Information Systems, 25(1), 35–55.
Laniado, D., Volkovich, Y., Scellato, S., Mascolo, C., & Kaltenbrunner, A. (2018). The impact of geographic distance on online social interactions. Information Systems Frontiers, 20(6), 1203–1218.
Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of Massive Datasets. Cambridge: Cambridge University Press.
Leung, KW-T, Lee, D. L., Ng, W., Fung, H. Y. (2012). A framework for personalizing web search with concept-based user profiles. ACM Transactions on Internet Technology 11(4): Article No. 17.
Martinez-Gil, J., & Aldana-Montes, J. F. (2013). Semantic similarity measurement using historical Google search patterns. Information Systems Frontiers, 15(3), 399–410.
Mishra, V., Arya, P., Dixit, M. (2012). Improving Mobile search through location based context and personalization. In: Proceedings of the 2012 International Conference on Communication Systems and Network Technologies, pp. 392-396.
Murdock Jr., B. B. (1962). The serial position effect of free recall. Journal of Experimental Psychology, 64(5), 482–488.
Nassif, L. F. C., & Hruschka, E. R. (2013). Document clustering for forensic analysis: An approach for improving computer inspection. IEEE Transactions on Information Forensics and Security, 8(1), 46–54.
Palshikar, G. K., Apte, M., & Pandita, D. (2018). Weakly supervised and online learning of word models for classification to detect disaster reporting tweets. Information Systems Frontiers, 20(5), 949–959.
Porter, M., Boulton, R. (2017). Snowball: A language for Stemming Algorithms. http://snowball.tartarus.org/. Accessed 10 December 2019.
Prakash, B. R., & Hanumanthappa, M. (2012). Web snippet clustering and labeling using lingo algorithm. International Journal of Advanced Research in Computer Science, 3(2), 262–265.
Pushpa CN, Kumar NKV, Shivaprakash T, Thriveni J, Manjula SH, Venugopal KR, Patnaik LM (2011) Improving the precision and recall of web people search using hash table clustering. In: Proceedings of the 5th International Conference on Information Processing, Bangalore, India, pp. 155-160. Springer-Verlag.
Ramesh, N., & Andrews, J. (2015). Personalized search engine using social networking activity. Indian Journal of Science and Technology, 8(4), 301–306.
Roark, B., Saraclar, M., & Collins, M. (2007). Discriminative N-gram Language Modeling. Computer Speech and Language, 21(2), 373–392.
Sadaf, K., & Alam, M. (2012). Web search result clustering - a review. International Journal of Computer Science and Engineering Survey, 3(4), 85–92.
Sang, J., & Xu, C. (2011). Browse by chunks: Topic mining and organizing on web-scale social media. ACM Transactions on Multimedia Computing, Communications, and Applications, 7(1), 1–18.
Saxena, N., Agarwal, S., & Katiyar, V. (2016). Personalized web search using user identity. International Journal of Computer Applications, 147(12), 14–17.
Scaiella, U., Ferragina, P., Marino, A., Ciaramita, M. (2012). Topical clustering of search results. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, pp. 223–232.
Singh, A., & Alhadidi, B. (2013). Knowledge oriented personalized search engine: A step towards wisdom web. International Journal of Computer Applications, 76(8), 1–9.
Song, M., Song, I.-Y., & Chen, P. P. (2004). Design and development of a cross search engine for multiple heterogeneous databases using UML and design patterns. Information Systems Frontiers, 6(1), 77–90.
Turetken, O., & Sharda, R. (2005). Clustering-based visual interfaces for presentation of web search results: An empirical investigation. Information Systems Frontiers, 7(3), 273–297.
Van Erkel, P. F., & Thijssen, P. (2016). The first one wins: Distilling the primacy effect. Electoral Studies, 44, 245–254.
Verma, D., Minocha, K., & Kochar, B. (2014). A multi-agent based personalized search engine with topical crawling capabilities. IUP Journal of Computer Sciences, 8(3), 20–33.
Wan, X. (2009). Combining content and context similarities for image retrieval. Lecture Notes in Computer Science, 5478(1), 749–754.
Wang, Z. (2020). A new clustering method based on morphological operations. Expert Systems with Applications, 145, 113102.
Yahoo. (2017). My Yahoo. http://my.yahoo.com. Accessed 10 January, 2019.
Yang, X.-H., Zhu, Q.-P., Huang, Y.-J., Xiao, J., Wang, L., & Tong, F.-C. (2017). Parameter-free Laplacian centrality peaks clustering. Pattern Recognition Letters, 100, 167–173.
Zamir, O., & Etzioni, O. (1999). Grouper: A dynamic clustering Interface to web search results. Computer Networks, 31(11–16), 1361–1374.
Zhang, T., Tang, Y. Y., Fang, B., & Xiang, Y. (2012). Document clustering in correlation similarity measure space. IEEE Transactions on Knowledge and Data Engineering, 24(6), 1004–1013.
Zhao, H., Qi, Z. (2010). Hierarchical agglomerative clustering with ordering constraints. In: Proceedings of the 2010 Third International Conference on Knowledge Discovery and Data Mining, Phuket, Thailand, 9-10 January 2010, pp. 195-199. IEEE.
Acknowledgements
We would like to thank anonymous reviewers of the paper for their constructive comments, which have helped us to improve the paper in several ways. This work was supported in part by Ministry of Science and Technology, Taiwan, under Grant MOST 108-2410-H-259-048-MY3 & 107-2410-H-259-016.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, LC. Interactive Topic Search System Based on Topic Cluster Technology. Inf Syst Front 23, 1227–1243 (2021). https://doi.org/10.1007/s10796-020-10021-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10796-020-10021-8