Skip to main content
Log in

Interactive Topic Search System Based on Topic Cluster Technology

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

In this paper, we develop an interactive hierarchical topic search system. In our system, the generation of topic names is mainly based on the N-gram statistical language model. The construction of hierarchical tree relationships between topics is mainly based on the concept of mathematical sets. In this study, the concept of mathematical sets not only helps the system to construct a topic hierarchy tree quickly, but also allows different users to use different binary operations to generate different interactive search results. In general, this study has the following three advantages. First, the generated topic names are presented in a hierarchical form rather than a flat form. Secondly, the interactive search for this study was achieved by non-stored user search and click history. Therefore, our approach can avoid personal privacy and large storage space issues. Finally, the concept of mathematical sets not only allows us to generate topic trees in linear time, but also allows users to run all possible binary operations to meet various interactive search needs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Akhlaghian F, Arzanian B, Moradi P (2010) A personalized search engine using ontology-based fuzzy concept networks. In: Proceedings of the 2010 International Conference on Data Storage and Data Engineering, pp. 137-141.

  • Alattar, B., & Norwawi, N. M. (2016). A personalized search engine based on correlation clustering method. Journal of Theoretical and Applied Information Technology, 93(2), 345–352.

    Google Scholar 

  • Aydin, M. N., & Perdahci, N. Z. (2019). Dynamic network analysis of online interactive platform. Information Systems Frontiers, 21(2), 229–240.

    Article  Google Scholar 

  • Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern information retrieval: The concepts and technology behind search (2nd ed.). Boston: Addison Wesley Press.

    Google Scholar 

  • Bao, S., Li, R., Yu, Y., & Cao, Y. (2008). Competitor mining with the web. IEEE Transactions on Knowledge and Data Engineering, 20(10), 1297–1310.

    Article  Google Scholar 

  • Baraglia, R., Dazzi, P., Mordacchini, M., & Ricci, L. (2013). A peer-to-peer recommender system for self-emerging user communities based on gossip overlays. Journal of Computer and System Sciences, 79(2), 291–308.

    Article  Google Scholar 

  • Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.

    Article  Google Scholar 

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(1), 993–1022.

    Google Scholar 

  • Boswell, W. (2015). What is the most popular search engine? https://goo.gl/5v70i9. Accessed 10 January, 2019.

  • Brenneke R, Mandl T, Womser-hacker C (2011) The development and application of an evaluation methodology for person search engines. In: Proceedings of the 1st European Workshop on Human-Computer Interaction and Information retrieval, Newcastle, UK, June 13-14, 2015 July 4, 2011, pp. 42-45.

  • Carpineto, C., Osinski, S., Romano, G., & Weiss, D. (2009). A survey of web clustering engines. ACM Computing Surveys, 41(3), 17:11–17:38.

    Article  Google Scholar 

  • Chawla, S. (2016). A novel approach of cluster based optimal ranking of clicked URLs using genetic algorithm for effective personalized web search. Applied Soft Computing, 46(C), 90–103.

    Article  Google Scholar 

  • Chen, L.-C. (2011). Building a web-snippet clustering system based on a mixed clustering method. Online Information Review, 35(4), 611–635.

    Article  Google Scholar 

  • Chen, L.-C., & Luh, C.-J. (2005). Web page prediction from MetaSearch results. Internet Research, 15(4), 421–446.

    Article  Google Scholar 

  • Chen, C.-L., Tseng, F. S. C., & Liang, T. (2010). Mining fuzzy frequent Itemsets for hierarchical document clustering. Information Processing and Management, 46(2), 193–211.

    Article  Google Scholar 

  • Chiang, M.-C., Tsai, C.-W., & Yang, C.-S. (2011). A time-efficient pattern reduction algorithm for K-means clustering. Information Sciences, 181(4), 716–731.

    Article  Google Scholar 

  • Chitika. (2012). Average number of words in a query. https://goo.gl/Bh9iqC. Accessed 10 December 2019.

  • Chitika. (2013). The value of Google result positioning. https://goo.gl/Uewg59. Accessed 10 December 2019.

  • Cilibrasi, R. L., & Vit’anyi, P. M. B. (2007). The Google similarity distance. IEEE Transaction on Knowledge and Data Engineering, 19(3), 370–383.

    Article  Google Scholar 

  • Cobos, C., Muñoz-Collazos, H., Urbano-Muñoz, R., Mendoza, M., León, E., & Herrera-Viedma, E. (2014). Clustering of web search results based on the cuckoo search algorithm and balanced Bayesian information criterion. Information Sciences, 281(1), 248–264.

    Article  Google Scholar 

  • Croft, B., & Lafferty, J. (2013). Language modeling for information retrieval. Berlin: Springer Science & Business Media.

    Google Scholar 

  • Dang, Y. M., Zhang, Y. G., Brown, S. A., & Chen, H. (2018). Examining the impacts of mental workload and task-technology fit on user acceptance of the social media search system. Information Systems Frontiers. https://doi.org/10.1007/s10796-018-9879-y.

  • Das, B., Pal, S., Mondal, S. K., Dalui, D., & Shome, S. K. (2013). Automatic keyword extraction from any text document using N-gram rigid collocation. International Journal of Soft Computing and Engineering, 3(2), 238–242.

    Google Scholar 

  • Divya, R., Robin, C. R. R. (2014). Onto-search: An ontology based personalized Mobile search engine. In: Proceedings of the 2014 International Conference on Green Computing Communication and Electrical Engineering, pp. 1-4.

  • Ferragina, P., & Guli, A. (2008). A personalized search engine based on web-snippet hierarchical clustering. Software: Practice and Experience, 38(2), 189–225.

    Google Scholar 

  • Fox, C. (1989). A stop list for general text. ACM SIGIR Forum, 24(1–2), 19–35.

    Article  Google Scholar 

  • Gamare, P. S., & Patil, G. A. (2015). Web document clustering using hybrid approach in data mining. International Journal of Research in Advent Technology, 3(7), 92–97.

    Google Scholar 

  • Google. (2010). Google trends. http://www.google.com/trends. Accessed 11 September 2012.

  • Google. (2017). Google search history. https://google.com/history. Accessed 10 January, 2019.

  • Guo, X., Wei, Q., Chen, G., Zhang, J., & Qiao, D. (2017). Extracting representative information on intra-organizational blogging platforms. MIS Quarterly, 41(4), 1105–1127.

    Article  Google Scholar 

  • Hazel, P. (2018). PCRE - Perl Compatible Regular Expressions. http://www.pcre.org/. Accessed 10 December 2019.

  • Hong, X., Shen, T., Shen, L., Yu, Z., & Guo, J. (2014). Unstructured data extraction of Chinese expert web page. International Journal of Wireless and Mobile Computing, 7(2), 132–136.

    Article  Google Scholar 

  • IDC. (2014). The digital universe of opportunities: rich data and the increasing value of the internet of things. https://goo.gl/GbmFKN. Accessed 10 January, 2019.

  • Jiang, Z., Deng, X. (2010). A personalized search engine model based on RSS User's interest. In: Proceedings of the 2010 2nd International Conference on Future Computer and Communication, pp. V2-196-V192-199.

  • Jinarat, S., Haruechaiyasak, C., & Rungsawang, A. (2015). Graph-based concept clustering for web search results. International Journal of Electrical and Computer Engineering, 5(6), 1536–1544.

    Google Scholar 

  • Jing, L., Ng, M. K., & Huang, J. Z. (2010). Knowledge-based vector space model for text clustering. Knowledge and Information Systems, 25(1), 35–55.

    Article  Google Scholar 

  • Laniado, D., Volkovich, Y., Scellato, S., Mascolo, C., & Kaltenbrunner, A. (2018). The impact of geographic distance on online social interactions. Information Systems Frontiers, 20(6), 1203–1218.

    Article  Google Scholar 

  • Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of Massive Datasets. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Leung, KW-T, Lee, D. L., Ng, W., Fung, H. Y. (2012). A framework for personalizing web search with concept-based user profiles. ACM Transactions on Internet Technology 11(4): Article No. 17.

  • Martinez-Gil, J., & Aldana-Montes, J. F. (2013). Semantic similarity measurement using historical Google search patterns. Information Systems Frontiers, 15(3), 399–410.

    Article  Google Scholar 

  • Mishra, V., Arya, P., Dixit, M. (2012). Improving Mobile search through location based context and personalization. In: Proceedings of the 2012 International Conference on Communication Systems and Network Technologies, pp. 392-396.

  • Murdock Jr., B. B. (1962). The serial position effect of free recall. Journal of Experimental Psychology, 64(5), 482–488.

    Article  Google Scholar 

  • Nassif, L. F. C., & Hruschka, E. R. (2013). Document clustering for forensic analysis: An approach for improving computer inspection. IEEE Transactions on Information Forensics and Security, 8(1), 46–54.

    Article  Google Scholar 

  • Palshikar, G. K., Apte, M., & Pandita, D. (2018). Weakly supervised and online learning of word models for classification to detect disaster reporting tweets. Information Systems Frontiers, 20(5), 949–959.

    Article  Google Scholar 

  • Porter, M., Boulton, R. (2017). Snowball: A language for Stemming Algorithms. http://snowball.tartarus.org/. Accessed 10 December 2019.

  • Prakash, B. R., & Hanumanthappa, M. (2012). Web snippet clustering and labeling using lingo algorithm. International Journal of Advanced Research in Computer Science, 3(2), 262–265.

    Google Scholar 

  • Pushpa CN, Kumar NKV, Shivaprakash T, Thriveni J, Manjula SH, Venugopal KR, Patnaik LM (2011) Improving the precision and recall of web people search using hash table clustering. In: Proceedings of the 5th International Conference on Information Processing, Bangalore, India, pp. 155-160. Springer-Verlag.

  • Ramesh, N., & Andrews, J. (2015). Personalized search engine using social networking activity. Indian Journal of Science and Technology, 8(4), 301–306.

    Article  Google Scholar 

  • Roark, B., Saraclar, M., & Collins, M. (2007). Discriminative N-gram Language Modeling. Computer Speech and Language, 21(2), 373–392.

    Article  Google Scholar 

  • Sadaf, K., & Alam, M. (2012). Web search result clustering - a review. International Journal of Computer Science and Engineering Survey, 3(4), 85–92.

    Article  Google Scholar 

  • Sang, J., & Xu, C. (2011). Browse by chunks: Topic mining and organizing on web-scale social media. ACM Transactions on Multimedia Computing, Communications, and Applications, 7(1), 1–18.

    Article  Google Scholar 

  • Saxena, N., Agarwal, S., & Katiyar, V. (2016). Personalized web search using user identity. International Journal of Computer Applications, 147(12), 14–17.

    Article  Google Scholar 

  • Scaiella, U., Ferragina, P., Marino, A., Ciaramita, M. (2012). Topical clustering of search results. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, pp. 223–232.

  • Singh, A., & Alhadidi, B. (2013). Knowledge oriented personalized search engine: A step towards wisdom web. International Journal of Computer Applications, 76(8), 1–9.

    Article  Google Scholar 

  • Song, M., Song, I.-Y., & Chen, P. P. (2004). Design and development of a cross search engine for multiple heterogeneous databases using UML and design patterns. Information Systems Frontiers, 6(1), 77–90.

    Article  Google Scholar 

  • Turetken, O., & Sharda, R. (2005). Clustering-based visual interfaces for presentation of web search results: An empirical investigation. Information Systems Frontiers, 7(3), 273–297.

    Article  Google Scholar 

  • Van Erkel, P. F., & Thijssen, P. (2016). The first one wins: Distilling the primacy effect. Electoral Studies, 44, 245–254.

    Article  Google Scholar 

  • Verma, D., Minocha, K., & Kochar, B. (2014). A multi-agent based personalized search engine with topical crawling capabilities. IUP Journal of Computer Sciences, 8(3), 20–33.

    Google Scholar 

  • Wan, X. (2009). Combining content and context similarities for image retrieval. Lecture Notes in Computer Science, 5478(1), 749–754.

    Article  Google Scholar 

  • Wang, Z. (2020). A new clustering method based on morphological operations. Expert Systems with Applications, 145, 113102.

    Article  Google Scholar 

  • Yahoo. (2017). My Yahoo. http://my.yahoo.com. Accessed 10 January, 2019.

  • Yang, X.-H., Zhu, Q.-P., Huang, Y.-J., Xiao, J., Wang, L., & Tong, F.-C. (2017). Parameter-free Laplacian centrality peaks clustering. Pattern Recognition Letters, 100, 167–173.

    Article  Google Scholar 

  • Zamir, O., & Etzioni, O. (1999). Grouper: A dynamic clustering Interface to web search results. Computer Networks, 31(11–16), 1361–1374.

    Article  Google Scholar 

  • Zhang, T., Tang, Y. Y., Fang, B., & Xiang, Y. (2012). Document clustering in correlation similarity measure space. IEEE Transactions on Knowledge and Data Engineering, 24(6), 1004–1013.

    Article  Google Scholar 

  • Zhao, H., Qi, Z. (2010). Hierarchical agglomerative clustering with ordering constraints. In: Proceedings of the 2010 Third International Conference on Knowledge Discovery and Data Mining, Phuket, Thailand, 9-10 January 2010, pp. 195-199. IEEE.

Download references

Acknowledgements

We would like to thank anonymous reviewers of the paper for their constructive comments, which have helped us to improve the paper in several ways. This work was supported in part by Ministry of Science and Technology, Taiwan, under Grant MOST 108-2410-H-259-048-MY3 & 107-2410-H-259-016.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lin-Chih Chen.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, LC. Interactive Topic Search System Based on Topic Cluster Technology. Inf Syst Front 23, 1227–1243 (2021). https://doi.org/10.1007/s10796-020-10021-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10796-020-10021-8

Keywords

Navigation