Abstract
In the past decades, there have been a number of proposals to apply topic modeling to research trend analysis. However, most of previous studies have relied primarily on document publication year and have not incorporated the impact of articles into trend analysis. Unlike previous trend analysis using topic modeling, we incorporate citation count, which can be viewed as the impact of articles, into trend analysis to shed a new light on the understanding of research trends. To this end, we propose the Generalized Dirichlet multinomial regression (g-DMR) topic model, which improves the DMR topic model by replacing a linear inner product in topic priors, \(\mathrm{exp}\left({{\varvec{x}}}_{d}\cdot {{\varvec{\lambda}}}_{t}\right),\) with a more general form based on topic distribution function (TDF), \(\mathrm{exp}\left(\mathrm{f}\left({{\varvec{x}}}_{d}\right)\right)+\upvarepsilon\). We use multidimensional Legendre Polynomial as TDF to capture publication year and the number of citations per publication simultaneously. In DMR model, since metadata could affect the document-topic distribution only monotonically and continuous values such as publication year and citation count need to be discretized, it is difficult to view the dynamic change of each topic. But the g-DMR model can handle various orthogonal continuous variables with arbitrary order of polynomial, so it can show more dynamic topic trends. Two major experiments show that the proposed model is better suited for topic generation with consideration of citation impact than DMR does for the trend analysis in the field of Library and Information Science in general and Text Mining in particular.
Similar content being viewed by others
References
Andrews, L. C., & Andrews, L. C. (1992). Special functions of mathematics for engineers. New York: McGraw-Hill.
Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. In Proceedings of the 23rd international conference on machine learning, (pp. 113–120).
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research,3, 993–1022.
Bouabid, H., Paul-Hus, A., & Larivière, V. (2016). Scientific collaboration and high-technology exchanges among BRICS and G-7 countries. Scientometrics,106, 873–899.
Cavacini, A. (2016). Recent trends in Middle Eastern scientific production. Scientometrics,109, 423–432.
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. In Advances in neural information processing systems (pp. 288–296).
Chen, C., Wang, Z., Li, W., & Sun, X. (2018). Modeling scientific influence for research trending topic prediction. In Thirty-second AAAI conference on artificial intelligence.
Dietz, L., Bickel, S., & Scheffer, T. (2007). Unsupervised prediction of citation influences. In Proceedings of the 24th international conference on machine learning (pp. 233–240).
Dou, H., & Kister, J. (2016). Research and development on Moringa Oleifera-Comparison between academic research and patents. World Patent Information,47, 21–33.
Finardi, U., & Buratti, A. (2016). Scientific collaboration framework of BRICS countries: An analysis of international coauthorship. Scientometrics,109, 433–446.
Fukugawa, N. (2016). Knowledge creation and dissemination by Kosetsushi in sectoral innovation systems: insights from patent data. Scientometrics,109, 2303–2327.
Gerow, A., Hu, Y., Boyd-Graber, J., Blei, D. M., & Evans, J. A. (2018). Measuring discursive influence across scholarship. Proceedings of the National Academy of Sciences,115, 3308–3313.
Gerrish, S., & Blei, D. M. (2010). A Language-based Approach to Measuring Scholarly Impact. ICML,10, 375–382.
Griffiths, T. L., Jordan, M. I., Tenenbaum, J. B., & Blei, D. M. (2004). Hierarchical topic models and the nested chinese restaurant process. In Advances in neural information processing systems (pp. 17–24).
Hall, D., Jurafsky, D., & Manning, C. D. (2008). Studying the history of ideas using topic models. In Proceedings of the conference on empirical methods in natural language processing (pp. 363–371).
Hawkins, D. T. (2001). Bibliometrics of electronic journals in information science. Information Research,7, 7.
Jabeen, M., Yun, L., Rafiq, M., & Jabeen, M. (2015). Research productivity of library scholars: Bibliometric analysis of growth and trends of LIS publications. New Library World,116, 433–454.
Jo, Y., Hopcroft, J. E., & Lagoze, C. (2011). The web of topics: discovering the topology of topic evolution in a corpus. In Proceedings of the 20th international conference on World wide web (pp. 257–266).
Kang, K., & Sohn, S. Y. (2016). Evaluating the patenting activities of pharmaceutical research organizations based on new technology indices. Journal of Informetrics,10, 74–81.
Kawamae, N., & Higashinaka, R. (2010). Trend detection model. In Proceedings of the 19th international conference on World wide web (pp. 1129–1130).
Kim, M., Baek, I., & Song, M. (2018). Topic diffusion analysis of a weighted citation network in biomedical literature. Journal of the Association for Information Science and Technology,69, 329–342.
Li, L.-L., Ding, G., Feng, N., Wang, M.-H., & Ho, Y.-S. (2009). Global stem cell research trend: Bibliometric analysis as a tool for mapping of trends from 1991 to 2006. Scientometrics,80, 39–58.
Liu, L., & Mei, S. (2016). Visualizing the GVC research: a co-occurrence network based bibliometric analysis. Scientometrics,109, 953–977.
Lv, P. H., Wang, G.-F., Wan, Y., Liu, J., Liu, Q., & Ma, F.-C. (2011). Bibliometric trend analysis on global graphene research. Scientometrics,88, 399–419.
Maisonobe, M., Eckert, D., Grossetti, M., Jégou, L., & Milard, B. (2016). The world network of scientific collaborations between cities: Domestic or international dynamics? Journal of Informetrics,10, 1025–1036.
Mann, G. S., Mimno, D., & McCallum, A. (2006). Bibliometric impact measures leveraging topic analysis. In Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries (pp. 65–74).
Milanez, D. H., Noyons, E., & Faria, L. I. (2016). A delineating procedure to retrieve relevant publication data in research areas: The case of nanocellulose. Scientometrics,107, 627–643.
Mimno, D., & McCallum, A. (2012). Topic models conditioned on arbitrary features with dirichlet-multinomial regression. arXiv preprint, arXiv:1206.3278.
Moed, H. F. (2016). Iran’s scientific dominance and the emergence of South-East Asian countries as scientific collaborators in the Persian Gulf Region. Scientometrics,108, 305–314.
Newman, D., Lau, J. H., Grieser, K., & Baldwin, T. (2010). Automatic evaluation of topic coherence. In Human language technologies: The 2010 annual conference of the North American chapter of the association for computational linguistics, (pp. 100–108).
Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S. K., et al. (2014). Structural topic models for open-ended survey responses. American Journal of Political Science,58, 1064–1082.
Sethi, B. B., & Panda, K. C. (2012). Growth and nature of international LIS research: An analysis of two journals. The International Information & Library Review,44, 86–99.
Song, M., Kim, S., & Lee, K. (2017). Ensemble analysis of topical journal ranking in bioinformatics. Journal of the Association for Information Science and Technology,68, 1564–1583.
Song, M., Kim, S., Zhang, G., Ding, Y., & Chambers, T. (2014). Productivity and influence in bioinformatics: A bibliometric analysis using PubMed central. Journal of the Association for Information Science and Technology,65, 352–371.
Stein, M.-K., Galliers, R. D., & Whitley, E. A. (2016). Twenty years of the European information systems academy at ECIS: Emergent trends and research topics. European Journal of Information Systems,25, 1–15.
Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2005). Sharing clusters among related groups: Hierarchical Dirichlet processes. Advances in neural information processing systems (pp. 1385–1392).
Timakum, T., Kim, G., & Song, M. (2018). A data-driven analysis of the knowledge structure of library science with full-text journal articles. Journal of Librarianship and Information Science. https://doi.org/10.1177/0961000618793977.
Tran, B., Pham, T., Ha, G., Ngo, A., Nguyen, L., Vu, T., et al. (2018). A bibliometric analysis of the global research trend in child maltreatment. International Journal of Environmental Research and Public Health,15, 1456.
Wang, C., Blei, D., & Heckerman, D. (2012). Continuous time dynamic topic models. arXiv preprint, arXiv:1206.3298.
Wang, X., & McCallum, A. (2006). Topics over time: a non-Markov continuous-time model of topical trends. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 424–433).
Wang, X., Zhai, C., & Roth, D. (2013). Understanding evolution of research themes: a probabilistic generative model for citations. In Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, (pp. 1115–1123).
Xu, S., Hao, L., An, X., Yang, G., & Wang, F. (2019). Emerging research topics detection with multiple machine learning models. Journal of Informetrics,13, 100983.
Yan, F., Xu, N., & Qi, Y. (2009). Parallel inference for latent dirichlet allocation on graphics processing units. Advances in neural information processing systems (pp. 2134–2142).
Zhang, Y., Chen, K., Zhu, G., Yam, R. C., & Guan, J. (2016). Inter-organizational scientific collaborations and policy effects: An ego-network evolutionary perspective of the Chinese Academy of Sciences. Scientometrics,108, 1383–1415.
Zhao, Y., & Zhao, R. (2016). An evolutionary analysis of collaboration networks in scientometrics. Scientometrics,107, 759–772.
Zhao, Y., Li, D., Han, M., Li, C., & Li, D. (2016). Characteristics of research collaboration in biotechnology in China: Evidence from publications indexed in the SCIE. Scientometrics,107, 1373–1387.
Zou, C. (2018). Analyzing research trends on drug safety using topic modeling. Expert Opinion on Drug Safety,17, 629–636.
Acknowledgements
This work was supported by the National Research Foundation of Korea Grant funded by the Korean Government (NRF-2018S1A3A2075114).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lee, M., Song, M. Incorporating citation impact into analysis of research trends. Scientometrics 124, 1191–1224 (2020). https://doi.org/10.1007/s11192-020-03508-3
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-020-03508-3