Skip to main content
Log in

An overview of the history of Science of Science in China based on the use of bibliographic and citation data: a new method of analysis based on clustering with feature maximization and contrast graphs

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

In the first part of this paper, we shall discuss the historical context of Science of Science both in China and at world level. In the second part, we use the unsupervised combination of GNG clustering with feature maximization metrics and associated contrast graphs to present an analysis of the contents of selected academic journal papers in Science of Science in China and the construction of an overall map of the research topics’ structure during the last 40 years. Furthermore, we highlight how the topics have evolved through analysis of publication dates and also use author information to clarify the topics’ content. The results obtained have been reviewed and approved by 3 leading experts in this field and interestingly show that Chinese Science of Science has gradually become mature in the last 40 years, evolving from the general nature of the discipline itself to related disciplines and their potential interactions, from qualitative analysis to quantitative and visual analysis, and from general research on the social function of science to its more specific economic function and strategic function studies. Consequently, the proposed novel method can be used without supervision, parameters and help from any external knowledge to obtain very clear and precise insights about the development of a scientific domain. The output of the topic extraction part of the method (clustering + feature maximization) is finally compared with the output of the well-known LDA approach by experts in the domain which serves to highlight the very clear superiority of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. It is worth noting that very recent scientific contributions on Science of Science seem to indicate that the Science of Science research domain is re-broadening its scope to come back to the original Bernal’s paradigm. Examples of this are the works by Zeng et al. (2017) published in Physics Reports and Science by the System Science Research Team of Beijing Normal University in China, Fortunato et al. (2018) which was a collaboration project between authors from the University of Indiana (USA) and Leiden University (The Netherlands) or even recent high quality papers published by the Complex Networks Research Team of the Northeastern University (USA) (Huang et al. 2012; Wang and Barabási 2013; Shen and Barabási 2014; Sinatra et al. 2017).

  2. http://cluster.ischool.drexel.edu/~cchen/citespace/download/.

  3. See Acknowledgment section for more precise information on experts.

  4. Because the CNKI database was queried in Chinese, we use two different terms because "Science of Science" is described by two different terms in this language: "科学学" and "科学的科学".

  5. Chinese Social Sciences Citation Index.

  6. For articles prior to 1997 that did not contain a summary or keywords we only used the information in the title.

  7. http://ictclas.nlpir.org/.

  8. The frequency threshold of 6 was found empirically. It means the description space can be significantly reduced while allowing for accurate clustering (the quality of which was estimated both by the experts and by our quality measures presented in section "Clustering and optimal model detection"). No documents were deleted by this process.

  9. In this article, the features represent the words extracted from the title, abstract and keywords of the articles, the weights of the features are the adjusted frequency information associated with them and the unsupervised classification (clustering) is based on the GNG algorithm.

  10. A feature with negative values can be separated into 2 different positive sub-features without loss of information. The first represents the positive part of the original feature and the second its negative part.

  11. Behaviour of our measures is similar with classes or clusters.

  12. Like the Dunn index (Dunn 1974), the Davies-Bouldin index (Davies et Bouldin 1979), the Silhouette index (Rousseeuw 1987), the Caliński-Harabasz index (1974) or the Xie-Beni index (1991).

  13. In Eq. 7, labels represent categories or clusters to which data are associated.

  14. In many cases, each data can have several external labels of the same kind. For example, a research paper can have several different authors.

  15. In this case the opinion of our domain experts themselves (see "Acknowledgment" section for expert descriptions).

  16. Yang Xiaolin. Thirty years of science and Technology Policy Research—Wu Mingyu's oral autobiography [M]. Hunan Education Press 2015.

  17. From 2009, 5 training seminars on knowledge mapping were hold in WISELAB (Dalian University of Technology) broadcasting the methods and thinking widely in China. This approach also lead to the present paper using specific mapping tool to highlight the structure and the evolution of Science of Science domain in China.

  18. See section "Data collection and preprocessing" for more details.

  19. https://radimrehurek.com/gensim/.

References

  • Attik, M., Lamirel, J.-C., Al Shehabi, S. (2006). Clustering analysis for data with multiple labels. In Proceedings of IASTED international conference on databases and applications (DBA), Innsbruck, Austria.

  • Bernal, J. (1939). The social function of science. London: George Routledge & Sons Ltd.

    Google Scholar 

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.

    MATH  Google Scholar 

  • Caliński, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics-Theory and Methods, 3(1), 1–27.

    Article  MathSciNet  Google Scholar 

  • Chen, Y., & Liu, Z. (2005). The rise of mapping knowledge domain. Studies in Science of Science, 23(2), 149–154. (in Chinese).

    Google Scholar 

  • Chen, Y., Zhang, L., & Liu, Z. (2017). The prelude of the science of science in the world: The third Copernican revolution initiated in Poland. Studies in Science of Science, 35(1), 4–10. (in Chinese).

    Google Scholar 

  • Cuxac, P., & Lamirel, J.-C. (2013). Analysis of evolutions and interactions between science fields: the cooperation between feature selection and graph representation. In 14th COLLNET meeting.

  • Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2, 224–227.

    Article  Google Scholar 

  • Dunn, J. C. (1974). Well-separated clusters and optimal fuzzy partitions. Journal of cybernetics, 4(1), 95–104.

    Article  MathSciNet  Google Scholar 

  • Etzkowitz, H., Leydesdorff, L. (eds) (1997). Universities and the global knowledge economy. In A triple Helix of university-industry-government relations [M]. New York: Pinter.

  • Fortunato, S., Bergstrom, C. T., Börner, K., Evans, J. A., Helbing, D., Milojević, S., et al. (2018). Science of science. Science. https://doi.org/10.1126/science.aao0185.

    Article  Google Scholar 

  • Fritzke, B. (1995). A growing neural gas network learns topologies. In Advances in neural information processing systems (pp. 625–632).

  • He, Y., Chen, Y., Cui, Y., & Liu, Z. (2017). Research approaches and prospects in the subject science of science based on the analysis of Bernal Price. Studies in Science of Science, 35(8), 1121–1129. (In Chinese).

    Google Scholar 

  • Hessen, B. M. (1931). The social and economic roots of Newton’s principia. London: Science at the Cross Roads.

    Google Scholar 

  • Hoffmann, M. D., Blei, D. M., & Bach, F. (2010). Online learning for Latent Dirichlet allocation. In 24th conference on neural information processing systems (NIPS), pp. 856–864.

  • Huang, J., Cheng, X. Q., Shen, H. W., et al. (2012). Exploring social influence via posterior effect of word-of-mouth recommendations. In ACM International CONFERENCE ON WEB SEARCH AND DATA Mining. ACM, 2012:573–582.

  • Kassab, R., & Lamirel, J.-C. (2008). Feature-based cluster validation for high-dimensional data. In Proceedings of the 26th IASTED international conference on artificial intelligence and applications (pp. 232–239). ACTA Press.

  • Kobourov, S. G. (2012). Spring embedders and force directed graph drawing algorithms. arXiv preprint arXiv:1201.3011.

  • Lamirel, J.-C., Cuxac, P., Chivukula, A. S., & Hajlaoui, K. (2015). Optimizing text classification through efficient feature selection based on quality metric. Journal of Intelligent Information Systems, 45(3), 379–396. https://doi.org/10.1007/s10844-014-0317-4.

    Article  Google Scholar 

  • Lamirel, J.-C., Dugué, N., & Cuxac, P. (2016). New efficient clustering quality indexes. In 2016 International joint conference on neural networks (IJCNN), pp. 3649–3657. IEEE.

  • Lamirel, J.-C., Mall, R., Cuxac, P., & Safi, G. (2011). Variations to incremental growing neural gas algorithm based on label maximization. In The 2011 international joint conference on neural networks (IJCNN), pp. 956–965, IEEE.

  • Liu, Z. (2017). Feng Zhijun’s puzzle: What is the core theory of the science of science? Studies in Science of Science, 35(5), 655–660. (in Chinese).

    Google Scholar 

  • Liu, Z., Chen, Y., & Zhu, X. (2013). D.J. Price’s contribution to theory of the science of science. Studies in Science of Science, 31(12), 1762–1772. (in Chinese).

    Google Scholar 

  • MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (vol. 1, pp. 281–297). Oakland, CA, USA.

  • Pu, G., & Di, R. (1998). The cognitive turn of sociology of science. Journal of Dialetics of Nature, 5, 29–34. (in Chinese).

    Google Scholar 

  • Qian, W., & Li, X. (2012). J.D. Bernal and China. Science & Culture Review, 16–32. (in Chinese)

  • Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65.

    Article  Google Scholar 

  • Shen, H. W., & Barabási, A. L. (2014). Collective credit allocation in science. Proceedings of the National Academy of Sciences of the United States of America, 111(34), 12325.

    Article  Google Scholar 

  • Sinatra, R., Wang, D., Deville, P., et al. (2017). Quantifying the evolution of individual scientific impact. Science, 354(6312), aaf5239.

    Article  Google Scholar 

  • Tsien, H. (1979). Science of science, studies in science and technology system. Marx’s Philosophy Philosophical Researches, 1, 20–27. (in Chinese).

    Google Scholar 

  • Tsien, H. (1980). On the establishment an development of Marxist science of science for the foundation of research management. Research management, (1), 3–8.

  • Wang, D., & Barabási, A. L. (2013). Quantifying long-term scientific impact. Science, 342(6154), 127–132.

    Article  Google Scholar 

  • Xie, X. L., & Beni, G. (1991). A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(8), 841–847.

    Article  Google Scholar 

  • Zeng, A., Shen, Z., Zhou, J., Wu, J., Fan, Y., Wang, Y., et al. (2017). The science of science: From the perspective of complex systems. Physics Reports, 714–715, 1–73. https://doi.org/10.1016/j.physrep.2017.10.001.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhao, H., & Jiang, G. (1983). Great facts, great subjects. Science of Science and S&T Management, 3(3). (in Chinese)

  • Zhao, H., & Jiang, G. (1988). Hessen episode and the origin of science of science. Studies in Science of Science, 6(1), 14–23. (in Chinese).

    MathSciNet  Google Scholar 

Download references

Acknowledgements

Two of the authors of this paper and an additional famous researcher play the role of the domain experts that have been mobilized for this work. Their help proved to be invaluable in carrying out our analysis of the results the proposed methodology: Professor Liu Zeyuan is one of the pioneers of Science and Science in China and one of the most important contributors to the field. He is one of the founders of the scientific societies in Science of Science in China and also the founder of WISELAB at Dalian University, a laboratory itself a pioneer in the quantitative study of Science of Science in China. Professor Liming Liang is one of the pioneer researchers in Scientometrics in China. She represents the Chinese researcher with the largest international academic reputation in that domain. Professor Chen Yue is the current director of WISELAB. Under his influence, this laboratory developed the use of modern methods of science analysis, especially cartographic methods. Today, this laboratory is considered one of the three main Chinese laboratories in the field of Science and Science. We also wish to warmly thank Richard Dickinson for the careful proof-reading of our paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jean-Charles Lamirel.

Additional information

Professor Liu Zeyuan—author of the paper decesased on Februray 8th, 2020, during the review process.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lamirel, JC., Chen, Y., Cuxac, P. et al. An overview of the history of Science of Science in China based on the use of bibliographic and citation data: a new method of analysis based on clustering with feature maximization and contrast graphs. Scientometrics 125, 2971–2999 (2020). https://doi.org/10.1007/s11192-020-03503-8

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-020-03503-8

Keywords

Navigation