Skip to main content
Log in

Scientometrics for management of science: collaboration and knowledge structures and complexities in an interdisciplinary research project

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Scientific research has been facing the problem of increase in size and complexity. Hence, the utilization of scientometric methodology for management of science has been expected because of needs for strategic and organizational management of basic and academic scientific research. Present study focuses on collaboration and knowledge structures in an interdisciplinary research project and proposes an evaluation framework to assist data-driven decision-making in scientific research by measuring the complexity of these structures. Co-author and co-word networks were constructed from the bibliographic information of publications generated in an interdisciplinary research project, and cross-sectional and longitudinal analyses of complexity were conducted by calculating the graph entropy. Observations of these networks demonstrated the different structural features and modes of time evolution. The results of the cross-sectional analysis indicated that the correlations between the amount of change in these complexities suggested the possibility of induction in the knowledge structure by the collaboration structure. The results of the longitudinal analysis informed that the decrease in the increment of complexity over time reflected the transition of observed case's strategy from promoting interdisciplinary research to the integration of research outcomes to realize the vision. The proposed framework will allow to implement real-time and evidence-based management practices in scientific research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. The term “management of science” is used as a contrast to technology management, an established discipline that deals with strategic and organizational management theories for research and development in industry.

  2. For example, citation analysis is a method for the impact assessment based on networks representing the citation structure of academic articles. However, such analysis is not suitable for real-time evaluation because there is a time lag until the time a paper is cited.

  3. https://coins.kawasaki-net.ne.jp/en/index.html [last visited on 12 Aug 2020].

  4. https://www.jst.go.jp/tt/EN/platform/coi.html [last visited on 12 Aug 2020].

  5. Note that micro, meso, or macro levels are a relative concept. In the present study, the structure of the project as a whole is defined as a macro-structure. However, the research programs funding the project, as well as the science and technology innovation policies, correspond more to a macro-level structure.

  6. This information was obtained from internal meetings, symposia, and personal communication with researchers and staff working at the core organization (iCONM) through the author’s participation as a member of the COINS.

  7. If more detailed system complexity should be measured, we should consider not only the objective complexity that depends on the system size, such as the graph entropy introduced in present study, but also the subjective complexity that reflects the disparity (or distance) from the reference model (Efatmaneshnik and Ryan 2016).

  8. The present is intended to measure the structural complexity which reflects the strength of ties between the network components (i.e., authors or words). For example, the complexity can be different magnitude even if the number of edges is the same, because the strength of ties between researchers who have been collaborating with many co-authorships for a long time is different from the strength of ties between students and their supervisors who appear only in a single co-authored paper.

  9. For creation of interdisciplinary project outcomes, the research leader referred necessity of the mindset that individual scientists voluntarily set research agenda to realize the vision of the project.

  10. In the practice of management, the fact that only evidence-based knowledge is not a unique solution for decision-making is highlighted (Lengnick-Hall and Griffith 2011). Desirable decision-making requires balanced and appropriate use of accessible knowledge at the time.

  11. Not all of them are always familiar with individual research topics of the state-of-the-art science. Of course, due to desirability they also understand such research topics, scientists themselves often play the role of project managers for scientific research. Meanwhile, recent research and development projects increasingly emphasize transdisciplinary research. Managing scientific research involving various stakeholders by scientists alone has already been the limit.

  12. This problem does not occur in a co-author network because the authors are determined by the publications.

  13. We assign a subtheme to each publication to determine the meso-level attributes of all authors because this document does not cover all authors. In addition, some members varied their subthemes belonged to depending on the year. Then, these members were considered to be included in that subtheme if they belonged to a certain subtheme in any year.

References

  • Anzai, T., Kusama, R., Kodama, H., & Sengoku, S. (2012). Holistic observation and monitoring of the impact of interdisciplinary academic research projects: An empirical assessment in Japan. Technovation, 32(6), 345–357.

    Article  Google Scholar 

  • Ávila-Robinson, A., & Sengoku, S. (2017a). Multilevel exploration of the realities of interdisciplinary research centers for the management of knowledge integration. Technovation, 62–63, 22–41.

    Article  Google Scholar 

  • Ávila-Robinson, A., & Sengoku, S. (2017b). Tracing the knowledge-building dynamics in new stem cell technologies through techno-scientific networks. Scientometrics, 112(3), 1691–1720.

    Article  Google Scholar 

  • Barre, R. (2019). Les indicateurs sont morts, vive les indicateurs! Towards a political economy of S&T indicators: A critical overview of the past 35 years. Research Evaluation, 28(1), 2–6.

    Article  Google Scholar 

  • Börner, K., Contractor, N., Falk-Krzesinski, H. J., Fiore, S. M., Hall, K. L., Keyton, J., Spring, B., Stokols, D., Trochim, W., & Uzzi, B. (2010). A multi-level systems perspective for the science of team science. Science Translational Medicine, 2(49), 49cm24-49cm24.

    Article  Google Scholar 

  • Efatmaneshnik, M., & Ryan, M. J. (2016). A general framework for measuring system complexity. Complexity, 21(S1), 533–546.

    Article  Google Scholar 

  • Etzkowitz, H. (1998). The norms of entrepreneurial science: Cognitive effects of the new university-industry linkages. Research Policy, 27(8), 823–833.

    Article  Google Scholar 

  • Etzkowitz, H., & Leydesdorff, L. (2000). The dynamics of innovation: From national systems and “Mode 2” to a Triple Helix of university–industry–government relations. Research Policy, 29, 109–123.

    Article  Google Scholar 

  • Fortunato, S., Bergstrom, C. T., Börner, K., Evans, J. A., Helbing, D., Milojević, S., Petersen, A. M., Radicchi, F., Sinatra, R., Uzzi, B., Vespignani, A., Waltman, L., Wang, D., & Barabási, A. L. (2018). Science of science. Science, 359, 6379.

    Article  Google Scholar 

  • Fruchterman, T. M. J., & Reingold, E. M. (1991). Graph drawing by force-directed placement. Software: Practice and Experience, 21, 1129–1164.

    Google Scholar 

  • Gomez-Pilar, J., Poza, J., Bachiller, A., Gómez, C., Núñez, P., Lubeiro, A., Molina, V., & Hornero, R. (2018). Quantification of raph complexity based on the edge weight distribution balance: application to brain networks. International Journal of Neural Systems, 28(1), 1–19.

    Article  Google Scholar 

  • Henderson, R. M., & Clark, K. B. (1990). Architectural innovation: The reconfiguration of existing product technologies and the failure of established firms. Administrative Science Quarterly, 35(1), 9–30.

    Article  Google Scholar 

  • Kapsali, M. (2011). Systems thinking in innovation project management: A match that works. International Journal of Project Management, 29(4), 396–407.

    Article  Google Scholar 

  • Kapsali, M. (2013). Equifinality in project management exploring causal complexity in projects. Systems Research and Behavioral Science, 30(1), 2–14.

    Article  Google Scholar 

  • Kell, D. B., & Oliver, S. G. (2004). Here is the evidence, now what is the hypothesis? The complementary roles of inductive and hypothesis-driven science in the post-genomic era. BioEssays, 26(1), 99–105.

    Article  Google Scholar 

  • Kodama, H., Watatani, K., & Sengoku, S. (2013). Competency-based assessment of academic interdisciplinary research and implication to university management. Research Evaluation, 22(2), 93–104.

    Google Scholar 

  • Lengnick-Hall, C. A., & Griffith, R. J. (2011). Evidence-based versus tinkerable knowledge as strategic assets: A new perspective on the interplay between innovation and application. Journal of Engineering and Technology Management, 28(3), 147–167.

    Article  Google Scholar 

  • Leydesdorff, L. (2003). The mutual information of university-industry-government relations: An indicator of the Triple Helix dynamics. Scientometrics, 58(2), 445–467.

    Article  Google Scholar 

  • Mackenzie, K. D. (1966). The information theoretic entropy function as a total expected participation index for communication network experiments. Psychometrika, 31(2), 249–254.

    Article  Google Scholar 

  • Ràfols, I. (2019). S&T indicators in the wild: Contextualization and participation for responsible metrics. Research Evaluation, 28(1), 7–22.

    Article  Google Scholar 

  • Rousseau, D. M. (2006). Is there such a thing as “Evidence-based Management”? Academy of Management Review, 31(2), 256–269.

    Article  Google Scholar 

  • Sen, B., Chu, S. H., & Parhi, K. K. (2019). Ranking regions, edges and classifying tasks in functional brain graphs by sub-graph entropy. Scientific Reports, 9(1), 1–20.

    Article  Google Scholar 

  • Sen, B., Bernstein, G. A., Mueller, B. A., Cullen, K. R., & Parhi, K. K. (2020). Sub-graph entropy based network approaches for classifying adolescent obsessive-compulsive disorder from resting-state functional MRI. NeuroImage: Clinical, 26, 102208.

    Article  Google Scholar 

  • Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379–423.

    Article  MathSciNet  Google Scholar 

  • Silva, F. N., Rodrigues, F. A., Oliveira, O. N., Jr., & Costa, L. D. (2013). Quantifying the interdisciplinarity of scientific journals and fields. Journal of Informetrics, 7(2), 469–477.

    Article  Google Scholar 

  • Stirling, A. (2007). A general framework for analysing diversity in science, technology and society. Journal of the Royal Society Interface, 4(15), 707–719.

    Article  Google Scholar 

  • Stokols, D., Hall, K. L., Taylor, B. K., & Moser, R. P. (2008a). The science of team science. Overview of the field and introduction to the supplement. American Journal of Preventive Medicine, 35(2), S77–S89.

    Article  Google Scholar 

  • Stokols, D., Misra, S., Moser, R. P., Hall, K. L., & Taylor, B. K. (2008b). The ecology of team science. Understanding contextual influences on transdisciplinary collaboration. American Journal of Preventive Medicine, 35(2), S96–S115.

    Article  Google Scholar 

  • Toutanova, K., & Manning, C. (2000). Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 Joint SIGDAT Conference EMNLP/VLC, pp. 63–71

  • Van Helden, P. (2013). Data-driven hypotheses. EMBO Reports, 14(2), 104.

    Article  Google Scholar 

  • Wagner, C. S., Roessner, J. D., Bobb, K., Klein, J. T., Boyack, K. W., Keyton, J., Rafols, I., & Börner, K. (2011). Approaches to understanding and measuring interdisciplinary scientific research (IDR): A review of the literature. Journal of Informetrics, 5(1), 14–26.

    Article  Google Scholar 

  • Wang, B., & Wang, Z. (2018). Heterogeneity evaluation of China’s provincial energy technology based on large-scale technical text data mining. Journal of Cleaner Production, 202, 946–958.

    Article  Google Scholar 

  • Wang, B., Liu, S., Ding, K., Liu, Z., & Xu, J. (2014). Identifying technological topics and institution-topic distribution probability for patent competitive intelligence analysis: A case study in LTE technology. Scientometrics, 101(1), 685–704.

    Article  Google Scholar 

  • Wang, Q., & Schneider, J. W. (2020). Consistency and validity of interdisciplinarity measures. Quantitative Science Studies, 1(1), 239–263.

    Article  Google Scholar 

Download references

Funding

This study was funded by the Center of Innovation Program from Japan Science and Technology Agency (JST, Grant No. JPMJCE1305).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shintaro Sengoku.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (MP4 117 kb)

Supplementary file2 (MP4 183 kb)

Appendix

Appendix

There are some processes to convert raw bibliographic data into suitable form for analyses after downloading from the Web of Science. Here, the detailed procedures to construct the following datasets for analyses are described:

  • An aggregated dataset of bibliographic information

  • A text corpus of titles and abstracts by publication

  • Panel data of the summarized lists of institutions and publications by author

  • Panel data of the adjacency matrices of co-author networks

  • Panel data of the adjacency matrices of co-word networks

In the process of constructing these datasets, we mainly used pandas, which is a library to conduct data manipulation and analysis for Python. In addition, we describe the generation of network objects from the created adjacency matrix and the parameters used for drawing the network diagram by Gephi.

Preprocessing of collected data: converting BibTex files into JSON format

To analyze the complexity of collaboration and knowledge structure, we used BibTex files downloaded from the Web of Science database. Since raw BibTex files are not suitable for dealing with Python, these files were converted into JSON format. The bibliographic information was extracted from raw BibTex files using the BibtexParser, which is a library to parse such files for Python. Then, files that could not be opened using BibtexParser were excluded from the dataset. Finally, bibliographic data extracted from 281 publications were aggregated to a dataset including information by publication as follows: document type, list of authors, published year, title, text of the abstract, and so on.

How to build a text corpus

Building a co-word network requires a corpus of key phrases extracted from the titles and abstracts in publications. In the present study, the corpus contained lists of key phrases per sentence extracted from the title and abstract was built using Natural Language Tool Kit (NLTK), which is a library for natural language processing in Python. The procedure to build the text corpus is as follows:

  1. (1)

    Text data of the title and abstract per publications were extracted from the bibliographic dataset.

  2. (2)

    To apply the key phrase extraction smoothly, text data were divided by sentences and pre-processed.

  3. (3)

    Sentences were tokenized and symbolic string was removed from some obtained tokens.

  4. (4)

    Part-of-speech (POS) tagging was performed on the obtained tokens in tokenized sentences.

  5. (5)

    Key phrases extraction using rule-based methods were applied to the tokenized sentences and the words that consisted of the extracted key phrases were lemmatized.

  6. (6)

    Common words were removed from the extracted key phrases as stop words.

To ensure the reproducibility of the analysis, that is, to eliminate the subjective judgments as much as possible, regular expressions were employed in the above processes. Applying the above processes to the dataset of bibliographic information resulted in a corpus containing the key phrases.

First, the text data of the title and abstract were divided by sentence, using “sent_tokenize” function of NLTK, because of the co-occurrence of the words and phrases that are defined by the sentence. To eliminate the influence of words and phrases that were not relevant to the content of the research topics, the following descriptions were removed from each sentence after dividing by sentence:

  • Descriptions about copyright or publisher, such as “(C) 2015 Elsevier,” “Published by Elsevier,” and so on.

  • Descriptions about statistical analysis, such as “(p < 0.05)” and “(N = 30).”

  • Abbreviations such as “(PEG)” in “poly(ethylene glycol) (PEG).”

Next, the sentences containing unnecessary descriptions were tokenized, using the “word_tokenize” function of NLTK as the tokenizer. Some tokens contained symbolic strings such as “/” or “-,” which would create noise in the analysis. Thus, the following processing was applied to suppress noise in analysis:

  • The tokens joined by symbolic string “/” were split.

  • The tokens joined by symbolic string “-” were split.

  • Symbolic string “-” at the head of the tokens were removed.

  • Symbolic string “/” at the tail of the tokens were removed.

After tokenization, tokens (or words) were executed via POS tagging using Stanford POS Tagger (Toutanova & Manning, 2000), which can be performed using the NLTK wrapper. A pilot study about Stanford POS Tagger performed almost correctly, however, several problems were also confirmed: some symbolic strings were recognized as nouns, and some nouns were recognized as foreign words. To avoid these problems, the following process was applied after POS tagging.

  • POS tags of the symbolic string such as “:” and “%” were set into another POS tag (SYMBOL).

  • The strings of the brackets such as “(”, “)” were removed.

  • POS tags for words recognized as foreign words were converted into nouns.

A rule-based key phrase extraction method was adopted and was accomplished by chunking using the pattern of occurrence of POS in a sentence based on regular expression, provided by the “RegexpParser” function of NLTK. The rules for key phrase extraction were as follows:

  1. (1)

    Gerund + noun + noun + …

  2. (2)

    Past participle + adjective + adjective + … + noun + noun + …

  3. (3)

    Adjective + gerund + noun

  4. (4)

    Noun + gerund + noun

  5. (5)

    Noun + past participle + noun

  6. (6)

    Adjective + adjective + … + Noun + Noun + …

  7. (7)

    Noun + noun + …

  8. (8)

    Nouns

Although the above rules imitated the previous study (Wang & Wang, 2018; Wang et al., 2014), they were not strictly the same. Words composed of the extracted key phrases according to these rules were lemmatized using the WordNetLemmatizer, a lemmatizer available from NLTK. When a word was lemmatized, it was converted into lowercase to eliminate case sensitivity.

Finally, common words, such as “study” and “cell,” were set to stop words and removed from the extracted key phrases. Words or phrases were set to stop words if they were in WordNet, a dictionary that can be easily called from NLTK, and removed from the corpus of key phrases.

How to build a panel data of the information about authors

Building a co-author network requires information about the node attributes (i.e., information about the authors themselves) as well as the edges (i.e., connections between authors). Panel data containing the following information for each author were built from the aggregated dataset of bibliographic information:

  • Author’s name

  • Count of author’s affiliation

  • The main affiliation of an author

  • A list of publications published the author

  • Number of publications published the author

  • Author’s ORCID

  • Author’s Researcher-ID

An affiliation with the highest count of affiliations was selected as the main affiliation.

After building the panel data of the author information, authors’ names were collated and author information was updated because some names were duplicated. In the case of an author having an identifier such as ORCID, the name was collated based on the identifier; otherwise, the names were collated manually.

How to build a panel data of adjacency matrices of co-author networks

The procedure to build the panel data for the adjacency matrices of the co-author network was as follows:

  1. (1)

    All authors, which corresponded to the nodes in the co-author network, were listed from the dataset of aggregated bibliographic information.

  2. (2)

    Collections of bibliographic data of publications by year were extracted to build the panel data of adjacency matrices from the dataset.

  3. (3)

    A list of authors per publication in that collection was obtained and the edge weights for all combinations in the list were computed.

  4. (4)

    The edge weights in the co-author network as a whole were calculated as the sums of each edge weight per publication for all publications.

  5. (5)

    The edge weights in the co-author network were normalized by dividing by the number of publications so that the sum of the edge weights is 1.

The unit of analysis for the co-author network was set to one publication because co-authorship is defined per publication. The edge weights in the co-author network were calculated as the sum of the inverse of the number of edges in the unit of analysis (each unit of analysis forms a complete graph and if such a graph has N vertices, then the number of edges is N (N − 1)/2).

How to build the panel data of adjacency matrices of co-word networks

The procedure to build the panel data for the adjacency matrices of the co-word network was as follows:

  1. (1)

    All key phrases, which corresponded to the nodes of the co-word network, were listed from the text corpus.

  2. (2)

    Collections of corpora consisting of key phrases per publications by year were extracted for building the panel data of adjacency matrices from the corpus.

  3. (3)

    A list of key phrases per sentence in that collection was obtained and the edge weights for all combinations in the list were computed.

  4. (4)

    The edge weights in the co-word network as a whole were calculated as the sums of each edge weight per sentence for all sentences.

  5. (5)

    The edge weights in the co-word network were normalized by dividing by the number of sentences so that the sum of the edge weights is 1.

The unit of analysis for the co-word network was set to one sentence because the co-occurrence of the words or phrases is defined per sentence. Similar to the co-author network, the edge weights in the co-word network were calculated as the sum of the inverse of the number of edges in the unit of analysis.

Generation of network objects

In present study, to deal with the co-author and co-word networks, we use NetworkX, which is a library for network analysis in Python. Adjacency matrices of the co-author and co-word networks were converted into network objects, using the “from_pandas_adjacency” function of NetworkX. Network objects have the advantage that it allows to set attributes of the nodes, such as information about the sector of the author’s affiliation in a co-author network.

In the cross-sectional analysis, the coauthor and co-word meso-network and micro-network are extracted from these macro-networks by using the “subgraph” function of NetworkX. This function generates a network object of the sub-network by passing a list of nodes, which is a subset of the nodes in an original network, as an argument. The subset of nodes (i.e., authors and words) used to extract the co-author/word meso/networks is defined by listing the nodes that represent authors or words in the title and abstract that appear in publications published by a group of individuals belonging to a sub-theme or by an individual, respectively.

To build a list of meso-level nodes, an article was considered to belong to a subtheme when we could confirm that an author belongs to that subtheme from the internal documents of the COINS.Footnote 13 The list of authors and words in an article belonging to a subtheme becomes a subset of nodes for extracting the co-author/word meso/macro-network. For example, if an author belonging to subtheme 1 and an author belonging to subtheme 2 publish a co-authored paper, this paper was considered to belong to both subtheme 1 and 2 thus these authors appear in the co-author meso-network for both subthemes, and the words in the title and abstract of this paper also appear in the co-word network of both subthemes.

Similarly, in the case of the micro level, a list of authors and words in publications published by an author becomes a subset of nodes for extracting the co-author/word micro-network from the macro-network. For example, co-authors in a series of publications published by an individual appear in the co-author micro-network of the individual and words in the title and abstract of them appear in the co-word micro-network of the individual.

Parameters for drawing network diagram by Gephi

The parameters of the Fruchterman-Reingold algorithm for drawing the co-author network and co-word network are shown in the Table

Table 2 Parameters of Fruchterman-Reingold algorithm

2. The parameter of “Speed” was change manually.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Miyashita, S., Sengoku, S. Scientometrics for management of science: collaboration and knowledge structures and complexities in an interdisciplinary research project. Scientometrics 126, 7419–7444 (2021). https://doi.org/10.1007/s11192-021-04080-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-021-04080-0

Keywords

Navigation