Bi-layer network analytics: A methodology for characterizing emerging general-purpose technologies

https://doi.org/10.1016/j.joi.2021.101202Get rights and content

Highlights

  • A methodology of bi-layer network analytics for characterizing emerging general purpose technologies.

  • Three technological characteristics describing a technology's generality and emergence.

  • Topological indicators quantifying technological characteristics through topological features.

  • An approach of link prediction with a weighted index of resource allocation.

  • Emerging general purpose technologies in information science.

Abstract

Despite the tremendous contributions bibliometrics has made to profiling technological landscapes and identifying emerging topics, reliable methods for predicting potential technological changes are still elusive. To fill this gap, we propose a methodology based on bi-layer network analytics that characterizes emerging general-purpose technologies. The framework incorporates three novel indicators that quantify a technology's technical potential and social impacts, not just in one specific technological area but in a wide range of domains. Missing links in the network are extrapolated through a refined link prediction method, and a weighted resource allocation index ranks both current technologies and their predicted evolutions to reveal candidate innovations for further empirical and/or expert analysis. A case study on information science incorporating quanlitative and qualitative validations demonstrates the methodology to be feasible and reliable. Researchers and policymakers in information science and bibliometrics should find valuable decision support from the empirical insights presented.

Introduction

Theoretical definitions of general-purpose technologies (GPTs) can be traced back to Paul David, who coined the idea from his observations of the widespread impact of electric dynamos on the productivity of the United States (US) during the 1920s (David, 1989). In the decade to follow, a well-recognized conception of what constitutes a GPT took hold – “pervasiveness, inherent potential for technical improvements, and innovational complementarities” (Bresnahan & Trajtenberg, 1995) – and measuring the many and various aspects of GPTs became a topic of increasing interest for many economists. One indicator in particular, generality, has been widely applied (Hall & Trajtenberg, 2004), and its standard calculations consider patent citations and their technological classes. In the 2000s, however, the attention of the science, technology, and innovation (ST&I) community turned to the unique features of nanotechnologies, triggering discussions on the notion of emerging GPTs (EGPTs), i.e., emerging technologies equipped with the features of GPTs (Graham & Iacopetta, 2009; Youtie et al., 2008). Although much research has been undertaken to define what classifies a technology as emergent, characterizing the “general purpose” component of an EGPT has proven to be far more difficult. Of the few studies specific to EGPTs, all assume emergence before testing constructs like generality, e.g., Schultz and Joutz (2010). Conversely, of the studies that primarily explore emergence, Rotolo et al.’s (2015) systematic review defined five attributes of emerging technologies, theoretically guiding the development of further measurements. Such continuous interests from the ST&I community endorse the potential significance of identifying EGPTs, that is, forseeing candidates of EGPTs may help policy makers and technology managers to take pre-emptive actions in strategic plan and R&D management (e.g., funding/investment allocation) and prepare for future global competitions. This crucial role and value of EGPTs motive our research. However, to our best knowledge, very few study has attempted to directly measure and classify emergence and generality at the same time. This, coupled with the lack of a quantitative measure for generality that does not solely depend on patents and citations, inspired us to establish a cohesive system of quantitative measures for identifying EGPTs that detects both emergence and generality.

Sharing a close interest with ST&I studies, bibliometrics is well recognized as a tool for supporting technology analysis and assessment. For example, it has been used to profile various technological areas (Chakraborty et al., 2015; Guo et al., 2010), identify emerging topics in science and technology (Glänzel & Thijs, 2012; Small et al., 2014), and track the pathways of technological change (Hou et al., 2018; Zhang et al., 2016; Zhou et al., 2014). More recently, the use of advanced data analytic technologies, such as topic models, streaming data analytics, and machine learning, have massively increased the amount of data traditional bibliometrics methods can process (Ding & Chen, 2014; Klavans & Boyack, 2017). They have also brought the ability to reveal hidden relationships (Zhang et al., 2018; Zhang et al., 2017b) and visualize complicated technological portfolios and innovation networks in highly interpretable ways (Börner et al., 2012; Suominen & Toivanen, 2016).

From a technical point of view, even though network analytics has long been a mainstay of social science (Borgatti et al., 2009), it was only introduced to bibliometric studies in the late 2000s. Originally used as a method for investigating research collaborations and the interactions between disciplines through bibliographic couplings (Yan et al., 2009; Yang et al., 2010), it has subsequently been combined with citation analysis to identify emerging topics and evaluate research impacts (Takeda & Kajikawa, 2009; Yan, 2015). Network analytics has also been explored for its ability to predict emerging technologies (Érdi et al., 2013) and to reveal hidden technological opportunities (Park & Yoon, 2018).

Yet, even with these techniques, developing a bibliometric model to identify EGPTs is still highly challenging. First, bibliometric models emphasize the use of historical data and have a natural connection with citation statistics, and thus are friendly for measuring generality. However, rapid developments in natural language processing in recent years are relaxing the field's dependence on patent archives to temper this philosophy of past as prologue, which may reveal insights directly from the semantics of the subject matter. Second, despite keen interest and many pilot studies on measuring and forecasting technical emergence (Carley et al., 2018), current bibliometric models are still falling short of truly “characterizing the potential of what is detected to be emerging” (Rotolo et al., 2015). Balancing generality with emergence to comprehensively characterize EGPTs further increases this challenge. Third, the bibliometric community is recognizing the benefits of link prediction as a way of identifying the likely technologies of tomorrow, but applying those methods to bibliographical information is not yet seamless. For example, theoretically mapping the key attributes of emerging technologies to the topological indicators of a bibliometric network can be problematic. Similarly, integrating heterogeneous bibliographical information into a single network so as to discover social impacts in addition to technological transitions still has issues.

Aiming to address these concerns, we propose a methodology based on bi-layer network analytics to quantitatively identify EGPTs. The methodology begins with the construction of a co-term network (the first layer) and a co-authorship network (the second layer). The two layers are then integrated into a bi-layer network that reflects both the substance of the technologies (i.e., terms) and the social entities (i.e., authors) engaged in their associated R&D. Typically, a traditional bibliometric network only reflects one indicator, e.g., term co-occurrence or co-authorship. The proposed bi-layer network charts both, offering a bibliometric solution that not only reveals the impact of key technologies, but also the authors and collaborative networks that are advancing these technologies. Integrating all this information into one analysis provides a novel perspective from which to draw comprehensive new insights.

To fully leverage this perspective, we adapted the five attributes of emerging technologies defined by Rotolo et al. (2015) into three new indicators capable of quantifying the topological structures in a bi-layer network, namely fundamentality, speciality, and sociality. Interestingly, among Rotolo et al.’s quintet of attributes, prominent impact is of particular interest to our endeavors. From their literature review, Rotolo et al. (2015) found that most scholars conceive of prominent impact as a force “exerted on the entire socio-economic system” – a concept, they add, that “comes very close to that of ‘general-purpose technologies’”. Discontented with the sweeping nature of this definition for the purposes of defining emergence, the authors proposed a more utilitarian version which acknowledges that an emerging technology's impact may be limited to one or a few domains. Thus, the intriguing argument was made that if we can measure prominent impact, we can measure generality as well. In part, this notion inspired the tripartite design of the above indicators.

With the network constructed and the topological structures measured, candidate future innovations are identified with a refined link prediction algorithm, using a weighted index of resource allocation. The algorithm considers the links both within each network layer, i.e., co-term and co-authorship links, as well as between layers, i.e., author-term links. Whether or not a link is predicted is based on the weighted index, which is an amalgamation of frequency statistics, including term co-occurrence, co-authorships, and author-term co-occurrence. Ultimately, the differences between the current network and the predicted network are the key to forecasting technological changes and, of course, which technologies are most likely to be EGPTs in the near future.

A case study on 17,445 articles published in 15 journals and conference proceedings on information science between 1 Jan 1996 and 31 Dec 2018 demonstrate the feasibility and reliability of the method. Additionally, the empirical insights derived from the study should provide decision support to researchers and policymakers in information science disciplines.

The rest of this paper is organized as follows: Section 2 reviews previous studies on bibliometrics for analyzing emerging technologies, network analytics with bibliometric indicators, and theoretical discussion on characterizing EGPTs from a bibliometric perspective. In Section 3, we outline the research framework of the study and introduce the proposed methodology. Section 4 follows, presenting the data, results, validation measurements, and empirical insights derived from the case study. The article concludes in Section 5 with a discussion on the technical and practical implications of our findings, the limitations of the study, and possible future directions of research.

Section snippets

Literature review and theoretical background

As a tool for analyzing emerging technologies, bibliometrics has attracted common interest from the bibliometrics and ST&I communities. Further, the rising enthusiasm for social network analysis is solidifying the merit of bibliometrics in both breadth and depth. Therefore, what follows is a review of how bibliometrics has been used to analyze emerging technologies and an overview of network analytics and its bibliometric indicators. Furthermore, we discuss the theories and concepts of GPTs and

Methodology

An overview of the EGPT framework is given in Fig. 2. As illustrated, the methodology involves three key phases: data and pre-processing, bi-layer network analytics, and validation measurements.

Results

Our chosen discipline for this case study is information science. The choice to analyze one discipline may seem unusual given that we are, at least in part, testing the methodology's efficacy at predicting technologies that span a broad range of disciplines. Our reasoning here is that information science has, for some time, been a spearhead for cross-disciplinary research. Information science connects fundamental studies, such as mathematics, physics, and computer science, with the real-world

Discussion and conclusions

In this paper, we presented a methodology for characterizing EGPTs based on bi-layer network analytics. We defined three indicators to quantify the impact of EGPTs and applied a refined link prediction approach based on weighted resource allocation to reveal emergence. A comparison between the ranked terms in the current network and the predicted network reveals candidate EGPTs for further analysis by experts and/or an empirical review of the literature. We incorporated both types of analyses

Author contributions

Yi Zhang: Conceived and designed the analysis, Collected the data, Contributed data or analysis tools, Performed the analysis, Wrote the paper. Mengjia Wu: Conceived and designed the analysis, Collected the data, Contributed data or analysis tools, Performed the analysis, Wrote the paper. Wen Miao: Conceived and designed the analysis, Collected the data, Contributed data or analysis tools, Wrote the paper. Lu Huang: Conceived and designed the analysis, Wrote the paper. Jie Lu: Conceived and

Acknowledgments

An early version of this work was published in the Proceedings of the 2019 International Conference of the International Society for Scientometrics and Informetrics (Zhang et al., 2019).

This work was supported by the Australian Research Council under Discovery Early Career Researcher Award DE190100994 and the National Nature Science Foundation of China under Grant 71774013.

References (75)

  • E.Y. Li et al.

    Co-authorship networks and research impact: A social capital perspective

    Research Policy

    (2013)
  • X. Liu et al.

    Co-authorship networks in the digital library research community

    Information Processing & Management

    (2005)
  • M.E. Newman

    A measure of betweenness centrality based on random walks

    Social Networks

    (2005)
  • T. Opsahl et al.

    Node centrality in weighted networks: Generalizing degree and shortest paths

    Social Networks

    (2010)
  • I. Park et al.

    Technological opportunity discovery for technological convergence based on the prediction of technology knowledge flow in a citation network

    Journal of Informetrics

    (2018)
  • S. Petralia

    Mapping general purpose technologies with patent data

    Research Policy

    (2020)
  • A.L. Porter et al.

    Technology opportunities analysis

    Technological Forecasting and Social Change

    (1995)
  • D. Rotolo et al.

    What is an emerging technology?

    Research Policy

    (2015)
  • H. Small et al.

    Identifying emerging topics in science and technology

    Research Policy

    (2014)
  • E. Yan et al.

    Predicting and recommending collaborations: An author-, institution-, and country-level analysis

    Journal of Informetrics

    (2014)
  • Y. Zhang et al.

    Discovering and forecasting interactions in big data research: A learning-enhanced bibliometric study

    Technological Forecasting and Social Change

    (2019)
  • Y. Zhang et al.

    Does deep learning help topic extraction? A kernel k-means clustering method with word embedding

    Journal of Informetrics

    (2018)
  • Y. Zhang et al.

    Term clumping” for technical intelligence: A case study on dye-sensitized solar cells

    Technological Forecasting and Social Change

    (2014)
  • Y. Zhang et al.

    Topic analysis and forecasting for science, technology and innovation: Methodology and a case study focusing on big data research

    Technological Forecasting and Social Change

    (2016)
  • J. Allan

    Topic detection and tracking: Event-based information organization

    (2002)
  • S. Basu et al.

    Information and communications technology as a general-purpose technology: Evidence from US industry data

    German Economic Review

    (2007)
  • S.P. Borgatti et al.

    Network analysis in the social sciences

    Science

    (2009)
  • K. Börner et al.

    Design and update of a classification system: The UCSD map of science

    PLoS One

    (2012)
  • S.F. Carley et al.

    An indicator of technical emergence

    Scientometrics

    (2018)
  • T. Chakraborty et al.

    On the categorization of scientific citation profiles in computer science

    Communications of the ACM

    (2015)
  • David, P. A. (1989). Computer and dynamo: The modern productivity paradox in a not-too distant...
  • W. Ding et al.

    Dynamic topic detection and tracking: A comparison of HDP, C-word, and cocitation methods

    Journal of the Association for Information Science and Technology

    (2014)
  • P. Érdi et al.

    Prediction of emerging technologies based on analysis of the US patent citation network [journal article]

    Scientometrics

    (2013)
  • M. Girvan et al.

    Community structure in social and biological networks

    Proceedings of the National Academy of Sciences

    (2002)
  • W. Glänzel et al.

    Using "core documents" for detecting and labelling new emerging topics

    Scientometrics

    (2012)
  • S.J. Graham et al.

    Nanotechnology and the emergence of a general purpose technology

    Annals of Economics and Statistics

    (2009)
  • H. Grupp

    The concept of entropy in scientometrics and innovation research: an indicator for institutional involvement in scientific and technological developments

    Scientometrics

    (1990)
  • Cited by (0)

    View full text