Introduction

The research and development (R&D) of a pharmaceutical product is time-consuming from the start of research to obtaining approval, and the probability of success with a candidate compound is extremely low. The probability of success from discovery and pre-clinical initiation of a candidate compound to launch is approximately 0.1 %, the cost of development to launch one product is in the hundreds of millions of dollars, and R&D requires more than ten years. Recently, the pharmaceutical industry’s research costs have increased at a rate higher than the average for all industries.

In the pharmaceutical industry, leading companies undergo shifts in the processes of R&D, production, and sales (Comanor and Scherer 2013; Scherer 2010; Munos 2009). The R&D process is called the drug pipeline, starting with new drug discovery. The development of new drugs requires pre-clinical testing, three stages of clinical trials, and approval in each country.

The knowledge flow in the drug pipeline is defined as the shift of license to deal with drug candidates. Osabe and Jibu (2013a, b, 2014a, b, c, d, e, f) pointed out that the knowledge flow is critical in academic collaboration for the development of new drugs and is measured as the shift of leading companies observed in the drug pipeline data. However, the knowledge flow dynamics between the pharmaceutical companies in the drug pipeline remain a concern (Narayana et al. 2014; Mazzola et al. 2015).

The shift of license, i.e., the knowledge flow, in drug development, is a complex phenomenon arising from the relationships between pharmaceutical companies, government institutions, and educational institutions. The multilayer networks (MLNs) are significant concepts for understanding complex systems because many systems in the real world are constructed from various relations between components. For example, firms are connected by supply chain, ownership, a concurrent post of board members, and co-application of patents Aoyama et al. (2010). Further examples of MLNs include transportation networks (Kurant and Thiran 2006; Zou et al. 2010), climatic systems Donges et al. (2011), economic markets Yang et al. (2009), and energy-supply networks Buldyrev et al. (2010). The statistical mechanics in single networks are expanded into MLNs Bianconi (2013).

Therefore, our primary research goal is to understand the characteristics of the flow and localization of knowledge during drug development. Here, localization refers to knowledge accumulation within the same company, observed as incoming links and self-loops in the drug pipeline network. We analyze the drug pipeline data based on the MLN constructed with the drug pipeline network layer, global supply-chain network layer, and the global ownership network layer to achieve this research goal.

This paper is organized as follows. Section 2 describes the datasets used in this paper. We specifically explain the detailed characteristics of the drug pipeline data and drug pipelines. Section 3 explains and investigates the methods of how to construct the MLN. The analysis and results are explained in Sect. 4, showing a statistical test to verify the significance of the overlapping nodes and edges between a pair of layers. Finally, Sect. 5 is devoted to conclusions and discussion. Appendix 1 summarizes the analysis of a single-layer network, and Table 1 depicts a list of abbreviations used in this paper. We use ISO codes for countries in most of the figures and tables in this paper.

Table 1 List of abbreviations

Data

Drug pipeline data

We constructed a drug database, extracting all drug data from the Cortellis Competitive Intelligence database provided by Clarivate Analytics Cortellis, on December 11, 2013. The Cortellis Competitive Intelligence database includes pipeline data from various drug candidates from the R&D stage to the clinical trial stages and deals reports from approximately 7000 pharmaceutical companies. The drug development cycle consists of discovery, launched, clinical trial, pre-registration, and registered. Furthermore, this clinical trial can be categorized into three phases: Phase I (examinations for drug safety conducted with healthy adults), Phase II (examinations for safety and effectiveness of a drug compared with existing drugs, conducted with a few patients), and Phase III (examinations for safety and effectiveness of a drug compared with existing drugs conducted with many patients). Deals reports include originator companies (i.e., the licenser) and active companies (i.e., the licensee). We indexed these companies’ country and type. Indeed, there exist various types of business entities dealing with drug pipelines. The types are indexed as small-and medium-sized business firms classified as a private company, large-sized businesses classified as a public company, universities classified as educational institutions and government institutions classified as government institutions. The extracted drugs amount to 38,295, excluding those being marked as “no development reported” in the highest status, 18- months behind the extracting date. Notably, the drug pipeline data are a snapshot and do not contain historical information regarding the drug development cycle. For example, Phase III data, whose licensee is different from the licenser, can show the shift of license of the drug candidate between them until Phase III but cannot show when it happened.

The drug pipeline starts with discovering a new drug candidate, which requires pre-clinical testing, three stages of clinical trials, and approval to launch. Generally, venture firms or educational institutions provide grants to the discovery of drug seeds and the licensee. For example, leading firms advance the development to launch, incurring considerable R&D expenses. These characteristics vary from country to country. Drug pipeline data help us to understand the knowledge flow in the drug development cycle as the shift of license to deal with drug candidates, i.e., the flow from the licenser to the licensee.

Figure 1 shows the number of drug pipelines at the launched status by country, where we counted the license of the launched drugs by country. Furthermore, we compared the licensee with the licenser in drug pipelines to measure the self-sufficiency of individual countries and companies. For example, in the United States (US), own country in the legend means that drug pipelines are started in the US, other country means all countries except the US, own company means that the US company as the licensee started the drug pipeline by own company, and “other company” indicates all companies except for the licensee. The US has launched several drug pipelines. We found that most of the top 20 countries have discovered and launched more than half of their drug pipelines in their countries, and the drug pipelines in China (CHN) tend to be developed in her country compared with other countries. We anticipate that the licensee of the drug pipeline frequently changes because of R&D expense. To characterize the changes from the viewpoint of the type of business entity, we compared the licenser or licensee type, namely, government institution, educational institution, a private company, and public company.

Fig. 1
figure 1

The number of drug pipelines at the launched status by countries. We show the top 20 countries with self-sufficiency rates by “own country” and “own company”. “Own country” in the legend represents the number of drug pipelines discovered and launched by each country. In contrast, “other country” represents the number of drug pipelines at launched status by each country, where a different country discovers them. Similarly, “own company” in the legend represents the number of drug pipelines discovered and launched by each country. “Other company” represents the number of drug pipelines launched by each country, where they are discovered by the different companies from these owners

Figure 2 shows the number of drug pipelines for each status for the top four countries in Fig. 1 and the ratio of components regarding business entity and self-sufficiency. “Own country” in the legend represents the number of drug pipelines discovered by each country. In contrast, “other country” represents the number of drug pipelines at individual status by each country, where a different country discovers them. Similarly, “own company” in the legend represents the number of drug pipelines discovered by each country’s own company. “Other company” represents the number of drug pipelines at individual status by each country, where the different company discovers them from these owners. As shown in Fig. 2a, the US has approximately 8,000 drug discovery seeds, which is more than eight times compared with that of the other countries. As expected, the government and educational institutions tend to release the license at an early stage of the R & D process and the public company’s license acquisition rate; for example, the number of leading firms increases as the process reaches the launched status. Furthermore, only in Japan (JPN), the number of drug pipelines at the discovery status is smaller than that at the launched status. Because of the lack of drug discovery seeds nowadays will decline the country’s pharmaceutical industry in the future, countries such as JPN must obtain drug pipelines from other countries.

Fig. 2
figure 2

The number of drug pipelines for each status, and these fractions of type of licenser and licensee. We show the top four countries in Fig. 1a US, b JPN, c CHN and d GBR. “Own country” in the legend represents the number of drug pipelines discovered by each country. In contrast, “other country” represents the number of drug pipelines at certain statuses by each country, where a different country discovers them. Similarly, “own company” in the legend represents the number of drug pipelines discovered by each country’s own company. “Other company” represents the number of drug pipelines at certain status by each country, where the different company discovers them from these owners

Figure 3 shows the knowledge flows in the drug pipelines between the licenser and licensee at the country level per pipeline status. Here we focus on the top 10 countries in Fig. 1 and aggregate the pre-registration and registered statuses because of the availability of a few pipelines. We ignore the drug pipelines whose license has not shifted to investigate the flows between different business entities. The diagonal elements correspond to the knowledge flows between companies in the country; thus, the knowledge flows from the firms in the US to the other countries in the drug pipelines. Furthermore, the knowledge of drug pipelines seems to flow not only in the direction from firms in the US to the others but also in the opposite direction.

Fig. 3
figure 3

Knowledge flows of drug pipelines between licenser and licensee at country level per pipeline status focused on the top 10 countries in Fig. 1. Each element represents the number of drug pipelines between countries, ignored the drug pipelines whose license has not shifted. The diagonal elements correspond to the number of shifts of licenses between companies in the country

Supply chain and ownership data

The global supply chain and ownership data in the year 2017 were constructed by collecting various companies’ data from the Standard & Poor’s (S&P) Capital IQ platform website CapitalIQ. The Capital IQ dataset covers more than 500,000 companies with information on business relations in 217 countries in 159 industrial sectors defined by the S&P, including all listed companies in the world. The data include company ID, company name, country, location of the company, company type, and the primary industry as node information. Industrial classification is based on the Global Industry Classification Standard, developed by Morgan Stanley Capital International and the S&P. The supply chain data also include examples of the business relationship between the supplier and customer as the edge information. Although various business relations that fall under suppliers are supplier, creditor, franchiser, licenser, landlord, lessor, auditor, transfer agent, investor relations firm, and vendor, most are supplier and creditor. Here, the supplier is a company providing the products or services, whereas the creditor is a private, public, or institutional entity availing funds to others to borrow. Notably, the links in the dataset are dominated by supply chain business relationships. Therefore, the characteristics of the dataset reflect the global supply chain network.

The ownership data include a list of shareholding companies and individuals for each company as the link information. The list comprises the top 100 owner companies and individuals with the ownership ratio data. The link’s weight is the ownership ratio owned by the owner between 0% and 100%. The listed firms dominate firms in the ownership data for each country. Therefore, the coverage of ownership data is narrower than that of the supply chain data (Table 2).

Table 2 The numbers of nodes, edges, and self-loops in supply chain and ownership data

Notably, the collection years for the two datasets, namely, the drug pipeline data and the company’s supply chain and ownership data, are different because of data availability. The former data are for 2013, whereas the latter is for 2017. The four-year difference in the data on the rapidly changing pharmaceutical industry might be a weakness. In this paper, we clarify the relationship among the knowledge flows, the flow of goods, and capital flow in the global pharmaceutical industry by analyzing the data as the MLN. The study is the first attempt and therefore considered significant enough.

Methodology

This study’s primary purpose is to reveal the dynamics of propagation and the localization of knowledge in the drug development cycle by analyzing the drug pipeline and supply chain and ownership data. For this research, we constructed an MLN, simultaneously representing three types of relationships between companies and institutions regarding the knowledge flow in the drug pipeline. Before we move to the MLN analysis, we explain the definition of the networks of each layer.

Drug pipeline network

We represent the drug pipeline data acquired from Clarivate Analytics as a drug pipeline network \(G_1\) in Fig. 4. In the graph \(G_1=(V_1,E_1)\), \(V_1\) is a node-set constructed from companies, government, and educational institutions listed in the drug pipeline data, and \(E_1\) is an edge set constructed from licensor–licensee relations on drug pipelines. Let us denote that a directional edge is present as \(i\rightarrow j\) in the drug pipeline network, where node i is the licenser, and node j is the drug candidate’s licensee in the drug pipeline data. As mentioned in Sect. 2, nodes are various types of business entities categorized as a government institution, educational institution, private company, and public company. One drug pipeline also does not always define one edge in the drug pipeline network because there is the case that the originator of the drug candidate shares the license with two companies. We denote the adjacency matrix of \(G_{1}\) as \(A^{[1]}=\left( a_{ij} ^{[1]}\right)\), where the element \(a_{{ij}}^{{[1]}}\) corresponds to the number of edges from node i to j. The superscript represents the layer index. The in- and out-degrees of node i are defined by \(k^{[1]} _{\text {in},i} =\sum _j a_{ji} ^{[1]}\) and \(k^{[1]} _{\text {out},i} = \sum _j a_{ij} ^{[1]}\). In the graph \(G_1\), \(k^{[1]} _{\text {in},i}\) and \(k^{[1]} _{\text {out},i}\) represents the total number of drug pipeline that the company i is dealing with, and the total number of drug pipeline that the company i has discovered, respectively. Moreover, the drug pipeline data contains the case that the drug pipeline’s licensee corresponds to the licenser. In other words, the graph \(G_1\) contains self-loops, and the number of self-loops of node i are computed as \(\ell ^{[1]} _i= a_{ii} ^{[1]}\). In order to distinguish the shifts of license with the other, we denote the number of edges of node i from/to different nodes as in-/out-degree removed self-loops,

$$\begin{aligned} m^{[1]} _{\text {in},i}=k^{[1]} _{\text {in},i} - \ell ^{[1]} _i \quad \text {and}\quad m^{[1]} _{\text {out},i}=k^{[1]} _{\text {out},i} - \ell ^{[1]} _i~. \end{aligned}$$
(1)

Thus, \(m^{[1]} _{\text {in},i}\) and \(m^{[1]} _{\text {out},i}\) represents the number of license of drug candidates that the company i has transferred to the others, and that the company i has obtained from the others, respectively. Therefore, in the graph \(G_1\), the edges \((i\ne j)\) are knowledge flows in the drug pipelines. However, the self-loops \((i=j)\) represents the localization of the knowledge in one company.

Furthermore, the drug pipeline data is a snapshot data and a record at a particular development stage. Thus, we use the status of the drug development cycle as edge attribution, \(E_1 = \bigcup _{p}E_1 ^{p}\), and \(E_1 ^p\) represents the drug pipeline at the status \(p\in \{\)Discovery, Phase I–III Clinical, Pre-registration, Registered, Launched\(\}\). In response to the extension, we add the status index p to the definitions of the characteristics such as in-degree of the company i, \(k^{[1,p]} _{\text {in},i}\). For example, the launched drug without transfer of license is represented as a self-loop with the launched attribution in the network, and \(\ell ^{[1,p]} _i\) at \(p=\{\text {Launched}\}\) represents the number of launched drugs for which the company i discovered and has launched. Although we cannot determine when drug candidates’ license is transferred because of data property, we can observe the tendency of knowledge flows in the drug pipeline between business entities in the pharmaceutical area.

Supply chain network

The supplier-customer relationship is most important at the company level in the real economy. We represent the supply chain data acquired from the S&P Capital IQ dataset as a supply-chain network \(G_2\) in Fig. 4. In the graph \(G_2=(V_2,E_2)\), \(V_2\) is a node-set constructed from companies listed in the supply chain data, and \(E_2\) is a edge set constructed from these supplier–customer relations. The supply chain network is an unweighted directed network, representing the supply chain business relationships. We denote a directional edge as \(i\rightarrow j\) when company i is a supplier to company j. The in- and out-degrees of node i are defined by \(k^{[2]} _{\text {in},i} =\sum _j a_{ji} ^{[2]}\) and \(k^{[2]} _{\text {out},i} = \sum _j a_{ij} ^{[2]}\). Moreover, we denote the number of edges of node i from/to different nodes as in-/out-degree removed self-loops, \(m^{[2]} _{\text {in},i}=k^{[2]} _{\text {in},i} - \ell ^{[2]} _i\) and \(m^{[2]}_{\text {out},i}=k^{[2]} _{\text {out},i} - \ell ^{[2]} _i\). In the graph \(G_2\), \(m^{[2]} _{\text {in},i}\) and \(m^{[2]} _{\text {out},i}\) represents the number of suppliers and customers of company i, respectively.

Ownership network

We define ownership networks \(G_3\) in Fig. 4 based on the S&P Capital IQ dataset. In the graph \(G_3=(V_3, E_3)\), \(V_3\) is a node-set constructed from companies listed in the ownership data, and \(E_3\) is an edge set constructed from these ownership relations. We denote an edge from company i to j when company j has a stake in the company i. The in- and out-degrees of node i are defined by \(k^{[3]} _{\text {in},i} =\sum _j a_{ji} ^{[3]}\) and \(k^{[3]} _{\text {out},i} = \sum _j a_{ij} ^{[2]}\). Moreover, we denote the number of edges of node i from/to different nodes as in-/out-degree removed self-loops, \(m^{[3]} _{\text {in},i}=k^{[3]} _{\text {in},i} - \ell ^{[3]} _i\) and \(m^{[3]}_{\text {out},i}=k^{[3]} _{\text {out},i} - \ell ^{[3]} _i\). In the graph \(G_3\), \(m^{[3]} _{\text {in},i}\) and \(m^{[3]} _{\text {out},i}\) represents the number of companies which company i holds shares, and the number of shareholders of company i, respectively. Furthermore, the number of owners of each company was limited to 100 in the S&P Capital IQ dataset. Although the S&P Capital IQ dataset includes the ownership ratio, we define the ownership network as an unweighted directed network to increase the number of duplicated nodes between layers. Note that the ownership network in this paper represents the dependency flow, which is in the opposite direction to that typically used because we assume that the firm knowledge tends to flow to these owners.

Combining datasets

To construct the MLN in Fig. 4, we must combine the drug pipeline data and the company’s supply chain and ownership data. Because the company’s name in the two datasets is not always the same, we performed name identification between the two datasets by using information about the country and industry after removing abbreviations such as Inc and Corp. When multiple candidates arose, we checked them manually.

MLN representation

In this subsection, we generalize the definitions of our networks using an MLN framework. Generally, the MLN is a pair defined as \({\mathcal {M}}=({\mathcal {G}},{\mathcal {C}})\), where \({\mathcal {G}}=\{G_{\alpha }; \alpha \in \{1,\cdots , M \} \}\) of the family of graphs \(G_{\alpha }=(V_{\alpha },E_{\alpha })\), where the set of nodes of layer \(G_{\alpha }\) is denoted as \(V_{\alpha }\), and M is the number of layers. \({\mathcal {C}}=\{E_{\alpha \beta }\subseteq V_{\alpha }\times V_{\beta }; \alpha ,\beta \in \{1,\cdots , M \}, \alpha \ne \beta \}\) is a set of interconnections between the nodes of different layers \(G_{\alpha }\) and \(G_{\beta }\) with \(\alpha \ne \beta\). We define the MLN (\(M=3\)), composed of companies and institutions as nodes and three types of interactions between them, by using the drug pipeline, supply chain, and ownership data. Because each layer of our MLN is defined by the types of interactions, there are overlapped nodes but no interconnection between the nodes of different layers: \({\mathcal {C}}=\{\emptyset \}\). The graphs \(G_{\alpha }\) for each layer are defined as

  • \(G_{1}\): Drug pipeline network

  • \(G_{2}\): Supply-chain network

  • \(G_{3}\): Ownership network

where the definitions for each layer are explained in previous subsections. Figure 4 displays the conceptual representation of MLN.

The adjacency matrix of each layer \(G_{\alpha }\) is denoted by \(A^{[\alpha ]}=\left( a_{ij} ^{[\alpha ]}\right)\), where the element \(a_{ij} ^{[\alpha ]}\) corresponds to the number of edges from node i to j in the \(\alpha\)-th layer. The in- and out-degrees of a node i of the MLN are defined as vectors:

$$\begin{aligned} \varvec{k}_{\text {in},i} = \left( k^{[1]} _{\text {in},i}, ~~k^{[2]} _{\text {in},i}, ~~k^{[3]} _{\text {in},i}\right) ~~~\text {and}~~~~\varvec{k}_{\text {out},i} = \left( k^{[1]} _{\text {out},i}, ~~k^{[2]} _{\text {out},i}, ~~k^{[3]} _{\text {out},i}\right) , \end{aligned}$$
(2)

where \(k^{[\alpha ]} _{\text {in},i}\) and \(k^{[\alpha ]} _{\text {out},i}\) are the in- and out-degrees of node i in the \(\alpha\)-th layer, i.e., \(k^{[\alpha ]} _{\text {in},i} =\sum _j a_{ji} ^{[\alpha ]}\) and \(k^{[\alpha ]} _{\text {out},i} = \sum _j a_{ij} ^{[\alpha ]}\). Similarly, we denote the number of self-loops of node i in the \(\alpha\)-th layer as, \(\ell ^{[\alpha ]} _i= a_{ii} ^{[\alpha ]}\). In this paper, we must recognize the flows as self-loops because self-loops in the drug pipeline network \(G_1\) correspond to the accumulation of knowledge. Thus, we count the number of flows in each layer by the edges between two nodes and self-loops. The number of edges of node i from/to different nodes is denoted as

$$\begin{aligned} m^{[\alpha ]} _{\text {in},i}=k^{[\alpha ]} _{\text {in},i} - \ell ^{[\alpha ]} _i ~~~~~~\text {and}~~~~~~~m^{[\alpha ]} _{\text {out},i}=k^{[\alpha ]} _{\text {out},i} - \ell ^{[\alpha ]} _i~, \end{aligned}$$
(3)

and their total number is

$$\begin{aligned} M_{\alpha }=\frac{1}{2}\sum _{i,j=1,i\ne j} ^{N_{\alpha }} a_{ij} ^{[\alpha ]}~~~~~~\text {and}~~~~~~~L_{\alpha }=\sum _i^{N_{\alpha }} a_{ii} ^{[\alpha ]}~. \end{aligned}$$
(4)

where \(N_{\alpha }\) is the number of nodes in \(\alpha\)-th layer. Notably, we added the status layer index p to the definitions of the first layer characteristics such as in-degree of the i-th node \(k^{[1,p]} _{\text {in},i}\) and the total number of edges \(M^{p}_1\).

Fig. 4
figure 4

Overview of the MLN representation. We used the drug pipeline, supply chain, and ownership data to define the MLN (\(M=3\)), which is composed of firms and institutions as nodes. Then, we construct MLN for knowledge flow based on the edge attribution about the status of the drug pipeline

Bow tie structure

As we showed in the edge-level analysis in Sect. 2, the knowledge flow in the drug pipeline seems to be circulated at the country level; however, whether the flows connect at the company level is unclear. Generally, the giant weakly connected components (GWCCs) of a directed network can be decomposed as giant strongly connected components (GSCCs), which is the largest size of the SCC in the GWC, its upstream and downstream portions (IN and OUT) known as the bow tie decomposition in the Web Broder et al. (2000). This decomposition could help us understand the hierarchical and circular flows of the networks from a macroscopic perspective.

Community structure

Besides the macroscopic structure measured as a bow-tie structure, a community detection is a powerful tool for explaining densely connected networks’ structural properties. We compared the structural properties between layers by using node attributes such as country and primary industry. Although we show the detailed analysis of community structure of each network in 6.2, we explain the method that is useful to extract community structures of a network here.

To find communities in the GWCC of the layers, we use the map equation method Rosvall and Bergstrom (2008), known as Infomap, which is one of the best performing community detection methods Lancichinetti and Fortunato (2009). The map equation method is a flow-based and information-theoretic approach to find an efficient code for minimizing the length of the description of the random walk for generating a module partition \({\mathcal {M}}\) to divide n nodes into m communities. Then, the average single-step description length is defined as

$$\begin{aligned} L({\mathcal {M}})=q_{\curvearrowleft }H({\mathcal {Q}})+\sum ^{m}_{i=1}p_{i\circlearrowright }H({\mathcal {P}}_i)~. \end{aligned}$$
(5)

The first term arises from the movements of the random walker across modules, where \(q_{\curvearrowleft }\) is the probability that the random walker switches communities, and \(H({\mathcal {Q}})\) depicts the average description length of the community index codewords given by the Shannon entropy. The second term arises from the intra-community movement of the random walker, where the weight \(p_{i\circlearrowright }\) represents the fraction of the movements within the community, and \(H({\mathcal {P}}_i)\) represents the entropy of the intra-community movement. Furthermore, this method has been extended to a hierarchical map equation Rosvall and Bergstrom (2011) that decomposes a network into communities and sub-communities.

We detect the hierarchical communities by using the multi-coding Infomap method, and we use the “Level” index to represent the hierarchy of communities; communities at the 2nd level represent sub-communities at the 1st level. To characterize the hierarchical communities, we use node attributions with country, company type, and bow tie component for the drug pipeline layer, and country, primary industry, and bow tie component for the supply-chain and ownership layer, respectively.

Interlayer degree correlations

It is reasonable to assume that a hub node in the drug pipeline network could be a hub in the supply chain or ownership network. Thus, to verify this assumption, we investigate the degree correlations for each node between different layers. However, we must note that the edges in the drug pipeline layer \(E_1 ^{p}\) characterized by the status \(p\in \{\)Discovery, Phase I–III Clinical, Pre-registration, Registered, Launched\(\}\), and the drug pipelines that have not reached the launched stage could disappear in the drug development cycle. Therefore, we only focus on the launched case here. Furthermore, each company’s incoming edges in the drug pipeline layer are equal to the number of drugs with which it is dealing. From this perspective, we divide the edges on which we focus in the drug pipeline layer into two types:

  • Self-loops, \(\ell _i^{[1,p]}\) at \(p=\{\text {Launched}\}\), corresponding to the number of launched drugs that the company i discovered as the licenser and has launched. In this paper, we define that \(\ell _i^{[1,p]}\) at \(p=\{\text {Launched}\}\) represents the closed innovation.

  • In-degrees removed self-loops, \(m_{\text {in},i}^{[1,p]}\) at \(p=\{\text {Launched}\}\), corresponding to the number of launched drugs for which company i was not licensed originally but owns because of the licensee. In this paper, we define that \(m_{\text {in},i}^{[1,p]}\) at \(p=\{\text {Launched}\}\) represents the open innovation.

Node and edge overlap

Although the dynamics of knowledge flows in the drug pipeline between companies remain unclear, we can raise a possible hypothesis for the edge-level similarity. When drug pipelines develop in a too closed situation, such that pharmaceutical companies do not have business with the same industry, the supply chain’s edge-level similarity ceases to appear. However, the alliances between the pharmaceutical companies, i.e., open innovation in the pharmaceutical industry, could share not only markets but also drug pipelines, which is assumed to appear as the edge-level similarity to the supply-chain layer. Furthermore, if the drug pipeline tends to be transferred to the owner of its licensor, we might observe the edge-level similarities to the ownership network. Although knowledge of the pharmaceutical industry might flow along with the flow of control, it is challenging to observe it in our MLN when the mergers and acquisition (M&A) causes it. To confirm the above hypothesis, we compute the overlapping of nodes and edges as

$$\begin{aligned} O(X_{\alpha },X_{\beta })=\frac{\left| X_{\alpha }\cap X_{\beta }\right| }{\left| X_{\alpha }\cup X_{\beta }\right| }~, \end{aligned}$$
(6)

where \(X_{\alpha }\) is a set of nodes/edges at the \(\alpha\)-th layer. We measure the overlap by the fraction of nodes/edges appearing in both layers over the aggregate number of nodes/edges of the two layers. We ignore the multiple edges and self-loops in the drug pipeline layer.

To evaluate the statistical significance of the finding stated above more precisely, we compute the probabilities (p values) when the expected number of overlapped edges is larger than the observed value by using a statistical test. Here, the null hypothesis is that we have no edge overlap between the two layers. First, we assume that the probability of generating the \(\alpha\)-th layer having the x overlapping edges obeys the binomial distribution, \(x\sim \text {B}(n,p(\beta |\alpha ))\), where \(n=\left| E_{\alpha }\cup E_{\beta }\right|\). We define the conditional probability that the edge connects between the two nodes of the overlapping knowledge flow layer (\(\beta =1\)) and the \(\alpha\)-th layer as

$$\begin{aligned} p(\beta =1|\alpha )=p_{1}\times p_{\alpha }=1\times \frac{\left<{\bar{k}}^{[\alpha ]}\right>}{\left| V_{\alpha }\right| }~, \end{aligned}$$
(7)

where \(\langle {\bar{k}}^{[\alpha ]}\rangle\) is the half value of the averaged total degree of the \(\alpha\)-th layer. Therefore, the last term in the right-hand side of this equation corresponds to the probability of finding the edges for randomly selected pairs of nodes. Here, the drug pipeline layer (\(\beta =1\)) is independent variable of the \(\alpha\)-th layer, \(p_{1}=1\), by definition.

Results

In this section, first, the results of the single-layer analysis were explained, and then the multi-layer analysis was performed based on the results.

Single-layer analysis

In preparation for MLN analysis, we carried out an analysis of the bow-tie structure and the community structure of every single layer: the drug pipeline network, the supply chain network, and the ownership network. Bow tie analysis suggests that the knowledge flows in drug pipelines have similar macroscopic characteristics as those of the supply chain network. For a comparison of the structural properties between layers, we conducted the community detection using the Infomap method and characterized each community in terms of node attributes such as country and primary industry. For the detailed discussion of single-layer analysis, see Appendix 1. As discussed in Sect. 2, the knowledge flow between the licenser and licensee to circulate at the country level and the global level because of the US companies. We observed a large size of SCC in the bow-tie structure at the company level in the drug pipeline network, which is a large-scale circular flow between pharmaceutical companies.

Table 3 The number of nodes, edges, and self-loops before combining Coritellis and Capital IQ datasets
Table 4 The number of edges and self-loops for each pipeline status in the drug pipeline layer

Interlayer degree correlations

We constructed the MLN to investigate the knowledge flow between companies in drug pipelines using the drug pipeline, supply-chain, and ownership data. Table 3 summarizes the number of nodes, edges, and self-loops for each layer of the MLN, where we list the number of the cases successfully combined by the Coritellis and Capital IQ datasets in parentheses. Table 4 shows the knowledge flows in drug pipelines for each status of the drug development cycle.

We show the relations between the number of launched drugs and the number of supplier–customer (ownership) links in the upper (lower) part of Fig. 5. Note that the number of drugs corresponds to the edges attributed to \(p=\{\text {Launched}\}\) at the drug pipeline layer, and Fig. 5 does not contain companies not dealing with launched drugs. This is the reason that the number of data points in Fig. 5 is less than the number of nodes in Table 3.

As defined in Sect. 3, \(\ell _{\text {in},i}^{[1,p]}\) and \(m_{\text {in},i}^{[1,p]}\) at \(p=\{\text {Launched}\}\) represents the closed and open innovation, respectively. Thus, we can investigate the relation between layers from Fig. 5a and c for the closed innovation, and Fig. 5b and d for the open innovation, respectively. We also calculated the Pearson correlation coefficients as denoted on these figures. Note that the upper limits at approximately 100 for the out-degree in the ownership layer because of the threshold in data collection. This is why we do not compute the Pearson correlation coefficients relating to out-degree in the ownership layer. Notably, the companies having larger \(m_{\text {in},i}^{[1,p]}\) at \(p=\{\text {Launched}\}\) have a larger number of supplier–customer or ownership relations. As a result, we have weak and positive interlayer degree correlations between the drug pipeline and both supply chain and ownership. Especially, we observed more strong correlation in the open innovation, which is measured by \(m_{\text {in},i}^{[1,p]}\) at \(p=\{\text {Launched}\}\) in Fig. 5b and d, than the closed innovation in Fig. 5a and c. Therefore, we could expect that open and closed innovation is useful for some companies with a larger number of suppliers–customer or ownership relations in the pharmaceutical industry– and they appear more remarkably in open innovation.

Fig. 5
figure 5

Interlayer degree correlations. The Upper (lower) two figures illustrate the relations between the number of drug pipelines and the number of supplier–customer (ownership) links. The left (right) vertical axis corresponds to the number of self-loops (in-degree) at the drug pipeline layer’s launched state. Since the upper limits at approximately 100 for the out-degree in the ownership layer because of the data collection threshold, we do not compute the Pearson correlation coefficients of them

Node and edge overlap

Table 5 Overlap of nodes and edges between the layers, as measured by the fraction of nodes/edges appearing in both layers over the aggregate number of nodes/edges of the two layers

Table 5 shows the node and edge overlap between the two layers. We demonstrate the overlapping of nodes and edges between the layers by measuring the fraction of nodes/edges appearing in both layers over the aggregate number of nodes/edges of the two layers. There is a lower proportion of node/edge overlap present at the ownership layer, which is more remarkable than the GWCC case.

Table 5 shows that the overlap between the knowledge flow layer and the supply chain layer is more extensive than other combinations of layers in both the node overlap and edge overlap. Now, the conditional probabilities \(p(1|\alpha )\) are given as

$$\begin{aligned} p(1|\alpha ) = {\left\{ \begin{array}{ll} 2.33 \times 10^{-3} &{} (\alpha =2) \\ 1.14 \times 10^{-2} &{} (\alpha =3) \end{array}\right. }\, {\rm where}\, \langle{\bar{k}}^{{[}\alpha {]}}\rangle = {\left\{ \begin{array}{ll} 2.68 &{} (\alpha =2) \\ 1.01 &{} (\alpha =3) \end{array}\right. }, \end{aligned}$$
(8)

for the GWCC of supply chain \((\alpha =2)\) and ownership \((\alpha =3)\) network. Figure 6 compares the comparisons of the actual number of edge overlaps with the above binomial distributions, where Fig. 6a shows the supply chain and (b) the ownership network. Consequently, the p-value shows a value approximately equal to zero (one) for the supply-chain (ownership) layer. Therefore, the null hypothesis is rejected (adopted) for the supply chain (ownership) layer, proving that the observed edge overlaps between the knowledge flow and the supply chain (ownership) layers are (are not) statistically significant.

Therefore, the pharmaceutical industry’s knowledge flows are related to the flow of products associated with the supply chain, rather than the dependency based on the ownership network. This suggests that pharmaceutical companies use open innovation to share the drug pipeline and spread the business market as the supply chain. Although we do not use the ownership data, including M&A, the relation between the knowledge flow of the drug pipeline and ownership network is rare. Future work, therefore, should compare the traditional M&A strategy of pharmaceutical companies with their sustainable growth.

The similarity between the knowledge flows of drug pipelines and supply-chain networks is also observed at the edge level, i.e., the licenser (licensee) of drug pipelines tends to be the supplier (customer), and a hub of knowledge flows of the drug pipelines tends to be their hub in the supply chain. Our results suggest a strong connection between an open innovation in the pharmaceutical industry and firms’ activities regarding the supply chain. Generally, the pharmaceutical industries promote drug discovery and a clinical trial using artificial intelligence (AI) methods. For example, large pharmaceutical companies, such as Pfizer and Novartis, deployed AI systems for drug discovery and clinical trials by IBM Pharmatutor.org , Fleming (2018). This collaboration suggests an increase in the need for the relation between drug pipelines and supply chains in other industries. Based on the analysis, our findings, therefore, agree with the reported situation of pharmaceutical innovation.

Fig. 6
figure 6

Comparison with binomial distribution. The solid lines represent binomial distributions \(\mathrm {B}(n,p_{\alpha })\), where \(p_{\alpha }\) is the conditional probability that we can choose the overlapped edge between drug pipeline layer and the \(\alpha\)-th layer from n edges for a for supply chain and b ownership. The dashed lines represent observed values listed in Table 5, that is, 480 for \(\alpha =2\) and 22 for \(\alpha =3\)

Implications

Open innovation is a lucrative field within industries relying on innovation. Specifically, the pharmaceutical industry represents knowledge-intensive industries, and open innovation should significantly boost pharmaceutical collaborations and reduce the cost of long-term R & D processes or the M & A in closed innovation.

Consequently, the development of a new drug and trial requires high investment, but with a low possibility of success. Private companies, such as small and medium enterprises and ventures, end up funding innovations because of high investments. Furthermore, public companies, such as big pharmaceutical companies, are always looking for new available targets. Thus, to boost pharmaceutical innovation, based on globalized and IT innovation, it is crucial to understand the dynamics of innovation regarding the firm-level economy’s various interactions.

To reveal pharmaceutical innovation’s actual situation, we proposed an MLN framework using the drug pipeline, global supply chain, and ownership data. Furthermore, we investigated the characteristics of the knowledge flows between the licenser and licensee, defined by the drug pipeline data, from the proposed framework. Several seeds of the drug pipeline were discovered in the US and provided to other countries. Larger firms, such as public companies having an individual market in the supply chain, can hold many drug pipelines as a licensee, causing pharmaceutical knowledge flows cross-borders among countries. The macroscopic structure of pharmaceutical knowledge flows provides us knowledge about the characteristics of each country. Because the US firms provide drug pipeline seeds, they compose the upper stream of the entire network of the knowledge flow of the drug pipeline. Although firms in JPN do not have many drug seeds, they deal with the second-largest launched drug pipeline. Several firms in JPN are interdependent with others on drug development. From a macroscopic structure, we can confirm that most of the firms in JPN are located in the largest size of the SCC. By contrast, the firms in CHN, having a high self-sufficiency rate of drug pipelines, do not involve the circular flow of knowledge of drug pipelines with others. Furthermore, the tendency of closed innovation in CHN is seen from a closed community of pharmaceutical firms in CHN in the global supply chain network.

Finally, an essential question for future studies is to understand how some operations, such as policy making, could boost pharmaceutical innovation. Thus, our study provides a framework for future studies to assess pharmaceutical innovation’s characteristics regarding various firm-level activities by using MLN representation. Although our results are inspiring, more investigations are required for further improvements. The traditional M&A strategy of pharmaceutical companies could be evaluated based on their sustainable growth by combining the historical M&A data with our networks. Furthermore, our approach can investigate how the risk, patent cliff in the pharmaceutical industry will spread to other sectors in the future. We hope that future studies could reduce unnecessary risk.

Conclusion

Drug development is time-consuming from the start of research to obtaining approval, and the probability of success with a candidate compound is extremely low. We evaluated the characteristics of the flow and localization of knowledge during drug development in the global pharmaceutical industry.

We analyzed the MLN constructed with the drug pipeline network layer, the global supply chain network layer, and the global ownership network layer. First, we focused on the flows in an individual layer of MLNs, licenses as knowledge, goods, and capital. We identified the bow tie and the community structure of each network layer. The obtained bow-tie structure suggested that the knowledge flows in drug pipelines have macroscopic characteristics similar to the supply chain network, characterized as the larger SCC size. The obtained communities in three layers were characterized by country, category of the company, and bow tie component.

After analyzing individual layers, we studied the knowledge flows in the MLN, primarily whether the knowledge flows are related to the flow of goods in the supply chain or the flow of capital in the ownership. Based on the results obtained from the MLN analysis, we conducted a statistical test. We verified the significance of the overlapping of edges between drug pipeline and supply-chain layers. Our results suggest a strong connection between open innovation in the pharmaceutical industry and its economic activities in the supply chain.