1 Introduction

Technological change is an evolutionary process coupled to patterns of industrial dynamics and organizational transformation. Industry structure and evolution are shaped by firms persistently searching for, and adopting, new ideas and new organizational forms in their struggle for survival and for industrial leadership (Dosi and Nelson 2010). Few seminal contributions have emphasized the key role of scientific and technological paradigms in shaping the evolution of industries (see Nelson and Winter 1982; Dosi 1982; Dosi 1988). Along this tradition, since the Eighties networks of innovators have been studied in the literature as a key feature of industry structure and evolution (see Pisano 1991; Galambos and Sturchio 1998; Arora and Gambardella 1994; Powell et al. 1996; Orsenigo et al. 1997; Henderson et al. 1999). In fact, a division of innovative labour through collaborative agreements between small firms generating new ideas and large companies specialized in development and commercialization had diffused across different technological fields and industries (see Grayson 1960; Arora and Gambardella 1990; Arrow 1993; Arora et al. 2009), sustaining a collective distributed effort to reach a balance between exploration of multiple research trajectories and their exploitation (March 1991). In particular, the seminal contribution of Orsenigo et al. (1997, 2001) established a direct connection between the structure and evolution of knowledge bases and the structure and evolution of collaborative R&D in pharmaceutical innovation, studied as a science-intensive industry (see Gambardella and et al., 1995). Patterns of scientific and technological change concurred to transform some of the fundamental sources of competitive advantage of pharmaceutical firms. Over time, companies progressively redesigned their competencies and roles in drug discovery and development, within a dense network of contractual relations. Orsenigo et al. (2001) and Riccaboni and Pammolli (2002) showed how technological trajectories within the industry shaped the emergence and structural evolution of an extensive network of R&D contractual agreements between firms. Starting from the mid Seventies, the biotechnology revolution (Orsenigo 1989; Galambos and Sturchio 1996; Nightingale and Martin 2004; McKelvey et al. 2004) and the molecularization of physiology, pathology and pharmacology sustained a transition from an industry dominated by a stable core of large firms founded in the XIX Century to a novel division of labor between large firms and a new generation of innovative start up companies, vulnerable since they lacked the financial resources that would have necessary to play as integrated pharmaceutical companies (Arrow 1993; Pammolli 1996; Galambos and Sewell 1997). Orsenigo et al. (2001) found that, being grounded in novel scientific and technological knowledge bases, new research hypotheses and opportunities tended to be originated, under conditions of uncertainty, by cohorts of new specialized small firms active in specific domains. Over time, new originators entered the industry to exploit new research hypotheses, and a complex collaborative network raised, to match exploration of new research hypotheses with development capabilities. In the early stages of network evolution, established pharmaceutical companies were able to exploit their absorptive capacity (Cohen and Levinthal 1990) and their first mover advantage in clinical development and commercialization. Essentially, they succeeded in integrating most of the research hypotheses generated by the new entrants, who lacked financial resources to develop in-house promising new research opportunities they are able to generate (see Arrow 1993; Grayson 1960; Cohen and Levinthal 1990; Arora et al.2009; Pammolli 1996). In this phase, innovative activities continued to be dominated by a “quasi-random” search process (Malerba et al. 2016) and the entry of new companies did not imply a significant alteration of market structure as measured by market shares and firm turnover (see McKelvey et al., 2004 ). Both the final market and the R&D network continued to be dominated by a relatively small group of incumbent companies, which succeeded as “serial innovators” (Schumpeter 1934; Dosi et al. 1995; Malerba and Orsenigo 1996; 1997) with a corresponding remarkable stability of structure industry (Sutton 2001; Bottazzi et al. 2001; Pammolli and Riccaboni 2004; Buldyrev et al. 2020). However, over time the R&D network experienced radical changes in the roles of different actors. First, new general purpose research technologies in combinatorial chemistry and hightroughput screening enabled the emergence of a new generation of start up companies, which were able to sell their services and platforms to both large established big pharmaceutical companies and new biotech firms. Second, some of the new entrants succeeded in combining research collaborations with large companies and the development of their own successful projects and products. Those companies, which succeeded, finally started to become integrated pharmaceutical companies thanks to the high specificity of their research hypothesis and to the reduction of the cost of testing them (see Pammolli1996; Riccaboni and Moliterni 2009; Pammolli et al. 2020). Third, some of the large established companies succeeded in developing, also through collaborations and acquisitions, their own capabilities in the new fields opened by the molecular and genomic revolution, while the rate of entry of new specialized Originators has started to decreased. In this paper, we investigate how the evolution of the R&D collaboration network in pharmaceuticals has been shaped by a variety of underlying technological and relational regimes, which have co-evolved, complementing each other. In particular we investigate whether the relational mechanisms described by Orsenigo et al. (2001) are still in place in the pharmaceutical sector at the present days or whether new technological regimes have shaped the structure of the collaborative network. We document a promiscuity of roles between firms acting as Originators and as Developers after 2000: Originators have begun to act also as Developers, while some of the established Developers have incorporated the new research technologies and have started to act also as Originators of new projects and collaborative alliances and analogously, biotech firms have started to play also as Developers of R&D projects originated by both large and small firms. In our analysis, we combine an explorative topological inspections of the graph representing the contractual relations between firms at different points in time and a parsimonious stochastic representation of network formation (see Buldyrev et al., 2020 ). First, we have been inspired by Simon (1962) to analyze the transition between relational regimes, without introducing any assumption on the underlying dynamical laws ruling the evolution of the network (Riccaboni 2000; Pammolli and Riccaboni 2003) and, instead, relying on topological inspection of the network to infer some of their key features. Second, we complement our explorative topological analysis of the network with a parsimonious stochastic framework of network formation, which, once again, builds upon the seminal contribution of Herbert A. Simon (Simon 1955; Ijiri and Simon 1977; Barabási and Albert 1999; Buldyrev et al. 2020). We show that in the early stages of the evolution of the network of collaborative alliances, older established companies acting as Developers were able to capture new opportunities to collaborate proportionally to the number of their collaborations they had already in place (this is the so-called preferential attachment or rich get richer phenomenon). In a second stage of network evolution, the emergence of Originators which sold their technologies and services to both large and small firms produced those departures from the previous configuration, which were first documented by Orsenigo et al. (2001), with a dramatic change in the underlying relational regimes. Accordingly, the formerly dominant preferential attachment formation mechanism started to coexist with a second regime, were new links were established irrespectively of Developers’ connectivity. Finally, we document how the subsequent evolution of the network continued to be shaped by the co-existence of those basic relational regimes and, more and more, by changes in the roles of Originators and Developers.

The paper is organized as follows. First, we expand the analysis of Orsenigo et al. (2001) to study the growth of the network over time. We extract relevant information on the evolution of the network, which allow us to characterize the roles of different types of actors in terms of their structural positions in the network (see Simon, 1962). Second, based on the results of our topological investigation, a simple network formation model is introduced, to account for the empirically observed patterns. In Section 2, we describe the dataset used for the analysis and the methods employed to describe the topological evolution of the collaboration network. In Section 3, we discuss our empirical results, while Section 4 introduces a network formation framework that we propose to reproduce the key results observed in Section 3. Section 5 concludes.

2 Data and Methodology

The source of data we use is Recap, as in Orsenigo et al. (2001) and Pammolli and Riccaboni (2002), with a significant extensions of the time frame, providing here the final long-term reconstruction of the industry alliances, covering more than thirty years. Recap collects data on biotechnology agreements from three primary sources: companies press releases, SEC filings and company presentations made at investment conferences and other public meetings. Alliances are made by multiple institutional actors, including firms, universities and government laboratories. Recap allows us to distinguish different agreements, including acquisitions, joint ventures, licensing deals, co-development agreements, manufacturing and marketing agreements. Our focus here is on interfirm R&D agreements, including co-development, R&D collaboration, development, technological licensing, and other unspecified research contracts. Our dataset encompasses 4,153 alliances involving 2,811 firms, from 1981 to 2012.

As in Orsenigo et al. (2001), we represent the network as a graph G(V,E), where vertices V are the organizations, while edges E are the R&D collaborations between them. The resulting network is a directed graph, where a link e is oriented and connects an Originator (o) with a Developer (d). The graph G is then represented by its adjacency matrix GA(G) = [aod], in which the entry aod is equal to 1 if there is a collaborative agreement linking the Originator o with the Developer d, and 0 otherwise. To analyze the evolution of the overall network, we decompose it into temporal sub-graphs and inspect the connections that are in place at different points t in time. Each sub-graph is indicated by Gt(V,E). We aggregate contractual agreements on a yearly basis, getting 31 sub-networks, which represent the R&D agreements signed from year by year from 1981 to 2012.

We first compute some preliminary statistics such as the number of nodes (i.e. the number of firms present on the network) and the number of links (i.e. the number of collaborations). Subsequently, we investigate how relational roles have shaped the growth and the structure of the network.

Discontinuities in relational ordering criteria within the network are studied measuring variations in the degree of decomposability of the system (see Simon, 1962 ) through a simple graph-theoretical indicator derived from the Dulmage-Mendelsohn (DM) decomposition (Harary 1967; Dulmage and Mendelsohn 1958). This means to rearrange rows and columns of the adjacency matrix to generate a set of blocks or submatrices, which under stable relational rules (stability of the ordering criterion) see a populated principal diagonal and off-diagonal zeros. Suppose that the adjacency matrix has a maximal transversal of length l. This corresponds to a one-one matching in the network between l Originators and l Developers. This can be used to construct what Pothen and Fan (1990) call the coarse decomposition of the adjacency matrix. To get to this form, we start by placing any unmatched rows as leading. Next we look for the columns that can be reached from any of these rows via an alternating path. These n1 columns are placed as leading. The first block of the block triangular form contains these columns and the rows that are unmatched or have a transversal entry in one of these columns. The rows of the block can have no entry in a column outside the block because such an entry would have led to an alternating path from an unmatched row and so the column would by definition be in the first block. Similarly, any unmatched columns are placed as trailing and the m1 rows that can be reached from any of these columns via an alternating path are placed as trailing. The final block of the block triangular form contains the rows and the columns that are unmatched or have a transversal entry in one of these rows.

The size and number of these blocks conveys information on the decomposability of the system. For this purpose, following Orsenigo et al. (2001), the Dulmage-Mendelsohn (DM) decomposition (Harary 1967; Dulmage and Mendelsohn 1958) to the adjacency matrix thus produces the following: i) o1oM(dU) ii) o2oMM iii) o3oUoM(oU) iv) d1dUdM(dU) v) d2dMM vi) d3dM(oU). Variables M and U denote matched and unmached nodes, respectively. Originators classified in o1 (i) or Developers in d3 (vi) cannot be assigned an unambiguous relational role within the network, i.e. they play a transversal role and prevent the adjacency matrix to be nearly decomposable, reflecting the transition from a given configuration to a new one, were multiple ordering principles of nodes and relations coexist.

To get a deeper insight on how relational roles played by different actors shaped the structure of the collaboration network we measure the relative importance, or centrality, of each node, counting the number of connected nodes. The in-degree \(k_{in}^{i}\) of a vertex i is the number of links from other vertices to i, while the out-degree \(k_{out}^{i}\) of a vertex i is the number of links directed from i to other vertices. In our network, the out-degree \(k_{out}^{i}\) can be considered as the number of ideas/projects licensed out by the corresponding institution, whereas the in-degree \(k_{in}^{i}\) shows the number of projects that company i gets access to. As a first benchmark, we test whether empirical distributions kin,out are consistent with an underlying process of preferential attachment, so that they generate a power law distribution.

We therefore estimate the cumulative probability distributions P>(kin) and P>(kout) of the number of nodes with in-degree and out-degree greater than or equal to kin and kout, respectively. In particular, we test the hypothesis that the distribution of the number of collaborations is a power law

$$ \begin{array}{@{}rcl@{}} P_{>}(k_{in}) \propto \left( k_{in}\right)^{1-\gamma_{in}}\\ P_{>}(k_{Out}) \propto \left( k_{Out}\right)^{1-\gamma_{out}} \end{array} $$

This corresponds, for large values of kin and kout to probability densities \(P(k_{in}) \propto \left (k_{in}\right )^{\gamma _{in}}\) and \(P(k_{in}) \propto \left (k_{out}\right )^{\gamma _{out}}\) of finding a firms that has access exactly to kin projects or licensed out exactly kout projects. Since the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution, i.e. the part of the distribution representing large but rare events, and by the difficulty of identifying the range over which power-law behavior holds, our estimation strategy follows (Clauset et al. 2009). Moreover, we also adopt the likelihood ratio test of Virkar et al. (2014) comparing the power law hypothesis against four alternative distributions, the exponential, the log-normal and the stretched exponential (Weibull) distribution, plus a power-law distribution with exponential cutoff. Generally speaking, given a pair of parametric models A and B for which we may compute the likelihood of the data, the model with the larger likelihood is a better fit. Using the ratio of the two likelihoods we are thus able to discriminate between alternative distributions.

Finally, we study the homophily of the network. In a nutshell, homophily identifies the tendency of nodes to connect to similar nodes. In network theory this concept is traditionally measured by means of the assortativity index. A network in which hubs (nodes with many connections) tend to link to each other, repelling small-degree nodes and, at the same time, small-degree nodes tend to connect to other small-degree nodes, is said to be assortative. Conversely, a network in which hubs repel each other, linking instead to small-degree nodes is called disassortative. Assortative/dissortative mixing can be computed for the overall network by looking at the correlation between nodes (in-out) degree and the average nearest neighbours (in-out) degree. This measure sheds lights on the way Originators and Developer connect to each other over time: if the measure is negative, this means that Originators (Developers) tend to connect with Developers (Originators), thus avoiding connections with firms that have similar roles. When the assortativity measure tends to zero, the connectivity of nodes has no influence on the likelihood that two parties will enter into a new collaborative agreement.

3 Empirical results

3.1 The growth of the network

From the beginning of the Eighties to the first decade of the new millennium, the size of the R&D network increased significantly, although in a non-monotonic way, as reported in Fig. 1. In particular, starting approximately from 2006 we do observe a decreasing trend. By classifying each institution into one of the following categories: biotech company, pharmaceutical firms, university, research hospital, Fig. 1 (lower panel) shows the dominant role of biotech and pharmaceutical companies in the network.

Fig. 1
figure 1

Number of nodes and their roles in time. The upper panels report the number of Developers (left) and the number of Originators (right). Both the quantities display an increasing trend, reflecting the role of new entrants in the expansion of the network. From 2006 to 2009 the number of newcomers decreased in correspondence with the global financial crisis, then reversing the trend in the last few years. In the lower panel, we show that even though the number of biotech and pharma companies have grown from 1990 to 2000, after the beginning of the new millennium, the growth of the network has been driven mostly by biotech companies

The growth of the number of nodes (institutions) in the network has sustained an increase in the number of links (see Fig. 2). Two paths emerge. Up to the mid Nineties, the number of links increased with a peak between 1995 to 1997. Afterwards, the growth of the number of links displays an exponential trend, with a contraction during the 2007-2009 global financial crisis.

Fig. 2
figure 2

Temporal evolution of the number of agreements. The figure shows the number of links representing R&D agreements in the network. The number of contractual agreements substantially increased in time, with a steeper slope during the Nineties, reflecting the emergence of new research trajectories and technologies in the industry, in terms of both new research hypotheses and new research technologies. In particular, after 1998, the number of links has started to grow exponentially

These descriptive results are in accordance with the stylized facts discovered by Orsenigo et al. (2001). The growth of the network was shaped by the arrival of new technological opportunities, while the non-monotonic increase of both the number of nodes and the number of links reflected the influence of multiple waves of technological opportunities, in fields such as recombinant DNA, monoclonal antibodies, genomics, screening devices and combinatorial chemistry.

To investigate the evolving patterns of cumulative advantages in the network, we produce a graphical representation of the aggregated adjacency matrix and plot the corresponding level curves. First we create an empty adjacency matrix whose size is given by the number of firms in the dataset, with no repetitions. Second we sort firms (both Originators and Developed) according to the first time they are recorded in the dataset, such that the upper left corner of the newly created adjacency matrix represents the oldest firms. Rows represent Developers, while columns are associated with Originators. Third at each time stamp t, the links present in the dataset are added to the adjacency matrix such that if the same link, from the x-th Originator to the y-th Developer, for instance, is recorded multiple times in the dataset then, in the adjacency matrix, the position (x-y) contains the number of time firm x and firm y are connected in the dataset. Colors thus represent the cumulative number of agreements between pairs of nodes. Darker colors are associated with a higher number of deals with respect to lighter ones. Figure 3 shows the presence of a hierarchical structure in the network, resulting from the emergence of a core of firms (the earlier entrants) connected to an expanding turbulent fringe of specialized entrants, which entered the industry with specific research hypotheses on mechanisms and targets for therapeutic intervention. Indeed, Fig. 3 shows the existence of a first mover advantage for the early entrants acting as Developers, especially up to 1998 (black rectangle). Since the Nineties, the network becomes sparser, and the number of agreements per node has decreased. The emergence of general purpose technologies has changed the structure of the network, introducing a competing relational regime with respect to the original preferential attachment scheme. New entrants, working now on new technological platforms and acting as Originators, signed agreements with a variety of other firms, irrespectively of age and size within the sector (dark points outside the rectangle).

Fig. 3
figure 3

Cumulative number of collaborative agreements classified according to node’s year of entry. The figure shows the evolution of the network formation process by a graphical representation of the adjacency matrix by means of level curves. Nodes are classified according to the year of entry into the network and colors represent the number of deals between pairs of nodes; darker colors correspond to higher numbers of deals with respect to lighter ones. The black rectangle in the up-left corner captures the existence of a first mover advantage for early entrants, while dark points outside the rectangle detect the perturbation of this regime induced by the development of general purpose research technologies

3.2 Network perturbation and roles’ transitions

To inspect the relational roles in the network, in Fig. 4 we show the results of the Dulmage-Mendelsohn decomposition of the network in three distinct periods: 1981-1990, 1991-2000 and 2001-2012. The upper-right rectangle of each sub-plot (red dot lines) of Fig. 4, encompasses the links between Originators and Developers that cannot be assigned an unambiguous relational role within the network.

Fig. 4
figure 4

Dulmage-Mendelsohn decomposition of the R&D network. The figure shows the result of the Dulmage-Mendelsohn decomposition performed on the collaboration agreement network for three distinct periods: 1981-1990, 1991-2000 and 2001-2012. The upper right rectangle (red dot lines), encompasses the links between Originators and Developers that have a transversal relational role within the network, i.e. they do not play an unambiguous role. We report in the title of each panel the percentage of links over the total number of links (i.e. the density D)

As shown in Fig. 4, this portion of the network became progressively more populated in time, meaning that the number of firms present in all the intersections among minimum coverage vertex sets, and thus attracting most of the agreements, increases in time. The dramatic growth of the overall degree of interdependence within the pharmaceutical network suggests that the leading Developers have established multiple collaborations with a wide variety of Originators. Indeed, up to the early Nineties, the transversal component of the network was almost absent, since Originators and Developers played unambiguous role in a network, which was growing through preferential attachment. In other words, large Developers attracted new Originators in proportion to their connectivity. In the Nineties, new firms bringing about what Orsenigo et al. (2001) defined as transversal research platforms and technologies, entered the industry thus generating entirely different relational patterns. In particular, we show that transversal technologies have led Originators to connect with Developers independently of their size and age. That was a radical change, since the new screening, bioinformatics and genomic platforms contributed to promote the emergence of a new division of labor, in which small specialized research firms had a chance to act as Developers and as potentially integrated players, by combining the discovery of new research opportunities with faster trials, allowed by the emergence of new biomarkers. Since the early 2000s, more and more nodes started to populate the transversal component of the sub-graph, thus indicating an increased promiscuity of roles and a much less defined distinction between Originators and Developers. As a result, the industry has become less polarized, and small firms have started to catch up and to grow as integrated companies, while larger firms seemed not to be able to develop internally all the projects they originate and started to act also as Originators.

From data inspection, we notice that the firms that play a transversal role as Developers are mainly large, R&D intensive pharmaceutical companies such as MSD, AstraZeneca, Novartis and Novo Nordisk. This group of firms entered the network early on accessing to recombinant DNA technologies and monoclonal antibodies during the Eighties and moving,in more recent years, to bio-informatics and gene expression technologies. The same companies that, from the Nineties onwards, behave as transversal Developers were already part of the core of the network since the early years. Moreover, within this group, large pharma companies and universities, acted also as Originators, with an increasing role of universities in the development of biomarkers technologies for cancer identification. Transversal Originators are mainly smaller companies involved in platform technologies, such as new drug delivery systems, combinatorial chemistry, genomics and bio-informatics. Interesting, the projects were originated both by companies like MSD and Pfizer, and at the same time, by smaller younger firms. The corresponding research projects were mainly in cancer and central nervous system, while the contracts were signed in the discovery and pre-clinical phases of development.

Figure 5 shows the temporal evolution of different types of organizations inside the transversal component of the collaborative network for 1981-1990, 1991-2000 and 2001-2012. Biotech firms acted as the major Originators during the Nineties, thanks to the arrival of new general purpose technologies (GPTs). Over time, some pharmaceutical companies have played as transversal Developers, thanks to their ability to incorporate GPTs. Indeed, after having incorporated GPRTs, Developers of different size have started to generate projects that are then developed by other firms, somehow irrespectively of their initial roles as Originators or Developers. Interestingly, universities have played a prominent role as transversal Originators in the last period.

Fig. 5
figure 5

Transversal organizations by type. The figure shows the organization type inside the transversal network component along three sub-samples, from 1981 to 1990, from 1991 to 2000 and from 2001 to 2012

Figure 6 presents the cumulative distribution functions and the best fit for the in-degree (top left panel) and for the out-degree (top right panel) computed on the aggregate network. The plot shows that the values of the exponents slightly differ across the type of centrality measures. Indeed we find γin = 2.7 and γout = 2.5. It also need mentioning that few empirical phenomena obey power laws for all values of a generic variable x. More often the power law applies only for values greater than a minimum xmin. In this case the tail of the distribution follows a power law (see Clauset et al. 2009; Pecora and Spelta 2015). We find that this value is 9 for kin and 5 for kout. This means that, companies that develop more than 9 projects or originates more than 5 projects follow a power law distribution.

Fig. 6
figure 6

In- and Out-degree cumulative distributions with power law fit and PDF exponent dynamics. The figure shows in the upper panels the in-degree (left) and out-degree (right) CDF computed on the aggregate network together with the power law fitting line. The insets provide the slope of the fitting line together with confidence bound and the minimum value after which the in- and out-degree follow a power law distribution. The lower panel report the power law exponent of the in- and out-degree PDF along time with the dashed lines representing the three sub-periods 1981-1990, 1991-2000 and 2001-2012. The value of the exponent is stable, showing also the highest values in the first part of the sample, while it decreases during the Nineties. After that phase, it settles down approximately to 1.3 for both Originators and Developers. This reflects changes in the network configuration in which the presence of hubs tends to decrease along time

The lower panel of Fig. 6 shows that the power law exponent of both the in-degree and the out-degree distributions is stable. The highest values of the exponent are in the first part period thus reflecting a first mover advantage for the established companies. During the Nineties the value of the exponent decreased, thus suggesting a more balanced structure. Afterwards, the power law moved down approximately to 1.3 for both Originators and Developers thus reflecting a third regime has started to the network in which promiscuity of roles between Originators and Developers dominates.

The increasing promiscuity of roles between Originators and Developers is also visible from Fig. 7 that reports, on the left y-axis, the average absolute difference between nodes’ in- and out-degree and, on the right y-axis, the number of reciprocated links. These two quantities depict opposite dynamics: the average absolute difference between nodes’ in- and out-degree decreases whereas the number of reciprocated links increases. These results suggest that, initially, companies specialized either in the early phases of the R&D process or downstream in the development of new analyses of the candidate drugs (and vice-versa). On the other hand, from 2000 onwards, a clear distinction between Originators and Developers does not hold anymore, since many players perform both roles.

Fig. 7
figure 7

Degree difference and reciprocated links. The figure shows the average absolute difference between nodes’ in- and out-degree (left y-axis) and the number of reciprocated links (right y-axis) over time. The average absolute difference between nodes’ in- and out-degree decreases whereas the number of reciprocated links increases. These results emphasize how, during the first part of the evolution of the network, companies played unambiguous roles being either Originators or Developers, while in more recent years a clear distinction does not hold anymore and firms can perform the two roles at the same time

The assortativity coefficient can provide some further insights into the main drivers of the evolution of the network. In directed networks the assortativity coefficient is a correlation coefficient between the degrees of all nodes on two opposite ends of a link in four possible ways: out-degree/in-degree correlation, in-degree/out-degree correlation, out-degree/out-degree correlation, in-degree/in-degree correlation. These combinations are used to investigate whether Originators tend to connect to Developers (and vice-versa) and when different types of collaboration take place in time. As Fig. 8 shows, there is no correlation between Originators and Developers meaning that a firm of the first type connects to a partner of the second type, irrespectively of its size. More interesting, the negative correlation (dissortativity mixing) between firms of the same types in the decade 1981-1991 suggests that Originators link only with Developers but not with other Originators (and viceversa) in that period. This pattern holds until end of the Nineties, when the anticorrelation starts to decreases, because of the introduction of transversal technologies and stabilized around -0.2, meaning that firms in the most recent years play both the roles of Originators of the research ideas and Developers of new technological opportunities. All in all, the negative value of the assortativity measure means that, up to the Nineties, important Originators (Developers) connect with companies active as Developers (Originators) thus avoiding connections with firms that play the same role in the industry. Conversely, in the Nineties, this measure approached zero, reflecting the fact that connections have started to be established irrespectively of the roles firms previously played within the network.

Fig. 8
figure 8

Assortativity coefficient. The figure shows the value of the assortativity coefficient for the four possible combinations the (in-)out-degree and average nearest neighbour (in-)out-degree distributions. Dashed lines represent the three sub-periods in which we split the sample (1981-1990, 1991-2000 and 2001-2012). The assortativity values of the Developers-Originators combination shows an uncorrelated pattern of collaboration, the same measure for Originators-Originators and Developers-Developers show a dissortative behavior that tends to reduce along time, stabilizing around -0.2 from 2001 onwards

4 Technological patterns and network formation

In this Section, based on our investigation of the main relational drivers of network formation and evolution, we introduce a simple stochastic model of network formation and we test its predictions against the topological properties that we have outlined above. Our network formation model takes into account the relational regimes observed in the data by reproducing polarization through preferential attachment and describing the role transitions in the collaborative network trough a fitness function.

In the model, the growth of the network is fuelled by the arrival of new firms with new research ideas/projects. New companies establish agreements with existing firms according to two distinct rules. First, firms can be attracted by established nodes with many connections/projects and they sign contracts based on specific research hypotheses. Alternatively, the selection of partners may be shaped by the general purposiveness of technologies, allowing the new entrants to connect with developing companies, irrespectively of their previous connectivity (see Bianconi and Barabási (2001)).

In a nutshell, we introduce a simple framework based on the following two steps:

  1. i)

    We start at t0 with a fully connected network composed of three nodes,

  2. ii)

    At each time-step a new node i with m = 1 link and fitness ηi is added to the network,

  3. iii)

    The probability that a link from a new node connects it to node j (already in the network) is proportional to the product of node j’s degree kj and its fitness. In formulas: \({\Pi }_{j}=\frac {\eta _{j}k_{j}}{{\sum }_{z} \eta _{z} k_{z}}\).

The dependence of πj on kj captures the fact that higher-degree nodes, i.e. firms with a high number of links, have stronger attractive power/visibility, hence new entrants are more likely to connect to them. The dependence of πj on ηj captures the fact that, between two nodes with the same degree, the one with higher fitness is selected with a higher probability. This means that, even a relatively young firm, with initially only a few links, can acquire links rapidly thanks to its higher fitness.

These two features map on the observed technological and relational regimes. During the first part of the evolution of the network, its growth seems to have been driven mostly by a principle of first mover advantage and preferential attachment, which can be modeled by referring to the Simon-Barabasi-Albert framework (see Barabási and Albert, 1999 ), where a node’s growth rate is determined mostly by its degree. In the second regime, general purpose research tools and techniques have induced the emergence of collaborations established irrespectively of Developers’ previous connectivity, while a few nodes were able to establish links at relatively high rates.

We split the data sample in two sub-samples: the first ranging from 1981 to 1998 and the second from 1998 to 2012. For both time periods, we have retained only the nodes that, at the end of each period, have at least three connections (collaborations) and we have calculated the Pearson Correlation coefficient between the degree and the cumulated growth of their degrees in the two periods. Results suggest that, during the preferential attachment relational regime, the fitness of each node, measured as the growth rate of its collaboration is virtually independent (-0.003) from its initial size, with a process of pure proportional growth and preferential attachment (Kong et al. 2008). On the other hand, during the second relational regime, the correlation between the growth rate of firms’ collaborations and the initial degree is negative (-0.04), one order of magnitude larger than in the cumulative/preferential attachment regime. In other words, small firms tend to attract collaborations more rapidly and frequently than firms with higher degrees. In this regime, the fitness measure is inversely related to the degree of each node, highlighting a technological discontinuity.

To take into account the changing relationships between the growth rate of each node and its degree during the two regimes, we propose a fitness measure:

$$ \eta_{j} = (1-\alpha)\frac{\max(k)}{k_{j}} + \alpha k_{j} $$
(1)

Indeed, when the parameter α is equal to 0.5, the fitness becomes equal for all the nodes, leading the Bianconi and Barabási (2001) model to converge to the Barabási and Albert (1999) model, where only the degree dynamics matters for the attachment. When α < .5, the fitness of the nodes with low degree becomes higher then the fitness of the high degree nodes, leading smaller nodes to establish a relatively higher number of connections.

Results are shown in Figs. 9 and 10. Figure 9 shows the circular networks and the Dulmage-Mendelshon decomposition of the adjacency matrices produced by the model in the two regimes. Figure 10 reports the degree distribution and the power law fit in the two cases.

Fig. 9
figure 9

Circular network topology and DM decomposition for the two relational regimes. The figure shows the circular network produced by the model (upper panels) together with the DM decomposition of its adjacency matrix (lower panels). The left panels refer to the first regime, in which preferential attachment prevails. The model produces a hub-like networks with a transversal component populated by few links. The right panels reproduce the main feature of the second regime where hubs tends to decrease and the transversal component becomes more populated (see black squares)

Fig. 10
figure 10

Power law fit for the two relational regimes. The left panel shows the degree distribution and the corresponding power law fit in the periods 1982-1998 along with the assortativity value. The right panel shows the degree distribution and the corresponding power law fit in the periods 1998-2012 and the assortativity value

The left panels refer to the first regime, in which preferential attachment prevails, while the right panels are associated with the second regime. In the first case, when α = 0.5 a hub-like structure emerges with a power law degree distribution with a coefficient similar to the one observed in the data (γ = − 1.854, see Fig. 6). Moreover, as empirically observed, the DM decomposition of the adjacency matrix reveals the presence of few transversal links (see Fig. 9). On the other hand, when α = 0.1 the statistical features of the second regime are reproduced, as the network becomes less polarized, and more and more nodes populate the transversal sub-graph. Also in this case, the power law coefficient of γ = − 1.457 is similar to the one observed in the period 1998-2011. Finally, the assortativity coefficient reproduced by the model is, at least qualitatively, similar to the one of the real world network: it is negative and lower in the cumulative regime with respect to the mixed regime started in the Nineties when the growth rate of the low degree nodes is higher than the one of the high degree nodes.

This simple model, which controls only for the intrinsic heterogeneity of nodes connections and thus for first order properties of the network is able to generate higher order topological features that are observable in the empirical data, such as the increasing size of the transversal block of the adjacency matrix or the increasing assortativity. In other words, the observed topological complexity of the evolving R&D network can be explained relatively simply in terms of the observed heterogeneity of its nodes.

5 Concluding discussion

In this paper, we have outlined some key features of the long-term evolution of the network of collaborative agreements in pharmaceutical R&D during the last thirty years. Our main goal was to show how explorative topological inspections can be used to detect the emergence of technological and relational discontinuities in R&D networks, driven by the underlying dynamics of scientific and technological regimes. In particular, we have built on the seminal contribution of Orsenigo et al. (1997, 2001) to unravel how scientific and technological evolution has influenced the relationships between large and small firms, re-shaping the influence of patterns of local interaction and the overall structure of the network. While the network has grown in time in terms of number of nodes and links, connectivity patterns have evolved from a regime in which Originators connected to Developers according to a preferential attachment scheme, which reflects the presence of first mover advantages in a hub-like network, to a configuration in which there is no more a clear cut between companies acting as Originators and companies acting as Developers. The introduction of General Purpose Research Technologies in molecular biology, combinatorial chemistry and genomics has increased the speed and the efficacy of drug discovery processes, contributed to the prolification of specific research ideas. In recent years, a well defined specialization of firm as Originators or Developers of new technological opportunities has vanished. Biotech firms have shifted from being mostly Originators to play a leading role also as Developers, while the number and specificity of novel plausible targets and indications generated by new research trajectories have induced large pharmaceutical companies to act as Originators of new projects that are then developed by other companies.

The contribution of our work in this paper is twofold. First we contribute to the literature which shows how network analysis can sustain the identification of major discontinuities in the evolution of the division of innovative labor. In particular, we document the importance of role transitions in shaping the evolution of the underlying relational regimes originally detected by Orsenigo et al. (1997, 2001). Second, we have show how explorative topological analysis can be complemented by a simple and general stochastic approach, which can be applied to investigate and to compare the evolution of innovation networks across multiple industries and technological regimes, in the spirit of Malerba (2004) and Buldyrev et al. (2020).