Introduction, motivation and research questions

In many areas of the world, policy-makers are defining intentional policies for creating and developing “world-class” and elite universities (Altbach & Salmi, 2011; Deem et al., 2008; Salmi, 2009; Shin, 2009a; Yang & Welch, 2012). While the specific literature about this field is far from reaching an overall consensus about a unique definition of what a world-class or elite university is (Huisman, 2008; Salmi 2009) identified three key elements that are now accepted by most scholars who work on this topic: (1) concentration of talents, (2) abundant resources and (3) favourable governance. These dimensions, albeit conceptually clear, are however difficult to be operationalized, and developing indicators for “measuring” the extent to which a university can be defined “elite” is actually very rare. A common shortcut, adopted by scholars and practitioners, is the use of rankings published by various authoritative entities, such as Quacquarelli Symonds (QS), Time Higher Education (THE), Financial Times, and the Shanghai Jiao Tong University’s Academic Ranking of World University (ARWU). Given the prestige associated with the number of elite universities in the country, many governments decided to launch specific initiatives for concentrating resources in some leading universities or for stimulating the growth of top-level academic institutions. Notable examples of such policies are the so-called “excellence initiatives” promoted in Germany (Kehm & Pasternack, 2009), Russia (Yudkevich et al., 2016), China (Zhang et al., 2013) and France (Boudard & Westerheijden, 2017).

In the context of such developments in the higher education (HE) sector, top-class universities in the various countries have strong incentives to continuously searching for improvements in their operations and results. Indeed, academic rankings place emphasis on performance indicators that measure the ability of reaching high levels of teaching, research and knowledge transfer (Buela-Casal et al., 2007; Johnes, 2018; Lukman et al., 2010). As a consequence, university managers decide how to allocate their resources to the most productive practices and initiatives such as, incentives for publications or the creation of attractive courses. Each institution starts regarding others like “competitor” in the same international higher education arena (Grewal et al., 2008). Globalization also is a force that is pressing universities in comparing themselves with potential competitors from all over the world (Stromquist, 2007).

This “battle” for elite and world-class universities (Hazelkorn, 2015), therefore, does not come without costs. Actually, institutions must invest a lot of human and financial resources to improve their performances in those dimensions which are valued by the agencies that compile rankings, such as the number of graduates, their employability, research outputs like grants, publications and citations. The innovative and high-quality activities that must be realized for pursuing these objectives are likely to generate unintended consequences, as for example an “excess of spending” in some activities which is not reflected in outputs’ volumes and quality. In this perspective, it would be useful to analyse the efficiency of elite universities, i.e. their ability to produce their high volumes of output at the minimum cost or, conversely, to maximize the output they produce with the available resources. Such a viewpoint related to the use of resources is a new one, which could go beyond the pure analysis of outputs activities and provide more economic insights about the way in which the elite institutions employ their resources.

In this paper, we aim at comparing the efficiency of a sample of Chinese and European elite universities. The rationale for choosing these two areas stem from the interest to develop and maintain groups of ‘elite’ universities competing in global rankings, which are still dominated by institutions in the USA (Aghion et al., 2010). The Chinese higher education system is recently concentrated on the world-class universities and first-class disciplines (Double-First Class) initiative construction, which aims to build a number of elite universities and disciplines until 2050 and further strengthen the quality of Chinese higher education. After the development of more than half a century, Europe has built up a world-class modern higher education system and many elite universities, which greatly promoted the rapid progress of science and technology worldwide. Comparing the efficiency of universities across countries and continents is an extraordinarily important task in this historical moment. For example, Wolszczak-Derlacz (2018) analyses how the productivity of US and European research universities evolved over time from 2000 to 2010, discovering that the groups experience a different pattern of efficiency (increasing in Europe, declining in the USA). The analysis presented in this paper conducts a similar, although innovative, exercise for a sample of elite universities in Europe and China for the period 2011–2015. Specifically, we answer the following questions: (1) Are Chinese elite universities more efficient than their European counterparts, or is the other way round? (2) How has the efficiency and productivity of European and Chinese elite universities evolved in the period under scrutiny? (3) Which are the main components of productivity that determine the relative efficiency of Chinese and European elite universities both within and across the two areas?

This paper innovates the current literature in three main ways. First, the research introduces the necessity of comparing elite universities not only on the ground of their performance (which is the traditional perspective) but also on the basis of their efficiency—i.e. the productivity rate at which they produce their level of output. Second, whilst comparative studies among different countries in Europe have been existing, a comparison of universities in Europe and China has yet been perused in the literature, thus this paper aims to represent the first attempt of comparing European and Chinese universities within the broader framework of the “battle” for world-class and elite status. Third, the comparison of the productivity of the selected universities is based on a meta-frontier approach, considering various components of performance—namely technical efficiency, intertemporal change and the comparison between the two groups of units.

The remainder of the paper is organized as follows. The next Sect. 2 contains a literature review about three main topics that influence our study: cross-country comparisons of universities’ efficiency, academic rankings and elite universities. Section 3 illustrates the methodological approach and describes the dataset we built for the empirical exercise. Section 4 reports the empirical results. Section 5 discusses the results, along with some concluding remarks.

Literature review

Studies on education efficiency in a cross-nation comparative perspective has appealed much attention of researcher and scholar in recent years. These studies have been conducted in terms of the level of analysis, institutional or country, and the type of education, secondary schools or tertiary institutions (universities) that to some extent much depends on available large-scale databases (Aparicio et al., 2018).

For the purpose of cross-national assessment of the relative performance of tertiary education, some recent studies have concentrated on university efficiency and productivity across several nations, particularly in Europe. For example, Bonaccorsi et al. (2007) compared efficiency of institutions of higher education of six OECD nations while Wolszczak-Derlacz and Parteka (2011) conducted a study on performance of seven OECD countries. Using the two-stage Data Envelopment Analysis (DEA) in terms of output-orientated constant returns to scale approach, their findings showed that inefficiency of institutions could be attributed to some determinants such as scale economies, number of departments and funding. Instead comparing many countries in a common framework of analysis, Agasisti and Johnes (2009) looked into efficiency of HEIs in a pair of countries, Italy and the United Kingdom, using the Malmquist index based on DEA variable returns to scale method. The findings revealed that Italian universities are improving their technical efficiency scores whereas English institutions are obtaining stable scores during a period of 2002/03–2004/05. The authors argued that by comparison of only two nations, more qualitative information can be explored in terms of policies and that using a short panel data ensures any trends identified in data that are recent. Going with this stream, Agasisti and Perez-Esparrells (2010) analysed efficiency of Italian and Spanish State universities in a cross-country comparison using Malmquist productivity indexes based on the DEA variable returns to scale approach for the academic year 2004/05. They found that improvement in Italian HEIs was due to technological changes whereas the change in the performance of Spanish counterparts HEIs came from pure technical efficiency. Regional effects were also found to be determinants of inefficiency in these universities, respectively. In the same token, Agasisti and Wolszczak-Derlacz (2015) compared efficiency of universities in Italy and Poland using Malmquist productivity index based on the conventional DEA output-orientated and variable returns to scale method for a long panel data of 2001–2011. The authors used the bootstrap truncated regression procedure of Simar and Wilson (2007) to test the influence of external factors against the estimated DEA scores for each sample of Italy and Poland. Their results indicated that there was a great difference in performance of universities in two nations and that efficiency frontier has been more improved in Italy than Poland. Most recently, Agasisti and Gralka (2019) investigated the transient and persistent efficiency of 70 Italian and 76 German universities using a stochastic frontier analysis (SFA) for the period of 2001–2011. The findings showed that Italian universities are more efficient than their German counterparts when capturing persistent efficiency of universities in two nations over a long period.

As can be observed, the above studies comparing universities between two or more nations undertook estimates of efficiency in a specific-country frontier, meaning that they measured efficiency of universities separately and compared them based on shared characteristics as described in the beginning of their study. Alternatively, the framework is based on assuming a common international frontier, without considering any structural difference between the two different higher education systems. One might ask whether frontier technological gap between two or multiple groups of universities coming from two or several countries can be compared when all of them are put into a general meta-frontier framework. That is indeed an interesting question that contributes a novelty to the sector by seeking to use different advanced methods recently developed in the literature. This provides us an avenue to explore this breakthrough by relaxing the assumption of a common technology of production across institutions of interest—something that we pursue in this paper.

Whilst comparative studies across countries on efficiency and productivity of universities have been well noted in Europe, this is not the case in China whereby numerous studies on efficiency of Chinese universities have focused on only China without a comparison with other countries. Apart from this, such studies have used similar methods of analysis as in comparative studies of efficiency in European universities such as DEA, network DEA, Malmquist productivity index, and SFA to analyse the performance of universities in China—see, for example: Johnes and Li (2008), Ng and Li (2009), Yaisawarng and Ng (2014), Hu et al. (2017), Yang et al. (2018), Wu et al. (2020). This would lead us to a question whether a comparative study should be done between European and Chinese universities to investigate whether any gap in their performance exists, given that China’s national reforms and investment initiatives in higher education aiming at raising the research standards of key universities for the 21st Century as “elite universities” (Zong & Zhang, 2019).

The academic literature already debated what an elite university is, and how they can be defined and characterized. Elite universities are identified as “the presence of highly qualified faculty, talented students, abundant resources, autonomy, and favourable governance” (Abramo & D’Angelo, 2014, p. 1271). By the same token, Palfreyman and Tapper (2009) assert that the elite university is academically excellent with respect to educational qualification, teaching and research. However, elite universities are not necessarily world-class but could potentially lead to a world-class university in terms of their competitive advantage in quality of education and research resulting in the social and market value (Abramo & D’Angelo, 2014). The growth of elite universities should go with favorable policy environment and university autonomy in a nation’s context (Aghion et al., 2010). Otherwise, elite institutions could be inevitably absent in that nation, thus would lead to a brain drain phenomenon due to lack of high-qualified staff and students towards nations with world-class universities (Auranen & Nieminen, 2010; Abramo & D’Angelo, 2014).

Investment and development of domestically elite universities are considered as a pathway to reach the world-class universities via international ranking systems. The efforts to become world-class are absolutely helpful to enhancing academic outputs of research projects and publications (Altbach, 2004; Shin, 2009a; Song, 2018). The international ranking system and “elite” or “world-class” institutions are identified as a guideline for nation’s government and universities to form appropriate higher education policies and strategies for the purpose of moving university performance forward. Deem et al. (2008) presented that European universities have taken league tables such as the the Time World University Ranking (THE), the Shanghai Jiao Tong University’s Academic Ranking of World University (ARWU) as a central target of their reform process and thus suggested to increase higher education GDP investment by more than 2% per annum in future. Together with this, the Bologna process that aimed to achieve some degree of harmonization of higher education across European countries has been criticized because it does not fully address wider matters of globalization and internationalization (Kwiek, 2004). On top of this, Kwiek (2005) argued that the Bologna and other European Union higher education reform may be less successful because of the lack of funding of public universities and the varied status of universities with respect to other public services. International ranking, such as THE, ARWU, QS World University Ranking and Webometrics Ranking, has influenced higher education in Asian nations. For instance, in Hong Kong, university leaders are concerned about their institutions’ ranking in international ranking league whereas academic staff face challenges to publish their work in high ranked journals (Chan, 2007; Deem et al., 2008). Similarly, international publication venues are concerned by academics in Taiwan in terms of promotion and research evaluations (Chen & Lo, 2007). In Japan, after benchmarking of the world university ranking has been launched, the government has invested additional resources to promote internationalisation in terms of international collaboration and exchanges (Furushiro, 2006; Yonezawa, 2006). By the same token, the Chinese government has implemented some projects of higher education to enhance the internationalisation competitiveness of Chinese universities (Deem et al., 2008). For example, the greatest of these is the project 211 started in 1995. This project aims to enhance research standards in China’s elite universities and therefore receive considerably increased funding. In 2018, there are 116 institutions eligibly listed in the project 211 (Kumar & Thakur, 2019). These projects aim to enable Chinese institutions as national elite universities becoming world-class universities in the future. The second project was established in 2015 to develop 42 Chinese universities into the world-class universities by 2050. In addition, 39 universities are listed as more selective group, so-called Project 985. Finally, as part of Project 985, nine universities are formalized into China’s C9 League to foster better students and share resources (Deem et al., 2008; Kumar & Thakur, 2019).

In the contemporaneous dynamic world, albeit the international ranking systems could be controversial, they are widely accepted as benchmarking for universities around the world to perceive themselves as well as to be perceived by others. The government of various nations started to consider the international ranking as a guideline to form higher education policies and strategies for their nation’s universities to be ranked in the top list. However, what is the performance of a nation’s elite universities on the way reaching the world-class ranking based on their available input resources and academic outputs has been thoroughly debated only scantly; this is a gap in knowledge that should be explored to assist elite institutions to obtain the best practice in providing their education and training quality.

Methodology and data

Statistical methods employed for the empirical analyses

Traditional performance evaluation techniques commonly assume that decision-making units (DMUs, that is elite universities in our context) are situated in a homogeneous environment, so that they determine a single production function or reference set to identify the performance of different units accordingly. However, in the real scenario, the evaluated DMUs usually vary by individual characteristics and the environment, and heterogeneity characterizes the landscape of evaluated organizations. For instance, universities in different regions, countries or continents are faced with different evaluation and incentive mechanisms, verified funds and personnel allocation modes and other different features of the economic and social environment. This leads to a drawback in traditional approaches that they may lose their practicability when the evaluated DMUs are not homogeneous. Referencing the idea of constructing separate production frontiers and a common meta-frontier for different groups of individuals in Battese and Rao (2002), this paper uses a non-parametric meta-frontier technique (O’Donnell et al., 2008; Oh & Lee, 2010) and incorporates the Malmquist productivity index (Malmquist, 1953; Caves et al., 1982; Fare et al., 1994) to investigate the different performance of Chinese and European universities. Sickles and Zelenyuk (2019, p.98) reveal that “the concept of efficiency and the concept of productivity are different concepts that can explain the same thing, performance”. Following this, we used both terms, productivity and efficiency that refer to the performance of universities, given that the concepts of productivity variation can arise from different sources such as attributed to differences in production technology, differences in the scale of operation, differences in operating efficiency, and differences in the operating environment in which production occurs (Fried et al., 2008; Coelli et al., 2005). Thus, to make inferences about efficiency, the effect on productivity caused by the differences in operating environment and other exogenous factor should be removed (Oum et al., 1999).

Assume there are n DMUs to be evaluated, and each \({\text{DMU}}_{j} \left( {j = 1,2, \ldots ,n} \right)\) generates \(S\left( {s = 1,2, \ldots ,S} \right)\) outputs using \(M\left( {m = 1,2, \ldots ,M} \right)\) inputs in period \(t\left( {t = 1,2, \ldots ,T} \right)\). The observation can be denoted as \(\left( {x^{t} ,y^{t} } \right) \in R_{m}^{ + } \times R_{s}^{ + }\). We further suppose that there are \(k = 1,2, \ldots ,K\) separated subgroups within the whole observations, and members inside each subgroup share the identical technology. In this paper, we assume that separate groups are European vs Chinese universities.

In the framework of meta-frontier analysis, three types of fundamental technology sets or production possibility sets can be introduced, namely (1) contemporaneous technology set, (2) intertemporal technology set and (3) global technology set.

The boundary (or frontier) of a contemporaneous technology set is composed of a group of best-performing DMUs, which are represented as the benchmark among DMUs within the same group during the same period. In practice, it allows estimating the most efficient universities with each of the two groups (European vs Chinese). Specifically, the contemporaneous technology set is defined as a reference set for all observations within group k in time t, and it is denoted as:

$$T_{k}^{t} = \left\{ {\left( {x^{t} ,y^{t} } \right)|x^{t} {\text{can produce}}\; y^{t} } \right\}, t = 1,2, \ldots ,T$$
(1)

Hence, observations in the same group k are easily classified into T contemporaneous technology sets according to different periods. Following the measurement of technical efficiency in Farrell (1957) and the formulation of output distance function in Shephard (1970), the efficiency of an observation inside the contemporaneous technology set can be measured as:

$$D^{t} \left( {x^{t} ,y^{t} } \right) = \inf \left\{ {\theta \left| {\left( {x^{t} ,y^{t} |\theta } \right) \in T_{k}^{t} ,\theta > 0} \right.} \right\},t = 1,2, \ldots ,T$$
(2)

where \(\theta\) measures the ratio of the actual output to the maximum output in the feasible region, and hence \(D^{t} \left( {x^{t} ,y^{t} } \right)\) can represent the technical efficiency of the current assessed DMU compared to the frontier of time t.

The intertemporal technology set eliminates the barrier caused by different periods and constructs a reference set for observations of group k over the discernible whole time. We denote the intertemporal technology set of observations in group k as:

$$T_{k} = \left\{ {\left( {x^{t} ,y^{t} } \right)|x^{t} {\text {can produce}} \;y^{t} ,\forall t} \right\}, k = 1,2, \ldots ,K$$
(3)

Accordingly, there are K separated intertemporal technology sets in the range of all observations. The relevant output distance function on the intertemporal technology set is defined as:

$$D^{k} \left( {x^{t} ,y^{t} } \right) = \inf \left\{ {\theta |(x^{t} ,y^{t} |\theta ) \in T_{k} ,\theta > 0} \right\}, k = 1,2, \ldots ,K$$
(4)

The global technology set is established to envelop all observations all over the investigated time periods and all subgroups, so it can also be deemed as the union of all intertemporal technology sets. Concretely, the global technology set is defined as:

$$T^{G} = {\text{conv}}\left\{ {T_{1} \cup T_{2} \cup \ldots \cup T_{K} } \right\}$$
(5)

Therefore, the corresponding output distance function on the global technology set is denoted as:

$$D^{G} \left( {x^{t} ,y^{t} } \right) = \inf \left\{ {\theta |(x^{t} ,y^{t} |\theta ) \in T^{G} ,\theta > 0} \right\}$$
(6)

In order to facilitate the expression of the formulae hereinafter, variables inside the output distance function are expressed in terms of their periods, e.g., \(D^{t} \left( t \right)\), \(D^{k} \left( t \right)\), and \(D^{G} \left( t \right)\). Concretely, we use output-based DEA models under the assumption of variable returns to scale to specify the measurement of three distance functions, which are illustrated in Appendix 1.

Following most previous research, when considering a certain technology set, the output enhancing approaches are defined as follows: (1) for observations located inside the frontier of the contemporaneous technology set, it is useful to reduce the distance between the actual point and the projection point on the frontier, which is formed by technical inefficiency; (2) for observations located on the frontier of a certain contemporaneous technology set, an effective way is to improve the technical merit, so that the height of the belonged frontier is further extended in the direction towards the intertemporal frontier. (3) For the ones situated as the benchmark points on the intertemporal frontier, it is advisable to improve the leading position among groups through improved techniques.

Following similar ideas of output (productivity) improvement decomposition, Oh and Lee (2010) defines the meta-frontier Malmquist productivity index (MMPI) and further decomposes it into three distinct sources, as follows:

$$\begin{aligned} M\left( {t,t + 1} \right) & = \frac{{D^{G} \left( {t + 1} \right)}}{{D^{G} \left( t \right)}} \\ & = \frac{{D^{t + 1} \left( {t + 1} \right)}}{{D^{t} \left( t \right)}} \times \frac{{D^{k} \left( {t + 1} \right)/D^{t + 1} \left( {t + 1} \right)}}{{D^{k} \left( t \right)/D^{t} \left( t \right)}} \times \frac{{D^{G} \left( {t + 1} \right)/D^{k} \left( {t + 1} \right)}}{{D^{G} \left( t \right)/D^{k} \left( t \right)}} \\ & = \frac{{{\text{TE}}^{t + 1} }}{{{\text{TE}}^{t} }} \times \frac{{{\text{BPG}}^{t + 1} }}{{{\text{BPG}}^{t} }} \times \frac{{{\text{TGR}}^{t + 1} }}{{{\text{TGR}}^{t} }} \\ & = {\text{EC}} \times {\text{BPC}} \times {\text{TGC}} \\ \end{aligned}$$
(7)

where \({\text{TE}}^{l} \left( {l = t,t + 1} \right)\) represents the technical efficiency within a certain period, so EC measures the efficiency change between two adjacent periods. Results of \({\text{EC}} > \left( { = , < } \right)1\) can be considered as the shortening (equivalent, increase) of the relative distance between an observation and its belonged contemporaneous frontier. In addition, \({\text{BPG}}^{l} \left( {l = t,t + 1} \right) > \left( { = , < } \right)1\) signifies the best practice gap between the contemporaneous frontier and its corresponding intertemporal frontier(so that the structural differences between the technologies of production of European vs Chinese universities), and BPC denotes the best practice gap change between two periods. Furthermore, \({\text{BPC}} > \left( { = , < } \right)1\) accounts for the technical progress (stabilization, regress) during the two time. Besides, \({\text{TGR}}^{l} \left( {l = t,\;t + 1} \right)\) stands for the technology gap ratio between the technology within the same group and the global technology, which also reflects the technical leadership of a group of DMUs. Hence, \({\text{TGC}} > \left( { = , < } \right)1\) refers to the advance (constancy, retrogression) in the level of intra-group technology, i.e. how the efficiency of universities vary over time within the two groups.

Following most previous research on higher education efficiency (Abbott & Doucouliagos, 2003; Casu & Thanassoulis, 2006; Ruiz et al., 2015), variable returns to scale assumption (Banker et al., 1984) are employed in the output distance function solution procedure and thereby construct the MMPI hereinafter. We also conduct a test for non-parametric returns to scale, in order to verify the rationality of our hypothesis in Sect. 4.1.

Variables and data of the assessed universities

Data of the investigated universities for this paper were based on the international university ranking known as playing a crucial role in discussing about the role and position of universities in the world’s higher education. The different ranking systems were made to assess the performance of universities over the world (Anowar et al., 2015; Gnolek et al., 2014). The widely recognized ranking systems—once accepted—are influential to a variety of people, specifically: international students for seeking a right institution for their higher studies; new researchers to carry on scholarly activities and achieve funding facilities; institutions themselves to develop a constructive competition; and also employers to choose appropriate workforce for their business development (Luque-Martínez & Faraoni, 2019; Souto-Otero & Enders, 2017). A review by Anowar et al. (2015) indicated the top four widely accepted ranking systems including THE, ARWU, QS and Webometrics Ranking. Anowar et al (2015, p. 559) showed that “(…) none of these ranking systems can provide satisfactory evaluation in terms of their construct validity and other parameters related to disputation. Overall observation of these four ranking systems reflects the fact that generic challenges include adjustment for institutional size, differences between average and extreme, defining the institutions, measurement of time frame, credit allocation, excellency factors as well as adjustment for scientific fields”. It is observed that since every ranking system has used heterogeneous measures to obtain ranking scores for each university participant, it hardly says which ranking system is more reliable. In addition, Altbach (2015) revealed that teaching quality and assessing the impact of education on students have been not explored in the ranking systems. According to Shattock (2017), the THE ranking has included some teaching related data in terms of a significant weighting to reputation; however, because of this ranking mainly based on research performance, teaching performance seems to be minimally focused.

Despite the fact that the ranking systems are widely criticized for questionable methods and for concepts itself, they have a powerful influence on ranked institutions to perceive who they are as well as to be perceived by others (Altbach, 2015; Shattock, 2017). For example, in a survey of Hazelkorn (2011, p. 86), approximate 70% of institutions want to be in the 10% nationally and in the 25% internationally. Leaders of institutions are often appointed on the ground of a commitment to improve the institution’s ranking (Shattock, 2017). It is clear that no ranking systems are perfect, but they play a crucial guideline for institutions on the way enhancing their academic performance. In a nutshell, while studies on seeking a better method for international university ranking are still on-going, some widely accepted ranking systems, namely THE, ARWU, Webometrics and QS are still used for different purposes in higher education research. On a more institutional ground, they are also commonly employed for identifying those universities that, in specific countries or areas, are considered those elite institutions competing for global positioning.

In such a context, the choice of the group of universities to be included in the study is crucial for the outcome of the productivity analysis of ‘elite’ Chinese and European universities. Several authoritative university rankings in the market provide us with the referable materials for selecting a reasonable sample. Within the scope of three most widely accepted international university rankings, QS places great emphasis on subjective reputation indicators, with weights approaching 50%. ARWU focuses vastly on scientific research indicators, which makes the universities focusing on undergraduate education relatively low rankings. THE uses more indicators in its ranking system than the other two, making its ranking more balanced providing more quantitative information usable for efficiency analyses. Meanwhile, the weight of literature and economic indicators in THE is much higher than that in QS (Olcay & Bulu, 2017). It is clear that THE indicator system is in line with Chinese desire to improve the research quality and treatment of researchers and catch up with the advanced western higher education. Therefore, we employ its records during 2011–2019 to select ‘elite’ Chinese and European universities in this paper. Specifically, annual rankings of universities are employed to determine the average rankings of each listed universities.

After eliminating the universities which have been present in the THE rankings less than seven times during 2011–2019, we choose 50 universities in European Union (EU) countries with the highest average rankings as the EU sample universities. Since the construction and development of Chinese universities started later than the European ones, only seven Chinese universities have been listed in the THE rankings for more than seven times during 2011–2019. However, seven Chinese sample universities are far from adequate in the process of comparing the performance of Chinese and European universities. Therefore, we first searched THE rankings during 2011–2019 and gathered 74 listed Chinese universities, which are regarded as our candidate universities. Therein, several candidate universities are not directly subordinate to the Ministry of Education (MoE) of China, and we cannot collect comparable data of these universities due to the incomplete release of Chinese university data. Ultimately, we choose top 40 samples according to the average rankings from the candidate universities directly under the administration of MoE. These sample universities all belong to Project 211 universities, which are deemed as a group of high-level researching universities in China. Therefore, they can be regarded as representatives of elite universities in China. A detailed list of sample universities is reported in Table 8 (in Appendix 2). Chinese and European universities are naturally divided into two groups because of the obvious differences in educational patterns and evaluation mechanisms between these two groups.

Data about Chinese universities primarily come from Compilation of Science and Technology Statistics of Higher Education Institutions (CSTS-HEI) edited by Chinese Ministry of Education, Graduate Employment Quality Reports (GEQRs) published by sample universities and the InCites database affiliated to Clarivate Analytics.Footnote 1 Data of European universities are mainly extracted from InCites and another database named European Tertiary Education Register (ETER), which is the first comprehensive database registering data of European higher education institutions (HEIs) and has been built through an initiative of the European Commission.

We explore the databases available by the presence of relevant indicators in both sources of data, guided by the selection of inputs and outputs used in the empirical analysis within the existing academic literature. We end up selecting two inputs and three outputs. Table 1 elaborates names, units and data sources of these variables. The selection of these variables refers to the existing literature, considers the availability of data, and takes into account the functions and operational characteristics of colleges and universities. From the perspective of input, both manpower input and capital input are included. In terms of output, the main outputs of both teaching activities and scientific research activities are considered.

Table 1 Information about our selected input and output variables

Total current expenditure reflects the total cost in research and development (R&D), teaching, and other operational activities of each sample universities. Concretely, “total current expenditure at purchasing power parity” is the exact indicator which we pick from the ETER database. For Chinese sample universities, data of this variable derives from the sum of R&D expenditure, government funding and enterprise funding. Therein, R&D fund refers to the funds for the scientific and educational expenses allocated from the higher authorities, and it is mainly used to carry out the fundamental operational activities. Government fund refers to the research funds provided by government departments to support universities’ participation in scientific research activities. Enterprise fund refers to the research funds obtained from enterprises and institutions outside the university and government departments, and it is also used to support the scientific research activities in universities. To eliminate the influence of inflation and monetary exchange rate across years, we use average exchange rate of RMB against Euro to process the data of “Total current expenditure” of China after gathering three subitems, R&D fund, government fund and enterprise fund.

Academic staff refer to the number of staffs (headcounts) engaged in academic activities in each selected university. European data are directly derived from the ETER database, while the Chinese data are arrived from the sum of teaching, researching and R&D personnel. Therein, teaching and research staffs stands for the personnel engaged in teaching and researching activities. R&D staffs refers to personnel engaged in research and development, application of results and services. Unfortunately, there is no detailed explanations on the structure of personnel title, class, etc., so we rely upon simple headcounts.

A methodological, economic note is needed here. The two variables selected as inputs are no doubt correlated, but we decided to include both as a more complete and reliable measure of HEIs’ resources. Indeed, we would like to take explicitly into account that labour intensity (as measured by the number of academic staff) can be different than capital intensity (as measured by expenditures). This is particularly true in cases where the price of labour (i.e. salary) is heterogeneous across institutions. This is certainly the situation we can observe within both Europe and China and across the two areas.

The variable of students refers to the International Standard Classification of Education (ISCED) and denotes it as the enrolled students at ISCED 6–8, which corresponds to the total number of students taught or trained in the stages of undergraduate, master and Ph.D. in China. From the year of 2013 on, GEQRs became the main way to reveal the number of students of most Chinese universities. Nevertheless, there is no such uniform publication on the market announcing this figure before 2013. Therefore, data about the number of students consists of two sources. The first part refers to the official enrollment plans released by these universities, which contain data of aggregate enrolments. The second part of the data is collected from GEQRs, which unveil the annual number of graduates of our sample universities. The academic literature in the field discusses whether considering students as inputs (and correspondingly graduates as outputs) or outputs. We opted for the second vision, considering the resources of universities (staff, teaching hours, etc.) are defined on the basis of the number of students, even in cases when they drop out and do not graduate. In this perspective, we consider the number of students a better proxy of the volume of teaching activities actually realized by universities—instead of focusing narrowly on the number of students.

Publications include the number of papers covered by the Web of Science (WOS) database, and citation denotes the category normalized citation impact (CNCI) of a university. CNCI is a bibliometric indicator which removes the influence of different staff size and subjects’ differences. Data about the bibliometric indicators is collected from the Incites database. This database is one of the products belonging to Clarivate Analytics. To prepare for the retrieval, we set “Retrieval type” as “Organization”, choose “Time period” as “2011–2015”, fill in the name of each university, choose the “Research output” as “document type” (including article and review), choose “Schema” as “ESI” (including SCI and SSCI). Then, we can obtain data of “Web of Science documents” and “Category Normalized Citation ImpactFootnote 2” for each university. Therefore, we are sure that data of publication and citation are from the same source and they are comparable at any effect.

The model for representing the productive process of the universities is kept quite simple for allowing the maximum possible comparison between universities in very different contexts. The simplicity of the model allows the generalizability of the production process described and can be used to compare the efficiency of the different organizations (universities) in realizing their base activities, namely teaching and research.

Data of EU samples are collected from the ETER database on April 1, 2019. Until then, the latest data is belonged to the year of 2015. Considering the temporal interval in which both Chinese and European data are available, our investigating period is limited to 2011–2015. Besides, the two different data sources of students in China overlapped in 2013, so that cross-year calculations and comparisons related to Chinese data are all divided into 2011–2013 and 2013–2015 hereinafter. It is worth noting that monetary data of China are converted to Euro by average exchange rate each year from the National Bureau of Statistics of China, as shown in Table 2. To eliminate the inflation factor of Euro in different years, we also employ the average consumer price index (CPI) for EU countries from the OECD database to deal with the data. The CPI data, with 2011 as the base year, are displayed in Table 2.

Table 2 Exchange rate and CPI during 2011–2015

Generally, dissimilar distribution patterns of the original data within different groups can reflect the differences in the operation strategies of universities. Therefore, we provide annual descriptive statistics of Chinese and European universities to make a comparison between the two different groups, which are given in Table 9 (in Appendix 3; moreover, Appendix 3 reports average data by country and year). Furthermore, the annual average values of all variables are used to depict their changing trends, as demonstrated in the line charts of Fig. 1. Peculiarly, we partition the line of students in Fig. 1c to display the different sources of data for the group of Chinese universities.

Fig. 1
figure 1

Line charts of variables over the investigated periods

The changing tendency of variables in Fig. 1 unveils several main differences between Chinese and European elite universities. Firstly, the amount of total current expenditure devoted to European elite universities far exceeds that of Chinese universities, also because of their bigger dimension on average (see number of students later in this section). On the basis of such a large amount of input, they maintain an annual growth rate of expenditures of 6.5%. The annual growth rate in China keeps in the ratio of 17.4%, albeit from a relatively small expenditure base. Secondly, there is a significant difference in the distribution of academic staff between Chinese and European universities, as Chinese universities tend to hire far more academic staff than European ones. Furthermore, academic staff in European universities reveal a general trend of steady rise, while in Chinese universities it remains relatively constant. Thirdly, in terms of the number of papers published and the citation quality, China is attempting to catch up with the European levels at a relatively high speed from a weak and backward basis. When considering publications, both Chinese and European universities are growing fast over the examined period, with average annual growth rates of 18.1% and 5.5%, respectively. However, with regard to the citation quality, Chinese universities keep growing steadily but remain at low levels, while European universities maintain slow growth on a higher level after reaching the secondly high peak in 2012. Due to the interruption of the data sources in China, the exact trend of students trained in Chinese universities is hard to judge, but the number of students within European universities constantly increases at an annual rate of 3.9%.

In general, from the perspective of input and output variable values, top European universities are significantly ahead of top Chinese universities. To identify ways in which the two groups of universities can further improve education performances, we use a meta-frontier approach to derive efficiency and productivity hereinafter, and to allow a fairer and robust comparison of their performance and efficiency.

Results of the empirical analysis

Results from the baseline model

To test the return of scale hypothesis satisfied by our data, we employ the non-parametric returns to scale test (Simar & Wilson, 2002; Tran & Dollery, 2021) and give the null assumption that return to scale is constant. We tested data from Chinese and European universities respectively, and both tests rejected the null assumption with p-values less than 0.05. Therefore, we employ models based on the assumption of variable returns to scale hereinafter.

Furthermore, since our adopted non-parametric DEA technique is greatly affected by outliers, it is necessary to check whether there are outliers before conducting calculation and analysis (Clermont & Schaefer, 2019). In view of the two different groups of universities in our study, we use the super-efficiency DEA model to calculate the super-efficiency of Chinese universities and European universities, respectively. The results show that only the 33th European university and the 2nd Chinese university present the infeasibility problem in 2011 and 2012. No such problems appear in other universities among other years. Therefore, to include as many universities as possible, we retained all the universities in the following analysis.

For the purpose of seeking out overall performance and differences between the separate groups, the results of meta-frontier productivity and its decompositions are elaborated and presented in detail in the Appendix (Table 11). Although the problem of infeasibility may occur with the VRS-based model, no such cases occur in the calculation process of this paper.

From the view of the whole Chinese and European university system, the MMPI grows with an annual average increase rate of 5.68%. However, values of the productivity index reveal a fluctuant and downward tendency, declining from 1.0958 to 0.9959. It is an indication that the overall productivity growth within China and European higher education system is slowing down. Values of the EC (technical efficiency) share a similar changing pattern with the MMPI, and its declined extent is also dramatic, strikingly descend from 1.0482 to 0.9837. The BPC (technical change over time) is the only decomposed source that stays above unity, and its fluctuation ranges from 1.0211 to 1.0538. Therefore, value of the average BPC ranks highest and acts as the most influential factor shaping the overall performance of the universities under scrutiny. The impact of individual technology leadership, denoted as TGC, keeps fluctuating during this period. It varies slightly on the perimeter of unity, within the range from 0.9814 to 1.0184. Please see detailed computational results in Table 3 and the corresponding line chart in Fig. 2. As anticipated, the overall productivity and its sources are affected by the discontinuity of indicator about students for Chinese universities, thus we use a dotted line to connect the two points 2012–2013 and 2013–2014.

Table 3 Overall productivity and its decomposition, all universities together
Fig. 2
figure 2

Line chart of the overall productivity and its decomposition

On the basis of the assessment of overall condition in Chinese and European tertiary education system, we further conduct an intragroup examination on 50 European sample universities. Numerical results are exhibited in Table 4, and the corresponding trend chart is depicted in Fig. 3. Data sources of the European group is continuous, so Fig. 4 is presented without dotted lines. The MMPI scores in European group experiences the process of falling and then rising with the appearance of a minimum value of 0.9994 at 2012–2013. Averagely, the value of MMPI increases by 4.51% per year, which is inferior to the average of all the samples. From the perspective of sources of MMPI, the EC (technical efficiency) rises at an average rate of 0.85% yearly, while the BPC (technical improvement over time) increased 2.05% annually. EC and BPC show an alternative relationship in the changing process. The TGC (intergroup technology gap) behaves most consistently with the MMPI in this group, with its score falling from 1.0637 to 0.9884 during 2011–2013 and rebounding to 1.0267 until the period 2014–2015.

Table 4 Productivity and its decompositions within the European group
Fig. 3
figure 3

Line chart of the productivity and its decompositions within the European group

Fig. 4
figure 4

Line chart of the productivity and its decompositions within the Chinese group

When considering the Chinese group, we derive the computational scores and list them in Table 5. The corresponding line chart is demonstrated in Fig. 4 (to be remembered: dotted lines represent the discontinuity in data about students). The average growth value of MMPI reaches 7.15%, a value definitely higher than that reported by European universities. However, the indicator also shows a sharply downward trend to 0.9219 during 2014–2015. Within the decomposed components, EC (technical efficiency) emerges as the key factor with an average annual growth rate of 6.11%, even though it shows a gradually descending tendency. BPC (technical improvement over time) also performs brilliantly, and it increases 5.29% yearly. These figures illustrate that individual efficiency is rising, and the gap between individual universities and group frontier is narrowing. Nevertheless, TGC shows a negative performance, revealing that intra-group technology is somehow inefficient in the period under analysis.

Table 5 Productivity and its decompositions within the Chinese group

After analyzing the different performance of MMPI and its corresponding decompositions in China and Europe, it is necessary to examine the results of TGR in each group, that is to say the technology gap ratio between the two groups. Even though we have obtained TGC scores, which provide us with the information of leadership changes, we still need TGR to identify which group plays the leading role in technical efficiency during the examined period. Therefore, we examine the TGR results of two groups and depict them in kernel density graphs, as shown in Fig. 5, to recognize differences between groups. TGR scores in EU group reveals centralized distribution in the range of 0.6–1, which indicates that EU universities possess the absolute technology leading position. With regards to the CN group, the concentrated distribution interval of TGR scores has a larger span from 0.2 to 1 and a lower average value. This finding must be interpreted as that there are less internationally leading universities in China compared with in Europe, and a persistent technology gap between the two groups.

Fig. 5
figure 5

Kernel density plots of TGR (technology gap ratio) in Chinese and European universities

Robustness checks

The results presented in the Sect. 4.1 could be hampered by two main drawbacks of the specification employed for the universities’ production function. On one side, the different quality of publications (i.e. the variable measuring output) can distort the image of efficiency, to the extent to which the quality is higher in the group of Chinese versus European elite universities (or viceversa). Given the elite status of the institutions, this eventuality seems remote, but cannot be completely ruled out. On the other side, the differences in salaries for academic staff between Chinese and European Chinese can have an effect on the expenditures (i.e. the input variable), so again distorting the measurement of pure technical efficiency. With these two threats in mind, this section reports the results of two robustness tests that we performed to check the validity of the main findings when different specifications of the production functions are allowed.

Specifically, we re-run all the empirical analyses, modifying the set of variables employed in the efficiency measurement:

  • Robustness test #1 We use two inputs (total current expenditure and academic staff) and three outputs: students, high-quality publications, defined as papers published in Q1 and Q2 journals—as classified by Web of Science—and citations of high-quality publications;

  • Robustness test #2 We use only one inputs (academic staff) and three outputs, as classified in the baseline model (students, publication, citations).

The results of these robustness tests reveal some interesting patterns emerge, which are worthwhile of some discussion.

When considering only high-quality publications (Table 6), the estimated productivity (MMPI) over the period looks lower than that reported through the baseline model (2.11% instead of 5.68%). This is expected, as the number of high-quality publications is (by definition) lower than that of total publications. Also, the productivity increase seems driven by the best practice gap change (BPC) and not by the efficiency variations within groups (EC), the latter being virtually = 1.00. The improvement in the frontier of the two groups is even slightly declining (0.99), a result in line with that emerging from the baseline model. The pattern is very similar across the groups of EU and CN elite institutions, where the only difference is that the productivity increase is a little bit higher for the former (2.39%) than for the latter (1.83%)—but Chinese institutions seem obtaining an improvement of technical efficiency, of about 1.14%.

Table 6 Overall productivity and its decomposition, robustness test # 1

When considering the robustness test #2, where the expenditures are not considered (and all publications and citations are counted, not only high-level ones), again the productivity increase appears somehow lower than the one reported in baseline model (3.51% vs 5.68%), although higher than the one associated with the robustness check #1 (2.11%)—see Table 7. The technical efficiency (EC) and the groups’ specific frontiers (TGC) contribute little—albeit positively—to the total productivity, and the major influence is again exerted by the best practice gap change (BPC). No significant patterns are detected among one of the two groups of elite institutions, which actually behave in a pretty similar manner (see Panels B and C of the Table 7).Footnote 3

Table 7 Overall productivity and its decomposition, robustness test # 2

Taken together, the results obtained through the two robustness tests proposed here seem confirming the validity of the main findings from the baseline model. Nevertheless, when relaxing some assumptions about the uniformity of the universities’ production functions across groups (namely, comparable quality of all publications and similar prices of inputs), the estimation of productivity increase actually decreases in magnitude. We can then conclude that baseline estimation of productivity growth (5.68%) is an upper bound of the true one, whilst the estimation based on only high-quality publications is a lower bound of it (2.11%).

Discussion of results and concluding remarks

This paper uses an innovative, meta-frontier Malmquist Productivity Index (MMPI)Footnote 4 to measure productivity efficiency of the elite universities in China and Europe in an overarching framework, considering the period 2011–2015. Answering the research questions of this study, our empirical findings show that universities in the reported sample witnessed a productivity growth of 5.68% on average in the period under analysis. Therefore, this growth fluctuated over the surveyed period and had a decreasing trend. The decomposed components of the productivity index including (1) efficiency change, (2) best practice change and (3) technology gap change illustrated an increase on average at 3.19%, 3.49% and 0.21% respectively. The best practice change demonstrating technical progress among universities during the surveyed period acted as a driven incentive for a rise in overall productivity. These findings suggest that both Chinese and European universities attempted to keep increasing productivity to reach standards of the elite universities. Albeit having started later its investment into projects of building world-class universities, the Chinese HE system’s success can be widely recognised looking at the productivity increase during 2011/12–2014/15. Such a growth in productivity of Chinese universities could not be attainable during the previous post-reform period (1998–2002) as indicated in the work of Ng and Li (2009). However, on the other hand, European elite universities exceed Chinese counterparts in investing in human and financial resources, including the quantity and quality of their academic outputs.

Having a closer look at the performance of European universities under their own frontier, the findings reveal that the average productivity growth of European universities is approximately 4.51%. In comparison with the overall samples, European universities present less technical progress in the productivity change. The initial higher efficiency level can explain why the progress made in the period under analysis can result lower in magnitude. A breakthrough can be seen in the performance of Chinese universities with a productivity growth at 7.15% in which efficiency change and intergroup technology catch-up are key enhancers. This result is slightly lower than the finding found in Song et al. (2019) at 10.3%, who investigated the productivity rise of 58 Chinese universities for the period 2009/10–2015/16. Thus, the study confirms evidence that Chinese universities have been making a brilliant improvement in their academic operations. Also, the increase in productivity points at demonstrating that government policies on Chinese higher education are on a right track to keep supporting and enhancing the higher education projects for the aim of getting Chinese universities into the group of elite universities as soon and many as possible. It may be a role model for Asian nations who would desire to get their tertiary institutions into the world’s high education standards. However, with respect to their individual frontier, Chinese universities still lag behind the current technology leadership conditions. This lag indicates that Chinese universities still have room for improving their performance through a more efficient use of their resources—and specifically, this would be pursued by increasing the research output (publications) more than proportionally when compared with resource investments.

An intriguing question would be investigating the underlying causes for the efficiency and productivity patterns of elite universities, that we synthetized above. Several justifications and acting forces could be explored, including for example (1) the specific effect of targeted policies enacted for elite universities in China, (2) the lack of proper incentives to increase universities in Europe (Aghion et al., 2010) and (3) the potential cost disease effect in the European universities that has been already highlighted in other HE contexts, such as in the USA (Archibald & Feldman, 2008). The (causal) explanation of the phenomena that we have documented is beyond the scope of the present paper. Future research could build on the descriptive findings reported in our paper, following the avenues and hypotheses mentioned above—and additional others.

Meanwhile, the current paper contributes to the literature of higher education by investigating and comparing productivity index of the European and Chinese elite universities, an empirical exercise that has been yet done before. Our empirical findings provide useful information to educational leaders and policy makers to seek appropriate solutions to move nation’s higher education institutions forward. Although European universities have experienced more with the elite university’s standards and got a strong position in the higher education ranking system, they still face some specific challenges (Zhu & Zayim-Kurtay, 2018) as shown in recent studies. For example, European HE institutions suffer a deficiency of interdisciplinary research and lack of national and international collaboration with industry and other tertiary institutions (European Commission, 2005b; European Commission, 2008; LERU, 2006) and other quality variables such as a decline in the number of young people, inequality in low social-economic status students in higher education and inadequate competitive capacity in research and innovation of universities (Eurydice, 2000; European Commission, 2005a; Van Vught, 2011). Following the suggestions by Aghion et al. (2010), further improvements in the productivity could be generated by more universities’ autonomy and extrinsic incentives towards academic excellence (for example, higher shares of competitive-based public funding)—these interventions might be more effective for the group of elite HE institutions, among others. Similar to European universities, Chinese institutions inevitably face challenges on the way to get the world-class university position. That is, the management system should be transparent to enable Chinese universities to achieve their elite and world-class status (Luo, 2013; Ngok & Guo, 2008). In addition, personnel reform and global competition are also influential factors to the elite status of Chinese universities should also be concerned (Song, 2018; Yang, 2000). More generally, the approach of keeping efficiency high by containing costs for inputs (as evidenced in the descriptive statistics in this paper) might be not the proper one when improving performance substantially becomes the key objective. Indeed, the last strategy would imply attracting the best academic talents (i.e. by higher remuneration) and developing top-class facilities and laboratories.

In spite of some clear results presented in the paper, future studies can improve our findings by investigating whether effects of environmental factors on productivity index are affecting the results. In addition, more outputs may be added into the model where appropriate to capture different patterns in production process of universities, especially qualitative outputs as a university’s contribution to the community and outreach (Wolszczak-Derlacz, 2017).

Time dimension is worth to be considered. The analysis presented in this work covers a short period of time, just 4 years—so new studies will be necessary to evaluate the long-run dynamics of efficiency and productivity of elite universities. Moreover, the period considered in this work includes the years during the Great Recession (especially 2011 and 2012). The particular economic conjecture could have affected the production processes of the universities analysed in this work. For example, we are aware that governments and institutions reacted very differently to the specific problem of assuring adequate funding to universities during that period. In the case of China, the employment situation of college graduates turned more severe because of the crisis, and the demand of domestic higher education talent market declined. To address these adverse effects, the government adopted various policy measures. The State invested extra financial input into higher education and established relevant laws and regulations to improve the employment situation. In addition, it also provided additional opportunity to introduce international educational resources and expand multi-regional cooperation and exchanges. In the case of Europe, the European University Association (EUA) reports that some governments (like the French one) invested more resources with a counter-cyclical aim, while others did not—resulting into substantial decline in available resources for universities. These elements call for some further research which looks at these factors more closely, comparing the situation with that consolidated after the financial crisis. From a methodological perspective, although we used a panel structure to track changes in productivity of two groups of universities, a longer span of data would be preferable to provide the productivity index in a more robust manner and thus assess more accurately sustainable development of universities over time.

The results presented in this paper are strongly valid internally but require caution to extrapolate policy implications for non-elite universities, especially poor-performing institutions. Indeed, it might be the case that this latter group is actually more efficient than elite universities—i.e. more able to make the most with available universities. Moreover, universities’ efficiency distributions within Europe and China, as well as across the two areas, might also be different than in the case of the elite HE segment. As a straightforward research direction, future analyses should target a direct comparison of performance and efficiency in the group of second and third tier of HEIs; such an extension would shed more important lights about the overall quality of the HE systems in the two different parts of the globe.

A final note also relates with the radical challenge imposed by the emergence of COVID-19 during 2020. In the different trajectories of evolving performance, Chinese and European universities could have been affected quite heterogeneously by the pandemic, for example because their relative ability of maintaining operations, on-campus lectures, international enrolments, etc. Surely, the investigation of the COVID-19 impact on universities’ efficiency in the international perspective will become a central topic in the empirical HE studies in the coming years.