1 Introduction

A problem that is common to a number of fields, including engineering design, is that of aggregating multiple ordinal rankings of a set of alternatives into a collective ranking. This problem may concern the early-design stage, in which m experts (or decision-making agents: D1 to Dm) formulate their individual rankings of n design alternatives (or objects: O1 to On) (Fu et al. 2010; Frey et al. 2009; Hoyle and Chen 2011; Keeney 2009). In the simplest case, these rankings are complete, i.e.:

  1. (i)

    each expert is able to rank all the alternatives of interest, without omitting any of them;

  2. (ii)

    each ranking can be decomposed into paired-comparison relationships of strict preference (e.g., O1 ≻ O2 or O1 ≺ O2) and/or indifference (e.g., O1 ~ O2).

The objective of the problem is to aggregate the expert ordinal rankings into a collective one, which is supposed to reflect them as much as possible, even in the presence of diverging preferences (Weingart et al. 2005; See and Lewis 2006). For this reason, the collective ranking is often defined as social, consensus or compromise ranking (Cook 2006; Herrera-Viedma et al. 2014; Franceschini et al. 2015, 2016).

Returning to the context of the early-design stage, design alternatives are often not very well defined and there are doubts about how) to prioritize them (Weingart 2005; Kaldate et al. 2006; McComb et al. 2017. Although there is a substantial agreement on the design criteria, the selection of design alternatives is generally driven by the different personal experience of designers (Dwarakanath and Wallace 1995). Thus arises the need to aggregate preference rankings of design alternatives that reflect the opinions of individual experts, using appropriate aggregation models (Fishburn 1973b; Franssen 2005; Cook 2006; Hazelrigg 1999; Frey et al. 2010; Katsikopoulos 2009; Ladha et al. 2003; Reich 2010; Nurmi 2012).

Alongside this, a passionate debate on the effects of the Arrow's impossibility theorem in engineering design is still going on (Arrow 2012; Reich 2010; Hazelrigg 1996, 1999, 2010; Scott and Antonsson 1999; Franssen 2005; Yeo et al. 2004; McComb et al. 2017). In short, this theorem establishes the impossibility of a generic aggregation model to provide a collective ranking that always satisfies several desirable properties, also known as fairness criteria, i.e., unrestricted domain, non-dictatorship, independence of irrelevant alternatives (IIA), weak monotonicity, and Pareto efficiency (Arrow 2012; Fishburn 1973a; Nisan et al. 2007; Saari 2011; Saari and Sieberg 2004; Franssen 2005; Jacobs et al. 2014).

For a given set of m expert rankings concerning n alternatives, different aggregation models may obviously lead to different collective rankings (Saari 2011; McComb et al. 2017). Identifying the model that best reflects the m rankings is not easy, also because it may change from case to case. Some researchers showed the effectiveness of specific aggregation models, even though they cannot always satisfy all of the Arrow's fairness criteria (Dym, Wood and Scott 2002). Yet, the Arrow's theorem does not close the doors to the possibility of comparing different aggregation models, identifying the best one(s) on the basis of certain tests. For example, several authors attempted to measure the coherence (or consistency) between the expert rankings and the collective one (Chiclana 2002; Franceschini and Maisano 2015, 2017; Franceschini and Garcia-Lapresta 2019). Other authors hypothesized a relationship between the so-called implicit agreement of the expert rankings and the Arrow’s fairness (McComb et al. 2017). Moreover, Katsikopoulos (2009) expressed the need for greater clarity in the discussion ofengineering design methods to support decision making.

In general, the choice of the best aggregation model may depend on: (1) the specific objective(s) of the expert group and/or (2) the rationale of the test used (Dong et al. 2004; Li et al. 2007; Paulus et al. 2011; Cagan and Vogel 2012; Franceschini et al. 2019; Franceschini and Maisano 2019b).

The aim of this article is to make a comparison between four relatively popular aggregation models—i.e., the so-called Best of the best, Best two, Best three, and Borda count model—trying to answer the research question: “Which is the model producing the collective ranking that best reflects the expert rankings?”. The comparison will be performed by measuring the coherence of the models, through a recent test based on the so-called Kendall's coefficient of concordance (W) (Kendall 1962; Legendre 2010). This test quantitatively evaluates the coherence of the collective ranking provided by any aggregation model, for a specific ranking-aggregation problem.

A previous research (Franceschini and Maisano 2019a) illustrated the test in general terms, regardless of the characteristics of the specific aggregation models. This work significantly extends the previous one, investigating the characteristics of the aggregation models that are most likely to achieve higher coherence, according to the above test. The new investigation generalizes earlier results, including a mathematical optimization of a specific coherence indicator. Thanks to the outcomes of this study, the engineering-design management will have extra support for choosing the most “promising” aggregation models.

The remainder of this article is organized into three sections. Section 2 illustrates a case study that will accompany the description of the proposed methodology. Section 3 is divided into two parts: the first part formalizes the concept of coherence of the collective ranking with respect to the expert rankings, recalling the coherence test proposed in (Franceschini and Maisano 2019a); the second part analyzes the test itself thoroughly, showing its close link with the Borda count model. Section 4 provides a discussion of the practical implications and limitations of this research for the engineering-design field, summarizing original contributions and suggestions for future research. Further details are contained in the Appendix section.

2 Case study

This section contains an application example that will be used to illustrate the proposed methodology. An important hi-tech company—which is kept anonymous for reasons of confidentiality—operates predominantly in the sector of video projectors. Recent advances in imaging technology have led the company to increasingly invest in the development of hand-held projectors, also known as a pocket projectors, mobile projectors, pico-projectors or mini beamers (see Fig. 1) (Borisov et al. 2018).

Fig. 1
figure 1

Example of pocket projector, i.e., small hardware device designed to project content from a smartphone, camera, tablet, notebook or memory device onto a wall or other flat surface

Four design concepts of pocket projectors (O1 to O4, i.e., objects) have been generated by a team of ten engineering designers (i.e., the experts of the problem: D1 to D10), during the conceptual design phase (see also the description in Fig. 2):

  • (O1) stand-alone projector;

  • (O2) USB projector;

  • (O3) media-player projector;

  • (O4) embedded-type projector.

Fig. 2
figure 2

Schematic representation and short description of four alternative design concepts of pocket projectors

The objective is to evaluate the aforementioned design concepts in terms of user friendliness, i.e., a measure of the ease of use of a pocket projector. Some of the factors that can positively influence this attribute are: (i) quick set-up time, (ii) intuitive controls, and (iii) good user interface.

Given the great difficulty in bringing together all the experts and making them interact to reach shared decisions, management leaned towards a different solution: a collective ranking of the four design concepts can be obtained by merging the individual rankings formulated by the ten engineering designers (Table 1 shows these rankings).

Table 1 Ordinal rankings of four design concepts (i.e., O1 to O4) formulated by ten engineering designers (i.e., D1 to D10)

Before focusing on the possible aggregation models, let us take a step back dealing with the evaluation of the experts’ degree of concordance (Franceschini and Maisano 2019a). The scientific literature includes an important indicator to evaluate the overall association for more than two rankings, i.e., the so-called Kendall’s coefficient of concordance, which is defined as (Kendall and Smith, 1939; Kendall 1962; Fishburn 1973b; Legendre 2005, 2010):

$${W}^{\left(m\right)}=\frac{12(\sum_{i=1}^{n}{R}_{i}^{2})-3{m}^{2}n{\left(n+1\right)}^{2}}{{m}^{2}n\left({n}^{2}-1\right)-m(\sum_{j=1}^{m}{T}_{j})},$$
(1)

where: \({R}_{i}=\sum_{j=1}^{m}{r}_{ij}\) is the sum of the rank positions for the i-th object, rij being the rank position of the object Oi according to the j-th expert; n is the total number of objects; m is the total number of ordinal rankings; \({T}_{j}=\sum_{i=1}^{{g}_{j}}\left({t}_{i}^{3}-{t}_{i}\right), \forall j=1, \dots ,m\), being \({t}_{i}\) the number of objects in the i-th group of ties (a group is a set of tied objects), and \({g}_{j}\) is the number of groups of ties in the ranking by the j-th expert. If there are no ties in the j-th ranking, then \({T}_{j}=0\).

Regarding the rank positions (rij) of the tied objects, a convention is adopted whereby they should be the average rank positions that each set of tied objects would occupy if a strict dominance relationship could be expressed (Gibbons and Chakraborti 2010). This convention guarantees that—for a certain j-th ranking and regardless of the presence of ties—the sum of the objects’ rank positions is an invariant equal to:

$$\sum_{i=1}^{n}{r}_{ij}=\frac{n\cdot \left(n+1\right)}{2}.$$
(2)

In terms of range, \({W}^{\left(m\right)}\in \left[\mathrm{0,1}\right]\). \({W}^{\left(m\right)}=0\) indicates the absence of concordance, while \({W}^{\left(m\right)}=1\) indicates the complete concordance (or unanimity). The superscript “(m)” was added by the authors to underline that the coefficient of concordance is applied to the m expert rankings and to distinguish it from another indicator—referred to as W(m+1)—which will be applied to m + 1 rankings.

Returning to the problem in Table 1, which does not include any ranking with ties (i.e., \({T}_{j}=0\)), the formula in Eq. 1 can be applied, obtaining \({W}^{\left(m\right)}=0.004=0.4\%\). This result denotes a relatively low degree of concordance among experts.

Table 2 shows the calculation of the rank positions (rij) of the four objects, for each of the ten expert rankings in Table 1.

Table 2 Ranks positions (rij) for the four design concepts (O1, O2, O3 and O4), deduced from the expert rankings in Table 1

Inspired by different design strategies, the team of engineering designers decides to consider four popular aggregation models from the scientific literature (Saari 2011; McComb et al. 2017; Franceschini and Maisano 2019a). A brief description of these models follows:

  1. (i)

    Best of the best model (BoB or standard plurality vote). For each ranking, the most preferred design concept obtains one point. According to the data in Table 2, the resulting collective ranking is \(O_{1} \succ O_{4} \sim O_{3} \succ O_{2}\) and the “winning” design concept is O1. Table 3(i) contains the intermediate calculations.

    For example, this model is used to designate the winner of important competitions, such as the “Red Dot Award”, awarded by eminent design associations/centres to the best design concept of the year (see https://www.red-dot.org, last accessed on September 2020).

  2. (ii)

    Best two model (BTW or vote for two). For each ranking, the two most preferred design concepts obtain one point each. According to the data in Table 1, the resulting collective ranking is \(O_{1} \succ O_{4} \succ O_{3} \succ O_{2}\) and the “winning” design concept is O2. Table 3(ii) contains the intermediate calculations.

    For example, this model is used for municipal elections of City Commissioner in some major U.S. cities (Boyd and Markman 1983).

  3. (iii)

    Best three model (BTH or vote for three). For each ranking, the three most preferred design concepts obtain one point each (i.e., this is equivalent to neglecting the worst design concept). The resulting collective ranking is \(O_{3} \succ O_{4} \succ O_{2} \succ O_{1}\) and the “winning” design concept is O3. Table 3(iii) contains the intermediate calculations. Whilst this model is less common than the above models, it is occasionally used for municipal elections in several city councils (Stark 2008).

  4. (iv)

    Borda count model (BC).For each expert ranking, the first design concept accumulates one point, the second two points, and so on (Borda 1781). According to this model, the cumulative scores of the four design concepts are calculated as:

    $$\begin{gathered} {\text{BC}}(O_{1} ) = 1 + 1 + 1 + 1 + 1 + 4 + 4 + 4 + 4 + 4 = 25 \hfill \\ {\text{BC}(O}_{{2}} ) = 2 + 2 + 4 + 4 + 4 + 2 + 2 + 2 + 2 + 2 = 26 \hfill \\ {\text{BC}(O}_{{3}} {)} = 3 + 3 + 2 + 3 + 3 + 1 + 1 + 3 + 3 + 3 = 25 \hfill \\ {\text{BC}}(O_{4} ) = 4 + 4 + 3 + 2 + 2 + 3 + 3 + 1 + 1 + 1 = 24 \hfill \\ \end{gathered}$$
    (3)
Table 3 Scores related to the design concepts (i.e., O1 to O4) and corresponding collective ranking, produced by each of the four aggregation models (i.e., BoB, BTW, BTW and BC)

being \({\mathrm{ BC}(O}_{1})\), \({\mathrm{BC}(O}_{2})\), \({\mathrm{BC}(O}_{3})\) and \({\mathrm{BC}(O}_{4})\) the so-called Borda counts related to the four design concepts. Of course, the degree of preference of an i-th design concept decreases as the corresponding BC(Oi) increases. In this specific case, the collective ranking is \(O_{4} \succ O_{1} \sim O_{3} \succ O_{2}\) and the most preferred alternative is O4.

In addition to being used for engineering design (Dym et al. 2002; McComb et al. 2017), it is also used for: (1) political elections in several countries, (2) internal elections in some professional and technical societies (e.g., board of governors in the International Society for Cryobiology, board of directors in the X.Org Foundation, research area committees in the U.S. Wheat and Barley Scab Initiative, etc.), and (3) a variety of other contexts (e.g., world champion of “Public Speaking” contest by Toastmasters International, “RoboCup” autonomous robot soccer competition at the University of Bremen in Germany, ranking of NCAA college teams, etc.) (Emerson 2013).

Reflecting different design strategies, the four aggregation models produce four different collective rankings in this case (see overview in Table 3). Even more surprising is that the best pocket projector design concept (i.e. the object at the top of each collective ranking) is different for each of the four aggregation models. Although this plurality of results may at first glance confuse the reader, it is in some measure justified by the low degree of concordance of the expert rankings (i.e., W(m) = 0.004, as seen before). Additionally, this plurality of results raises the question: “Which is the model producing the collective ranking that best reflects the expert rankings?”. To answer this question, a test can be used to measure the coherence between (1) the expert rankings and (2) the collective ranking obtained through each model.

3 Testing and maximizing the coherence

This section is divided into two parts: the first one recalls the concept of coherence and the so-called W(m+1) test, while the second one analytically studies the maximization of the coherence itself.

3.1 The W (m+1) test

The basic idea of the \({W}^{\left(m+1\right)}\) test, recently proposed by the authors (Franceschini and Maisano 2019a), is to analyse the level of coherence between the expert rankings and the collective ranking resulting from the application of the (k-th) aggregation model. The test is based on the construction of an indicator, denominated \({W}_{k}^{\left(m+1\right)}\), which is nothing more than the Kendall's concordance coefficient (see Eq. 1), applied to the (m + 1) rankings consisting of:

  • The m expert rankings, involved in a engineering-design decision problem;

  • The collective ranking obtained by applying the (k-th) aggregation model to the previous m rankings. The collective ranking is actually treated as an additional (m + 1)-th ranking.

The formula of the indicator \({W}_{k}^{\left(m+1\right)}\) follows:

$${W}_{k}^{\left(m+1\right)}=\frac{12\left[\sum_{i=1}^{n}{\left({R}_{i}+{r}_{i}\right)}^{2}\right]-3{\left(m+1\right)}^{2}n{\left(n+1\right)}^{2}}{{\left(m+1\right)}^{2}n\left({n}^{2}-1\right)-\left(m+1\right)(\sum_{j=1}^{m}{T}_{j})-\left(m+1\right){T}_{m+1}},$$
(4)

where ri is the rank position of the i-th object in the collective ranking; obviously \({r}_{i}\in \left[1,n\right]\). In case of tied objects, the same convention described in Sect. 2 is adopted.

Going back to the case study, the indicator \({W}_{k}^{\left(m+1\right)}\) can be determined by applying the formula in Eq. (4) to the ten rankings in Table 1 plus the collective ranking resulting from the application of each aggregation model. Table 4 reports the resulting \({W}_{k}^{\left(m+1\right)}\) values; subscript "k" denotes a generic aggregation model, k: BoB, BTW, BTH, BC. For this specific problem, the BC model is the one with the highest coherence (\({W}_{\mathrm{BC}}^{\left(m+1\right)}\approx 2.00\mathrm{\%}\)).

Table 4 W(m), \({W}_{k}^{\left(m+1\right)}\) and \({b}_{k}^{\left(m\right)}\) values related to the m rankings in Table1, when applying the four different aggregation models (k: BoB, BTW, BTW, BC)

In this specific case, the condition \({W}_{k}^{\left(m+1\right)}\ge {W}^{\left(m\right)}\) holds for each k-th aggregation model, depicting a certain coherence (or positive coherence) between the corresponding collective ranking and the m rankings. An opposite result (i.e., \({{W}_{k}^{\left(m+1\right)}<W}^{\left(m\right)}\)) would depict incoherence (or negative coherence). Even though the latter situation is in some ways paradoxical, it can occur when a collective ranking is somehow conflicting with the m rankings (Franceschini and Maisano 2019a).

To quantitatively measure the degree of coherence of an aggregation model, the following synthetic indicator can be used (Franceschini and Maisano 2019a):

$${b}_{k}^{\left(m\right)}=\frac{{W}_{k}^{\left(m+1\right)}}{{W}^{(m)}}.$$
(5)

For a given set of alternative aggregation models, the most coherent can be considered the one that maximizes \({b}_{k}^{\left(m\right)}\); in formal terms, the model for which:

$${b}_{*}^{(m)}=\underset{k}{\mathrm{max}}\left[{b}_{k}^{\left(m\right)}\right].$$
(6)

The last column of Table 4 reports the \({b}_{k}^{\left(m\right)}\) values related to the four aggregation models of interest. It is worth noticing that this indicator allows a quick and practical quantitative comparison.

3.2 Maximization of \({{\varvec{b}}}_{{\varvec{k}}}^{\left({\varvec{m}}\right)}\)

Let us now focus on the synthetic indicator \({b}_{k}^{\left(m\right)}\). Replacing Eqs. (1) and (4) into Eq. (5), \({b}_{k}^{\left(m\right)}\) can be expressed as:

$$b_{k}^{{\left( m \right)}} = \frac{{W_{k}^{{\left( {m + 1} \right)}} }}{{W^{{\left( m \right)}} }} = \frac{{\frac{{12\left[ {\mathop \sum \nolimits_{{i = 1}}^{n} \left( {R_{i} + r_{i} } \right)^{2} } \right] - 3\left( {m + 1} \right)^{2} n\left( {n + 1} \right)^{2} }}{{\left( {m + 1} \right)^{2} n\left( {n^{2} - 1} \right) - \left( {m + 1} \right)(\mathop \sum \nolimits_{{j = 1}}^{m} T_{j} ) - \left( {m + 1} \right)T_{{m + 1}} }}}}{{\frac{{12(\mathop \sum \nolimits_{{i = 1}}^{n} R_{i}^{2} ) - 3m^{2} n\left( {n + 1} \right)^{2} }}{{m^{2} n\left( {n^{2} - 1} \right) - m(\mathop \sum \nolimits_{{j = 1}}^{m} T_{j} )}}}}$$
(7)

The previous expression is deliberately general, as it contemplates the possibility of:

  • ties—i.e., relationships of indifference (“ ~ ”) and not only of strict preference (“≻” or “≺”)—among the objects within the m-expert rankings;

  • ties among the objects within the collective ranking, or (m + 1)-th ranking.

By grouping some terms, Eq. (7) can be reformulated in a more compact form as follows:

$$b_{k}^{\left( m \right)} = \frac{{N_{1} + 24\left[ {\mathop \sum \nolimits_{i = 1}^{n} \left( {R_{i} r_{i} } \right)} \right] + 12\left( {\mathop \sum \nolimits_{i = 1}^{n} r_{i}^{2} } \right)}}{{D_{1} - W^{\left( m \right)} \left( {m + 1} \right)T_{m + 1} }}$$
(8)

It can be seen that the indicator \(b_{k}^{\left( m \right)}\) includes four types of contributions:

  1. (i)

    \(N_{1} = 12\left( {\mathop \sum \nolimits_{i = 1}^{n} R_{i}^{2} } \right) - 3\left( {m + 1} \right)^{2} n\left( {n + 1} \right)^{2}\) and \({ }D_{1} = W^{\left( m \right)} \left\{ {\left( {m + 1} \right)^{2} n\left( {n^{2} - 1} \right) - \left( {m + 1} \right)(\mathop \sum \nolimits_{j = 1}^{m} T_{j} )} \right\}\), concerning the m rankings (and therefore the rij and Ri values, also known as experts’ preference profile) and the parameters related to the “size” of the problem (i.e., n and m);

  2. (ii)

    \(24\left[ {\mathop \sum \nolimits_{i = 1}^{n} \left( {R_{i} r_{i} } \right)} \right]\), concerning a mixture of the experts’ preference profile (through the \(R_{i}\) values) and the ranks of the collective ranking (through the \(r_{i}\) values);

  3. (iii)

    \(12\left( {\mathop \sum \nolimits_{i = 1}^{n} r_{i}^{2} } \right)\), concerning the ri values of the collective ranking;

  4. (iv)

    \(W^{\left( m \right)} \left( {m + 1} \right)T_{m + 1}\), concerning a mixture of the experts’ preference profile (through \(W^{\left( m \right)} \left( {m + 1} \right)\)) and the Tm+1 value related to the collective ranking.

Note that the first contribution (i) is not related to the results of the collective ranking. Instead, the remaining three contributions—(ii), (iii) and (iv)—are all related to the collective ranking. In line with the research question behind this study, let us try to identify the aggregation model that produces the most coherent collective ranking, through the maximization of \(b_{k}^{\left( m \right)}\), operating on the three terms concerning the aggregation model: (ii) \(\mathop \sum \nolimits_{i = 1}^{n} \left( {R_{i} r_{i} } \right)\), (iii) \(\mathop \sum \nolimits_{i = 1}^{n} r_{i}^{2}\) and (iv) Tm+1.

Additionally, note that the analytic maximization of Eq. (8) as a function of the \(r_{i}\) values is relatively complex for two reasons: (i) the \(r_{i}\) values are variables defined on a discrete domain; (ii) the \(r_{i}\) values are explicit in some terms (i.e., \(24\left[ {\mathop \sum \nolimits_{i = 1}^{n} \left( {R_{i} r_{i} } \right)} \right]\) and \(12\left( {\mathop \sum \nolimits_{i = 1}^{n} r_{i}^{2} } \right)\)) and implicit in others (i.e., \(W^{\left( m \right)} \left( {m + 1} \right)T_{m + 1}\), where possible ties in the \(r_{i}\) collective ranks affect the \(T_{m + 1}\) term).

In the following subsections, the major terms of Eq. (8) will be analysed separately, although they are closely related. A more rigorous, though laborious, alternative could be performing numerical maximization through Monte Carlo simulations.

3.2.1 Analysis of \(\sum_{{\varvec{i}}=1}^{{\varvec{n}}}\left({{\varvec{R}}}_{{\varvec{i}}}{{\varvec{r}}}_{{\varvec{i}}}\right)\)

This term can be interpreted as a scalar product between two vectors: R = (R1, …, Rn) and r = (r1, …, rn). In general, the scalar product of two vectors (r and R) with predetermined modules is maximized if these vectors are completely aligned, i.e., when direct proportionality between the relevant components occurs: ri ∝ Ri. With reference to the problem of interest, this perfect alignment can hardly be achieved in practice, due to the fact that the ri components are rank positions ∈ [1, n], with constant sum equal to n⋅(n + 1)/2. Compatibly with the previous constraint, it can be demonstrated that the Borda model provides a collective ranking that maximises the term \(\sum_{i=1}^{n}\left({R}_{i}{r}_{i}\right)\) (see the proof in Appendix A.1).

3.2.2 Analysis of \(\sum_{{\varvec{i}}=1}^{{\varvec{n}}}{{\varvec{r}}}_{{\varvec{i}}}^{2}={\left|{\varvec{r}}\right|}^{2}\)

We note that this term corresponds to the squared module of the collective-rank vector r = (r1, …, rn). It can be demonstrated that r has the maximum-possible module when its components are a permutation of natural numbers included between 1 and n, in the absence of ties (see the proof in Appendix A.2). Precisely, the maximum-possible value of the term of interest is n⋅(n + 1)⋅(2⋅n + 1)/6 (Gibbons and Chakraborti 2010). If there is a tie, however, this term tends to decrease. The most disadvantageous case would be the one with an overall tie of all the alternatives (i.e., r1r2 = rn\(\frac{n+1}{2}\)), with a consequent value of the term of \({\left(\frac{n+1}{2}\right)}^{2}\cdot n\); therefore:

$$\sum_{i=1}^{n}{r}_{i}^{2}\in \left[{\left(\frac{n+1}{2}\right)}^{2}\cdot n, \frac{n\cdot \left(n+1\right)\cdot \left(2\cdot n+1\right)}{6}\right].$$
(9)

3.2.3 Analysis of \({{\varvec{T}}}_{{\varvec{m}}+1}\)

This term is maximized in the case of an all-all tie of all the alternatives in the collective ranking, i.e., r1r2= rn, with the value of n3n (see Sect. 2). In the case of no ties, it is obviously equal to zero. Therefore, the range of \({T}_{m+1}\) is:

$${T}_{m+1}\in \left[0,{n}^{3}-n\right]$$
(10)

3.3 Close link between \({{\varvec{b}}}_{{\varvec{k}}}^{\left({\varvec{m}}\right)}\) and the Borda count model

In light of the previous analyses, it is possible to outline two potential situations, as described in the following subsections.

3.3.1 Absence of ties in the collective ranking

In the absence of ties in the collective ranking, the two terms \(\sum_{i=1}^{n}{r}_{i}^{2}\) and \({T}_{m+1}\) become constants, i.e., respectively: \(\sum_{i=1}^{n}{r}_{i}^{2}=\frac{n\cdot \left(n+1\right)\cdot \left(2\cdot n+1\right)}{6}\) and Tm+1 = 0. It can be proven that the BC model is the one maximizing the term \(\sum_{i=1}^{n}\left({R}_{i}{r}_{i}\right)\) (see demonstration in Appendix A.1); given the constancy of the aforementioned two other terms, the BC model will also maximize \({b}_{k}^{\left(m\right)}\) in this situation.

To better focalize this result, it may be appropriate to consider the general expression of \({W}^{(m)}\) (in Eq. 1) more closely. It can be observed that the \({R}_{i}\) values (e.g., those reported at the bottom of Table 2) are the same as the Borda scores assigned to the individual objects of interest (i.e., \({\mathrm{BC}(O}_{\mathrm{i}})\), \(\forall i=1, \dots ,n\), as exemplified in Eq. 3) (Cook and Seiford 1982). In this situation, the expression of \({b}_{k}^{\left(m\right)}\) in Eq. (7) can be reformulated as:

$${b}_{k}^{\left(m\right)}=\frac{{W}_{k}^{\left(m+1\right)}}{{W}^{(m)}}=\frac{\frac{12\left\{\left[\sum_{i=1}^{n}{{\mathrm{BC}(O}_{i})}^{2}\right]+ 2\cdot \left\{\sum_{i=1}^{n}\left[{\mathrm{BC}(O}_{i}\right)\cdot {r}_{i}]\right\}+\frac{n\cdot \left(n+1\right)\cdot \left(2\cdot n+1\right)}{6}\right\}-3{\cdot (m+1)}^{2}\cdot n\cdot {(n+1)}^{2}}{{\left(m+1\right)}^{2}n\left({n}^{2}-1\right)-\left(m+1\right)(\sum_{j=1}^{m}{T}_{j})}}{\frac{12\cdot \left[\sum_{i=1}^{n}{{\mathrm{BC}(O}_{i})}^{2}\right]-3{m}^{2}n{\left(n+1\right)}^{2}}{{m}^{2}n\left({n}^{2}-1\right)-m(\sum_{j=1}^{m}{T}_{j})}}.$$
(11)

Pooling the terms that do not depend on the collective ranking (i.e., all except for the ri terms), the following compact expression can be obtained:

$${b}_{k}^{\left(m\right)}=a+b\cdot \sum_{i=1}^{n}[{\mathrm{BC}(O}_{i})\cdot {r}_{i}],$$
(12)

being a and b two terms that—for a given problem—can be treated as constants, as they depend exclusively on n, m, \({R}_{i}={\mathrm{BC}(O}_{i})\) (\(\forall i=1, \dots ,n\)), and \({T}_{j}\) (\(\forall j=1, \dots ,m\)). Equation (12) highlights the close link between the collective ranks (\({r}_{i}\)) and the Borda scores related to the m-expert rankings: in the absence of ties in the collective ranking, the BC model maximizes the term \(\sum_{i=1}^{n}[{\mathrm{BC}(O}_{i})\cdot {r}_{i}]\), and therefore also \({b}_{k}^{\left(m\right)}\), determining the maximum alignment (or projection) of the two vectors r = (r1, …, rn) and \({\varvec{B}}{\varvec{C}}=[{\mathrm{BC}(O}_{1}), {\mathrm{BC}(O}_{2}),..{\mathrm{BC}(O}_{n})]\) (cf. Sect. 3.2.1 and Appendix A.1).

3.3.2 Presence of ties in the collective ranking

The presence of ties in the collective ranking can affect the maximisation of \({b}_{k}^{\left(m\right)}\) in a somewhat unpredictable way: although it contributes to reduce the term \(\sum_{i=1}^{n}{r}_{i}^{2}\), it also contributes to increase the term \(\left(m+1\right){T}_{m+1}\) (cf. Eq. 8). Thus, the overall effect on \({b}_{k}^{\left(m\right)}\) is not simply predictable and should be considered on a case-by-case basis; this also emerges from the additional asymptotic analysis of \({b}_{k}^{\left(m\right)}\), contained in Appendix A.4.

Reversing the perspective, in the presence of ties, the maximization of \(\sum_{i=1}^{n}[{\mathrm{BC}(O}_{i})\cdot {r}_{i}]\) by the BC model does not guarantee the maximization of \({b}_{k}^{\left(m\right)}\) (see the example in Appendix A.3).

4 Discussion

Leaving the mathematical issues, this section focuses on (i) practical implications and limitations of this research for the engineering-design field, and (ii) original contributions and ideas for future research. These topics are covered in the following two sub-sections respectively.

4.1 Implications and limitations for engineering design

In early-design stages, initial decisions often should be made when information is incomplete and many goals are contradictory, leading to situations of conflict between (co-)designers. Managing the conflict that emerges from multi-design interaction is therefore a critical element of collaborative design (Grebici et al. 2006). According to some authors, conflict itself is the process through which ideas are validated and developed: “the engine of design” (Brown 2013).

Considering the problem of interest, the engineering-design conflict finds its shape in the (discordant) object rankings, which are formulated by the individual designers; for example, this conflict is quite evident from the m rankings in the case study (see Table 1). The collective ranking represents a way to solve this conflict and the aggregation model therefore represents a sort of conflict-management tool. However, the plurality of aggregation models makes their selection non-trivial: any aggregation model is by definition imperfect and may provide more or less sound results, depending on the specific ranking-aggregation problem (Arrow 2012). In this research, the coherence between the collective ranking and the corresponding m rankings was considered as a selection criterion; in fact, the indicator \({b}_{k}^{\left(m\right)}\) allows to identify the most coherent aggregation model(s) in different practical situations.

In cases where a certain conflict between collective ranking and expert rankings is observed, decision makers can deepen the analysis, identifying those expert rankings that represent the main sources of incoherence. A possible in-depth analysis could be based on the calculation of the Spearman's rank correlation coefficientFootnote 1 (ρ) between the collective ranking and each of the expert rankings. Intuitively, the Spearman correlation between the rankings will be high (i.e., tending towards + 1) when objects have a similar rank between the two rankings, and low (i.e., tending towards −1) when objects have a dissimilar (or even opposed) rank between the two rankings. Of course, the rankings that will produce the highest incoherence are those with negative ρ values.

For the purpose of example, let us return to the case study; Table 5 reports the ρ values between the collective ranking related to the application of the BC model (in Table 3) and the corresponding expert rankings (in Table 1). The expert rankings most in contrast with the collective ranking are those by experts e1 and e2 (both with ρ values of − 0.632), followed by those by experts e6 and e7 (both with ρ values of − 0.316). It could be interesting for the engineering-design management to identify the reasons for the misalignment of these experts (e.g., different view or poor understanding of some design concepts, errors in the ranking formulation, etc.).

Table 5 Spearman’s ρ correlation coefficients with relevant p-values for the BC-model collective ranking (in Table 3) and each of the ten expert rankings in Table 1

Among the jungle of possible aggregation models existing in the scientific literature, this researcher has considered exclusively aggregation models characterised by simplicity and easy understanding. It is well known that simple and intuitive models are more easily “digested” and implemented by management than obscure and complicated ones (Franceschini et al. 2019). To quote a phrase by Leonardo da Vinci, “Simplicity is the ultimate sophistication”. In line with that, our analysis was limited to four simple, intuitive and popular aggregation models, showing that the traditional BC generally provides very coherent results. The authors believe that this is a relevant indication for the engineering-design management when selecting the most “promising” aggregation models for a certain ranking-aggregation problem.

The proposed study has several limitations, summarised in the following points:

  • Since the present analysis makes extensive use of the W(m+1) test, it “inherits” the limitations associated with it, i.e., (Franceschini and Maisano 2019a): (i) the test does not consider the (possible) uncertainty in expert rankings, and (ii) the test allows only an ex post (i.e., case-by-case) analysis of the impact of aggregation models.

  • The study revealed that the BC model often provides the best coherence. Over and above the merits of the BC model, this result is also due to the structural characteristics of the W(m+1) test; in fact, being based on the Kendall’s coefficient of concordance, this test is somehow related to the BC model (cf. Section 3.3) (Cook and Seiford 1978, 1982). This aspect in some ways limits the proposed coherence analysis: measuring coherence through another indicator would not necessarily lead to the same results. As an example, one could use Cronbach’s alpha (\({\alpha }_{{\varvec{C}}k}\)), applied respectively to the ranks related to the expert rankings and the collective ranking. A new indicator, similar to \({b}_{k}^{\left(m\right)}\), could then be defined as:

    $${c}_{k}^{\left(m\right)}=\frac{{\alpha }_{{\varvec{C}}k}^{\left(m+1\right)}}{{\alpha }_{{\varvec{C}}}^{\left(m\right)}},$$
    (13)

    being \({\alpha }_{{\varvec{C}}}^{\left(m\right)}\) and \({\alpha }_{{\varvec{C}}k}^{\left(m+1\right)}\) respectively the Cronbach’s alpha related to the m-expert rankings and the (m + 1) rankings, when adding the collective ranking obtained through the (k-th) aggregation model (cf. Sect. 3.1).

    However, the choice fell on W for many reasons: (i) it is specific for judgments expressed in the form of rankings and not in other forms, such as on cardinal scales (Hammond et al. 2015); (ii) the W-distributional properties are well known (Kendall 1962; Gibbons and Chakraborti 2010); (iii) it is intuitive and relatively easy to implement (Franceschini and Maisano 2019a).

  • The proposed analysis considers only complete expert rankings, where all objects are ranked through strict preference and/or indifference relationships only. Nevertheless, some practical contexts may make it difficult to formulate complete rankings, e.g., problems with many alternatives, where experts can face practical impediment or do not have the concentration to formulate complete rankings. In such cases, experts may prefer to formulate incomplete rankings, which include the most/least relevant objects only and/or deliberately exclude some other objects (Franceschini and Maisano 2019b, 2020).

4.2 Original contributions and ideas for future research

This paper analysed the coherence of alternative aggregation models, trying to answer the research question: “Which is the model producing the collective ranking that best reflects the expert rankings?”. The coherence between the m-expert rankings and the collective ranking (which is obtained through a certain aggregation model) was assessed using the W(m+1) test and the corresponding synthetic indicator \({b}_{k}^{\left(m\right)}\) (Franceschini and Maisano 2019a).

It was found that the BC model offers, with some exceptions, the best coherence. Precisely, when no ties appear among the objects of the collective ranking, it was analytically shown that the BC model maximizes both the indicators W(m+1) and \({b}_{k}^{\left(m\right)}\). Admitting ties, instead, the BC model’s collective ranking is not necessarily the best one, although it is generally close to it (cf. Appendix A.3).

The above result confirms the versatility and practicality of the BC model, which—in spite of some inevitable imperfections (Dym et al. 2002; Arrow 2012)—remains intuitive, easy to implement, computationally light and coherent as results.

Regarding the future, we plan to explore in greater depth the gap between (i) the collective ranking resulting from the BC model and (ii) the one maximizing coherence, based on a large number of tests and experimental simulations.