Introduction

The Journal impact factor (JIF) has been introduced by Garfield (1955) partly based on previous work by Gross and Gross (1927) and initially intended to assist librarians for deciding which journals to purchase for their institution (Garfield 2006). While its calculation is simply the arithmetic mean (Pang 2019) by division of the number of citations a journal receives in a given year (numerator) by the number of papers which received these citations and are published by that journal in the two preceding years (denominator), the JIF is probably the most controversial metric. The main reason for that is due to its widespread use as a proxy of research quality or research performance in the assessment of individual authors, departments or academic institutions (Adler et al. 2009; Seglen 1989), see also (Opthof 1997; Opthof and Wilde 2009) for a further discussion on the use of citation data to evaluate research. As pointed out by several colleagues including concerns of his inventor [“used inappropriately as surrogates in evaluation exercises”, (Garfield 1996)], such usage is mostly inappropriate [e.g. Simons (2008), McKiernan et al. (2019), Casadevall and Fang (2014)] as the JIF of a journal does not predict the citedness of individual articles published in the respective journal [e.g. Adler et al. (2009), Opthof (1997), Opthof et al. (2004)]. Several initiatives have been instigated discussing the potentially harmful effects of such practices and providing recommendations for sensible and responsible use of metrics such as the JIF—most prominently the ‘San Francisco Declaration on Research Assessment’ (DORA),Footnote 1 the ‘Leiden Manifesto’ (Hicks et al. 2015), and ‘the Metric tide’ report (Wilsdon et al. 2015). Besides several other general lines of critique [see e.g. Seglen (1998), Larivière et al. 2016, Larivière and Sugimoto 2019; Casadevall and Fang 2014; Glänzel and Moed 2002)], the lack of correlation between JIF and article citedness is primarily based on the (highly) asymmetric distribution of citations to a journal’s published articles. In particular, this means that (1) the citations received by a journal are not equally distributed among the papers this journal has published, and that (2) the majority of papers in a given journal are cited infrequently compared to a few highly-cited ‘outliers’. About 30 years ago, Seglen (1989, 1992) has provided convincing evidence for this “skewness of science” in terms of citation distributions—recently confirmed by Zhang et al. (2017). Altogether, the resulting arithmetic mean is strongly influenced by a minority of highly-cited papers and does not adequately reflect the “average” citation rate a particular journal is characterised by Campbell (2008). Consequently, the median value of citations to a journal was suggested to be more appropriate to express the journal’s citedness [e.g. Editor(s) (2011), Opthof (2019), Pulverer (2013, 2015), Weale et al. (2004)].

As summarised in the discussion section, several previous studies investigated the distribution of citations using a variety of approaches for such descriptive analyses and rather heterogeneous selections of investigated journals or group of journals. The current study aimed to provide a comprehensive up-to-date analysis of the citation distribution characteristics (citation inequality, skewness) based on three independent cohorts of journals listed in the recent Journal Citation Report (JCR 2018): journals of two complete medical categories, i.e. ‘Medicine, Research & Experimental’ and ‘Medicine, General & Internal’ and the three best-ranking journals in each JCR category (further referred to as ‘Med-R&E’, ‘Med-G&I’, and ‘Top 3’, respectively). The two medical categories were chosen due to the perceived high relevance of the JIF and the discussion of its use and misuse especially in (bio-)medical sciences. The Top 3 cohort was included to provide an extension of the results to non-medical journals by analysis of the “best” three journals in all JCR categories. The claim of novelty and comprehensiveness of this study is based on the combined use of various previously reported approaches to describe and quantify the skewness of the citation distribution in a large dataset comprising a total of 982 journals. Besides the aim of investigating the prevalence of citation skewness in the two complete general medical journal categories, the third category was included to study whether the phenomenon of skewed citation distributions for individual journals or (small) cohorts of journals within a JCR category could be extended to all journals and subject categories currently indexed in the JCR and thus having been assigned a JIF.

Data and methods

Journal cohorts

Journal datasets analysed in the current study comprised two complete medical categories in the SCIE Edition of the 2018 Journal Citation Report (JCR)—‘Medicine, Research & Experimental’ and ‘Medicine, General & Internal’—as well as the three highest ranking journals (JIF-based ranking) from all categories of both 2018 JCR editions, i.e. the Science Citation Index Expanded (SCIE) and the Social Science Citation Index (SSCI). Table 1 lists the basic characteristics of these cohorts.

Table 1 Journal cohort characteristics

All journals in the respective cohort were included for further analysis except one journal in each of the two medical categories for which no JIF has been published in the 2018 JCR. The third cohort (Top-3) comprises journals from all SCIE and SSCI categories (n = 236 categories). In case individual categories are listed under the same name in both editions of the JCR (SCIE and SSCI), the list of journals was treated as if it was one category followed by selection of the top 3 journals. This was the case for the following seven categories: ‘Green & Sustainable Science & Technology’, ‘History & Philosophy of Science’, ‘Nursing’, ‘Psychiatry’, ‘Public, Environmental & Occupational Health’, ‘Rehabilitation’, ‘Substance Abuse’. Therefore, a total of n = 229 categories (236 minus 7) were included in this Top 3 cohort. In the category ‘Nursing’, four journals were included because the third rank was occupied by two journals with identical JIF values. Consequently, n = 688 journals were included in this cohort (Top-3): 229 categories × 3 journals = 687; plus one journal from the ‘Nursing’ category = 688. To ensure consistency, the three highest-ranking journals (based on JIF) in the two medical categories were also included in the Top 3 cohort. Therefore, this cohort includes all journals that occupy the first three rank positions in the respective JCR categories.

Data collection

Using the web interface of the Clarivate’s JCR (https://jcr.clarivate.com) via the authors’ institution’s subscription, the citation data for each journal were retrieved using the following procedure: in the ‘Browse by Journal’ option of the JCR, the list of journals was restricted to each particular category by ‘Select Categories’. Afterwards, each journal (either the complete list for the two medical categories or the top 3 journals based on JIF ranking) was opened in a new tab of the web browser providing each journal’s ‘Journal Profile’. The complete list of articles (citable items (CI)) was exported to a *.csv file as outlined in detail in Online Resource 1 including representative screenshots. Beside basic bibliographic information on each article (citable items in 2016 and 2017), these files contained the number of citations the CI have received in 2018. The data were retrieved in January 2020 covering the data of the 2018 JCR and reflect the content of the database at that point in time. For further analysis, the journal names and the citations for each CI per journal were compiled into a single Excel file for each of the three cohorts. The published JIF for the journals in the three cohorts was retrieved by downloading the latest “JCRs data” (“JCR SSCI 2018 Metrics” and “JCR SCI 2018 Metrics” published on Nov 8, 2019).

Descriptive analysis

The distribution of citations for each journal was assessed using several approaches: (i) Lorenz curves showing the cumulative percentage of citations a journal received in 2018 versus the cumulative percentage of articles (CI) in 2016 and 2017 these citations refer to, (ii) the percentage of CI with n = 0 citations, (iii) the percentage of CI achieving 50 or 90% of the journal’s total citations (50/90% cumulative citations threshold), (iv) the percentage of citations generated by the 50% most-cited CI per journal, (v) the percentage of CI with citations greater than the corresponding mean citation rate, (vi) the skewness of the distribution, and, (vii) the Gini coefficient. In brief, for (i), the raw citation data (sorted by citations in descending order) and the number of CI per journal were transformed to relative percentages using Microsoft Office Excel (v. 2016; Redmond, WA, USA) for the double-cumulative plots (Lorenz curves). The procedure is described in detail in Online Resource 2. For (ii)–(iii) and (v), the Excel function COUNTIF was used to determine the number of CI fulfilling the respective criterion. For (iv), the Excel function SUMIF was used in combination with the OFFSET function—the latter specifying the range of the most-cited 50% citable items. The skewness of the citation distribution (vi) was calculated based on the raw citation data per journal using the STATISTICS ON COLUMNS function of OriginPro 2020b (OriginLab Corp., Northampton, MA, USA) by the formula given in Eq. 1

$${\text{skewness}} = \frac{n}{{\left( {n - 1} \right)\left( {n - 2} \right)}}\mathop \sum \limits_{i = 1}^{n} \left( {\frac{{x_{i} - \bar{x}}}{sd}} \right)^{3}$$
(1)

with n is the number of values (x1, x2,…, xn), and sd is the standard deviation (OriginLab manual, https://www.originlab.com/doc/X-Function/ref/moments). For (vii), the Gini coefficient as a measure of inequality (De Maio 2007; Gini 2005) was calculated based on the double cumulative percentages (% citations versus %CI): for each journal’s Lorenz curve, the area (Aue) was calculated using the OriginPro INTEGRATE function (mathematical areas, i.e. the algebraic sum of trapezoids). The Gini coefficient (G) was subsequently calculated as shown in Eq. 2

$${\text{G}} = \frac{{{\text{A}}_{\text{ue}} - {\text{A}}_{\text{e}} }}{{{\text{A}}_{\text{e}} }}$$
(2)

with Aue is the area of the putative unequal distribution (i.e. journal’s citation distribution) and Ae is the area below the theoretical line of equality (45° line corresponding to an area of 5000 in the double cumulative plot (0–100% on each axis)). Division by Ae normalises the results yielding Gini coefficients in the range between 0 (completely equal distribution) and 1 (highest possible inequality). For a numerical example of this procedure, see Online Resource 2. Similar to the median citation rate, the mean citation rate was calculated manually from the raw data. As further outlined in the results section, the “mean citations” correspond to the published JIF but including only citations to “citable items” listed in the denominator.

Statistics and data visualisation

Basic calculations were performed in Microsoft Office Excel (v. 2016) and data were visualised using OriginPro 2020b and Corel Designer 2018 (Corel Corp., Ottawa, Canada). Box plots show data points (left) and median values (horizontal line), the 25–75% percentile (box, right) as well as the min–max values. Spearman correlation analysis was used to investigate the relationship between the mean citations and various measures of citation inequality considering p values < 0.05 (< 0.01) as (highly) statistically significant.

Results

Citation distributions

Figure 1a–c show the double cumulative plots (% citations versus %CI) for each journal in the categories Med-R&E (Fig. 1a), Med-G&I (Fig. 1b) and the Top 3 journals from all JCR categories (Fig. 1c). This demonstrates the skewness of the citation distribution since the Lorenz curves of all journals in the two medical categories significantly deviate from the theoretical 45° line of equality. Figure 1d shows the median percentage of CI receiving n = 0 citation (thus not contributing to the JIF) which is 24.2% and 43.6% of papers for the medical cohorts Med-R&E and Med-G&I, respectively. Noticeably, in these JCR categories, several journals have published up to 87% (Med-R&E) and 96% (Med-G&I) citable items that have never been cited within the JIF window. For the Top 3 cohort, these values are considerably lower, i.e. a median of 9.0% uncited items and the journal with highest proportion of uncited publications includes 66% uncited items. In this cohort, i.e. the top 3 journals from all JCR categories based on JIF ranking, the majority of journals have < 20% CI with n = 0 citations.

Fig. 1
figure 1

Citation distributions. ac Lorenz curves showing the cumulative % of citations versus the cumulative % of CI for the three journal cohorts Med-R&E, Med-G&I and Top 3, respectively. Horizontal dashed red lines indicate the 50 or 90% cumulative citations threshold. The vertical dashed line indicates the 50% best-cited CI threshold. d %CI with n = 0 citations. e/f %CI contributing ≥ 50%/≥ 90% citations to the journal. g % citations generated by 50% of the best-cited papers. The box plots in dg show the median (horizontal line), the 25–75% interquartile range, and maximum/minimum values. Abbreviations: CI = citable items, Med-G&I = Medicine, General & Internal, Med-R&E = Medicine, Research & Experimental, Top 3 = the three highest-ranking journals (JIF-based ranking) from all Journal Citation Reports categories

The inequality of the citation distributions is further illustrated by the %CI published in a journal required for generation of ≥ 50% (Fig. 1e) or ≥ 90% of its total citations (Fig. 1f; in Fig. 1a–c, these thresholds are indicated by dashed horizontal lines). For the medical JRC categories Med-R&E and Med-G&I, a median of 15.2% (52.4%) and 13.4% (43.9%) citable items are needed for generation of ≥ 50% (≥ 90%) of the total citations to the journal, respectively. In case of the Top 3 cohort, a median of 18.3% (59.6%) of citable items generate ≥ 50% (≥ 90%) of the journal’s total citations. For all three cohorts, several journals show a one-digit % range of citable items which contribute ≥ 50% of all citations. Maximum values are about 30% for CI contributing ≥ 50% of citations and about 70% for CI contributing ≥ 90% of citations in the three journal cohorts. The majority of journals groups around the median value of %CI required for ≥ 50% or ≥ 90% of citations (Fig. 1e, f). A different approach calculates the % of total citations generated by the most cited 50% of citable items (Fig. 1g): here, a median of 88.6/94.7/84.3% of all citations were received by the most cited 50% of papers for the three categories, respectively. In the cohorts Med-R&E, Med-G&I and Top-3, fifteen (11.1%), 62 (39.0%) and nine (1.3%) journals even generate all citations (100%) with their 50% most-cited articles, respectively.

Measures of inequality

The numerator of the published (‘official’) JIF might differ from the actual sum of citations generated only by the CI in the JIF window. This discrepancy can be explained by unmatched citations (Larivière et al. 2016) and the known asymmetry between the JIF’s numerator and denominator [e.g. (Glänzel and Moed 2002)], i.e. a journal can acquire citations to (‘non-citable’) content that is, however, not counted (classified as CI) in the denominator. As shown in Online Resource 3, this is true for almost all journals in the cohorts investigated in this study: except for six and three journals in the categories Med-R&E and Med-G&I, respectively, where the difference between the JIF numerator and the sum of citations to CI is zero. For all other journals, the JIF numerator counts more citations than those received by publications classified as citable items. The median % reduction when using the actual sum of citations instead of the JIF numerator is − 5.3%/− 10.6%/− 5.9% for the three cohorts, i.e. Med-R&E, Med-G&I and Top 3, respectively. While most journals show a % reduction in that order of magnitude, extreme examples in the three cohorts are journals having 28.8%/47.8%/80.5% fewer citations and, consequently, also fewer mean citations than indicated by JIF numerator and the published JIF, respectively (see Online resource 3d).

Therefore, for subsequent analyses, not the published JIF from the JCR database was used, instead this indicator was calculated manually as the arithmetic mean of the raw citation data for each citable item (CI) per journal. Thus, only citations unambiguously matched to citable items were used for calculating mean citation values in the current study in line with what had been previously argued (Larivière et al. 2016). This approach was chosen since the median value (for citations per journal) is not included in the downloadable version of the JCR: therefore, in order to ensure consistency and comparability between the measures of central tendency, both mean and median citations were calculated manually for all subsequent analyses. Consequently, the unequal distribution of citations to the CI per journal can further be characterised by how many citable items (%CI) per journal actually receive citations that are equal to or higher than the journal’s arithmetic citation mean (corresponding to the JIF, but without citations to non-citable items): as shown in Fig. 2a, only a median of 33.6%/33.7%/34.7% citable items receive at least as many citations as the mean number of citations in the three cohorts, respectively.

Fig. 2
figure 2

Measures of inequality of citation distributions. a % CI with citations equal or greater than the JIF. b, c Skewness and Gini coefficients of the citations per journal, respectively. The box plots show the median (horizontal line), the 25–75% interquartile range, and maximum/minimum values. Abbreviations: CI = citable items, Med-G&I = Medicine, General & Internal, Med-R&E = Medicine, Research & Experimental, Top 3 = the three highest ranking journals (JIF-based ranking) from all Journal Citation Reports categories

Notably, only one journal in Med-R&E, two journals in Med-G&I and five journals in the Top 3 cohort comprise equal or more than 50% of CI that receive citations equal or more than the mean citation for the respective journal—in case of a symmetric, equal distribution of citations, half of each journal’s CI (50%) should receive citations equal or higher than the journal’s JIF. It seems worth mentioning that using the published (‘official’) JIF for this calculation (i.e., %CI with citations ≥ JIF) would result in even smaller numbers of journals.

The asymmetry of a distribution can be additionally quantified using the skewness: in case the right tail is longer (right-skewed, left-leaning distribution), a positive skew is expected (Henderson 2006). For the current data set, this is true for all journals as shown in Fig. 2b: the min–max skewness values are 0.9–13.2/1.1–11.0/0.1–22.5 for the Med-R&E, Med-G&I and Top 3 cohorts, respectively. The median skewness for the three groups of journals is rather similar with 2.9/2.6/2.8, respectively.

Finally, the Gini coefficient as a widely used measure of inequality in economic studies has been calculated for the journals’ citations in the three cohorts. This measure is defined as the difference between the area under the curve of the theoretical equal distribution (45° straight line in the Lorenz curve plots; compare Fig. 1a-c and Online Resource 2) and the area below the actual distribution of the variable of interest (e.g. income across the population). While a Gini coefficient of zero indicates complete equality, a value of 1 indicates an entirely unequal distribution of a value (De Maio 2007)—that would be a single CI receiving all citations to a journal. For the three cohorts in the current study (Fig. 2c), the Gini coefficients’ medians are 0.58/0.64/0.51 ranging between min–max values of 0.40–0.89/0.36–0.97/0.35–0.89, respectively. In all cohorts, extreme examples of journals have Gini coefficients of up to 0.9 or higher (Fig. 2c).

The relation of the mean citation values and the above-mentioned measures of unequal citation distribution is graphically shown in Online Resource 4 and summarised in Table 2.

Table 2 Correlation of mean citations versus measures of citations’ inequality

While there is a heterogeneously strong and significant positive correlation between mean citations and the %CI generating ≥ 50 or ≥ 90% of total citations in all three cohorts, its association with %CI with n = 0 citations is negative and highly significant in all cohorts. The relationship between mean citations and the percentage of CI reaching equal or more citations than the journal’s mean citation is—although significant in the Top 3 cohort—rather weak (see also Online Resource 4d). The same interpretation applies to the association of mean citations and the skewness of the citation distribution. A moderate and significant correlation can be observed for the mean citations and the Gini coefficient: here, the journals’ mean citation values are inversely proportional to the Gini coefficient.

Taken together, several mathematical approaches confirm a general, unequal and skewed distribution of citations for all journals analysed.

Mean versus median

Given the asymmetric distribution of the citations per journal, the median seems to be more appropriate to represent the central tendency. Therefore, median citations were calculated for each journal and compared to mean-based citation rates for the journals of the three cohorts in this study (Fig. 3a-c): for the vast majority of journals, the median is considerably lower than the mean value—with very few exceptions where the opposite is the case (n = 1, 1, 4 journals for the Med-R&E, Med-G&I and Top 3 cohorts, respectively). For some journals (n = 4, 1, 15 for the Med-R&E, Med-G&I and Top 3 cohorts, respectively) calculation of the median citation gives a non-integer value (*.5): these are cases of an even number of CI where the arithmetic mean of the two central values is non-integer.

Fig. 3
figure 3

Mean versus Median citations (I). ac Mean (left) versus median citations (right) is shown for the three journal cohorts Med-R&E, Med-G&I and Top 3, respectively. For better readability, the data are split into two diagrams using a JIF threshold of 10 or 25 for the two medical and the Top 3 cohort of journals, respectively. d % reduction of the numerical value if median instead of mean citations are used for each journal. e Correlation of the median and mean citation values per journal. The box plots show the median (horizontal line), the 25–75% interquartile range, and maximum/minimum values. Abbreviations: Med-G&I = Medicine, General & Internal, Med-R&E = Medicine, Research & Experimental, Top 3 = the three highest ranking journals (JIF-based ranking) from all Journal Citation Reports categories

As summarised in Fig. 3d, the numerical values of the mean citations are reduced by a median of minus 36.3/50.1/31.4% when median values are used instead for characterising the three cohorts, respectively. Furthermore, the median citation rate drops to zero for n = 15/62/9 journals (11.1/39.0/1.3% of journals) in the three cohorts, respectively. By definition of the median, these values correspond to the percentages of journals generating all of their citations with 50% (or less) of their most-cited papers as shown in Fig. 1g. The correlation between mean and median citations shown in Fig. 3e furthermore demonstrate an increasing difference in absolute numbers between these variables for journals with higher mean citation values (i.e. ‘top journals’ in terms of mean citation rates).

The median of citations each journal receives in the JIF window was further used to rank the journals in the two medical JRC categories: Fig. 4a (Med-R&E) and Fig. 4b (Med-G&I) compares the ranking of journals if their position was determined by the mean citations (left) or median citations (right). Using the median as the ranking criterion would reduce the number of rank positions to 13 with 1–3 instances with a non-integer (*.5) median citation increment. Especially ranking positions with a median citation rate of 0–4 are occupied by several journals.

Fig. 4
figure 4

Mean versus Median citations (II). a, b mean-based versus median-based ranking of journals in the medical categories Med-R&E and Med-G&I, respectively. The number of journals in the median-based ranking is coded by the area of the circle. Abbreviations: Med-G&I = Medicine, General & Internal, Med-R&E = Medicine, Research & Experimental

Discussion

This study investigated the citation distribution and its quantitative characteristics of three cohorts of journals: two complete JCR categories including journals with a general medical focus (‘Research & Experimental’, and ‘General & Internal’) as well as the three top-ranked journals from all categories of the JCR as a comparison. The analyses presented in this paper confirm a highly skewed distribution of citations per journal for all investigated cohorts—quantified by several measures to approach this phenomenon from different angles.

Table 3 provides an overview of previous studies reporting quantitative characteristics of citation distributions on a journal level [for citation skew on the article level and basically corresponding results, see e.g. (Albarrán et al. 2011; Bornmann and Leydesdorff 2017)]. For the first indicator, i.e. the percentage of citable items receiving less citations than indicated by the JIF, the present study obtained about 65% of CI for all three journal cohorts (corresponding to about 35% CI receiving more citation than the mean)—similar to previously reported values (Asaad et al. 2019; Larivière et al. 2016; Larivière and Sugimoto 2019). A symmetric distribution would result in about 50% CI below/above the mean (JIF) thus demonstrating the JIF to overestimate the real ‘average’ citation rate in virtually all journals. In contrast to the other characteristics of citation distribution which are similar across the cohorts of the current study, the percentage of CI receiving zero citations is rather heterogeneous: while the Top 3 journals from all JCR categories comprise only a median of 9% CI without citations in the JIF window, in the Med-R&E and Med-G&I categories, median 24% and 44% of papers have not been cited, respectively. Although further detailed analyses are need to prove this assumption, this difference could be explained by the Matthews-effect (Larivière and Gingras 2010), i.e. that the relatively high JIF of the Top 3 journals within each category tends to attract citations to all published items in these journals thus reducing the proportion of non-cited articles. As shown in Table 3, other studies reported values in the range between < 20 and 70% of CI (Weale et al. 2004; Asaad et al. 2019; Opthof et al. 2004; Lustosa et al. 2012; Bozzo et al. 2017). Weale and colleagues (Weale et al. 2004) discussed the percentage of articles per journal receiving no citations (rate of non-citation) as a possible alternative for measuring journal quality due to their observation that high-JIF journal have lower rates of non-citations. The main advantage could be that this approach might reduce the “temptation to use a journal’s ranking to judge individual articles” (Weale et al. 2004). The present study confirms their findings, i.e. the negative correlation between non-citation rates and the journals’ JIF is strong and highly significant: the Spearman rank correlation coefficient is almost identical with − 0.9581/− 0.9830/− 0.8607 for the three cohorts in the present study compared to − 0.854 and − 0.924 in (Weale et al. 2004). Additionally, the current analysis clearly shows that below a threshold of e.g. JIF = 10 (~ 25 in the Top 3 cohort), the rate of non-citation dramatically increases (see Online resource 3-c), i.e. the rate of being not cited in these journals can readily be 50%. Therefore, especially worth considering in the lower segment of JIF-ranked journals, the general infeasibility of the JIF to predict individual article performance is underscored by the considerable high fraction of non-cited papers.

Table 3 Literature review—summary of studies reporting quantitative characteristics of citation distributions on a journal level

Although the summary of in Table 3 does not rise the claim for completeness and the included studies are rather heterogeneous with respect to the numbers of journals analysed (and how they were selected), the percentage of citable items contributing 50/90% to the total of the journals’ citations is very similar across previous studies and our current results: between 15 and 25% of papers published in any journal analysed receive 50% of the citations to that journal.

This is further illustrated in Fig. 5: studies comprising larger sets of journals report between 12–20% of citable items per journal being responsible for 50% of all citations. Similarly homogeneous are the results on % citations received by the most frequently cited 50% of citable items: in published studies including our current results, usually 85–90% of all citations to a journal are generated by the best-cited 50% CI within that journal. It seems altogether that these indicators (%CI required for 50% citation and % citations generated by the best-cited 50% of papers) are quite robust and show no obvious dependency on the selection and number of journals and the date of analysis.

Fig. 5
figure 5

Summary of previously published and current results on the %CI contributing ≥ 50% citations. The number of journals investigated in the individual studies is coded by the area of the circle. The label refers to the following studies: Ref 1 = (Seglen 1992), Ref 2 = (Opthof and Coronel 2002), Ref 3 = (Opthof et al. 2004), Ref 4 = (Weale et al. 2004), Ref 5 = (Falagas et al. 2010), Ref 6 = (Editor(s) 2011), Ref 7 = (Larivière et al. 2016), Ref 8 = (Zhang et al. 2017), Ref 9 = (Pang 2019), Ref 10 = (Asaad et al. 2019), Ref 11 = the current study (red = Medicine, Research & Experimental, blue = Medicine, General & Internal, black = the three highest ranking journals (JIF-based ranking) from all Journal Citation Reports categories). Abbreviations: CI citable items

Correlation analyses as summarized in Table 2 and Online Resource 4 reveal a relationship between the various measures of unequal citation distribution with the JIF (mean citations): for all three investigated cohorts, journals with high JIFs have significantly i) higher percentages of CI contributing 50/90% of total citations, ii) lower numbers of non-cited CI, and iii) lower Gini coefficients. This could be interpreted as an indication that for high-JIF journals, the unequal distribution of citations is less pronounced than in low-JIF journals and the associated measures of inequality tend to be smaller. In further consequence, one could argue that the problem is less important for such high-JIF journals and the JIF would, therefore, adequately reflect those journals’ high quality and impact. This argument supports the previously expressed opinion whether “in each speciality the best journals are those (…) that have a high impact factor” and “the use of the impact factor as a measure of quality is widespread because it fits well with the opinion we have in each field of the best journals in our speciality” (Garfield 2006; Hoeffel 1998). While it is often indeed the high(est)-JIF journals where it is most difficult to have a manuscript accepted (Hoeffel 1998)—partly due to editorial policies aiming at attracting the most citable (trendiest, mainstream) articles (Falagas and Alexiou 2008; Taylor et al. 2008)—this argument, however, can be turned by saying “if you are a mature and active scholar in your field, you do not need the JIF (or any other metric) to know which journals are the best” (Browman and Stergiou 2008). In line with this, a reliable, robust and more meaningful metric of journal quality would be most important in the lower-JIF segment of (less well known, less prestigious) journals, as for those citation inequality is most pronounced. Due to the inability of the JIF to adequately represent the ‘average’ citation a journal receives (especially for lower JIF journals with highly skewed citation distribution), any JIF-based ranking (including three decimal digits) in this segment is quite meaningless. As recently shown (Koelblinger et al. 2019), journals publishing small numbers of papers show more pronounced JIF changes over time. Therefore, in addition to general citation skewness, in case of lower volume journals, the sample size (number of CI per journals) the JIF is based on further questions the significance of the JIF and the relevance of its temporal dynamics.

Due to the known skewness of citations, the median has been suggested as more appropriate to represent the central tendency of the citations distribution (e.g. (Editor(s) 2011; Opthof 2019; Pulverer 2013, 2015; Weale et al. 2004)). The current results indicate that this measure is considerably smaller than the mean (40–50% reduction of the numerical value) thus indicating the bias of the latter by highly-cited articles as reported previously (Colquhoun 2003; Larivière et al. 2016; Pulverer 2015; Seglen 1992). Additionally, when journals are ranked by median instead of mean (JIF), up to 40% of journals in the JCR category Med-G&I drop to a value of zero. These results are in line with Bozzo et al. who showed that for 74 orthopaedic journals, the median number of citations is zero for the majority of journals, i.e. 67 journals (90.5%) (Bozzo et al. 2017). Furthermore, the number of rankings is reduced to 13 instead of 135 or 159 rank positions for the categories Med-R&E and Med-G&I, respectively—resulting in individual journals at high median-based ranking positions and lower ranking positions comprising rather large numbers of journals. Similar observations were reported by (Pang 2019) who identified three broad groups which sufficiently rank veterinary journals and are separated by a median difference of one. As discussed by the creator of the JIF, E. Garfield (Garfield 2006), “The precision of impact factors is questionable, but reporting to 3 decimal places reduces the number of journals with the identical impact rank.” Due to the small differences in JIF, “however, it matters very little whether, for example, the impact of JAMA is quoted as 24.8 rather than 24.831.” (Garfield 2006). The current data support this statement: the median, supposed to better characterise journals in terms of an ‘average’ citation rate, results in a value of one or zero for virtually all journals in the Med-R&E and Med-G&I categories which have a JIF of 1–2 or below 1, respectively. The present study clearly confirms previous data showing that mean citations (JIF) are not suitable to represent the citation rate of a journal. While median citations are more appropriate for skewed citation distributions, their power to separate and rank journal in the lower segment of citation counts is quite limited.Footnote 2 Since any measure of central tendency captures only a small part of the information (Adler et al. 2009), it must be supplemented by information on the underlying distribution to adequately reflect the data; such information could be (1) the measures of inequality used here and in previous studies or, probably more familiar for most researchers, (2) e.g. interquartile ranges of citations. Altogether, as discussed by Adler at al. (Adler et al. 2009), in the current “culture of numbers” we should be aware of the illusory accuracy and seductive precision of crude statistics such as the JIF.

One limitation of the current study might be that instead of ‘official’ JIF values, manually calculated arithmetic mean values were used for those analyses involving the journals’ JIF. As mentioned above, this approach was chosen in order to ensure comparability with the manually calculated median values. However, with this procedure, the already previously noted problem (e.g. (Glänzel and Moed 2002; Larivière et al. 2016)) with traceability of the JIF calculation became obvious (Online Resource 3): while most journals in all three investigated cohorts experience an approximately 5–10% value reduction by manual calculation (i.e. sum of citations to citable items as provided in the JCR database), extreme examples of journals show up to − 30/− 50/− 80% lower values than the official JIFs in the cohorts Med-R&E, Med-G&I and Top 3, respectively. Interestingly, while all n = 982 journals in the current study’s cohorts showed lower mean citation values than their JIFs due to higher numerators in the JIF equation compared to actual citations referring to CI only, previous data (Pang 2019) also report (minor) positive numerical changes when means were used instead of the JIF for individual journals. In contrast, for a selection of 30 most-cited cardiovascular journals, Opthof T. (Opthof 2019) demonstrated that the JIF of these journals is 15.0 ± 3.5% (mean ± s.e.m.) higher that a mean citation value considering citations to citable items in the numerator only. Irrespective of such contrasting results, the underlying problem consists in the JIF numerator containing citations for which no corresponding citable item is counted in the denominator (Larivière et al. 2016; Opthof 2019; Rossner et al. 2007). Although the classification of citable items in the JIF’s denominator was claimed to be “accurate and consistent” by employees of the previous owner of the JCR database (McVeigh and Mann 2009), current and previous results (e.g. (Opthof 2019)) demonstrate a considerable effect of citations to non-citable items. This asymmetry provides room for negotiations between journals and the publisher of the JIF on whether which items should be counted towards the denominator (Editor(s) 2006; Rossner et al. 2007). In addition to the limited replicability of the JIF, the whole process of calculating JIFs and rating sciences by ‘journal impact’ has, therefore, been called “unscientific and arbitrary” and “unscientific, subjective, and secretive” (Editor(s) 2006) and “reflects the lack of transparency surrounding items included in the calculation, in contrast to the standards expected of published research” (Pang 2019). Noteworthy, the current study’s results on citation inequality would be even more pronounced if the official, published JIF would have been used instead of the mean citations to citable items (used in this study).

Another limitation of the present study is the use of the Top 3 cohort of journals. While this cohort was intended to serve as comparison demonstrating the generality of the statements concerning citation inequality, such analyses could be expanded to larger and more representative journal selections in future studies covering more than the “three best journals” per category.

Conclusions

Referring to the initial question of the article’s title, yes, it does matter: by analysing a large sample of medical and a selection of non-medical journals, the present analysis confirms a considerable inequality in citation distribution as illustrated by several quantitative measures—thus, disqualifying the JIF to adequately represent a journal’s or a paper’s citedness. While replacing the JIF (mean) by median citations does not necessarily overthrow journal rankings in the upper JIF segment, for lower-JIF journals, however, the 3-decimal-digit precision of JIF-based rankings is obviously meaningless when individual journals’, articles’ (or authors’) citedness should be inferred. In further consequence, the current results provide additional up-to-date evidence for why assessing scientific quality or performance of individual authors or institutions by the JIF must be considered an inappropriate and mostly meaningless (mis-)use of this metric (Casadevall and Fang 2014; Garfield 1996; McKiernan et al. 2019; Simons 2008). In that context, instead of falling into the trap of a simple number pretending to represent a complex endeavour, multiple criteria need to be applied since science and research themselves have multiple dimensions and goals (Adler et al. 2009).