Introduction

Publishing an article is not an easy task. Authors associate the publishing process with peer review, which is often considered to be a source of frustration by scientists trying to share their work (Björk and Solomon 2013; Nguyen et al. 2015; Powell 2016). However, peer review is one of the most important mechanisms for ensuring the quality of published papers and despite the fact that it is not a perfect system (Alberts et al. 2008; Resnik et al. 2008), the scientific community still trusts it and believes it is necessary (Nicholas et al. 2015).

While peer review is the aspect of academic publishing that authors of articles are most exposed to, journal editors can paint a different picture and offer a broader perspective. Journals and editors must adapt to the ever-changing landscape of academic publishing. In particular, with the number of published manuscripts increasing every year (Ware and Mabe 2015), open access gaining in popularity (Laakso et al. 2011; Laakso and Björk 2012) and the advent of mega-journals (Björk 2015), it is becoming progressively more difficult to secure reviewers (Lajtha and Baveye 2010; Arns 2014; Merrill 2014), though the extent of this problem may vary from a field of study to a field of study and is yet to be fully determined (Breuning et al. 2015; Albert et al. 2016; Kovanis et al. 2016).

Peer review and the editorial process in general are also, in some ways, understudied (Squazzoni and Takács 2011). A large body of literature and research is available on the sociological/statistical aspects of these processes, but only recently complex systems scientists began to study this undoubtedly complex phenomenon using extensive numerical simulations and models known, for example, from interdisciplinary applications of physics. The problem is that data on the editorial process is not easily available. However, thanks to COST Action PEERE, we were able to form an interdisciplinary team of physicists and chemists, who are editors in the Journal of the Serbian Chemical Society (JSCS), and work closely with the journal on the analysis of submitted manuscripts and editorial practices. This collaboration resulted in four papers (Mrowinski et al. 2017, 2016; Ausloos et al. 2016b, a) that would not otherwise be possible.

When we were analysing the JSCS data, we noticed certain trends and phenomena (e.g. origin and volume of manuscripts, imbalance of genders) that are, in our opinion, indicative of the general state of peer review and academic publishing today. In this article, we want to share some of our findings that may be interesting to both authors and editors. At the same time, we want to offer a different perspective—not of a manuscript author, but of a journal editor.

JSCS is an international journal, in the sense that it receives submissions from many different countries. One of the goals of our analysis, considering the scope and impact factor of the journal (Sect. 2 contains more details), was to see whether any differences exist between local (i.e. Serbian) and external (i.e. non-Serbian) submissions. To this end, we often divide the sets of data under study into subsets corresponding to local and external authors, and compare the results.

Having access to such a unique dataset, with information about articles that were submitted but not published (which could not be obtained from publicly available databases like Web of Science or Scopus), we were able to study the country of origin of all submitted manuscripts and the number of manuscripts submitted by users from different countries (these results can be found in Sect. 5). We analysed articles that constitute the majority of all submissions—that is manuscripts that were rejected for technical reasons (results are presented in Sect. 6). Also, we studied the distribution of genders of JSCS users and differences in technical rejection and acceptance rates of manuscript submitted by male and female users (Sect. 7). The analysis of handling time, which is a very important factor for both authors of manuscripts and editors, can be found in Sect. 8. Section 9 contains information about the number of authors of submitted papers and the link between this number and the probability that a manuscript will be accepted. Finally, in Sect. 10, we analyse automatic classifiers of articles and the relevance of various features.

Journal of the Serbian chemical society

The journal was established in 1930 by the Chemical Society of the The Kingdom of Serbians, Croatians and Slovenians under the name of Glasnik Hemijskog Drustva Kraljevine Jugoslavije (The Journal of the Chemical Society of the Kingdom of Yugoslavia). When the first issue after the Second World War was published in 1947, the name was changed to Glasnik hemijskog društva Beograd (Journal of the Chemical Society of Belgrade). In 1984, the Serbian Chemical Society decided that starting from volume 50 for 1985, all papers would be published in English only and the name would be changed to the Journal of the Serbian Chemical Society (JSCS).

The journal received an Impact Factor (0.277) in 2000, which has been exhibiting a growing trend (0.828 in 2018 and 5 Year IF of 0.917). Every year users submit about 1700 papers to the journal, though many of these papers are resubmissions of the same article - the number of unique submissions is close to 300 (see Sect. 6 for details). About 140 manuscripts are published after peer review. Articles are available on-line, with about 2000 download (cumulative for all manuscripts) per year. The journal is open access without any article processing charge.

Dataset

The dataset we studied consisted of two related databases corresponding to the users registered in the online submission system and articles submitted by these users between March 2015 and July 2016. While there were 2221 users registered in total, not all of them submitted an article (or completed a submission, which means that some of the 2388 articles in the database were in fact partial, incomplete submissions). In the end, we decided to limit our analysis to 2089 fully completed submissions and 795 users who submitted them. It seemed strange to us at first that only about one third of users actually submitted a manuscript but we discovered that most of the accounts without submissions had bogus names (random letters or concatenations of words) and seemed to be either made by bots or by users for test purposes.

The editorial process in JSCS

In order to interpret the results presented in the subsequent sections, one must understand how the editorial process in JSCS works. The journey of an article from submission to publication or rejection can be divided into two major steps.

Firstly, all newly submitted manuscripts are checked by the technical editor, whose role is to ensure that authors followed the JSCS submission guidelines. Articles that were not prepared correctly can be rejected at this stage—returned to authors accompanied by a letter which lists all problems found in the paper. Such rejections for technical reasons are analysed in Sect. 6.

Articles that passed the technical check are assigned, according to their subject, to one of the main journal editors. The editors may desk-reject an article or begin the proper peer review process (which is single-blind) by sending invitations to potential reviewers. We will discuss the possible grounds for desk-rejection in Sect. 5, as they are closely tied to some of the interesting phenomena that we found. As for the peer review process itself (that is the problems associated with the selection of reviewers and handling of a manuscript), we direct all interested readers to (Mrowinski et al. 2016, 2017), where we described these issues in detail.

Country of origin

When users create their accounts, they can provide certain additional information, including their country. In the dataset, 607 out of 795 users who submitted a manuscript specified this particular information (though there is very little reason not to, as an affiliation must be a part of the submitted manuscript). For the rest, we were able to deduce their country using the location their accounts were accessed most often from. It was the only way we could fill up these blanks barring the manual inspection of manuscripts, which would be impossible for ethical reasons. While this method may not be foolproof, we believe it is sufficiently accurate—for 607 users with a known country, the location-based approach failed only in 30 cases.

Fig. 1
figure 1

Distribution of the country of origin for a all users, b all submitted manuscripts, c suitable manuscripts (not rejected for technical reasons) and d published manuscripts

When we analysed this combined geographical data, we found the results very surprising. While, considering current trends, we did expect that users from Asian countries would constitute a large portion of the entire userbase, our initial prediction was that Serbian users would be in the majority. However, that wasn’t the case. As can be seen in Fig. 1a, most of JSCS registered users are from India (163–21%) and Iran (158–20%). Serbian users are the third biggest group (93–12%), followed by China (64–8%) and Pakistan (44–6%). In total, there are 529 (67%) Asian users registered in the system who submitted a manuscript, which makes them an overwhelming majority.

These results do seem to reflect the fact that the number of publications from Asian countries is increasing every year at a high rate. According to UNESCO Science Report Towards 2030 (UNESCO 2015), the number of publications from China and Iran more than doubled between 2008 and 2014, while publications from India increased by 44%. Asian countries in total produced 72% more publications, while European countries 14% and North American 11%. Considering that chemistry, according to the same report, is the main topic, or one of the main topics, of Indian, Iranian and Chinese publications, a high number of submissions from these countries in JSCS is to be expected and, in fact, is confirmed by data.

Figure 1b shows the number of manuscripts submitted to JSCS from various countries—with one unavoidable simplification, that is we assumed that the country assigned to each paper is the same as that of the submitting user. Unsurprisingly, the proportions noticeable in the distribution of users are also clearly visible in the distribution of manuscripts. Most articles come from India (460 out of 2089—22%) and Iran (419—20%). Serbia still holds the third place with 214 (10%) papers and China is fourth with 179 (9%) papers. In total, users from Asian countries contributed 1430 articles (68%).

So far, these results seem consistent. There is a twist in this story, however, which manifests itself when articles rejected for technical reasons are removed from the pool. Their contribution is hardly insignificant, as we discovered, since actually most of the submitted articles are rejected on purely technical grounds. Out of 2089 submissions only 286 (that is 14%) passed the technical check and were assigned to editors. This phenomenon requires further explanation and analysis, which we will provide in the next section.

The distribution of countries for manuscript approved by the technical editor (see Fig. 1c) is in line with our original predictions. Most of the articles (75–26%) were submitted from Serbia, while Iran, India and China contributed 48 (17%), 30 (10%) and 28 (10%) papers respectively. Romania and Turkey submitted 19 articles each (7%). Among the published articles (that is in the subset of 110—out of 2089—articles that passed the technical check and were not desk-rejected by editors or by reviewers during peer review; see Fig. 1d) 55 Serbian submissions constitute 50% of all manuscripts. Also, there where 11 (10%) articles from Iran accepted for publication, 10 (9%) from China, 9 (8%) from Romania and 6 (5%) from India.

It means that out of the initial 75 local (Serbian) submissions not rejected for technical reasons, 73% were published in the journal after peer review. On the other hand, only 26% of external manuscripts were accepted for publication. This is a staggering difference, especially considering the sheer number of publications submitted to JSCS from other countries. In order to shed some light on this phenomenon, we asked the editors in JSCS about their experiences with external manuscripts. Additional discussion of acceptance rates is provided in Sect. 8, in which we study handling times of submissions.

When asked to estimate the number of external manuscripts they process, the editors responded (10 of them in total) that, on average, submissions from that region constitute 56% of all submissions. However, three editors in particular—the ones responsible for inorganic chemistry, organic chemistry and chemical engineering—made a notably higher estimate, ranging from 70% to 90%. The rejection rate of external papers was, on average, estimated by the editors to be 50% with 25% (out of these 50%) of papers ending up desk-rejected without peer review.

The reasons for desk-rejections are rather baffling. The editors often ask for additional information or corrections before sending a paper to peer review, however, in many cases external authors do not respond. Thus, such papers must be desk-rejected. As for manuscripts sent for peer review, most often they are rejected because of their insufficient scientific contribution, for being out of scope or because their results are identical or similar to existing work. In many cases conclusions presented in rejected papers are not derived from results or the results are misinterpreted. Also, sometimes authors don’t respond to reviews they receive, which must result in the rejection of their article. Generally, there are no problems when it comes to the communication with authors (assuming, of course, that they do respond), but it does happen sometimes that they do not understand the review reports, which leads to their dissatisfaction.

According to JSCS editors, many of the external articles seem to be authored by young scientists and are at the level of seminar papers or essays that are required as one of the preconditions needed to earn a scientific degree. After taking everything into consideration (especially the poor preparation of manuscripts not in accordance with JSCS guidelines) the editors conclude that many of the external papers submitted to the journal are resubmissions of articles that were rejected in other journals. It is possible that after receiving multiple rejections from peer-review journals similar to JSCS, authors of rejected manuscripts turn their attention to “predatory” journals. The market of such journals is blooming in the regions from which the majority of external submissions originate and many young scientists end up publishing their work in them (Beall 2012; Xia et al. 2014; Frandsen 2017).

Rejections for technical reasons and multiple resubmissions

Fig. 2
figure 2

Distributions of a the number of submitted manuscripts per user, b the number of resubmissions of each manuscripts, c the number of submissions of each manuscripts for local users, d the number of submissions of each manuscripts for external users, e the number of unique submitted manuscripts per user. Subpanels b, c and d contain separate data for suitable (not rejected for technical reasons) and unsuitable (rejected for technical reasons) submissions

The number of all submitted manuscripts—2089, more than one thousand submissions per year—seems rather high. However, as we will show, this number is heavily inflated and does not really reflect the actual number of submissions. What it does reflect, is the sheer volume of work that must be done by the technical editor.

In order to understand the source of this high number of submissions, we have to study the distribution of the number of submitted articles per user, which is shown in Fig. 2a. Many users submitted only one (300 out of 795, that is 38%), two (200—25%) or three manuscripts (115 - 14%), which is still a reasonable number given the time frame. However, 23% of users (180) submitted four or more papers, which does seem rather unlikely, and there were even some who submitted 15 or 22 manuscripts. A closer inspection of these submissions reveals that we are dealing not with unique articles but rather with multiple resubmissions of the same paper.

The technical editor inspects all newly submitted manuscripts to ensure that they were prepared according to the JSCS author guidelines. Some of the more common problems that can be found in submissions are associated with images (wrong format, insufficient resolution, lack of descriptions and titles), references (DOI is not included or references are not submitted using a separate form in the submission system), equations (which must be prepared using Microsoft Equation or MathType), designation of physical quantities and units (must be consistent with the IUPAC recommendations), and affiliations (incomplete information). Articles that were not prepared correctly are rejected for technical reasons (we will also call such articles unsuitable) and returned to authors accompanied by a letter which lists the problems that were found by the technical editor and led to the rejection. Authors, of course, can correct and resubmit the paper but these resubmissions count as separate submissions and constitute a large fraction of the 2089 total submissions.

Figure 2b shows the distribution of the number of resubmissions of each unique article (that is the number of times each unique article was submitted and analysed by the technical editor). In total, there were 1010 such articles, 522 (52%) of which were submitted only once, 225 (22%) twice, 121 (12%) thrice and 66 (7%) four times. Interestingly, there is little difference between the local (Fig. 2c) and external (Fig. 2d) submissions when it comes to the aggregate numbers. There were 124 (12%) unique local submissions, with 65 (52%) submitted only once, 37 (30%) twice, 13 (10%) thrice and 9 (7%) four times. Among the external papers, 457 (52%) out of 886 (88%) manuscript were submitted once, 188 (21%) twice, 108 (12%) thrice and 67 (6%) four times.

The differences between regions become more apparent when we separate the submissions that were rejected for technical reasons from the rest. Out of 1010 unique submissions, 279 passed the technical check. The discrepancy between this number and the one presented in the previous section (286, which is consistent with the number of articles that were assigned to editors) can be attributed to the way we find unique submissions (that is by grouping together manuscripts with the same title) in the database combined, most likely, with a human error, omission or peculiarities of the on-line system. Taking that into account, out of 75 unique local submissions that passed the technical check (which is 60% of 124 local unique submissions), 31 (41%) did not require any corrections, 27 (36%) were resubmitted once, 11 (15%) twice and 6 (8%) three times. As for the external submissions, 48 (24%) out of 204 (which is 23% of 886 unique external submissions) submissions did not require corrections, 64 (31%) were resubmitted once, 34 (17%) twice, 20 (10%) trice and 18 (9%) four times. It means that papers submitted by external users were almost three times less likely to pass the technical check and needed more resubmissions on average than local users before their papers were accepted by the technical editor.

The technical editor, when asked about his experience with these multiple resubmissions, said that when authors receive the rejection letter (for technical reasons) and decide to correct the manuscript, he estimates that only in 30% of cases they manage to do that at the first attempt (which is confirmed by the data). He believes that authors with rejected papers do not read the instructions carefully enough—neither before their first or subsequent attempts. Sometimes it is obvious they do not understand English well enough or have problems understanding certain technical aspects, like image resolutions or image formats. Illustrations are usually the most problematic part of submissions, as they can not be fixed by the editorial team.

Finally, we can look at the actual number of unique submissions per user, which is shown in Fig. 2e. Out of 795 users, 621 (78%) submitted just one manuscript, 140 (18%) two and 24 (3%) three, which seem like much more realistic numbers than the ones that we presented at the beginning of this section, without taking into account multiple resubmissions.

Gender balance

Fig. 3
figure 3

Distributions of genders (F—female, M—male, O— other, U—unspecified) for a all registered users, b all submitted manuscripts, c suitable manuscripts (not rejected for technical reasons) and d published manuscripts

For a long time science has been dominated by men. While there are programmes implemented to change that, remove prevailing biases and encourage women to pursue a career in academia, even today there are more male than female researchers and men tend to publish more than women (Lerback and Hanson 2017; Larivière et al. 2013). These gender differences vary from a field of study to a field of study (Elsevier 2017), with some (health and life sciences) having a larger fraction of female scientists than others (physical sciences). Chemistry is one of the subjects in which there are more male scientists and more publications are written by men. Using the JSCS dataset, we were able to test these gender imbalances.

Among the information that JSCS users may provide during registration, or when they edit their profile, is their gender. However, this particular information is not required. As such, users may choose one of four options: female, male, other or they may leave this field blank, which we will dub as unspecified. As such, it is important to note that one should be careful when interpreting numbers and rates provided in this section. The actual distribution of genders in the unspecified group, which does contain a significant fraction of all users, is not known and is not necessarily uniform. It is possible, for example, that this group contains a higher proportion of female authors, who may have a higher incentive to hide their gender.

Figure 3a shows the distribution of genders among JSCS registered users who submitted at least one manuscript. Out of 795 such users, 210 (26%) selected female as their gender, 405 (51%) selected male, 2 (< 1%) chose other, while 178 (22%) did not select anything. Unspecified category aside, it seems that there are roughly twice as many men as women, which is consistent with the general trend in chemistry.

Once again, though, a different picture emerges if users are divided into two groups: local and external. Among the 702 external users, 169 (24%) were female, 380 (54%) were male, 151 (22%) did not specify their gender and there were 2 other users. That is, for external users, there are twice as many men as there are women. The lack of gender balance among external users could be attributed not only to general trends in chemistry, but also to the social structure in the regions from which external submissions originate. For local users, the distribution is different, with 41 (44%) out of 93 female users, 25 (27%) male, 0 other and 27 (29%) unspecified. Thus, in this group, there are more women than men.

By assigning a gender to each article, corresponding to the gender of the submitting user, we can easily study the number of submitted articles by users belonging to each gender group and look for interesting imbalances. Figure 3b shows this distribution of genders for all submitted manuscripts (that is without filtering out numerous resubmissions). The distribution is, as one could expect, following the distribution of gender among users. Out of 2089 articles, 549 (26%) were submitted by female users, 1039 (50%) by male, 6 (< 1%) by other and 495 (24%) by unspecified. In the subset of 214 local manuscripts, 100 (47%) were submitted by female users, 50 (23%) by male, 64 (30%) by unspecified and 0 by other. For 1875 external papers, 449 (24%) were submitted by female users, 989 (53%) by male, 431 (23%) by unspecified and 6 (< 1%) by other.

The distribution of genders for articles that were not rejected for technical reasons (see Fig. 3c) is more interesting, as it isn’t entirely consistent with the distribution of genders of all users. Out of 286 manuscripts that passed the technical check, 95 (33%) come from female users, 116 (41%) from male, 1 (< 1%) from other and 74 (26%) from unspecified. These numbers are skewed towards men by external users, for which out of 211 papers 58 (27%) were submitted by female users, 97 (46%) by male, 1 by other and 55 (26%) by unspecified. On the other hand, there were almost twice as many female submissions as male submissions among the local users, with 37 (49%) female papers out of 75, 19 (25%) male, 0 other and 19 (25%) unspecified.

As for the articles accepted for publication (see Fig. 3d), out of 110 papers, 39 (35%) were submitted by female users, 36 (33%) by male, 35 (32%) by unspecified and none by other. This balance is yet again broken after dividing users into the local and external groups. Within the latter group, male users dominate with 23 (42%) out of 55 submitted manuscripts, followed by 13 (24%) female and 19 (35%) unspecified. In the former group, female users are in the majority with 26 (47%) submissions, followed by 13 (24%) male users and 16 (29%) unspecified.

All this data allow us to see if the acceptance rate differs between users belonging to different gender groups. When we look at all articles submitted to JSCS without filtering out the ones rejected for technical reasons, then the acceptance rate for female users is 7%, for male is 3% and 7% for unspecified. For the local users it translates to 26%, 26% and 25% respectively, and for the external users to 3%, 2% and 4%. After taking into account only the articles that passed the technical check, we get 41% for female, 31% for male and 47% for unspecified users. For local users, these adjusted acceptance rates are equal to 70%, 68% and 84%, while for the external users they are equal to 22%, 23%, 35%.

Handling time

Fig. 4
figure 4

Handling time a for all manuscripts (first bin is only shown partially, cf. the next subpanel), b for manuscripts rejected for technical reasons, c for published manuscripts of local users, d for published manuscripts of external users, e for rejected manuscripts of local users, and f for rejected manuscripts of external users

Handling time, that is the time between the submission of a manuscript and editorial decision, is a very important aspect of the peer review process for both authors and editors. Our two articles, (Mrowinski et al. 2016, 2017), were devoted to this issue - we studied peer review in JSCS as a dynamical process with handling time as our main focus point. These papers, however, were based on data provided by only one of the sub-editors of the journal, who had to collect all the information manually (it was before the journal switched to an automated on-line submission system). In this section we present some interesting results that we uncovered when we studied a more comprehensive sample from the entire submission database.

Figure 4a shows the distribution of handling time. The median is equal to 0 days (the mean is 11 days), but this result is highly distorted by articles rejected for technical reasons. Such articles are usually handled by the technical editor on the same day they were submitted (see Fig. 4b). It is much more informative to study manuscripts that passed the technical check and were either rejected or accepted for publication.

Figure 4c and d show the distribution of handling time for published articles for local and external submissions. As can be clearly seen, the review process for local submissions is faster than for the external ones. The median of handling time for local submissions is equal to 65 days, while for external submissions it is equal to 110 days. It means that submissions from external authors require almost twice as much handling time as local submissions. This trend is also present in the distribution of handling time for rejected manuscripts (Figure 4e and f). The median of handling time of rejected submissions for local authors is equal to 21 days, while for external authors it is equal to 41 days.

When analysing handling time, it is worth keeping in mind that journals usually set deadlines for both authors and reviewers. Peer review in JSCS can be divided into rounds. At the beginning of the process, reviewers get 30 days to complete their reviews. After receiving the reviews, authors have 60 days to revise their manuscripts. The revised versions are sent back to reviewers, who must check the revisions within 30 days. Multiple rounds of this revision-review cycle may be necessary if reviewers decide that changes made by authors are not satisfactory.

Another important factor is that peer review in JSCS is single-blind and scientists who agree to review articles for JSCS are in many cases Serbian (see Nedic and Dekanski (2015) for results of a survey that was conducted among sub-editors, reviewers and authors of articles submitted to JSCS). It means that reviewers move in the same circles as authors of local manuscripts and some of them are even members of the same society (that is the Serbian Chemical Society). As such, it is possible that local authors are more likely to submit manuscripts that are appropriately prepared and are of high quality because they do not wish to tarnish their reputation in the eyes of their colleagues—a sentiment which is most likely not shared by external authors. It is also possible that reviewers are more prone to quickly review manuscripts submitted by local authors. While we cannot verify the validity of these theories, our results are consistent with other studies—it has been shown, for example, that handling times are shorter when editors handle submissions of previous co-authors (Sarigöl et al. 2017). Also, studies suggest that single-blind peer review may lead to biases towards some groups of submitting authors (Seeber and Bacchelli 2017). Such biases could serve as an alternative explanation for the differences in handling times and acceptance rates of local and external submissions. However, to test such a hypothesis, one would require access to data from a double-blind peer reviewed journal similar in scope to JSCS.

Number of authors

Fig. 5
figure 5

Distribution of the number of authors for a all, b unique, c suitable (not rejected for technical reasons) and d published manuscripts

The last interesting phenomenon we want to mention is tied to the number of authors of submitted manuscripts. The distribution of the number of authors, for all manuscripts, is shown in Fig. 5a. As can be seen, there are a lot of manuscripts with just one author among external submissions—643 (34%) out of 1875. The distribution for local manuscripts is more uniform, with a maximum at 7 authors (36 out of 214, that is 24%). The median of the number of authors for local submissions is 5, while for external submissions the median is 2. The distribution for unique articles is very similar (see Fig. 5b).

What is more interesting is that this distribution changes significantly for articles that passed the technical check (see Fig. 5c). Out of 279 such articles, only 9% (25) had just one author, and the median of the number of authors equals 4. After dividing the data into local and external submissions, articles with one author constitute 4% (3 out of 75) of the former set and 11% (22 out of 204) of the latter set. The distribution becomes more sharply peaked at 7 for local submissions, with the median equal to 6, while the median of external submissions increases to 3. There are even less articles with one author among the published articles (see Fig. 5d), that is 5% (6 out of 110) with 2% (1 out of 55) for local (the median is 6) and 9% (5 out of 55) for the external submissions (the median is 3).

Ultimately, only 2% of unique articles with one author were published, 8% (12 out of 146) with two authors, 12% with three (21 out of 180), 8% (12 out of 147) with four and 18% (16 out of 91) with five. The median of the number of authors of published papers is equal to 5. While it could be tempting to explain the fact that articles with one author have a very low publication probability by assuming that manuscripts submitted by teams are prepared more carefully and the results they contain are of higher quality, the actual explanation is most likely more prosaic. Out of 87 papers that both passed the technical check (possibly after multiple resubmissions) and were initially submitted with just one author listed, 59 had more than one author listed after passing the technical check. It means that in many cases users do not fill metadata correctly when they submit manuscripts, which leads to rejections for technical reasons. The overall distribution of articles accepted for publication—and the fact that articles with just one author are in the minority—seems to be consistent with the fact that research is increasingly done in teams (Wuchty et al. 2007).

Classification of articles

Fig. 6
figure 6

a Fraction of correctly predicted suitable (not rejected for technical reasons) and unsuitable (rejected for technical reasons) manuscripts for all classifiers. All \(2^{16}\) classifiers are visible on the plot: each point corresponds to one classifier and points were made partially transparent to emphasize the distribution of classifiers; Red circles correspond to classifiers obtained using logistic regression for various cut-off values of probability; b highest values of Matthews correlation coefficient for subsets of classifiers that use subsets of features (O—country of origin, A— number of authors, G—gender)

Fig. 7
figure 7

a Fraction of correctly predicted published and rejected (after peer review) manuscripts for all classifiers (only articles that passed the technical check were used to evaluate the performance of classifiers). All \(2^{16}\) classifiers are visible on the plot: each point corresponds to one classifier and points were made partially transparent to emphasize the distribution of classifiers; Red circles correspond to classifiers obtained using logistic regression for various cut-off values of probability; b highest values of Matthews correlation coefficient for subsets of classifiers that use subsets of features (O—country of origin, A—number of authors, G— gender)

The data we studied show that there are differences in distributions of gender, country of origin and the number of authors between accepted and rejected articles. It could be interesting to see if these differences can be used to automatically classify articles - that is, to determine whether an article should be accepted/rejected based on the aforementioned features alone. Such a study could potentially bring additional insight into the importance of these features.

The results presented in Sect. 9 suggest that there are differences between articles with one author and with more than one author. As such, in order to simplify the problem, we decided to assume that this feature is binary. It means that the set of features we used for classification and their possible values are as follows:

  • origin—two values: local or external

  • number of authors—two values: one or greater than one

  • gender—four values: female, male, other, unspecified

There are only \(2 \cdot 2 \cdot 4 = 16\) combinations of these features and a classifier must assign each combination to a class (rejected/accepted in our case). While there are a lot of various algorithms that can be used to create classifiers, with such a limited number of possible combinations of features the number of all classifiers is also limited. There are \(2^{16} = 65536\) classifiers for this problem and no matter which algorithm is used, it must result in a classifier that belongs to this set. Thus, instead of relying on a specific algorithm, we studied the entire space of all possible classifiers. In order to quantitatively measure the quality of each classifier, we used the Matthews correlation coefficient (Matthews 1975):

$$\begin{aligned} MCC = \frac{TP\cdot TN-FP\cdot FN}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}} \end{aligned}$$
(10.1)

where TP is the number of true positives in the sample (that is accepted articles that were classified correctly), TN is the number of true negatives (rejected articles that were classified correctly), FP is the number of false positives (rejected articles that were classified incorrectly), and FN is the number of false negatives (accepted articles that were classified incorrectly).

Figure 6a shows fractions of correctly classified manuscripts for classifiers that determine whether a manuscript should be rejected for technical reasons (unsuitable) or sent to the handling editor (suitable). As can be seen, performance among all possible classifiers varies greatly, but in general higher rates of correctly classified accepted manuscripts result in lower rates of correctly classified rejected manuscripts, which should not be surprising.

The highest value of Matthews correlation coefficient (MCC) for all possible classifiers is 0.237 (see Fig. 6b), which is not high. When we look at the subsets of classifiers that use only one feature, then the highest value of MCC is 0.210 for origin, 0.196 for number of authors and 0.07 for gender. The highest value of MCC for classifiers that use both number of authors and origin is 0.237, which is equal to the MCC of classifiers incorporating all features. It suggests that origin and number of authors are important features that do influence the rejection and acceptance of manuscripts.

Figure 7a is analogous to Fig. 6a. It show fractions of articles that passed the technical check and were correctly classified as accepted for publication or as rejected after peer review. Once again, the performance of classifiers varies greatly. The highest values of MCC can be seen in Fig. 7b. Incorporating all features results in classifiers with the highest MCC equal to 0.436, which is higher that the corresponding value for classifiers dealing with rejections for technical reasons. Classifiers that use only one feature have MCC equal to 0.427 for origin, 0.108 for the number of authors, and 0.131 for gender. It suggests that country of origin is highly relevant in this case.

Table 1 The results of logistic regression—the dependent variable was whether an article should be rejected for technical reasons
Table 2 The results of logistic regression—the dependent variable was whether an article should be accepted for publication

In order to verify these conclusions using a specific classifier, we performed logistic regression with the following model:

$$\begin{aligned} \text {logit}(p) = \log \frac{p}{1 - p} = \beta _0 + \beta _1 G_m + \beta _2 G_o + \beta _3 G_u + \beta _4 O_l + \beta _5 N_o \end{aligned}$$
(10.2)

where \(\beta _i\) are coefficients. \(G_m\), \(G_o\), and \(G_u\) are equal to 1 only when an article was submitted by users with male (\(G_m\)), other (\(G_o\)) or unspecified gender (\(G_o\)). Otherwise, these parameters are equal to 0. \(O_l\) is equal to 1 for articles submitted by local authors and \(N_o\) is equal to 1 for articles with only one author. The results of logistic regression are presented in Table 1 (the dependent variable was whether the article should be rejected for technical reasons) and Table 2 (acceptance for publication; only articles that passed the technical check were analysed). The classifiers that result from logistic regression for different cut-off values of probability are marked with red circles in Figs. 6a and 7a. These results confirm that both origin and number of authors are significant factors for the probability of technical rejection, while only origin is significant for the probability of acceptance for publication.

Results presented in this section seem to confirm that there is a significant difference between local and external authors. However, one should be careful when analysing data in this way. The notion of “best classifier” is, in the end, subjective and so are the measures of the quality of classification. For example, one may be willing to use a classifier which classifies 80% of accepted manuscripts correctly and only 40% of rejected manuscripts correctly (there are such classifiers, as can be seen in Fig. 7b), but such a classifier is not necessarily characterised by the highest value of MCC and may use some features that would otherwise be deemed unimportant.

Conclusions

While there are many tales of frustration told by authors of manuscripts who, after submitting their articles, had to deal with long review (or desk rejection) times and subpar reviews, the data we presented in this paper offer a retelling of some of these stories from the perspective of journal editors. It is a sample from just one journal collected during a period slightly longer than one year, but we do believe it reflects the general trends that can be currently observed in scientific publishing and informal discussions with editors from other journals of similar scope confirm this assumption.

When we began analysing the data, we suspected that there may be a significant difference between local (in this case—Serbian) and external submissions. Our analysis, including the analysis of automatic classification of articles, confirms this prediction. The articles that are actually published in the journal are just a tip of a huge iceberg of all submitted manuscripts. The journal is flooded with external articles, the authors of which do not follow submission guidelines or provide all necessary information. Such articles end up rejected for technical reasons—the technical editor must check them all thoroughly and provide feedback to authors, which is a time consuming task. What is worst is that this task must be often repeated more than once, as many authors resubmit their manuscripts multiple times without making all necessary corrections. Local submissions are usually better prepared and more likely to pass the technical check.

According to JSCS editors, external articles that do pass the technical check are often rejected for being out of scope, insufficient scientific contribution or erroneous conclusions. Interestingly, some articles are desk rejected because authors do not respond to editors either after receiving a request for additional information or reviews of the manuscript. The editors believe that many external submissions are often resubmissions of articles rejected in other journals and are written by young, inexperienced scientists.

We showed that the peer review process for external submissions takes more time. Both in the case of rejected and accepted manuscripts, local authors receive the editorial decision twice as fast as external ones. We were also able to, at least partially, check whether some gender imbalances can be observed among JSCS users. While there are more female than male local users, there are more men than women in the external subset. In the former subset, articles submitted by women constitute the majority of manuscripts that were not rejected for technical reasons (this trend is also present in the set of published manuscripts). In the external subset, aforementioned relations are reversed. The acceptance rate (the probability that a submitted manuscript will be published) for both subsets combined is higher for women than men. However, one should be careful before drawing conclusions from these results and consider both the size of the sample and the significant fraction of users who did not declare their gender. Similarly, we showed that articles submitted with just one author listed are less likely to both pass the technical check and be published, but it is an indication that such articles have incomplete metadata.

The results of our study raise some interesting questions. Is it beneficial for all journals to open for external submissions? Maybe journals of certain scopes and sizes would benefit from allocating review time, which is—especially in the current peer review landscape—a very scarce and expensive resource, only to local submissions? And if so, what would be the right time for a journal that aspire to be international to open for external submissions? How does, in terms of the dynamics of the process, a journal turn from being a local journal (in which the majority of published manuscripts come from local authors) to an international one? These are all open questions that would require further research. What JSCS data shows is that this journal is open for external submissions and that they give all manuscripts an equal fighting chance, as many times as necessary. Whether it is a general trend among all similar journals, that we cannot say. We would like to believe, though, that it means scientific publishing, despite being overburdened, is still holding out.