Introduction

Since the Budapest Open Access Initiative (2002), the Berlin Declaration on Open Access (2003), and the Bethesda Statement (2003)Footnote 1, free access to digital publications has become a major topic in science policy. Many funding organizations are now mandating open access (OA) to publications which derive from research projects that received financial support.Footnote 2 In addition, a number of large-scale OA ‘transformative’ or ‘disruptive’ initiatives have entered the scene in a number of countries and aim to turn large parts of the publication landscape of science OA.Footnote 3 However, electronic publication providing free access to research via the internet is more than a decade older than the declarations and nearly three decades older than the transformative initiatives, was invented in the 1990s and driven by parts of the scientific community (Ginsparg 1994, 2011). At that time, new ways of disseminating scientific information evolved in some disciplines and fields and still serve as models and reference points for the development of publishing within science at large.

Regarding the adoption of OA, the role of science policy, on the one hand, and the role of the publication culture of different disciplines, on the other, are today only partly understood. It is out of question that important aspects of the development towards OA publishing have already been studied. These include the extent of the adoption of open access publishing (e.g., Gargouri et al. 2012; Archambault et al. 2014; Crawford 2015; Wohlgemuth et al. 2017; Piwowar et al. 2018; Martín-Martín et al. 2018; Abediyarandi and Mayr 2019; Huang et al. 2020; Hobert et al. 2020), the attitudes towards open access (e.g., Creaser et al. 2010; Kim 2011), possible citation advantages of OA publications over non-OA publications (e.g., Lawrence 2001; Kurtz et al. 2005; Harnad and Brody 2004; Archambault et al. 2016), as well as the structure of the publication market and journal prices (e.g., Dewatripont et al. 2006; Ware and Mabe 2015; Larivière et al. 2015). What is missing with few exceptions (Gunnarsdóttir 2005) are studies that aim to draw a more contextualized picture and analyze the evolvement of publication models in the context of disciplinary cultures. In particular, the question under what circumstances an OA publication model succeeds and is being adopted by a scientific community so far remains unanswered.

The goal of this article is to develop a perspective that allows answering this question. Instead of making claims or assumptions about the role of an OA publication infrastructure it is asked what scientists actually do with an OA publication infrastructure and how and for what purpose they use it. The focus here is on green OA, i.e., open access provided by an institutional or subject repository or a website, where authors self-archive their manuscripts.Footnote 4 We are interested in the question of how both the authors and readers are included in the communication system of their disciplines by the use of self-archived manuscripts, and inquire about the main characteristics of such an inclusion, possible problems that result from it and how they are being solved. Therefore, the routines of actions in which repositories are being mobilized by authors and readers are reconstructed. The overall question for the conditions of a stabilization of green OA is explained by the presence of complementary routines of authors and readers.

The article is organized as follows: It starts with a clarification of the term open access and develops a heuristic model that helps to understand how digital infrastructures—like repositories for publications—are embedded into two social contexts and how they support scientists in achieving their aims. The second step sketches the methodological design of the study on OA in astronomy and mathematics from which the results derive. The third step presents the empirical results and focuses first on the authors, his or her motives for self-archiving and the point in time in which papers are made freely available online. Since parts of the manuscripts on repositories are being self-archived not only before publication but also before peer review is completed, it is asked in a fourth step whether such an early point in time causes any problems for readers in terms of trustworthiness of the reported results. Moreover, the routines that respond to the problem are investigated. The article concludes with some remarks about the complementarity of authors’ and readers’ routines and the characteristics of the inclusion in the communication system of science (fifth step).

Open Access from the Perspective of Sociology of Science

With the advent of the internet in the late 1980s and early 1990s, new means of digital dissemination were developed, making research freely available online. For this type of publishing the term open access was coined many years later (Suber 2002). Following the text of the old but still very influential Budapest Open Access Initiative (BOAI), the term is defined as follows:

By ‘open access’ to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited. (BOAI 2002)

This definition mentions four aspects that are constitutive for OA. First, the publication has to be in an electronic format. This allows the (re-)production of as many copies as required and is therefore the precondition for the absence of rivalry in consumption. The second characteristic is that it is freely available on the public internet and that there are no technical or legal barriers. This means that anyone who has access to the internet can also access the publication. Third, the term refers to scientific research but not to other kinds of publications like, for example, popular writings or newspaper articles. Fourth, the BOAI mentions also the legal status of the publication and different kinds of use that should be permitted, namely, reading, downloading, searching, printing and linking. In this article, the term OA will not be used in a legal but in a more practical sense: OA means that publications are freely available online and can be used practically.Footnote 5

In the current discussion various types of OA are introduced.Footnote 6 To some extent, the invention of new types is driven by science policy and changing preferences. Nevertheless, the most important distinction is still the one between green and gold OA. In the case of green OA, free access is provided by self-archiving of manuscripts on an institutional or subject repository or a website (Guédon 2004: 315; Suber 2012: 5). In the case of gold OA, free access is provided by formal communication channels like journals, conference proceedings, books or anthologies, no matter at what point in time access is provided and whether or not OA is provided to all publications in the medium.Footnote 7

Up to now, OA has not been a topic that has attracted a lot of attention from a sociology of science perspective and has widely been left to the library and information science and bibliometrics. This might have to do with the way in which the communication system of science (Garvey and Griffith 1967; Whitley 1968) is usually conceptualized from a sociological point of view: The common understanding focuses on social mechanisms like peer review, the recognition of merit or the attribution of reputation, but far less on the technical facilities and infrastructures the publication media are based on. They are merely treated as preconditions that allow the dissemination of new research and findings but are usually not regarded as a research topic that promises new and interesting insights.

In order to overcome these shortcomings and to develop such a perspective, the subject is re-conceptualized as three layers are being distinguished: the communication system of science, the publication infrastructure, and service organizations.

The first layer, the communication system of science, has extensively been discussed in the literature. In the context of OA, it has often been highlighted that this publication model improves the dissemination, i.e., the circulation of research within a specific community of scholars. Nevertheless, one should notice that dissemination is only one function of the communication system of at least four (Kircz and Roosendaal 1996, 107f.; Andermann and Degkwitz 2004, 8). The communication system also has a registration function with which the point in time is recorded when new findings or novel claims first appeared. It allows identifying the person who published a result first and to whom priority should be attributed. The third function is often called certification of a publication as a contribution to a common body of knowledge as a result of peer review,Footnote 8 but has been criticized as drawing a naïve picture of the significance of published research. What happens to novel claims and findings in the course of publication can be more carefully described as a symbolic appreciation of a contribution as noteworthy. In many fields, appreciation results from peer review. Symbolic appreciation is the source for the reputation system that combines two important regulatory mechanisms of science. On the one hand, it provides an external incentive for scientists to conduct and publish their research. On the other hand, it provides guidance of attention within a discipline or field as it points to eminent scholars and renowned journals that are more likely to deserve attention than others (Luhmann 1970). Fourth, it is necessary to have an archive where all the contributions can be found. The archiving function does not only allow the reconstruction of the developments within a field but also to distinguish, more fundamentally, between old and new knowledge (Stichweh 1979: 96) and is in this respect a precondition for the identification of a research frontier.

This concept of the communication system of science has undoubtedly been fruitful in the past. However, the analysis of the effects of digitization and OA seems to make it necessary to rethink it. For this purpose, it would be more suitable to conceptualize the formal communication system of science as a system of action, in which scientists are included in changing roles like authors, editors, reviewers and readers and in doing so, they contribute to the registration, dissemination, symbolic appreciation and archiving of research. Within these actions, a second layer, that is the publication infrastructure, comes into play. It consists of all technical means that maintain the communication system of science. It contains electronic journals gathered on publishing platforms but also online editorial management systems that help to organize the peer review process and the technical production of publications. Moreover, the publication infrastructure also includes academic social network sites, reference management systems, subject- and citation databases, repositories for searching and selection of literature, (alt-)metrics systems as well as tools that provide various indicators for different purposes.

In order to understand the relation between the communication system and the publication infrastructure, the concept of the duality of resources and routines developed by Ingo Schulz-Schaeffer (1999, 2000) is instructive. It distinguishes three elements of an action that involves technology. What the infrastructure provides for an actor of the communication system (authors and readers) are resources that allow them to achieve their aim. The second element are specific rules that have to be followed to activate the resources and that are part of the routines of the actors. But routines do not simply exercise rules. They also consist of a routine aspect which is understood as a specific occurrence and interpretation of the rules that results from a user-specific adoption of the infrastructure. One of the advantages of the concept is that it is fruitful for an empirical analysis as it relates two key aspects of action involving technology. On the one hand, it answers the question how technology coins action, as the rule aspect of routines calls to investigate the requirements of the technology for an actor. This rule aspect results in restrictions of how an actor performs his or her action. On the other hand, the routine aspect acknowledges that technology is always appropriated by users and used in a specific, sometimes even idiosyncratic way and for purposes that may have been unforeseen by the inventors. An example shall help to clarify these abstract considerations: In the context of peer review, online editorial management systems applied by scientific journals provide automatically generated emails that can be used by the editor to organize the correspondence with authors and referees (resource aspect). They can only be used by authorized persons, as identified by the system by pre-defined roles (like ‘editor-in-chief’) and in pre-defined situations, e.g., when a new manuscript is submitted or a review has been completed (rule aspect). The editor can simply send, add, or change the content of these emails according to his habits (routine aspect), resulting in a more or less standardized correspondence between editors, authors and referees (see Taubert 2012).

For a more complete understanding of open access and electronic publishing, there is also the need for a second extension of the perspective. It is often highlighted that digital infrastructures “reach beyond a single event or one-site practice” (Star 1999: 381; Star and Ruhleder 1996: 113) and that they are maintained and updated by someone (Ribes and Lee 2010: 234). This is usually done within maintenance or service organizations that can be regarded as a second social context in which the publication infrastructure is embedded. Such organizations keep the publication infrastructure up and running and adapt it to further technological developments and to the need of the users. Examples for these organizations are small and large publishing houses, libraries, as well as private and public information service providers. The type of organization often shapes the resources provided by the infrastructure and the rules that have to be followed by them. Thus, a private and a public infrastructural regime can be distinguished (Taubert 2017). The relation between the second and the third layer is also not one-way only. The resources the infrastructure provides for the communication system also legitimate the service organizations.Footnote 9 Figure 1 summarizes the conceptual extensions of the communication system of science in a heuristic model.

Fig. 1
figure 1

Extended perspective on the communication system of science. See also Taubert (2017, 2019)

Methods

The empirical results presented here analyze the usage of green OA in astronomy and mathematics. The two disciplines were chosen as they are well known for an early adoption of OA and for an early establishment of patterns regarding the use of repositories in particular.

  • In astronomy, the astro-ph repository (part of arXiv.org), which is used for self-archiving of manuscripts, was created in 1992 and shows a steady growth of manuscripts since then. Essentially, self-archiving of manuscripts built upon a previous circulation of paper pre-prints that were produced by observatories and published in pre-print series (Trimble 2010: 26; Lim 1996: 22), a system that was already in place in the 1940s. Originally, paper pre-prints were circulated between observatories until the volume of pre-prints reached a level at the end of the 1970s where this decentralized system reached its limits. From that time onwards until the advent of electronic repositories, pre-prints were collected in a central registry and sent to astronomers on request (Till 2001).

  • In mathematics, self-archiving also had the circulation of pre-prints as a predecessor but the development differed from astronomy. From 1991 onwards, a number of smaller repositories were created for different subject fields (Jackson 2002: 24) and a centralization of self-archiving happened later at the end of the 1990s, driven by the development of features of the arXiv that meet the requirement of mathematicians. Even though arXiv is now “by far the dominant preprint repository” (Crowley 2011: 1128) in mathematics, a few of the small preprint servers continued their operation until recently.Footnote 10

Against the historical development just sketched and a continuously high level of self-archiving that is reported by arXiv (2019) as well as by a number of studiesFootnote 11 , it can be assumed that green OA has found its role in the communication system of both disciplines and that the actors have developed stable ways of dealing with green OA. Therefore, the two disciplines are suitable cases for the analysis of authors’ and readers’ routines.

Design of the StudyFootnote 12

To investigate the usage of green OA and to analyze the routines applied by authors and readers, an interview study was conducted with 20 interviewees. In order to represent a maximum diversity of conditions under which research is published and publications of other scientists are accessed and used, a sample strategy was applied involving three sampling dimensions:

  • Disciplines First, and as already described, the study aims to compare the patterns of use of repositories in two disciplines. Therefore, half of the sample consists of astronomers, while the other half are mathematicians.

  • Countries Second, the study aims to compare scientists from different countries. The rationale behind this is that the financial conditions under which scientists conduct their research and participate in the communication system may influence how they use repositories and the content deposited on them. Therefore, half of the interviewees come from a country where science is relatively well funded (Germany), while the other half come from a country where funding for science is rather scarce (South Africa).

  • Cohort Third, the study compares different cohorts of scientists: The rationale behind this dimension is that scientists may primarily develop routines in the use of the publication infrastructure at an early point in their scientific career and that the routines might therefore differ. Half of the interviewees started publishing before the advent of free electronic publishing (before 1991), while the other half began when free electronic publishing was already in place in their discipline (after 1995).Footnote 13

The expert interviews, conducted in February/March 2012, focus on open access and address scientists both in their role as authors and readers in the communication system of their discipline. The length of the interviews varies between 39 and 122 min and an interview guideline was used to conduct them. The interviews were transcribed by using a simple transcription scheme and analyzed by adopting the grounded theory coding process.Footnote 14 For the analysis and the interpretation of the interview data Atlas.ti software was used.

Empirical Results

The empirical section analyzes how disciplinary repositories and the content deposited on them are being used by the interviewees. In the course of the analysis, it turned out that the dimension ‘discipline’ is more important than ‘cohort’ and ‘country.’ Therefore, the results section is organized by this dimension.

Repositories as Used by Authors

Astronomy

For astronomy, we start by asking why do authors self-archive their publications on the arXiv and what kind of resources the repository provides for them. Following the perspective of many protagonists of the open access movement, the main goal of self-archiving is to provide access to research and to make it usable. In contrast, the analysis of the interviews shows that they consider something different in first place. One astronomer explains:

The main reason is that once it`s been accepted, it takes a few months to get it published, so to disseminate it quicker, that information, that’s one main reason. (I 3, 0:11:54)

I think everyone wants to get their work out into the public domain as soon as possible. That’s the driving reason. (I 15, 00:45:28)

The most important objective of self-archiving is not the provision of access to one’s research but to make research available at an earlier point in time. Given that a number of steps have to be taken between submission and acceptance of a manuscript and also between acceptance and publication, self-archiving creates a gain of time as the upload on the arXiv repository offers immediate access. In other words, the resource that is provided by the repository can be characterized as speedy dissemination. However, the interviews also show that the relevance of speed differs between subfields within astronomy.

The second prominent motive for self-archiving is the provision of access but why this is important differs in the interviews with astronomers:

Well my intention is […] just because I deal with Russian astronomers, I know that they do not have the funds to subscribe to Astronomy & Astrophysics.Footnote 15 And those colleagues can only read research that is archived on this server. I think it is good for those people if my own contributions are accessible via that way. (I 14, 00:34:10, own translation)

The other one is accessibility, if I know I have published proceedings for some conference and it will eventually after one year appear or it would appear in that book and only the conference delegates will get the copy of the book, then I would like to upload a copy of mine so that other people who might be interested can get a copy electronically even if they don’t have the book or if they have not attended the conference. So for me it’s about accessibility. It’s a nice record for myself, I know okay I have been to ten conferences, I have written ten proceedings and if somebody wants it, I say, oh okay, go to the arXiv, it’s all there. I don’t have to look on my hard drive and so on. (I 3, 00:12:53)

The aim of self-archiving is accessibility in both cases, but the problem self-archiving reacts to is framed in a different way: In the first quotation, the astronomer, who works in the field of cosmic dust, reflects on the needs of other astronomers that are marginalized by the subscription model because of lack of funds of their research institution. Self-archiving is a means to include such recipients. In the second quotation, self-archiving refers not to a specific group of colleagues, their local situation, and insufficient funds but to deficits of a communication channel itself. The conference proceedings that are mentioned in the quotation are sometimes available in print for the participants of a conference only, and self-archiving is a means to compensate the lack of reach.

As a third motive, the arXiv is used as a platform to gather feedback from colleagues and to improve a manuscript before it is submitted to a journal. One astronomer describes his manuscript preparation strategy, which includes a feedback loop on the repository.

Normally what I would do is I’d send it to a couple of people I know to have a look, to comment. I would like a week turnaround time for that, and then send it to the arXiv after that. Then I have another week turnaround time and then after that, once I’ve had varied comments from various people at that point I submit to the journal. […] Feedback from the arXiv is very important, yeah. (I 13, 00:15:18)

I 13 follows a multi-step strategy that gradually increases the size of the group of astronomers that provide feedback. It intends to gather evaluations of the quality of the manuscript and suggestions how to improve it. In this context of use, self-archiving is a means to adopt a manuscript to the standards of the field and by doing so, the chance to pass peer review is improved. Regarding this strategy, the repository provides the resource of a two-way medium that allows organizing feedback of a specific community to new knowledge claims and findings.

Mathematics

Let us now turn to mathematics. What kind of motives for self-archiving can be found in the interviews and how do they differ from astronomy? In contrast to astronomy, ‘providing access to publications’ is the most prominent motive here. One interviewee refers to the limitations of the reach of journals and points out that

[…] my own stuff appears here [on the arXiv, author] in principle and is therefore freely available online. (I 16, 01:08:17)

Like in the quotation of the astronomer above, providing access to a specific group of colleagues that would otherwise be excluded can also be found as a motive in the interviews with mathematicians:

But the papers that appeared in older editions of that, I’ve noticed people from overseas can’t get hold of. So I’ve listed those papers on arXiv myself. (I 10, 00:07:20)

The second motive also has to do with gaining time but the context differs from astronomy. In some fields of astronomy, the speed of the research frontier results in a race for priority and leads to self-archiving as a means to protect priority. In the interviews with mathematicians, a different threat to priority claims occurs.

If I’ve got research that is submitted for publication but hasn’t been accepted yet, I sometimes put that on the arXiv. If I feel that the whole review process is taking too long and I would like to talk about the work at a conference, but I want to make sure that my intellectual property is protected, I put it on the arXiv. (I 10, 00:08:05)

Typically, mathematics is not known for high dynamics in the evolution of knowledgeFootnote 16 and a threat to priority is usually not caused by competing colleagues. Instead, there is a different problem mathematicians have to struggle with: The duration between the submission of a manuscript and its publication. This time span can be extensively longFootnote 17 and the reason for it is explained by another interviewee:

Peer review is of high importance, and can take quite a long time. Sometimes, it takes more than two years from submission to publication and some cases are even worse. […] the reason for this has to do with the fact that understanding an article in mathematics is difficult as one has to go through the proof step-by-step and one might not get it. Sometimes the author’s argument fails to elaborate an important point and one does not find the idea that is necessary to understand the proof – that is quite difficult and one has to be an expert in the particular field. Thus, I would not be able to review a manuscript with precision beyond my narrow field of expertise. (I 8, 00:28:53, my translation)

Besides the processes in the editorial offices and production steps of the publisher, there are two factors that increase the time span in mathematics until publication. First, mathematics is a discipline with a high degree of differentiation and allows mathematicians to review contributions within a narrow field of expertise. The identification of suitable referees can therefore be time consuming. Second, as the interviewee points out, the complexity of the task of reviewing a manuscript is high and can cause problems for the referee. Like the author of a manuscript, the referee can fail to understand the mathematical problem, the idea of a proof and its execution, or is able to evaluate it only after a longer period of time.

Long time spans between submission and publication can delay the communication of research or make it more risky. A mathematician could choose to wait until his manuscript has been published in a journal (or at least until it is accepted) before he presents his research at a conference. Otherwise he could also decide to present his research at conferences but runs the risk that a colleague might pick up the results and publish them first. Regarding these two options, self-archiving of non-published research is a means to protect priority claims, as the manuscript is freely available online and the act of making it public is documented by a time stamp that is assigned by the repository.

Beside the third motive—self-archiving as a means to gather feedback from colleagues and to improve the chances to pass peer review at a journal—a fourth motive can be found in the interviews with mathematicians:

I think one can increase it by being more visible. […]. And so the citation rate goes up I think, the more visible, the more easily accessible the paper is. (I 10, 00:37:59)

The idea here is that publishing an article in a journal does not adequately reach all members of the community that is addressed by the research. Within this context of use, the repository acts as a resource that attracts additional attention and hence boosts citations.Footnote 18

After having reconstructed the motives for self-archiving within the two disciplines, three findings are worth highlighting. First, there are different resources that repositories provide to authors: The increase of the reach of research that is often highlighted as the main advantage of OA for authors is only one type besides speedy dissemination, provision of feedback, protection of priority claims and boosting citations. Second, the two disciplines differ with regard to the relevance of the different types of resources. Third, in both disciplines there are motives that foster early self-archiving not only before publication but even before peer review is completed. One such motive is the race for priority (astronomy) or the protection of priority claims (mathematics), the other one is gathering feedback from colleagues to improve the chances to pass the peer review process at a journal.

Readers’ Usage of Self-Archived Manuscripts

Given that self-archiving at such an early point in time de facto bypasses peer review, and given that peer review is highly regarded in both disciplines, it is asked in a next step whether the readers take the possible non-peer reviewed nature of manuscripts on repositories into account. In other words: Are there any specific routines on the side of the readers that react to this characteristic of preprints?

Astronomy

Starting again with astronomy, the interviewees show high awareness of the location at which they access the research of their colleagues. When it comes to repositories, four types of routines can be distinguished that support an assessment and make content deposited on repositories usable.

Interpretation of metadata First, readers are highly aware of the existence of preprints on the arXiv repository that have not passed peer review or might not even have been subject to it. A first routine is technically supported by the arXiv. In the course of self-archiving procedure, authors have to provide metadata for their manuscript. Besides mandatory information like author, abstract and subject classification, there is the optional field ‘comments.’ Following the instructions for self-archiving, this field describes as “proper” to provide information about the status of the manuscript at the journal like “to be published in,” or “submitted to.”Footnote 19 This information is interpreted by the readers:

If they are on a preprint-server and did not appear in a refereed journal, one would not use them. Or, more precise: I wouldn’t use them. (I 4, 00:11:29, my translation)

This interviewee distinguishes between manuscripts that have already been published at the journal and that can be used—in the sense that they can be trusted and can be referred to – and manuscripts that have not been published up to now that cannot be used from his perspective. Other astronomers are less restrictive and refer to other kinds of information. Another interviewee describes his assessment of unpublished manuscripts as follows:

Would I trust them less? Not necessarily – it depends also who the author is. That also brings an interesting point, because you sort of know that the work of certain people and so on. […] I mean in that sort of areas where I work in – close binaries, it’s a small community, there are a few hundred people […] so you know most of the people who are working on the kinds of things. (I 12, 00:40:59).

In this case, trust and the possibility of usage of a manuscript result from an interpretation of the authors’ name. If the author is known to the reader for his research or, in other words, if the reader has a positive preconception of the authors’ work, a preprint will also be trusted even though it has not yet passed peer review. Hence, trust in the institution of peer review can be replaced by personal trust.

Restriction of citation A second way of dealing with preprints can be described as the restriction of the citation to infrequent situations that are characterized as exceptional. One astronomer describes such a situation: Footnote 20

It [The citation of preprints, author] only happened a couple of times. […] Because it was directly relevant to the work I was publishing. I can’t ignore the fact that this other piece of work was out there. […] It would be okay for me to do it, I think. I’m not sure, but if I’m aware of other people’s work, I will cite it. If it’s relevant to the work I’m working on. (I 15, 00:16:01)

In this quotation, the interviewee describes a normative dilemma. On the one hand, he knows about a manuscript of immediate relevance to his research and feels the normative obligation to recognize it. On the other hand, he is committed to the norm of institutionalized skepticism and regards only research that has been peer reviewed as ready for citation. The dilemma is being solved as the interviewee privileges the norm of recognition of former works of his colleagues.

Following the interviews with astronomers, there is a second situation where the citation of a preprint is considered as legitimate. This can be called an unproblematic citation:

For example, if you have an overview talk on a conference, on overviews about how many pulsars have been discovered lately and things like that then it’s new news, it’s not been published somewhere else, it’s the latest update. So it’s something insignificant […] Then it’s okay, they say there has been a talk by so and so and this is the latest numbers, but it’s not going to change the big issue I address in my paper. So I wouldn’t really place big important things on pre-review papers, it’s probably better to go to the journals, journal papers. So there is a small role for that I would say, but yeah, keep it to a minimum. (I 3, 00:15:25)

This interviewee sharply distinguishes between references that are central for this argument and those that are marginal. Only in the latter case does the citation of preprints seem to be adequate to him.

From the perspective of the communication system of science, both ways of restricting citations of preprints can be interpreted as a mechanism that hinders the accumulation of errors. In the first case, the higher risk of building on erroneous results is limited to situations in which a normative commitment requires the acknowledgment of previous works. In the latter case, the argument of the citing article will not be affected, as the citation is located in the wider context of the presentation of results but does not support the core argument or finding.

Trustworthy components of preprints Besides the context in which a non-peer reviewed preprint is cited, a subtle distinction is made between trustworthy and untrustworthy components of a preprint. In empirical astronomy, there is trust in research data to a larger extent, while interpretations of data are less trusted. According to this distinction, peer review is not regarded as equally important for different components of a manuscript:

But, I mean […] it [peer review, author] adds value to the publications, often publications are much better following the peer review. So I refer to observations that could be on the web or so on. But papers and interpretation should have been subject to peer review. (I 12, 00:39:46)

One does not read the whole article, one looks at a few figures and if there are data in it […] one can use that from a paper even if it is submitted only, yeah. (I 14, 01:05:29, my translation)

Especially in this area where I’m quite interested in the observation on astronomy so the simple just reporting of observations doesn’t necessarily need to be peer reviewed. It’s the interpretation of the results, of the data that needs peer reviewing really. (I 15, 00:19:21)

While observational data are considered as being trustworthy and do not require an evaluation by peers, the same does not hold for the interpretation and conclusion. They are suspected to comprise errors, flaws or subjectivity. According to the distinction, data can be used in a pragmatic way, while such a use is out of question for the other components of a manuscript.

To understand the reason why data are being excluded from institutional doubt, it is helpful to consider the way in which observational data are created in astronomy. Today, the progress in observational astronomy heavily depends on a relatively small number of earthbound and spaceborne optical, infrared and radio telescopes. Data that is published in observational astronomy almost exclusively originated from not more than 150 observatories and are distributed disproportionally between them.Footnote 21 Because of the cost for observational time at these facilities, observations follow a strict schedule and are delegated to experts including technical staff. Subsequent to the observation, the data are being processed, compressed and quality controlled by a so-called data pipeline.Footnote 22 This process is black-boxed and can hardly be evaluated by outsiders. One can conclude that the trust in observational data expressed in the quotations above is rooted in the organization of observations in astronomy with its delegation of data production and processing to experts and software.

Mathematics

How do readers in mathematics refer to preprints, what kind of routines do they apply and do they differ from astronomy? In the interviews with mathematicians, there are also ample passages that refer to the potential non-peer-review nature of preprints.

Interpretation of metadata The first type of routine to deal with preprints is already known from astronomy and interprets the context information of a preprint. Following this routine, the authors’ names indicate trustworthiness or at least notability of results:

Also on the level of certain authors you start to trust, you trust their results from past experience you know pretty much that if they put something up [on the arXiv, author], it`s quite sure that there are not too many mistakes in it. (I 2, 00:12:14)

Another mathematician states that recommendations from valued colleagues are an important criterion to identify publications worth reading. In this context, a preprint is not qualified to spend time on because of past experiences with the author but because of the qualified assessment of a colleague.

Reading an article in mathematics is troublesome as it often takes a whole day to understand it. And if I read crap I will lose that day. […] Therefore, I only read unpublished articles written by an acquaintance or an article is recommended by someone I appreciate. (I 8, 00:10:58, my translation)

Plausibility checks: The second routine that creates trust in preprints can be called plausibility checks and are being used in pure mathematics to check a proof.

I would try to understand the basic idea and check if it is correct. […] You can see quite fast, aha, this is correct, there is the new idea and therefore he made it. Then you think to yourself “Why didn’t I get the idea myself?” In particular if you are working on a field for some time. Then you can assess the idea quite fast even if you do not overlook the details of the argument. […] One could say a plausibility check. (I 5, 00:11:52, my translation).

The plausibility check refers to one of the two levels of mathematical thinking distinguished by Bettina Heintz. The first one is the understanding of a mathematical problem as an open and uncompleted process quite similar to the understanding of social situations (Heintz 2000: 223) while the second level is the check of the correctness of each individual step of a proof. The quotation refers to the first level of thinking and allows a short assessment whether or not it is worth changing to the second level and going into the details.

Comparison of evaluation: In mathematics, trust in the results of preprints is not created in a single act but in a process that consists of different steps. One step is the exchange with colleagues that leads to a consensus model of truth (Thom 1971; Heintz 2000: 178). To decide whether or not a paper is worth reading, specialists in a particular field discuss new preprints and share their impressions about the quality. Compared with the first routine, the comparison of evaluation is not restricted to simple recommendation but refers more to the content, e.g., the mathematical problem, the solution and the argument that is developed in it.

In most cases, the community is rather small. You know the people anyway. And if someone comes up with something new, one would talk about it […] “Did you read that? Is it something valuable or not?” (I 6, 00:17:22, my translation)

Conclusion

The empirical results presented in this article should be summarized in three points. First, it could be shown that green OA plays an important role in the communication system of both disciplines. In astronomy as well as mathematics, a central disciplinary repository was established at an early point, namely, at the beginning (astronomy) and end (mathematics) of the 1990s, thus even before the term open access was coined in a science policy debate. In both disciplines, the repository did not replace journals but acts as a second layer of the communication system (Gunnarsdóttir 2005). The reason for not replacing them can be pinpointed with reference to the four functions of the communication system of science: given that the point in time of self-archiving is recorded (as part of the rule aspect of the infrastructure), repositories register claims for new results and findings and exercise the registration function. And as already stated, the dissemination function is shared by repositories and by journals as well as the archiving function that allows a reconstruction of old and new knowledge claims. The only function that remains for the monopoly of journals is the symbolic appreciation of a contribution because peer review is exclusively applied there. Regarding the author, the inclusion in the communication system of science via self-archiving of manuscripts can therefore be characterized as incomplete since the symbolic appreciation is missing. For that reason, authors usually also strive for publication in a journal. Regarding the reader, the inclusion in the communication system by the reception of manuscripts from repositories tends to be more risky since the manuscripts that can be found in repositories might be preprints and may not have been subject to peer review. One can therefore conclude that as long as repositories do not provide the resource of symbolic appreciation, it does not seem to be likely that they will replace journals.

Second, the theoretical framework suggests to analyze the usage of repositories in terms of rules and resources that are both components of routines (Schulz-Schaeffer 1999, 2000). It sensitizes the perspective and calls the attention that repositories provide more than one type of resource. The increase of reach, speedy dissemination, provision of feedback and the boost of citations are different resources that an author may desire, depending on his or her situation when publishing research. Regarding the reader, it could be shown that content self-archived on a repository is not a resource per se but is of value for the readers only in the context of routines. These include, for example, the interpretation of meta data of self-archived publications (such as the status of a manuscript at a journal) or subtle distinctions between trustworthy and non-trustworthy components of a preprint.

Third, and in contrast to studies that aim to understand the adoption of green OA by looking at the motivations of authors (e.g., Kim 2011), this empirical analysis offered a two-sided perspective on the use of repositories: the adoption of OA is understood as a result of a co-stabilization of both the authors’ and readers’ routines that mutually relate to each other. In other words, a complementarity between authors’ and readers’ routines can be found in both disciplines. This finding suggests that the emergence of complementary routines could be a necessary condition for the green OA model to succeed. Finally, the analysis shows that motives for self-archiving and routines of readers differ between the two disciplines. Furthermore, there are some links between epistemic characteristics of the disciplines and the routines of action. For authors in astronomy, competition for priority is an important driver for self-archiving while in mathematics, the long time span necessary for peer review (caused to some extent by the complexity of the task) plays an important role. For readers in astronomy, trust in preprints refers to how data are collected and processed, while in mathematics, routines to establish trust refers to a specific type of mathematical thinking about proofs. The routines seem to be anchored in the two disciplines. Regarding a possible generalization of the results, it seems therefore unlikely that green OA works along the same principles in science at large.