Introduction

The field of third sector studies is inherently interdisciplinary, with studies from political science, management, sociology and social work, among others. Within the field of research, a large percentage (between 40–80%) of studies employ qualitative methods such as interviews, focus groups and ethnographic observations (von Schnurbein et al., 2018). In order to ensure rigor, qualitative researchers devote considerable time to developing interview guides, consent forms and coding frameworks. While there is a vast literature that considers the collection and the analysis of qualitative data, there has been comparatively limited attention paid to audio transcription, which is the conversion of recorded audio material into a written form that can be analyzed. Despite advances made in qualitative methodologies and increasing attention to positionality, subjectivity and reliability in qualitative data analysis, the transcription of interviews and focus groups is often presented uncritically as a direct conversion of recorded audio to text. As technology to facilitate transcription improves, many researchers have shifted to using voice-to-text software and companies that employ AI rather than human transcription. These technological advances in transcription, along with shifts in the way that research is undertaken (for example, increasingly via video conferencing as a result of the COVID-19 pandemic), mean that the need to critically reflect upon the place of transcription in third sector research is more urgent.

In this article, I explore the place of transcription in qualitative research, with a focus on the importance of this process for third sector researchers. The article is structured as follows. First, I review the qualitative methods literature on audio transcription and the key themes that arise. Next, I report on a review undertaken of recent qualitative research articles in Voluntas and the way that authors discuss transcription in these articles. Finally, I propose a framework for qualitative third sector researchers to include transcription as part of their research design and elements to consider in including descriptions of the transcription process in writing up qualitative research.

Audio Transcription: What We Know

At a basic level, transcription refers to the transformation of recorded audio (usually spoken word) into a written form that can be used to analyze a particular phenomenon or event (Duranti, 2006). For many qualitative researchers, transcription has become a fairly taken-for-granted aspect of the research process. In this section, I review the methods literature on the process of audio (and video) transcription as part of qualitative research on the third sector, focusing on three key areas—how transcription is undertaken, epistemological and ethical considerations, and the role of technology.

Qualitative research and transcription

While quantitative research seeks to explain, generalize and predict patterns through the analysis of variables, qualitative research questions are more interested in understanding and interpreting the socially constructed world around us (Bryman, 2016). This means that data are collected through documents, observation and interviews, and the latter are often recorded in order to analyze these as documents. For third sector research, recordings are most commonly made of interviews and focus groups, but may also be of meetings, events and other activities to ensure that researchers do not have to rely on their power of recall or scribbled notes.

Transcription is a notoriously time-consuming and often tedious task which can take between three hours and over eight hours to transcribe one hour of audio, depending on typing speed. Transcription is not, however, a mechanical process where the written document becomes an objective record of the event—indeed, written text varies from the spoken word in terms of syntax, word choice and accepted grammar (Davidson, 2009). The transcriber therefore has to make subjective decisions throughout about what to include (or not), whether to correct mistakes and edit grammar and repetitions. This has been described as a spectrum between “naturalized” transcription (or “intelligent verbatim”) which adapts the oral to written norms, and “denaturalized” transcription (“full verbatim”), where everything is left in, including utterances, mistakes, repetitions and all grammatical errors (Bucholtz, 2000).

While some contend that denaturalized transcription is more ‘accurate’, the same can equally be argued for naturalized, as it allows the transcriber to omit occasions when, for instance, an individual mis-speaks and corrects themselves, thereby allowing the transcriber to record closer to what was intended and how the interviewee might have portrayed themselves in a written form. As Lapadat (2000, p. 206) explains, “Spoken language is structured and accomplished differently than written text, so when talk is re-presented as written text, it is not surprising that readers draw on their knowledge of written language to evaluate it.” Other nonverbal cues, such as laughter, tone of voice (e.g. sarcasm, frustration, emphasis) and the use or omission of punctuation, can also drastically alter the meaning or intention of what an individual says. In addition, the transcriber must make decisions about how much contextual information to include, such as interruptions, crosstalk and inaudible segments (Lapadat, 2000). Because of the range of types of research that employ qualitative methods, there is no single set of rules for transcription but rather these decisions must be based on the research questions and approach.

Epistemological and Ethical Considerations

Because the researcher (or external transcriber) must make these decisions as they translate audio into written text, transcription is an inherently interpretative and political act, influenced by the transcriber’s own assumptions and biases (Jaffe, 2007). Every choice that the transcriber makes therefore shapes how the research participant is portrayed and determines what knowledge or information is relevant and valuable and what is not. Indeed, two transcribers may hear differently and select relevant spoken material differently (Stelma & Cameron, 2007). As Davidson (2009) notes (and as I explore in further detail in the next section), despite being a highly interpretive process, transcription is frequently depicted using positivist norms of knowledge creation.

Transcription also involves potential ethical considerations and dilemmas. When working with disadvantaged communities, deciding how to depict research participants in written text can highlight the challenges of ethical representation. As Kvale (1996, pp. 172–3) notes, “Be mindful that the publication of incoherent and repetitive verbatim interview transcripts may involve an unethical stigmatization of specific persons or groups of people”. Oliver et al. (2005) similarly demonstrate how transcribers must make decisions about how to represent participants’ use of slang, colloquialisms and accents in ways that are accurate but also respectful of the respondent’s intended meaning. Some researchers decide to send finished transcriptions to interviewees for approval in order to honor commitments to fully informed consent, to ensure transcription accuracy or in some cases as a means to address the balance of power between the researcher and interviewee. As Mero-Jaffe (2011) describes, on the one hand, this may empower interviewees to control the way that they are portrayed in the research. On the other hand, Mero-Jaffe found that seeking transcript approval from interviewees sometimes increased their embarrassment at the way that their statements appear in text. This may be especially problematic with full verbatim transcriptions.

Technology and Transcription

As technology improves and AI becomes increasingly able to create written text from recorded audio, researchers might ask—is human transcription even necessary? New options in Computer Assisted Qualitative Data Analysis Software (CAQDAS) such as NVivo, Atlas.ti and MAXQDA give qualitative researchers the option to forgo audio-to-text transcription altogether, and instead engage in live coding of audio or video files. Using this method, researchers first watch or listen to recordings to code for nonverbal cues, followed by a stage of note taking and coding based on pre-defined themes and matching these with time codes and nonverbal cues. Finally, researchers then transcribe specific quotes of interest from the recording (Parameswaran et al., 2020). This process may improve immersion in the data and allow researchers to account for dynamics that are often lost in complete audio-to-text transcription, such as group interactions and nonverbal communication.

There is a considerable need to develop the evidence base on the role of AI in transcription for qualitative research, with many important publications that consider the issue (e.g. Gibbs et al., 2002; Markle et al., 2011) out-of-date given the swift rate of change in AI technologies. Over the last few years, voice and speech recognition technologies have improved dramatically and may now be able to provide researchers with “good enough” first drafts of transcripts (Bokhove & Downey, 2018), providing certain conditions are in place (e.g. limited number of speakers and excellent audio quality). Using these technologies can save researchers time and money. As a result of the COVID-19 pandemic, many qualitative researchers are now undertaking interviews over Zoom or other video conferencing apps, which is a trend that may continue beyond the pandemic (Dodds & Hess, 2020). Zoom offers AI live transcription options, which benefits from the generally clear audio quality of a video conference, compared to in-person interviews where there is a greater chance of audio interference and background noise that may be undetected in the moment.

While AI may offer a cheaper and quicker alternative to human transcription, these transcripts will need to be meticulously checked by the researcher to ensure accuracy, fill in missing details or edit for context and readability. Using cloud-based AI transcription services also raises potential ethical concerns about data protection and confidentiality (Da Silva, 2021). There are numerous subjective decisions made in the course of creating a transcription that AI is unable to process, such as where to include punctuation, which words to include or exclude (such as filler words, hesitations, etc.) and how to denote things such as interruptions, hesitations and nonverbal cues. Voice-to-text software is also generally less accurate in discerning multiple voices or different accents (Bokhove & Downey, 2018). Several studies have considered how researchers/transcribers can use voice recognition software to listen and repeat the spoken text of an interview into software as a shortcut to traditional typing transcription (Matheson, 2007; Tilley, 2003), but the above shortcomings and cautions apply.

Transcription and Third Sector Research

Transcription matters for third sector research because qualitative research methodologies make up a large percentage of studies undertaken on nonprofits—as much as 40–80% of research published in this field (Igalla et al., 2019; Laurett & Ferreira, 2018; von Schnurbein et al., 2018). Audio transcription is particularly important for third sector research for several reasons. In conducting qualitative research (which aims to produce rich, rigorous description) and as third sector researchers (who study organizations that seek to improve society and who may be working with traditionally disenfranchised or disadvantaged communities), we have a particular ethical obligation to ensure that our research provides an accurate depiction of our participants’ lives and the organizations with which they are involved.

However, transcription is perhaps the most underacknowledged aspect of the qualitative research process, and this is also evident in the way that transcription is discussed in research articles. In order to survey the current depiction of the transcription process in third sector research, I undertook a review of the 212 most recent papers in Voluntas that include the word ‘interview’ to explore how qualitative research articles discuss transcription as part of their methodology.Footnote 1 Of these papers, 79 were deemed not applicable (because they were quantitative research papers that mentioned interviews in another context, or used the word interview to denote the administering of a structured questionnaire, or systematic review papers reporting on other research). This left 133 articles which were analyzed to explore the extent to which transcription was described—if at all—as part of the research methodology.Footnote 2

The analysis (illustrated in Fig. 1) found that 41% of papers employing interviews as a research method did not mention transcription at all, while 11% mentioned transcripts but not the process of transcription. It was not clear from these whether or not interviews were recorded or if researchers relied upon written notes taken during interviews, or how information from the oral interview was converted into analyzable text. The most common discussion of transcription (19%) was a simple sentence along the lines of “interviews were recorded and transcribed”, while 26% gave some further information including who undertook the transcription (the researcher(s), a research assistant or a commercial company) or that the interviews were transcribed ‘verbatim’ (with none explaining what they mean by this term). These findings are not dissimilar to a study of qualitative research in nursing, where it was found that 66% of articles reporting solely that interviews were transcribed, and the remaining articles indicated only “full” or “verbatim” to clarify the process (Wellard & McKenna, 2001). I also surveyed the first authors’ departmental affiliations/field of study to gauge any differences between academic fields (Table 1) although there were not considerable differences.

Fig. 1
figure 1

Transcription in Voluntas qualitative articles

Table 1 Description of transcription and field of first authors

The fact that over half of the Voluntas articles using interviews as a research method make no mention of the transcription process is a problem for transparency in qualitative research. This tendency may be a symptom of the fact that qualitative researchers face greater challenges in academic publishing that disadvantage longer from, in-depth qualitative research to fit within prescribed word limits (Moravcsik, 2014). In researchers’ efforts to ensure that qualitative research meets requirements for transparency, rigor and reliability, efforts are concentrated on descriptions of case and participant selection and data analysis while transcription as the conduit between data collection and analysis remains unproblematized. This emphasis reflects the growing influence of positivist views of validity. Ignoring the subjective decisions and theoretical perspectives that determine the creation of a transcript therefore inadvertently presupposes a positivist stance on the objective nature of data which is inconsistent with qualitative methodologies.

A Framework for Undertaking and Reporting on Transcription

As shown in the previous section, there is currently widespread neglect of transcription as part of interpretive qualitative research on the third sector. In this section, I present key elements for third sector researchers to consider in regard to transcription, both to ensure rigor as part of the qualitative research process and in writing up qualitative research, drawing upon examples of good practice from previous research in Voluntas. These recommendations are based on a review of the literature as well as my personal experience as a qualitative researcher, qualitative methods teacher, and professional transcriber.

Before Transcribing: Ethics and Data Management

All decisions regarding research design, data collection and data management should be made at the beginning of a qualitative research project when applying for ethical/IRB approval from one’s university, and this includes transcription. At this stage, the researcher should confirm with their university whether they have a budget for transcription. Undertaking ethical qualitative research means ensuring standards of transparency, informed consent, confidentiality and protection of the data obtained from the research (Blaxter et al., 2001). Increasing concerns about data protection and legislation such as GDPR in the European Union have prompted many universities to institute strict rules about where research data can be stored. Some universities do not allow the use of certain cloud servers, such as Dropbox. These considerations should be taken into account when deciding how to undertake and record interviews (Da Silva, 2021)—for instance, if you are recording using your mobile phone, it is important to be sure you know whether recordings automatically upload to the cloud. For this reason, it may be preferable to use a traditional digital recorder so you can manually download the files to your computer and know exactly where everything is saved.

Before Transcribing: The Interview

Before transcription can even be considered, researchers must ensure that they have a suitable audio recording, which begins with the interview itself—whenever possible, interviews should be conducted in a quiet environment without background noise or interruptions and the audio recording device should be placed close enough to the respondent to pick up their voice clearly. While recording interviews with a mobile phone has become increasingly common and easy, using a backup recording device is always a good idea to mitigate against flat batteries, full memory cards, and human error. If recording with your mobile phone, it’s also critical to remember to place it on airplane mode/‘do not disturb’ for the duration of the interview.

To Transcribe or Not to Transcribe?

While transcription from audio recordings is considered standard practice in qualitative research (Tracy, 2019), it is not the only way of undertaking qualitative interviews, and it is important to note that there are many reasons why it may not be desirable, appropriate or possible to record interviews at all. In relation to third sector research, this is most commonly the case in community-based research, research with political elites or research in challenging environments. One article explained that they did not record interviews because: “In sectors marked by fear, intimidation, and strong security apparatuses, recording devices would almost certainly have led to self-censorship and limited our access.” (Atia & Herrold, 2018, p. 1046). Similarly, researchers may be unable to record in community settings because of sub-optimal recording conditions (e.g. meeting outside, noisy environments, etc.) or because using recording device makes participants uncomfortable or reinforces power relations between the researcher and participants (Quintanilha et al., 2015).

If researchers decide not to comprehensively transcribe recordings, or decide not to record qualitative fieldwork at all, this should be noted and explained in relation to methods. Other methods of notetaking and analysis may be more suited to certain types of ethnographic research, such as reflexive journaling (Halcomb & Davidson, 2006), or Systematic and Reflexive Interviewing and Reporting—a process by which a researcher and research assistant jointly interview participants and write their own reports that include observations and analyses, which are collaboratively analyzed (Loubere, 2017).

How to Transcribe?

Traditionally, transcribers used foot pedals to play, rewind and fast forward tape recordings while they typed. Now that audio files are digital, several free and low cost programs are available (such as Express Scribe and oTranscribe) that let transcribers set up hot keys to perform the same actions without having to navigate away from their transcript document.

The degree of detail to include in transcripts should be decided upon before interviews are transcribed. This is important because previous research has demonstrated that the format selected for transcription significantly impacts how the researcher interprets the data (Mishler, 2003; Packer, 2017). There is no one best or “most accurate” style of transcription, but rather, a researcher should consider the particular theoretical background and research questions of the study in order to determine where on the scale of full verbatim to intelligent verbatim is most appropriate for the study. Because third sector research is most commonly associated with social science and business disciplines rather than linguistics, it will rarely be necessary or appropriate to employ the conventions of conversation analysis or extreme levels of denaturalized transcription (Bucholtz, 2000). Indeed, it might most frequently be appropriate to employ a version of naturalized/intelligent verbatim, so that any participants’ quotes included in written works are more ‘readable’ and do not include excessive repetitions or verbal fillers such as ‘um’.

If the researcher determines that naturalized or intelligent verbatim transcription is the most appropriate for their study, several considerations should be heeded in order to ensure that meaning is not distorted or lost. First, indications of laughter, nonverbal cues (such as sighs, huffs, finger-snaps, sobbing or even blowing raspberries) should be included if these convey important meaning. Other considerations of how to transcribe may be based more on personal preference and the ability to produce a document that is easily analyzable in the researcher’s chosen medium. For instance, wide margins on one side can be useful for researchers who choose to analyze their data on paper or in Microsoft Word, while other more flowing templates will work better to import into software such as NVivo. It can also be useful to include time stamps for unclear or inaudible statements, or at regular intervals (e.g. every minute) which makes it much easier to check a transcript against the original audio.

Who Transcribes?

As discussed in the consideration of qualitative studies, the prevalence of the passive voice when reporting on transcription (i.e. “interviews were transcribed”) obscures the important distinction of who undertook the transcription. If the researcher transcribes recordings themselves, then it is generally acceptable to assume the coherence between the research approach and approach to transcription, as well as the researcher’s confidence that the written transcript is an accurate record of the event/interview that took place. If, however, the researchers choose to outsource transcription to a research assistant or commercial transcription company, then care should be taken to give detailed and thorough instructions about the elements described above. The researcher should also spot check transcripts for accuracy, fill in any missed words/inaudibles and ensure that the transcription document fulfils their expectations in regard to level of verbatim, style and formatting.

Ideally, transcribers should be hired who have specialist knowledge of the subject matter and familiarity with the accents or dialect of the speakers. They should be provided with a key information about the project, such as the research questions, important terms and acronyms. Lapadat (2000) provides several useful suggestions when hiring transcribers in order to ensure transcription quality and increase rigor. First, rather than fully outsourcing transcription, the researchers can transcribe some interviews or portions of interviews themselves in order to provide an example for transcribers and develop a transcription protocol. Another option when employing research assistants to transcribe interviews is to include them directly in the interviews (either as a co-interviewer or observer), so they have direct involvement in the research and context.

Finally, when working with external transcribers it can also be valuable to encourage transcribers to keep memos of the transcription process or contextual observations and impressions that may not come through in the written text. For instance, does the interviewee sound tired, frustrated, distracted or nervous? Does the interviewer interrupt the respondent frequently (which the transcriber may choose to edit for readability)? Or did the interview take place somewhere public, like a cafe, which may have made the respondent more guarded? Such information is often lost, particularly in projects that involve multiple research team members (for instance, a PI, multiple interviewers, research assistants and/or professional transcribers).

Writing about Transcription

Due to limited space or word limits, it is not typically possible or desirable to include all of the above details in research articles. Instead, at a minimum, researchers should include who transcribed the audio recordings as part of a commitment to ethical and transparent qualitative research. If this was done by anyone other than the researchers, authors should ideally describe the measures taken to ensure accuracy (developing a protocol for transcribers, spot checking, proofreading, sending transcripts to interviewees if appropriate) and ethical considerations (such as data protection and confidentiality).

Second, researchers should indicate the type of transcription—whether selective (pulling out relevant quotes and themes, or transcribing just the ‘gist’), intelligent verbatim/naturalized or full verbatim/denaturalized. The choice of type of transcription should align to the researcher’s epistemological position and theoretical framework.

Finally, researchers should include any other subjective decision-making that took place during the transcription process, in much the same way that researchers are encouraged to be transparent about their subjectivity and positionality in undertaking interviews and analysis of qualitative data (McCorkel & Myers, 2003). This may include information about selecting the level of verbatim, working with external transcribers, feedback from interviewees on transcripts or efforts to ensure accuracy of transcripts and coherence with the research approach.

The following quotes provide good examples of how to write about transcription:

The interviews, which were conducted in the native language of the interviewees by six female Hebrew-Arabic-speaking interviewers, were recorded, translated, and transcribed verbatim. […] Immediately following the interview, each interviewer transcribed and translated her interviews into Hebrew. In this manner, we sought to achieve a translation that was as close as possible to the interviewer’s insights regarding the participants, and we regarded the interviewers as active agents in the creation of knowledge. (Yanay-Ventura et al., 2020, p. 6)

Three Spanish speaking investigators transcribed all of the interviews from audio recording devices, checked each other’s transcription for accuracy, and analyzed the interviews using thematic analysis (Braun & Clarke, 2006). The transcribers observed the focus groups and took notes on participants’ voices and other identifying traits to help the transcription process go more smoothly. Researchers aided the transcribers in this regard by asking participants at the beginning of the focus groups to introduce themselves using a pseudonym and briefly remark upon how they preferred to spend their time. (Schwingel et al., 2017, p. 170)

In both of these examples, the authors treat the process of transcription as part of the broader research process, rather than as an automatic conversion of audio to text. While there is limited clarification about the type of transcription (beyond ‘verbatim’), the discussion of the subjective decision-making as part of the transcription process and acknowledgment of the agency of the individuals undertaking transcription increases transparency and therefore rigor.

Conclusions

Qualitative research can help us to understand some of the important issues impacting the third sector in ways that quantitative methods fall short of explaining, such as the ways that individuals and organizations make sense of public policy and societal challenges, how and why organizations design their services and activities in particular ways, and the intricacies of the relationships between boards, executives, staff and volunteers. Qualitative methods training stresses that an interpretivist epistemological position sees knowledge as socially constructed, yet transcription has slipped through the cracks of methodological examination in the process of creating and interpreting meaning.

In this short article, I sought to draw our attention to this important stage of qualitative data collection and analysis and call on third sector researchers to critically reflect upon transcription both in conducting research and in writing about it. I have focused primarily on the transcription of interviews, rather than focus groups or other multi-person events. All of the points raised in my framework transcription apply to these methods of data collection as well; however, there are further issues that need to be taken into consideration regarding focus groups that warrant further attention, such as the issues of power and accuracy of transcription when there are multiple people speaking and interrupting one another. Researchers employing multi-person recordings should therefore devote more time and consideration to transcription. Finally, technology continues to advance in the area of voice recognition, which may save researchers considerable time and/or money in transcription; however, I implore scholars to see transcription through an interpretivist rather than positivist lens, to ensure that the production of written transcripts is not approached as the creation of objective knowledge.