Introduction

Our nonverbal emotional signals help others to recognize what we are feeling (Niedenthal, 2007), shape decision-making in interpersonal negotiations (Pietroni et al., 2008; Sinaceur & Tiedens, 2006), and secure effective communication with others (Guyer et al., 2019; Ottati et al., 1997; Van Kleef et al., 2011, 2015). Research on nonverbal, emotional signals often focuses on facial expressions (e.g., smiling, frowning; Dimberg et al., 2000). However, a growing literature suggests that vocal affect (i.e., conveying emotions through the voice) is also crucial in communication (Guyer et al., 2018; Juslin & Scherer, 2005; Kappas et al., 1991). The present work is a theoretical paper that aims to outline some conceptual principles regarding how vocal affect may shape communication effectiveness. In so doing, our goal is to organize past research and provide an organizational framework that can highlight potentially productive avenues for future research.

Vocal Affect in the Communication Process

People show highly sensitive and specific recognition of vocal affect (Banse & Scherer, 1996; Bänziger et al., 2014). Indeed, the ability to accurately identify different emotions in the voice does not even require that listeners understand the language to which they are exposed (Elfenbein & Ambady, 2002; Pell et al., 2009), highlighting people’s exquisite sensitivity to vocal affect. At minimum, vocal affect communicates the underlying personality, attitudes, and feelings of a speaker (Hall & Schmid Mast, 2007). For example, when people feel specific emotions, they speak with corresponding profiles of speech melody, disturbance, tempo, and other vocal cues (Wallbott & Scherer, 1986).

Vocal affect plays critical roles in social life, including inter-group processes (e.g., in-group vocal affect recognition advantages, Laukka et al., 2016) and relationship dynamics (e.g., vocal affect patterns involved in demand-withdraw behaviors, Baucom et al., 2011), and facilitates quick inferences about other people’s perspectives (Juslin & Scherer, 2005). However, research seldom has examined vocal affect’s connection with communicative success (i.e., the successful transfer of information from speaker to recipient). This absence is surprising because communication is so often vocalized (e.g., television, public speeches, private conversations). Furthermore, affect is central to research on at least one major type of communication: persuasion, in which a source attempts to change a recipient’s opinion of an object (see Petty et al., 2003). Yet, there is little clarity regarding how and why vocal affect impacts the success of communications. The present article provides a theoretical framework to address this gap.

Affective and Cognitive Communication of Evaluations

People often form and communicate their opinions on the basis of both affective content (i.e., how an object makes them feel) and cognitive content (i.e., attributes that an object is believed to possess; Edwards, 1990; Fabrigar & Petty, 1999), and vocal affect may play distinct roles in these contexts. First, people’s opinions about objects are often rooted in affective reactions they have towards those objects (Forgas, 2010; Olson & Kendrick, 2008; Zanna & Rempel, 1988). We borrow the term objects from the attitudes literature, where it refers to any physical object, person, social group, or even abstract concept that people can hold opinions towards (e.g., Ostrom, 1989). Affective communications operate by having listeners feel key emotions which then become guides for recipients’ post-message opinions (e.g., making people feel disgust towards drugs so that they will dislike drugs). For instance, the This is Your Brain on Drugs campaign (Cutler & Thomas, 1994) associated drugs with various disgusting and fearful imagery. Ideally, if people associate drugs with disgust and fear, they will dislike drugs because people generally dislike gross, scary things.

A connected question is how one can conceptualize the link between a given type of vocal affect (e.g., an excited voice) and an affectively-based communication. A continuum of congruence/incongruence can be conceptualized between vocal affect and the emotion expressed in the content of an affective message (Guyer et al., 2018). Vocal affect is congruent when a speaker vocally expresses an emotion (e.g., sounding scared) that is perceived as matching the emotion present in the message itself (e.g., fear-based message). Vocal affect is incongruent when it is perceived as differing from the message’s emotions (e.g., a calm voice paired with fear-inducing message content).

But when will people generally perceive that two emotions (i.e., the emotion in a message, and the message in a speaker’s voice) are congruent versus incongruent? Strictly speaking, the voice might have to express an emotion identical to the emotion a message expresses to be considered congruent. More practically, however, people likely perceive a continuum from complete congruence to complete incongruence, and several perceived dimensions of emotions determine the degree of congruence between vocal affect and message content. These dimensions may include the valence (positivity versus negativity) and arousal (enervated versus energetic) of emotions. Dimensional approaches are frequently identified in studies of emotions, and often receive some evidentiary support (Bakker et al., 2014; Huelsman et al., 1998; Mehrabian, 1996). For example, Remington and colleagues’ (2000) study of self-reported emotions found moderate support for a circumplex model, in which valence and arousal are modeled as two major dimensions of emotions (also see Russell, 1980). Apple and Hecht (1982) also provided evidence that people may rely on arousal and valence dimensions in their recognition of vocally-expressed emotion (also see Bachorowski, 1999). In sum, there is at least tentative support for the idea that congruence and incongruence might be effectively modeled by varying emotions along valence and arousal dimensions. We label emotions congruent when they match on valence and arousal, partially incongruent when they mismatch on one dimension but mismatch on the other, and fully incongruent when they mismatch on both dimensions.

Certainly, modern perspectives on emotions often advance more complex systems for understanding emotions (e.g., Ellsworth & Scherer, 2003) compared to simple valence/arousal frameworks (also see Goudbeek & Scherer, 2010). Ultimately, our model does not assume the validity of any one system by which people might gauge emotional (in)congruence, and the arousal/valence approach is simply a convenient heuristic because listeners clearly recognize at least these dimensions of emotional speech.

Communication may also rely on cognitive information, which changes people’s beliefs about an object’s positive and/or negative attributes. These attributes often become guides to evaluating the object (Haddock et al., 2008; Mayer & Tormala, 2010; See et al., 2008). For instance, the ‘Know Overdose’ ad campaign has distributed information about the harms of drug use to the public, generally avoiding affect-provoking imagery in favor of facts and statistics (Harm Reduction Coalition, 2020). With the novel CIVA model, which explores the Contextual Influences of Vocal Affect, we propose that vocal affect can influence the outcomes of both affective and cognitive communication types; however, the processes by which vocal affect influences affectively-based and cognitively-based communication differ.

Contextual Influences of Vocal Affect: CIVA

Although numerous psychological mechanisms could link vocal affect to evaluations being effectively communicated, we theorize that these mechanisms can be meaningfully organized into three primary categories. Each category comprises several mechanisms that connect vocal affect to communication. First, vocal affect may alter recipients’ judgments and attributions regarding their emotions (emotion origin/construal). Second, vocal affect may lead recipients to change the type and/or the magnitude of emotions they are feeling (changing emotions). Third, vocal affect may lead recipients to form inferences about the communication’s source that may then impact how recipients respond to communicated information (communication source inferences). Each category shapes the degree of communicative success likely to occur in a distinct way.

We have outlined the key features of the CIVA model in Fig. 1, which can be followed from the top to the bottom, along paths that reflect contextual factors in the communication environment. The bolded/unitalicized cells each reflect one of the CIVA categories (i.e., emotion origin/construal, changing emotions, and communication source inferences), and the bolded/italicized cells each reflect one of the CIVA processes (e.g., validation effect, persuasive intent). Broadly, CIVA’s categories of processes are hypothesized to emerge differently (or not at all) given affective or cognitive messages, as reflected by this being the first (i.e., top) question in Fig. 1. For instance, emotion origin/construal processes are more likely to emerge given affective messages, communication source inferences processes are more likely to emerge given cognitive messages, and changing emotions processes are hypothesized to operate in substantially different ways given affective versus cognitive messages. Figure 1 is intended to give a broad overview of the complete CIVA model, capturing the critical moderators that we propose determine which category of processes, and which process within a given category, will determine vocal affect’s influence on communication.

Fig. 1
figure 1

Vocal Affect’s Communication-related Processes Determined by Contextual Factors: the CIVA Model. Note. Although we display pathways as though they are categorical (e.g., “yes” versus “no”) for illustrative clarity, they are generally more likely to be continuous variables

Category One: Emotion Origin/Construal

First, vocal affect may alter the inferences that recipients make about their emotions. Psychological research has examined people’s inferences and attributions about their emotions, which can often prompt very different reactions to the same underlying emotion (Cooper, 2007; Losch & Cacioppo, 1990; Schwarz et al., 1985). For instance, imagine listening to a story that claims that an unfamiliar creature is highly dangerous, prompting you to feel some fear. You might wonder if your emotions are proportionate in magnitude, or appropriate in type. Indeed, people desire social validation in many domains (Cialdini, 2009; Guadagno et al., 2013). When people seek validation, vocal affect congruent (vs. incongruent) with their emotions may validate or bolster their emotions. This may be important when communicating affective messages that attempt to instill feelings in the recipient to communicate an overall judgment (e.g., promoting anger in a recipient so they will dislike an object).

A second emotion origin/construal process is attribution. Recipients who feel emotions when listening to vocal affect may consider possible sources of the emotion from the overall communication context. The existing literature on attribution suggests that one’s internal experiences can be pinned to many different origins (Schwarz, 1990; Schwarz et al., 1985; Wallbott, 1986; Wyer et al., 1999). For example, if an excited-sounding actor’s message about cheesecake prompts excitement in recipients, recipients might attribute the emotion to the cheesecake, to the speaker, to themselves (e.g., to their own cheerful disposition), to a contextual origin (e.g., if they just consumed a caffeinated beverage), or any combination thereof.

Most clearly, attributions of emotion to the described object have implications for communication. For example, people who attribute their emotion to the object might infer that the object shares the emotion’s valence. In the prior example, the cheesecake is seen as good because it is exciting, which might mean the recipient comes to like the cheesecake. Attributions to the speaker are somewhat less clear. One possibility is that recipients may reason that the speaker’s emotions reflect the speaker’s personal relationship with the object (e.g., the speaker is excited by cheesecake, as opposed to cheesecake being objectively exciting). Depending on how the recipient interprets this information, it may have different implications for the recipient’s judgment of the object. For instance, assuming the speaker is seen as a valid source of information, their opinion may guide how the recipient should view this object. Attributions to the recipient and context are less likely to facilitate a communicator’s persuasive goal. For example, if people think they feel excited because they see themselves as an excitable person, they may attempt to correct for the positivity conferred by their perceived excitement bias (i.e., bias correction; Wegener & Petty, 1997). This could undermine the speaker’s goal to change a recipient’s attitude by shaping their emotional evaluations of the object. Similarly, attributions to the context (e.g., awareness that one has consumed caffeine pills or considering how upbeat pop music is playing nearby) may undermine a communicator’s ability to convey key emotions. In sum, whereas attributions of emotions to the object make those emotions diagnostic, and attributions to the speaker are moderately diagnostic, attributions to the environment and recipient are non-diagnostic.

Emotion Origin/Construal: Moderators

CIVA proposes several factors that control which process is more likely within emotion origin/construal. For example, emotion-validation effects are more likely when recipients are uncertain whether their emotional reactions are appropriate and proportionate. This uncertainty is probably more likely when emotional information about an object is somewhat ambivalent, such as when an object seems intensely exciting but also intensely distressing. A positive message about skydiving may call up positive feelings (e.g., excitement) in a recipient, but also evoke negative feelings (e.g., terror). Emotional validation provided by a source’s vocal affect may help resolve the ambiguity. Lots of vocalized excitement may thus secure the dominance of people’s excitement over their fear.

Attributional mechanisms should be predominant if the origin of recipients’ emotions (e.g., if those emotions are caused by a described object versus by a speaker) are ambiguous. For example, suppose that recipients again consider the positive message about skydiving, but have recently consumed some placebo pills they believe contained caffeine. Recipients might then attribute arousal produced by the communicated message to the caffeine rather than to skydiving (Cooper, 2007; Zillmann et al., 1972). Here, the speaker’s vocal affect could cement one attributional path over another. For example, a (presumably non-caffeinated) source who sounds excited should redirect recipients’ attribution of their excitement from the caffeine to the object by the principle of consensus (Kelley, 1973).

Emotion Origin/Construal Given Affective Messages

Most of the emotion origin/construal processes primarily make sense in the context of affective messages. These processes shape whether emotions are perceived as diagnostic guides to evaluating objects, but this presupposes that emotions are emphasized in the communication. It matters less if people attribute their irritability to their own temperament, the speaker’s voice, or their uncomfortable chair–if anger is completely irrelevant to the communication. But if a communicator needs the listener to feel angry and ascribe that anger to some object’s being bad, these attributions matter substantially. We further hypothesize that emotion origin/construal processes will be more influential given high-intensity affective messages. For instance, participants’ inferences about high-intensity anger is likely to have extreme consequences, whereas their inferences about low-intensity anger is not likely to seriously impact their thoughts or behaviors.

Empirical research. Before we introduce the specific experimental work that examines emotion origin/construal processes, we return to the distinction between (in)congruence between vocal affect and message content. When listeners are focused on making inferences about the origins of their emotions, incongruence may produce some communication advantages. Figure 2 plots how affective messages might influence communication effectiveness, flowing from left to right. To begin, affective communications should induce an emotional feeling in the recipient (a). However, for recipients’ emotions to substantially impact communication effectiveness, those emotions may need to be attributed to the object (b). In particular, path b represents a judgment that the recipient is feeling a given emotion because of the qualities of the object itself: the recipient thinks that object provokes this emotion in them. If the object provoked their emotion, the emotion should be perceived as diagnostic for evaluating the object (c). That is, people will presumably dislike objects that are thought to be responsible for their own negative emotions, and presumably like objects that provoke positive emotions in themselves. However, people may also attribute emotions to the speaker (d). Speaker attributions have a somewhat ambiguous relevance to communication because recipients might feel that their emotions are due to the source’s idiosyncratic relationship with the object (e). Finally, recipients may attribute their emotions to their own dispositions, or to the environment.Footnote 1

Fig. 2
figure 2

Theoretical model linking vocal affect with affective messages

Certainly, incongruent vocal affect could decrease the amount of emotion that people are feeling (f). However, recall that emotion origin/construal processes are predicted to occur in the context of the very intense affective messages; thus, listeners would be expected to be experiencing the message’s communicated emotion intensely due to the message itself. The more important role of vocal affect here might involve changing the attributions that people make about these intense emotions.

People experiencing an emotion when listening to an emotional message by an emotional speaker have an attributional puzzle to solve: what/who is responsible for this emotion they feel? When the speaker’s voice does not match the message’s affective content, it seems unlikely that people will attribute their emotions to the speaker. Consequently, the object should be seen as proportionately more likely to be causing the recipients’ feelings. Thus, an incongruent voice is likely to substantially increase (g) the likelihood that emotions will be attributed to the object (b), subsequently increasing the probability that these emotions will be used as diagnostic guides to the object, boosting communicative success (c). Incongruent voices should decrease the odds that recipients’ emotions will be attributed to the speaker (path h). Decreasing speaker attributions may slightly undermine communication because speaker attributions are moderately diagnostic (e).

For example, consider an intensely affective message about the horrors of global warming read by a happy speaker. Listeners will probably feel sad even though the speaker sounds happy, due to the intense emotionality of the communication’s contents. Moreover, listeners will probably associate their sadness to global warming rather than the speaker, since it should seem quite unlikely that the speaker’s happy voice made listeners feel depressed. If listeners attribute sadness to global warming, and sadness is a negative emotion, this should prompt more negative judgments about global warming. Since the communication was trying to lead people to a negative opinion of global warming, this would increase communicative success.

To empirically test these ideas, we recently conducted a line of research that isolates vocal affect rather than combining it with other nonverbal signals (compare Van Kleef et al., 2015), and holds message content constant across all conditions (Guyer et al., 2018). Furthermore, this research used a highly-intense affective message that should reward incongruence. We conducted three experiments in which participants were exposed to an affective message about an ostensibly real aquatic mammal called a lemphur.

In each experiment in Guyer and colleagues (2018), participants were exposed to an initially positive written essay about the lemphur and expressed their initial attitudes towards the lemphur via self-report (Crites et al., 1994). Next, participants were asked to listen to a negative, affective message about lemphurs. Specifically, the passage described a horrific encounter in which a lemphur kills and eats a human swimmer (see Fabrigar & Petty, 1999). Although the content of this message was always identical (a strong, fear-based message), we randomly assigned how the message was communicated across four conditions. Some participants simply read a transcript of the message (written condition). The other participants heard one of three spoken recorded versions of the message, always spoken by the same acting student (a man). In one version, the speaker used a fearful voice (i.e., fully congruent with the message). In another, the speaker sounded bored (i.e., partially incongruent, matching the message on valence but mismatching on arousal). Finally, other participants heard a version in which the speaker sounded content (i.e., fully incongruent from the message).Footnote 2 Finally, participants expressed their attitudes for a second time.

Note that one could reasonably speculate that a congruence pattern would occur: intuitively, pairing a fear-based message with a fearful voice seems logical, and we will explore this sort of assimilation logic in Category Two (changing emotions). Alternatively, an incongruence pattern could occur, assuming that incongruent vocal affect increases the attribution of speaker emotions to judgments about the object (see Fig. 2). The CIVA model breaks this stalemate by pointing to the experimental context: An attribution effect should occur because the affective message was intense, elevating the likelihood of emotion origin/construal effects. Across three experiments, communication success was increased (compared to the written condition) only when the speaker’s voice was partially incongruent or fully incongruent with the message (Guyer et al., 2018). The congruent voice, however, did not elicit increased communication success relative to a written message. Furthermore, the incongruent voices each produced more communication success than the congruent voice. Finally, the two incongruent voices did not differ from one another with respect to communication success. Therefore, speakers may better communicate their opinions when speaking with emotions that mismatch their communication’s affective content. Ironically, the speaker was more persuasive about the dangers of a violent animal when speaking in a voice that sounded bored, rather than sounding terrified. Though initially counter-intuitive, the CIVA model explains exactly why this would occur, and proposes mechanisms that should account for this reaction: specifically, different attributions occurring across the vocal affect conditions.

Indeed, incongruent voices changed people’s attributions of their fear to primarily view the object (i.e., the lemphur) versus the source (i.e., the speaker) as responsible (Guyer et al., 2018, Exp. 2–3). Verifying Fig. 2’s path g, incongruent voices led people to make more emotion attributions to the lemphur; and verifying path c, this increased communication of negative opinions about lemphurs. Thus, not only did incongruence increase communication, CIVA’s proposed mechanism emerged clearly: incongruence made the object seem particularly responsible for listeners’ feelings, which made those emotions a highly diagnostic guide for judging the object. Furthermore, verifying path h, incongruence made people less inclined to attribute their emotions to the speaker. Consistent with CIVA, incongruent speakers form unlikely explanations for why one feels a message’s emotions, because the incongruent voice mismatches those emotions. Interestingly, we found that path e slightly increased communicative success. In other words, incongruence made people less likely to pin their emotions on the speaker, and this slightly undermined communicative success because communication increased when people attributed their emotions to the speaker.Footnote 3

In Experiment 3 of Guyer and colleagues (2018), we replicated the core communication effects again. Furthermore, we replicated our attribution effects: once again, incongruence led to relatively more object versus speaker attributions, and this bolstered communicative success because object attributions were more related to attitude change than were speaker attributions. Importantly, Experiment 3 also considered three other CIVA mechanisms: contrast (a changing emotions process), persuasive intent (communication source inferences), and expectancy violation (communication source inferences). None of these explanations held up as alternative mechanisms. Indeed, CIVA suggests that contrast is unlikely given the intense affective messages, and participants’ self-reported emotions did not vary by condition. Persuasive intent would be implausible according to CIVA due to the message being affective rather than cognitive, and indeed persuasive intent was completely unrelated to attitude change. Finally, a surprise-based (i.e., expectancy violation) account would be unlikely given the affective nature of the message. Indeed, surprise did not account for the data because the calm (fully incongruent) voice was uniquely less surprising than the other conditions, and yet this voice was associated with more rather than less communicative success.

Category Two: Changing Emotions

CIVA’s next category of processes relates to situations when vocal affect changes the type and/or the magnitude of emotions felt by recipients. In other words, this category encompasses both when vocal affect makes a recipient change from feeling one emotion to a different type of emotion, or when vocal affect increases/decreases the magnitude of a recipient’s emotion. This can happen in at least three different ways. First, sometimes vocal affect may cause a recipient to experience the emotion that the source is expressing. For example, recipients often assimilate the emotions that other people express (Hatfield & Rapson, 2008; Hatfield et al., 1992; Neumann & Strack, 2000), consequently experiencing those emotions themselves. For example, a source’s vocal anger may cause a recipient also to feel (more) angry. Source emotions need not be explicitly communicated to spread to listeners (Neumann & Strack, 2000).

Second, recipients may sometimes contrast how they feel about an object with how the speaker appears to feel about the object based on the speaker’s vocal affect. Indeed, contrast effects are often detected in psychology (Martin et al., 1990; Schwarz & Bless, 2007; Wanke et al., 2001). For example, recipients who feel slightly angry about political corruption might consider that compared to an absolutely enraged anti-corruption speaker, the recipients’ anger seems quite trivial. Consequently, recipients may perceive themselves as feeling less anger.

Third, the emotions of sources and recipients need not always mimic or contrast with those of a speaker, but may instead sometimes be complementary, referring to emotions that differ from a source’s emotion but are responsive to that source’s emotions (Dimberg & Ohman, 1996; Keltner & Kring, 1998). For instance, a recipient who is personally criticized by an angry voice may not feel anger themselves, but rather might feel ashamed, sad, or even fearful.

Moderators of Specific Changing Emotions Processes

Summarizing the prior section, hearing a source’s vocal affect may generate either more of that emotion (assimilation), less of that emotion (contrast), or an entirely different emotion (complementary reaction) in recipients. CIVA offers guidance through moderators that control when each process is most likely to emerge. For example, what factors may switch recipients between assimilation versus contrast effects? Past research indicates that assimilation is often more likely to occur than contrast when a particular influence is present but not particularly salient in people’s attention (Schwarz & Bless, 2007), a principle that equally might apply to vocal affect (i.e., when a person speaks with subtle vocal affect). However, sometimes vocal affect’s influence on recipients might become quite salient, for instance if the source’s vocal affect itself is very extreme or if recipients are encouraged to listen carefully to the speaker’s voice. In these situations, recipients may engage in active bias correction (Martin, 1986; Martin et al., 1990; Wegener & Petty, 1997), attempting to suppress this contamination of their judgment (e.g., dampening their anger when they realize that their anger is caused by a speaker). However, even very salient influences like vocal affect will not automatically provoke contrast effects if the influence is deemed to be valid. People may recognize that their anger is driven by a source’s voice but nonetheless consider the anger legitimate: for instance, if the source is judged as a valid source of information about how the recipient should feel.Footnote 4 Indeed, bias correction models have suggested that recipients have to be sufficiently motivated and able to correct bias before correction will occur (e.g., Wegener & Petty, 1997). In short, contrast effects of vocal affect may require several preconditions: (i) salience in recipients’ attention, (ii) belief that the source’s vocal affect is a bias affecting them, and (iii) ability to correct for this bias.

Complementary emotions may be most likely to emerge depending on the apparent target of the vocal affect. Facial expressions directed at (versus away from) the recipient provoke complementary reactions (Dimberg & Ohman, 1983). Likewise, when a recipient recognizes that they are the target of a source’s vocally expressed emotions (versus there being no particular target), complementary reactions also may be more likely. For example, imagine a scenario in which a person angrily chastises a work colleague for smoking in front of the entrance to their office building. Rather than feel anger at being reproached, the recipient may feel complementary emotions such as shame or guilt. Thus, vocal messages that target the recipient (by name, with second-person language such as ‘you,’ etc.) should more likely provoke complementary reactions, as might conditions that lead recipients to empathize with the emotional speaker.

Changing Emotions Given Affective Messages

Once changing emotions processes influence people’s emotional reactions to messages, this may impact how successful a communication will be, but the exact effects should depend on message content. Affective messages rely on producing a particular type of emotion in the message’s recipients, so that recipients will use those emotions as guides to evaluating the object described in a communication (Haddock et al., 2008; Mayer & Tormala, 2010; See et al., 2008). Thus when vocal affect leads to changing emotions effects, this may facilitate communication (if vocal affect produces an emotion in the recipient that matches the communication goal), or undermine communication (if vocal affect produces an emotion antithetical to the communication goal).

Empirical research. Surprisingly, although the idea that affective messages rely on successful cultivation of appropriate emotions in recipients is well-documented and there is substantial indirect empirical evidence to support the plausibility of the processes we have described in this section, there is not extant literature directly demonstrating that vocal affect can boost these effects by instilling certain emotions in recipients. This lacuna constitutes one of the most promising opportunities in the vocal affect literature.

Changing Emotions Given Cognitive Messages

Changing emotions processes may initially seem irrelevant for cognitive communication; after all, cognitive messages operate by changing recipients’ beliefs about the attributes of objects, not by shaping recipients’ emotions. However, CIVA suggests that changing emotions processes should alter the success even of cognitive messages. Emotions can shape how people receive communications from others, even when these emotions are substantively irrelevant to the communication’s contents. We elaborate on two relevant processes here, although these represent only small samples from a broad literature (e.g., Isbell et al., 2013; and see Petty et al., 2003, for a review).

One example is the hedonic contingency perspective, raised by Wegener and his colleagues (1995). This viewpoint suggests that people experiencing positive emotions (e.g., happiness) generally wish to maintain these emotions, whereas people experiencing negative emotions (e.g., sadness) are less protective of their present emotional state (also see Wegener & Petty, 1994). Empirical studies support that happy people are more sensitive to the hedonic consequences of processing a message. For example, a message that seems likely to be upsetting may not be thoroughly processed by a happy person, but would be carefully processed by a sad person; however, messages that seem likely to provoke positive feelings might be processed regardless of emotional status. Of course, this has important consequences for whether a communication successfully alters a recipient’s opinion.

Another process that can occur involves emotions’ effects on recipients’ confidence. For instance, Briñol and colleagues (2007) had participants read either a weak or a strong persuasive message, and afterwards induced participants to feel happy or sad. Importantly, the mood induction was substantively irrelevant to the message topic itself. Participants led to feel happy (versus sad) had greater confidence in their thoughts, and thus the effect of argument quality on communicative success increased. That is, participants who read the weak message and thus generated oppositional thoughts to the message, and then felt happy (sad), consequently trusted (distrusted) these negative thoughts, and thus were particularly unpersuaded (persuaded). In contrast, participants who read a strong message and thus generated supportive thoughts, and then felt happy (sad), consequently trusted (distrusted) their positive thoughts, and thus were particular persuaded (unpersuaded). However, in Briñol and colleagues’ work emotions were manipulated after participants had considered and listed thoughts about the message, likely causing emotions to validate those thoughts. If the message itself was read with a particular vocal affect, possibly resulting in participants feeling that emotion while thinking about the message, this could work against finding thought validation effects (Briñol et al., 2007; Petty et al., 2002).

Empirical research. Empirical research justifying this category of processes is best described as indirect but very probable given effects found in two distinct literatures. First, people’s emotions are clearly shaped by other people’s vocal affect. Indeed, scholars have argued that a fundamental purpose of vocal affect is to alter listeners’ affect (Bachorowski & Owren, 2008; Russell et al., 2003). For example, listening to readings of an identical message (a philosophical essay by David Hume) read in a happy, sad, or neutral voice led listeners to experience relatively congruent emotions themselves (i.e., more sadness/happiness in the sad/happy voice conditions; Neumann & Strack, 2000). Interestingly, these authors also found that when listeners then read the same communication aloud themselves, they spontaneously tended to mirror the speaker’s vocal affect. This is important because speaking in an emotional tone tends to produce a congruent emotional state in the speaker (Hatfield et al., 1992). Thus, for instance, a communication source’s vocal sadness may prompt listeners to speak sadly too, producing sadness in the listener.

Second, there is good reason to think that recipients’ emotions shape how they process communicated information. We gave two examples above: the hedonic contingency view, and thought-validation effects caused by recipients’ current emotions. Both of these hypotheses are rooted in extensive literature, and have been extensively documented. For example, the hedonic contingency view is linked with a broader literature concerning mood management differences across levels of current emotion (Wegener & Petty, 1994), and its effect on message processing has been independently validated (Côtè, 2005; Turner et al., 2013). Similarly, emotions serving to validate thoughts are linked with a rich background literature in which many recipient body states can validate thoughts (for a review, see Petty & Briñol, 2015), and emotion-based thought validation has been replicated and extended (Briñol et al., 2018).

Unfortunately, these two literatures are seldom studied together, that is, in research where a communication source speaking emotionally shifts recipient emotions, altering how the recipient then processes communicated information. However, if the first and second steps of this process (communication source emotion to recipient emotion; recipient emotion to judgmental changes) are valid, as we have argued above, it follows that their conjunction also should unfold when tested directly.

Category Three: Source Inferences

Finally, CIVA’s communication source inferences category describes how vocal affect may prompt recipients to form judgments about the communication source’s attributes and motives. Often, beliefs about a communication source’s characteristics can have a large effect on the likelihood that a recipient accepts the communication (e.g., Briñol & Petty, 2009). Vocal affect could alter such inferences about the communication source in numerous ways.

One consequence of vocal affect is that it could influence recipients’ perceptions that the speaker is trying to be persuasive. Past work indicates that forewarnings of persuasive intent sometimes decrease communicative success (Lee, 2010; Mühlberger & Jonas, 2019). For example, consider a speaker who gruesomely details how cigarettes impact one’s internal organs, and further imagine that the speaker’s vocal pattern becomes tremulous and fearful as she describes the fear-inducing effects of tobacco smoke and nicotine addiction. Recipients’ perceptions of the persuasive intent of the speaker may increase, especially if the vocal affect is seen as mawkish. This could then undermine effective communication.

Additionally, vocal affect might be interpreted by recipients as revealing the speaker’s confidence. For example, a highly enthusiastic speaker is likely perceived as quite confident, given that enthusiasm is understood to signal positive valence (Remington et al., 2000) and perhaps high dominance (i.e., because enthusiasm may imply good self-control and agency; Bakker et al., 2014). The positive valence might be taken to mean the speaker enjoys what they are discussing, and/or is comfortable speaking; dominance may indicate that the speaker feels in control and authoritative, suggesting confidence. If these ideas hold, perceptions of confidence should then have downstream consequences for communication. Indeed, past research indicates that voices perceived as being high (vs. low) in confidence are often more persuasive, for a variety of reasons (Guyer et al., 2019; Van Zant & Berger, 2020).

Finally, people’s beliefs about a source may be challenged or made salient when the source violates their expectations of that source, eliciting surprise in recipients. For example, if a communication focuses on specific emotions (e.g., contains words conveying excitement), vocal affect reflecting extremely dissimilar emotions (e.g., sadness) might elicit surprise because people might expect that a source’s voice should match the tone/content of their message. Alternatively, messages with relatively low-stakes content (e.g., about a minor new tax policy) might provoke surprise when presented with high-arousal voices (e.g., the person describing the tax policy sounds terrified). In turn, surprise can have a variety of communication-relevant effects. Most critically, surprise may increase the amount or depth of processing that people allocate to a communication. Increased processing potentially boosts communicative success with strong (coherent, logical) communications, and weakens communicative success with weak (incoherent, illogical) communications (Petty et al., 2001; Schützwohl & Borgstedt, 2005).

Moderators of specific source inferences processes

Persuasive intent should be more likely to guide communicative success when reactance concerns (a state of psychological resistance in which people are motivated to resist other people’s communications; Miron & Brehm, 2006) are primed in recipients. For example, reactance is more common when people feel that their freedom is being constrained (Clee & Wicklund, 1980). For instance, Quick and colleagues (Quick et al., 2015) found that when persuading people to list themselves as organ donors, loss-framed communications that highlight negative consequences of failing to self-register led to increased reactance (compared to gain-framed communications). In contrast, reminders that ultimately recipients are free to think as they see fit decrease reactance (Miller et al., 2007), making it less likely that vocal affect will operate by shifting perceptions of persuasive intent.

Confidence, too, may vary in its persuasive impact depending on circumstances. For example, confident sources increase communication effectiveness when they present strong arguments, but actually decrease communication effectiveness when they present weak arguments (Tormala et al., 2006). A likely reason is that confident speakers draw more attention than doubtful speakers, but this can backfire when communications are unconvincing and vacuous (because they promptly counter-argue bad arguments). Thus, argument quality may play a role in when and how vocal confidence impacts communication.

Concerning expectancy violations, one probable moderator is the level of careful thinking that participants are typically devoting to processing the communication. Past theory and research indicates that variables are more likely to effectively influence people’s level of processing when people’s motivation and ability to think is unconstrained (Baker & Petty, 1994; Petty & Cacioppo, 1986; Petty et al., 2001). In contrast, if recipients’ circumstances or dispositions constrain processing to be low or high, surprise is unlikely have much effect on the extent to which people will process a communication.

Sources Inferences Given Cognitive Communications

We propose that the communication source inferences processes we have outlined have much clearer communication roles given cognitive (rather than affective) communication contents. For example, consider how a speaker’s vocal affect may lead one to think that a speaker is confident. This may bias listeners into agreeing with the central claims of the speaker’s argument because listeners tend to assume that confident speakers are more likely to be correct in their beliefs (Guyer et al., 2019). However, this same inference about the speaker being confident is rather ambiguously useful given affective communication. For instance, a confident-sounding speaker could either undermine a communication that is trying to convey fear and anxiety to a listener, because it sounds as though the speaker is not themselves fearful or anxious; or it may benefit communicative success, if the confident speaker simply draws more attention from recipients. Thus, vocal affect would seem likely to impact an affective communication via an emotion origins/construal process (attribution) rather than a communication source inference per se.

Empirical research. One line of our research explicitly addresses communication source inference processes of vocal affect given cognitive communications (Vaughan-Johnston et al., 2019). This work was structurally similar to the Guyer and colleagues (2018) paradigm we explained earlier: participants formed and rated their initial opinions about lemphurs based on some mildly positive information. Next, they were exposed to a negative communication in one of four conditions: written, or with the communication read by a speaker using vocal affect. Attitudes were then measured again to gauge communicative success.

In Experiment 1 the communication characterized lemphurs as having problematic feeding habits causing harm to nearby communities’ economies (i.e., a negative, cognitive communication). Vocal affect conditions included fear and excitement, but also an emotionless voice that was developed by digitally altering a voice recording (via Praat; see Boersma & Weenink, 2018). The emotionless voice controlled for the mere effect of voice as people listen to the communication rather than reading it. The fearful voice likely violated listeners’ expectations due to its high-arousal nature (potentially increasing communication effectiveness), but also could have undermined the speaker’s perceived confidence (potentially decreasing communication effectiveness because the arguments were strong; Baker & Petty, 1994). In contrast, the excited voice should improve communication effectiveness, being both surprising and indicative of higher confidence. In fact, Experiment 1 provided support only for excitement increasing communication effectiveness, with other conditions not differing from one another.

Experiment 2 provided greater clarity about why this pattern emerged, as we also included several mechanism variables: participants’ own emotions were measured to help rule out that changing emotions processes (i.e., contrast) could be responsible given this cognitive communication. An explicit measure of recipients’ feelings of being surprised by the passage’s tone was also included. This mechanism would be plausible given the experiment used a cognitive communication (favoring communication source inferences processes), and given the unconstrained processing conditions (favoring enhanced processing specifically). Experiment 2 also included ratings of the speaker’s confidence. Finally, the emotionless voice condition was replaced with a calm voice condition. This tested if excitement improved communication because excitement is positively-valenced (i.e., because participants who assimilated a positive emotion would be more accepting of the communication; Petty et al., 1993). If so, the calm voice should similarly boost communication effectiveness.

The calm voice did not improve communication in Experiment 2. Instead, excitement again improved communication effectiveness (replicating Experiment 1). Furthermore, the fearful voice produced more communication effectiveness. Moreover, the mechanisms helped to explain why these patterns occurred. As shown in Table 1, changing emotions were a weak explanation for the data, because although the fearful voice prompted emotions germane to communication effectiveness (e.g., more negative and less positive emotions for a negative communication), the excited voice did not influence emotions. Second, source confidence was a poor explanation for the data because although the excited voice was seen as confident, the fearful voice (which also bolstered communication effectiveness) was seen as lacking confidence. Third, however, expectancy violation was a good explanation for the data, because both the fearful and excited voices were seen as surprising. Moreover, ratings of surprise at the speaker’s tone were related to increased communication effectiveness. This is consistent with the idea that fearful and excited voices surprised participants, who then analyzed the communication more carefully; because the communication was quite convincing, they were more persuaded (compared to people who had read the communication, or received a calm voice).

Table 1 Effects of Vocal Affect Conditions on Communication Effectiveness and Mechanisms (Vaughan-Johnston et al., 2019, Experiment 2)

Pooling Experiment 1 and 2 meta-analytically, fear improved communication effectiveness compared to the written message. Excitement also produced more communication effectiveness than the written condition. Excitement also out-performed the fear-based voice.

Although the data presented so far are quite consistent with the CIVA model, an issue with our Experiment 2 is the reliance on a measurement-based assessment of the psychological mechanism. The case for expectancy violation would be strengthened further by revealing this mechanism via experimental manipulation. For example, if listeners were randomly assigned to hold different expectations regarding how the speaker will likely speak, and were then randomly assigned to a different voice during the persuasive communication, the matching versus mismatching with their expectations should provide an experimental parallel to Experiment 2. That is, communication effectiveness should increase when the later vocal affect differs from whatever vocal affect participants heard initially.

Summary

Psychologists are rich in data concerning the processes by which vocal affect is generated, communicated, and received, but remain impoverished of theories and data addressing the functional significance of expressing vocal affect. In particular, a unified framework for relating vocal affect to effective communication is lacking. The CIVA model, by drawing together many processes beneath a single explanatory umbrella, provides a useful resource for psychologists and applied researchers hoping to better understand vocal affect’s multifaceted relationship to communication effectiveness. The CIVA model posits three categories of processes by which vocal affect can influence communicative effectiveness: by shifting the emotion origin/construal by which people make sense of their emotions, by changing emotions that the recipient is feeling, and by altering communication source inferences by which vocal affect bolsters or challenges listeners’ understanding of the source’s character, abilities, or intentions.

CIVA posits moderators both within these categories and across categories. For instance, assimilation and contrast processes are both examples of changing emotions, but each may be more likely under prescribed circumstances (e.g., when people are versus are not vigilant to the possibility of assimilation as an unwanted bias). Furthermore, changing emotions or communication source inferences categories in general may be more likely depending on the communicative context (e.g., depending on the message’s relative emphasis on affective or cognitive information). Thus, the CIVA model helps to organize research concerning vocal affect and communication, and produces some guiding principles to understand when various processes are more or less likely to emerge. Direct empirical evidence substantiating the CIVA model is incomplete (particularly for changing emotions processes), but we have outlined some existing work that is very consistent with the model, and suggest specific future research that would clarify several outstanding issues.

Future Research with the CIVA Model

An important limitation is that although all of CIVA’s mechanisms (e.g., assimilation, contrast, persuasive intent) are predicated on a rich background literature, our proposed applications to vocal affect are often relatively novel. Consequently, whereas some CIVA mechanisms have quite directly been tested in a vocal affect context (e.g., attribution, expectancy violation), others have not (e.g., persuasive intent, social validation). Because it proposes moderators quite specific to each proposed mechanism, CIVA can help guide researchers in developing experiments that may reveal each process, and enriching theoretical understanding of vocal affect’s many persuasive implications.

Relatedly, CIVA’s moderators that account for which category of mechanism should drive vocal affect’s effects are based in much prior literature, but specific applications to vocal affect are theoretically novel and await direct testing. For example, attribution effects emerge in the context of highly intense affective messages (Guyer et al., 2018), consistent with the idea that emotion origin/construal processes should occur for high-intensity affective content. Communication source inferences processes, particularly for expectancy violations, emerge for cognitive messages as predicted (Vaughan-Johnston et al., 2019). Thus, existing evidence is consistent with CIVA’s claims. However, future research should target these claims about moderation more directly. For instance, studies could systematically vary the intensity of affective messages (low versus high) in a between-participant design to demonstrate that changing emotions processes (i.e., assimilation, contrast, complementary reactions) emerge more strongly given low-intensity affect, and emotion origin/construal processes (i.e., validation, attribution) emerge more strongly given high-intensity affect. Relatedly, studies could contrast an affective against a cognitive communication for a direct comparison of emotion origin/construal versus communication source inferences processes. For instance, traditional confidence effects should bridge between vocal affect and communication effectiveness only for the cognitive, but not for the affective message.

Extensions to the CIVA Model

Although substantiating all of CIVA’s primary claims is our most immediate concern, there are several ways that the model could be extended that future work might explore, such as proposing additional mechanisms of vocal affect. One interesting extension of the CIVA model might integrate recipients’ pre-message, message-irrelevant emotions into the model. For example, participants might be manipulated to feel either angry or sad before attending to a negative, cognitive message (e.g., describing problems associated with a new tax policy) vocalized by an angry- or sad-sounding speaker. Matching of source vocal affect to recipients’ pre-message affect could facilitate communication effectiveness for several reasons. DeSteno and colleagues (DeSteno et al., 2004) found that pre-message anger, for example, promoted a thought bias favoring angry- over sadness-framed messages (and vice versa for pre-message sadness) at least among participants who were dispositionally inclined to think carefully. Recipients might feel that a source is similar to themselves insofar as that speaker’s vocal affect mirrors (versus clashes with) their own emotions. Given that people prefer similar over dissimilar others (Burleson & Denton, 1992; Philipp-Muller et al., 2020), they may also feel more liking for a source whose vocal affect suggests shared emotional experiences to themselves. In turn, liking a message’s source may improve communication success (Roskos-Ewoldsen & Fazio, 1992).

Another possibility is that high-arousal voices could lead participants to assume that the source considers the topic to be important or moralized (another communication source inference). Important attitudes often rouse intense emotions in people (Zuwerink & Devine, 1996), so a speaker who seems furious or overjoyed (high-arousal emotions) presumably considers the topic important. To some extent, this may convince the recipient to also perceive the topic as important, and therefore worthy of attention and cognitive elaboration (Holbrook et al., 2005), potentially bolstering communication effectiveness assuming a reasonably strong message. Topics can be important either because they are personally relevant, relevant to one’s core social identity groups, or relevant to one’s core values (Boninger et al., 1995; Eaton & Visser, 2008). Thus, if the recipient feels that they share similar personal circumstances, overlapping social groups, and/or shared core values with the source, then they may be more likely to infer that the speaker’s high-arousal voice implies that the recipient should also consider the topic to be important. Alternatively, voices expressing highly intense emotions may indicate that the source considers the topic to be connected with moral significance, given the close relationship between emotions and morals (e.g., Haidt, 2001). The association of the topic with moral stakes, if communicated to the recipients, could prompt more persistent, impactful attitudes in recipients (Luttrell et al., 2016). Thus, the CIVA model is not intended as an exhaustive set of processes, but its theoretical principles may be useful when incorporating other processes by which vocal affect influences communication.