Multiformat communication refers to personalized, bilateral, simultaneous communication through various channels; it is critical to relationship marketing efforts (Palmatier et al. 2008; Verma et al. 2016). Recent changes in technology and business practices have profoundly transformed the nature of bilateral (i.e., one-to-one) customer–firm communication (Appel et al. 2020; Grewal et al. 2020a, 2020b). An influx of new formats (e.g., video messaging, virtual worlds) occupy the communication landscape, as do opportunities to design novel options with unique characteristics for specific exchange needs (e.g., one-sided video conference). But even as technology-enabled formats have exploded in number and reach, “research has not kept pace with the rapid expansion of device types and interaction modes” (Yadav and Pavlou 2014, p. 25).

As a critical marketing research priority, identifying ways to manage customer relationships across a variety of communication formats represents a timely, complex challenge (Appel et al. 2020), especially amid a pandemic like COVID-19, in which mediated formats became a primary or sometimes only option for firms to build and maintain relationships with their customers (e.g., contactless or virtual appointments). According to the Aberdeen Group, companies with the strongest multiformat communication strategies retain an average of 89% of their customers, compared with 33% for those with weak strategies (Kulbyte 2018). With the recognition that current technology allows managers to vary formats across relational exchanges according to their communication goals, discussion topics (message factors), timing (temporal factors), and parties involved (dyadic factors), we seek a theoretically grounded, managerially relevant, characteristic-level framework of bilateral multiformat communication for achieving different goals, contingent on message, temporal, and dyadic factors.

The need for this framework also reflects the state of extant communication theories, which predate recent technological advances (e.g., artificial intelligence [AI], virtual reality [VR]) that facilitate sophisticated relationship marketing strategies (Steinhoff et al. 2019; van Doorn et al. 2017). Their insights tend to be limited to traditional (e.g., face-to-face, email) formats. On the other side, strategies that leverage emerging digital formats (e.g., video messaging, social media, virtual worlds) rarely are based in theory. With a more fine-grained analysis of the unique characteristics (e.g., visual cues, textual cues, synchronicity) embedded within communication formats, we seek to reveal how individual format characteristics (rather than the format as a whole) function in an exchange, so managers can use them and choose among existing formats, create new ones, or combine multiple formats to achieve key communication goals and performance objectives, without wasting resources. This characteristic-level perspective is essential to developing a foundation for multiformat communication (Palmatier et al. 2018).

Therefore, we start by introducing several communication fundamentals and goals (effective, efficient, experiential) that firms pursue with multiformat communication strategies. Next, we review extant communication theory and identify three research gaps, which motivate our specification of six fundamental format characteristics that constitute the building blocks for our framework (MacInnis 2011). Each characteristic can promote different communication goals; furthermore, theory suggests that distinct message (e.g., ambiguity) and temporal (e.g., relationship stage) factors create unique requirements for different characteristics (Table 1). We then consider prior bilateral communication research (Table 2), in light of the theoretical insights and gaps identified, to complement and extend prior theory. In line with the state of theory, extant research largely pertains to the format level, though the findings still yield essential insights for our holistic, characteristic-level framework. Current bilateral communication frameworks do not encompass pertinent aspects such as the public nature of social media posts, AI agents, virtual worlds, or avatars, despite the opportunities afforded by such features for relationship marketing efforts. We review relevant research to understand how these previously identified factors (Table 2) and new considerations become manifest in exchanges and affect performance; we recognize additional temporal (e.g., sequence) and dyadic (e.g., human vs. AI) factors that alter the impacts of format characteristics.

Table 1 Review of communication theories
Table 2 Summary of select bilateral communication research

The insights obtained from our reviews of communication theory and bilateral communication research then allow us to decompose traditional and emerging digital formats into their individual characteristics. Communication theory acknowledges traditional formats that bundle multiple characteristics (e.g., telephone, email); we seek to go beyond bundles and format-level analyses to investigate individual communication characteristics (e.g., visual cues, textural cues, channel synchronicity) and their interactive effects (i.e., characteristic-level analyses). We also predict how certain characteristics might be simulated through recent technology (e.g., avatars, virtual worlds). Thus, we generate new insights to predict how each characteristic (real and simulated) contributes to effective, efficient, and experiential communication goals in light of important message, temporal, and dyadic factors (Table 3). This contribution paves the way for future efforts; by identifying the building blocks of communication formats and revealing their effects and success drivers, we recommend how firms can implement existing formats and build new ones, now and as technology evolves to support novel combinatory possibilities. We conclude with 15 formal propositions categorized into five overarching themes (Table 4).

Table 3 Decomposition of communication formats: characteristic comparisons
Table 4 Characteristic-level multiformat communication strategies

Communication fundamentals

Relational multiformat communication strategies are “aimed toward guiding more personal relationships with clients and enhancing customers’ noneconomic (social) satisfaction (Kim and Kumar 2018, p. 50). Communication theorists and marketing scholars concur that a primary goal of effective, bilateral communication is to reach mutual understanding, such that “between two people, the messages received equal the messages sent, with no distortion” (Mohr and Bitner 1991, p. 2). At its core, effective communication means that each party understands what the other is trying to communicate; if two parties cannot reach mutual understanding, the exchange does not persist. This inherent goal of customer–firm bilateral communication appears in early communication frameworks that identify the face-to-face format as optimal (e.g., Short et al. 1976; Walther 1992).

As newer formats emerged (e.g., email, video chat), so did another goal, namely, to minimize the communication costs associated with each format, such as the time, effort, and resources required (Palmatier et al. 2008), and thereby ensure efficient communication. Communication frameworks that acknowledge this goal note a trade-off between efficiency and effectiveness (e.g., Daft and Lengel 1986; Dennis et al. 2008). Efficiency tends to be less critical for early stages of a customer–firm relationship, because firms (and people) are willing to invest more resources to develop strong, profitable relationships, and new customers may be less concerned with costs as they learn about the firm and its offerings. In later stages though, both parties may be more concerned with efficiency.

Finally, firms have realized the value of interactions characterized as experiential and engaging too (Gonzalez 2019; Hilken et al. 2020). Chief marketing officers responding to a survey indicate that they plan to spend up to half of their budgets on experiences, and 77% of them identify experiential marketing as vital (Wertz 2019). Promoting experiential exchanges by stimulating sensory involvement in particular is a growing practice (Krishna and Schwarz 2014), as is enabled by many emerging digital formats (Altman 2017). Virtual interactions, especially in the COVID-19 era, are notably prominent. However, bilateral communication frameworks have not kept pace with these developments, to specify how an experiential, engaging exchange should integrate sensory, emotional, and social information (Arnould and Price 1993) that can enrich customers’ existing or build new mental associations with the core offering, brand, or firm (Harmeling et al. 2017; MacInnis and MacInnis and Price 1987), as well as ultimately generate long-term shifts in beliefs or attitudes (Schouten et al. 2007). Although experiential communication can generate long-lasting, positive impressions that create affective, customer–firm connections, this outcome may not always be a primary goal. Rather, experiential communication may be more relevant in early (vs. later) relationship stages (e.g., acquisition, onboarding), to create strong first impressions and build new mental associations with the firm.

Our relationship-based conceptual framework accordingly addresses how each format characteristic might promote effective, efficient, and/or experiential communication goals, in light of relevant contextual factors. Specifically, we consider exchange factors pertaining to discussion topics (message), timing (temporal), and parties involved (dyadic).

Communication theories

We review and synthesize the most widely recognized communication theories (social presence, social information processing, media richness, and media synchronicity), selected for their prevalence, consideration of underlying format characteristics, and relevance to relationship marketing (Table 1). Even if developed prior to the emergence of many modern communication formats, they constitute the primary theoretical lenses available to assess bilateral communication in customer–firm exchanges and its effective, efficient, and experiential goals.

Social presence theory

Social presence theory proposes that communication formats differ in their ability to convey social presence, perceived intimacy, and immediacy (Short et al. 1976; Walther 1992). When the exchange requires interpersonal involvement or warm, personal communication, it should rely on a format with a high degree of social presence (Miranda and Saunders 2003; Short et al. 1976). For example, during service recovery encounters, formats with higher (face-to-face) versus lower (telephone) social presence promote satisfaction and trust, which should improve word of mouth and repurchase intentions (Lii et al. 2013). Proximal, visual, and verbal cues are characteristics that convey social presence, as does channel synchronicity (Miranda and Saunders 2003; Short et al. 1976). Still, this theory mainly focuses on the format (e.g., face-to-face, telephone), not the individual characteristics that define each format. In online environments, social presence arises when “situations trigger a feeling that a human being is present” (Grewal et al. 2020b, p. 98), and researchers have described options to heighten levels of social presence by enhancing formats with specific characteristics, such as adding visual cues (e.g., Gefen and Straub 2004; Hassanein and Head 2007) or interactive tools (Fortin and Dholakia 2005) to websites. Thus, in addition to predicting which communication formats promote effective exchanges when they require interpersonal involvement, this theory recognizes that new formats with unique characteristic combinations are possible and warrant attention in frameworks.

Social information processing theory

Similarly, this theory focuses on interpersonal benefits, but it highlights the dynamic effects of formats, by noting that some cues encourage faster relational development. That is, formats can prompt exchanges of social information at varying rates (Walther 1992; Yadav and Varadarajan 2005). Such social information extends beyond what is needed for the exchange, promoting both effective communication and accelerated relational development. Without it, mutual understanding is difficult, which would delay relationship development. For example, salesperson attractiveness in face-to-face interactions enhances performance, but this effect becomes suppressed over time (Ahearne et al. 1999). Face-to-face interactions also lead to more relational communication (a form of social information) than live chats (Walther et al. 2005), which is important during initial stages of customer–firm relationships but may diminish in value as the relationship matures. Originally developed to compare the relational benefits afforded by face-to-face relative to computer-mediated formats (e.g., email), this theory also tends to focus on the format level and emphasize how visual cues (e.g., facial expressions) offer social information (Walther 1992; Walther et al. 2005). Even with formats without visual cues or interpersonal information though, parties can adapt over time and cultivate strong relationships (Walther 1996), so ultimately, this theory suggests that ideal formats for effective communication shift with the relationship stage (i.e., temporally). In support of this, textual formats (with no visual cues) may grow more impactful in later (vs. earlier) relationship stages (Bandyopadhyay et al. 1994; Mohr and Mohr and Sohi 1995), perhaps both because it is efficient and because customers appreciate a record of their interactions once they establish a relationship. Customers with longer tenure are even more likely to exhibit favorable responses to firm communication on social media (Kumar et al. 2016a), perhaps due to the permanent record or public nature of the interaction.

Media richness theory

Media richness theory predicts that the effectiveness and efficiency of communication formats depend on their information richness, defined according to whether the information can enhance understanding, within a certain time interval (Daft and Lengel 1986; Yadav and Varadarajan 2005). For exchanges characterized by high message ambiguity and the potential for “multiple and potentially conflicting interpretations” (Daft and Lengel 1986, p. 556), a richer format promotes effectiveness, despite the higher associated costs (e.g., time, effort); richness benefits outweigh the costs. If an exchange instead features unambiguous, standardized messages, a leaner, less rich format can improve both effectiveness and efficiency. For example, in high-risk exchanges (which tend to be ambiguous), a richer format can elevate the human component and minimize potential losses by providing more trustworthy information, which also incurs greater costs. In low-risk (less ambiguous) exchanges, a leaner format reduces both customers’ and firms’ costs but still supports effective communication (Polo and Sese 2016). Although this theory originally pertained to manager–employee relationships, marketing scholars have applied it productively too. The primary focus is the format, yet research indicates that proximal, visual, and verbal cues, as well as channel synchronicity, can increase richness (Daft and Lengel 1986; Yadav and Varadarajan 2005). Overall, this theory recognizes which formats promote effective and efficient communication and suggests that richer formats are more useful for ambiguous messages, even though they conflict with efficient communication goals.

Media synchronicity theory

Media synchronicity theory extends beyond media richness theory, with the acknowledgment that a single exchange can involve multiple tasks, or steps, each of which might require a different level of behavioral coordination, defined as an “ability to support individuals working together at the same time with a shared pattern of coordinated behavior” (Dennis et al. 2008, p. 576). In this step-by-step perspective, format choices appear more complex than previous theories would suggest. Here, the format should match the message for each task, rather than the exchange as a whole. A task that demands message convergence (high levels of coproduction, ambiguous information) should take place in formats that promote more behavioral coordination (e.g., face-to-face, video-chat); one that entails message conveyance (diverse information, unambiguous information) can be paired with formats that feature lower levels of behavioral coordination (e.g., email, SMS) (Dennis et al. 2008). For example, face-to-face communication is better for tacit knowledge acquisition (e.g., ambiguous messages), whereas email is better for product knowledge acquisition (e.g., less ambiguous messages, high content variety) (Ganesan et al. 2005). Face-to-face and email communication, however, both lower operational costs, which are associated with complex issues (Cannon and Homburg 2001). Complex exchanges likely involve many steps (e.g., convergence and conveyance), such that exchange goals at each step require formats with different characteristics.

Behavioral coordination (also referred to as coordinated interactions or being in sync) can facilitate customer–employee rapport (Gremler and Gwinner 2000) but might hinder effective communication goals in other contexts, such as by creating cognitive overload, premature action, or distraction from the message content. Similar to media richness theory, it posits that proximal, visual, and verbal cues and channel synchronicity contribute to behavioral coordination. Messages encoded in textual cues enable people to edit the message while encoding (i.e., rehearsability) and reexamine them during and after decoding (i.e., reprocessability) (Dennis et al. 2008), so they offer what we call channel revisability. We propose that revisable formats in turn should be more effective for messages that contain high content variety (e.g., words, numbers, and statistics), because they support reprocessing to reach mutual understanding (Berger 2014). For these reasons, firms may need to use multiple formats for one interaction or design a completely new format to accommodate all their exchange needs.

Communication theories: Summary of insights

This review of communication theories reveals six underlying format characteristics (proximal, visual, and verbal cues; channel synchronicity, and revisability) that drive performance. It also denotes some message (interpersonal involvement, ambiguity, convergence vs. conveyance, content variety) and temporal (relationship stage) factors that determine their impact on performance. This synthesis indicates three gaps, as well as the need for a theoretical framework that informs communication across the spectrum of format characteristics. First, we require a systematic review of how individual characteristics (not formats) affect exchange performance, individually and synergistically. Communication theories make format-level predictions, even as they acknowledge that underlying, fundamental building blocks are responsible for those effects. Current theories also focus on the individual effects of formats, without exploring potential synergies among formats or their underlying characteristics.

Second, extant theories’ format-level predictions largely prioritize effective and efficient communication goals, without clearly illustrating how different formats could promote experiential communication in bilateral exchanges, even though customers demand experiences and “firms face unique challenges with regard to providing compelling customer experiences at the online organizational frontline” (Hilken et al. 2020, p. 884). However, research finds that social presence on websites, for example, can increase pleasure, arousal, and flow (Wang et al. 2007). Thus, we infer the characteristics that promote social presence also could advance experiential communication goals, by offering multisensory, social, and emotional information.

Third, communication theory predates two major shifts in bilateral customer–firm communication practices: frequent social media interactions (Hewett et al. 2016) and technological advancements through AI and VR (Appel et al. 2020; Davenport et al. 2020). These trends represent major developments for relationship marketing strategies, in that “they can make online relationships more personal and virtually bring sellers and customers closer together while reducing some of the risk inherent to online contexts” (Steinhoff et al. 2019, p. 382). Existing bilateral communication frameworks thus operate under three assumptions: The exchange is private (without any observers), both parties are human, and the format characteristics (e.g., facial features, body language, tone of voice) are real instead of simulated. Thus, such theories cannot capture the differences between private social media messages and public posts, interactions involving a human or AI agent, or in-person interactions and those that occur in virtual worlds. To offer preliminary insights into these theoretical gaps and assumptions, we next review relevant bilateral communication research in marketing, after which we begin the characteristic-level analyses for our framework.

Bilateral communication research

A review of relevant bilateral (one-to-one) communication research in marketing provides additional insights that help complement, and extend existing theory (Palmatier et al. 2018) and support key considerations to develop a characteristic-level framework (Fig. 1). We recognize the limitation that extant bilateral communication research is primarily conducted at the format level, yet it yields important insights into the aforementioned theoretical gaps (e.g., format synergies, social media). Marketing literature on bilateral communication that is most pertinent for addressing these identified theoretical gaps can be meaningfully grouped into four main streams, as in Table 2: format trade-offs and synergies, social media interactions, AI agents, and simulated cues.

Fig. 1
figure 1

Conceptual framework for bilateral multiformat communication strategies

Format trade-offs and synergies

Current communication theory recognizes that format trade-offs exist; extant bilateral research goes a step further to explore trade-offs across exchanges and their synergistic effects across formats. The established, inverted U-shaped relationship between communication frequency and firm performance appears inversely related to format richness (Kumar et al. 2008; Venkatesan and Kumar 2004), such that communication costs accumulate across multiple customer–firm interactions. Rich formats thus might need to be used strategically, to avoid reaching a communication frequency threshold (i.e., point of diminishing returns) too soon. Researchers agree that spillover effects and synergies exist among different formats, such as social media and traditional marketing, and their influences vary over time (Hewett et al. 2016; Kumar et al. 2016a; Kumar et al. 2017). Still, some debate arises regarding whether the use of multiple formats in bilateral communication practices always helps or sometimes hurts performance. For example, Reinartz et al. (2005) indicate positive interaction effects among certain format pairs, such as face-to-face and email or telephone and email, which synergistically increase performance. Godfrey et al. (2011) find negative interactions across all format pairs though, suggesting substitution effects. These conflicting findings might reflect the variation in and trade-offs across format characteristics, such that redundant characteristics would induce substitutive effects, whereas complementary characteristics might evoke positive, synergistic effects. Following this logic, the format sequence across multiple interactions or steps within a single exchange should be determined by the trade-offs of the underlying characteristics. Accordingly, we recognize two additional temporal factors for our framework: communication frequency (i.e., number of customer–firm interactions per unit of time) and format sequence (i.e., order of communication formats used during the course of customer–firm interactions).

Social media interactions

Bilateral communication on social media platforms has become “the heart of online RM efforts in this era” (Steinhoff et al. 2019, p. 377). Social media communication enhances a variety of outcomes, including consumers’ shopping (Kumar et al. 2016a), repeat usage (Toker-Yildiz et al. 2017), interactive engagement (Viglia et al. 2018), and overall relationship quality (Achen 2016). Social media interactions can occur via private messages, public posts, or virtual worlds; we discuss virtual worlds in a separate section. Public social media interactions (posting) represent unique opportunities (e.g., consumer praise) and challenges (e.g., complaints), raising both the stakes and communication costs, because they reveal what was said and when, indefinitely (permanent record), which could have positive or negative influences on communicators and observers. For example, firms’ responses to customers’ posts may appeal to those who receive the feedback but alienate observers whose posts were ignored (Gu and Ye 2014). Replying to a complaint raises expectations (among the complainer and observers) and encourages more complaints, but it also improves the customer’s underlying relationship with the firm, which can lead to more positive online voices (Ma et al. 2015). In unfavorable situations, informational instead of emotional messages are also more effective for enhancing customer sentiment (Meire et al. 2019). Reacting to negative online reviews influences subsequent opinions positively; reacting to positive reviews influences subsequent opinions negatively; tailoring or personalizing replies amplifies these effects (Wang and Chaudhry 2018). An informational or personalized response in unfavorable situations (e.g., negative reviews) may add specificity and value, whereas responses that are emotional or that highlight positive reviews may call the firm’s intent into question (e.g., self-promotion). We include a dyadic public versus private interaction factor (i.e., whether the customer–firm interaction inherently has observers) in our framework; it will influence the impact of different format characteristics on performance.

AI agents

To enhance customer experiences, firms have been investing in new tools to further humanize AI agents in exchanges (e.g., humanlike avatars, virtual assistants, robots) (Mende et al. 2019; Steinhoff et al. 2019). Customers can engage in real-time dialogue with AI agents through instore robots or call centers, text-based messaging, or virtual face-to-face contact in online simulations and worlds. When they function as customer service representatives or salespeople, AI agents might interact with individual customers as they browse a website, foster relational bonds (Keeling et al. 2010), enhance customer experiences, and improve sales and customer service outcomes (Daugherty and Wilson 2018). Still, AI agents struggle to mimic human behaviors fully (e.g., unique, flexible placement of punctuation or capital letters for emphasis), which can lead to customer frustration, disappointment, or discomfort (Castelo et al. 2018b; Mimoun et al. 2012), so purchase rates drop a reported 75% when firms disclose to online customers that they are conversing with a nonhuman agent (Davenport et al. 2020). Even AI agents with perfect human appearances lack the affective capability or empathy needed to perform certain tasks (Castelo et al. 2018a), suggesting simulated visual and verbal cues may not be as effective from an AI agent. Still, AI agents offer efficiency benefits (e.g., exceptionally fast responses) and “can be as effective as trained salespersons, and 4x as effective as inexperienced salespersons” (Davenport et al. 2020, p. 28).

A general consensus is that the message or task should determine the decision to use human versus nonhuman agents in bilateral exchanges; the task, in turn, dictates unique characteristic needs. More mechanical and analytical tasks are easier and more efficient for AI agents (Kumar et al. 2016b); tasks that require higher intelligence, intuition, and empathy instead are better performed by human employees (Huang and Rust 2018; Kumar et al. 2016b). AI agents lack emotional intelligence and their response effectiveness depends on first categorizing the message correctly, which becomes more difficult in highly ambiguous and early relational exchanges (e.g., no previous customer data). Cues from AI agents may not be as effective for conveying a genuine apology or interpersonal involvement (Petterson 2019). AI agents may miscategorize confusing messages and deliver an unwarranted, inauthentic sorry or irrelevant response. We include the dyadic factor of human versus AI agent in our framework, which refers to whether the employee delivering the message in the exchange is human or nonhuman and technology generated (van Doorn et al. 2017) and will influence the impact of format characteristics on performance.

Simulated cues

Existing communication theory recognizes that underlying characteristics of communication formats exist but does not consider how proximal, visual, and verbal cues might be simulated in online interactions with avatars and virtual worlds, to promote effective and experiential goals. An avatar encompasses any visual representation (i.e., 2D or 3D) of an online user, including complex beings created in a shared virtual reality; it thus adds simulated visual (e.g., facial features, body language) and/or verbal (e.g., tone of voice) cues to the exchange. Avatars can be simple, personalized cartoon-like characters (e.g., Yahoo avatars) or complex, anthropomorphized, embodied persons. By addressing consumers’ needs for interpersonal involvement in mediated shopping environments (Papadopoulou 2007), they have potential benefits in terms of offering more information, greater entertainment value, and enhanced online experiences (McGoldrick et al. 2008). Furthermore, because people treat nonhuman agents as human beings when they are anthropomorphized and imbued with humanlike features (Kim et al. 2016), nonhuman, virtual agents with anthropomorphic features can help enhance medication adherence (Bickmore et al. 2010) and facilitate social engagement behaviors (Tapus et al. 2012). Anthropomorphized technology also tends to evoke greater self-disclosures from customers, because it provides social cues (Thomaz et al. 2020). Yet the uncanny valley theory, which predicts that technology that is too humanlike appears creepy, and the opportunities for adopting calculated self-presentation strategies suggest some issues. Simulated cues (e.g., facial features) may be perceived as less authentic and more carefully managed than real cues, especially for nonhuman (vs. human) agents. In other words, the effect of simulated cues on performance may vary depending on the task and agent type. For example, in online exchanges, AI agents with humanlike features sometimes negatively influence computer game enjoyment, because they undermine consumers’ autonomy (Kim et al. 2016).

Avatars also can be incorporated into existing formats (e.g., live chat) or mediate interactions in virtual worlds. These immersive, 3D, multi-user online spaces in which all participants interact using virtual representations provide the highest degree of media richness and social presence to consumers among the online platforms currently available (Grewal et al. 2020b; Moon et al. 2013), by offering simulated proximal (e.g., co-location), visual, and verbal cues. For example, in a virtual shopping environment with avatar-mediated communication, social presence increases enjoyment, brand attitudes, and purchase intentions; these effects are more pronounced when other consumer avatars are present (Moon et al. 2013).

Decomposition of communication formats

Our review of communication theory and extant bilateral communication research in light of the identified theoretical gaps establishes a foundation from which we can decompose the formats into their underlying characteristics (Table 3), which constitute two main categories: cue and channel. Cue characteristics (proximal, visual, verbal, textual) determine how people encode messages within a particular format (Te'eni 2001), whereas channel characteristics (synchronicity, revisability) refer to how the format enables people to transmit and process messages. With this categorization, we can predict how the underlying characteristics, individually and synergistically, promote communication goals (effective, efficient, experiential) and performance outcomes across different message, temporal, and dyadic factors (Table 4).

Cue characteristics

Cues vary across formats, and technological advances (e.g., avatars, virtual worlds) now allow simulated proximal, visual, and verbal cues in online interactions. We thus first discuss the real versions, followed by simulated ones, to clarify any similarities and differences.

Real proximal cues

The customer’s and employee’s physical copresence in an exchange creates proximal cues (e.g., physical touch, atmospherics; Wilson et al. 2012). We recommend them for effective and experiential goals, in earlier relationship stages, and to convey interpersonal involvement. Proximity can heighten both involvement and attachment (Price et al. 1995) and overall exchange evaluations (Hornik 1992). A handshake (physical touch) in greeting could promote intimacy (social presence), psychological closeness, and relationship development. Copresence also facilitates mutual understanding by making it easier to seamlessly incorporate other materials (e.g., handouts) or product demonstrations. Still, proximal cues require customer and employee colocation in spatial and temporal proximity (i.e., in same room at same time), which promotes effective (mutual understanding) and experiential communication (sensory involvement) but also invokes substantially greater costs (e.g., travel, attention) that accumulate quickly across exchanges. Face-to-face interaction is the only format that offers real proximal cues; they are thus the least accessible and flexible characteristic and will always be combined with visual and verbal cues.

Real visual cues

Physical appearances, facial expressions, eye contact, gestures, body language, and body orientation offer visual cues (Sia et al. 2002) and are relevant for effective and experiential communication goals. We contend that they will function well in early relationship stages, to accelerate relational development, and when there is a high need for interpersonal involvement. Visual cues complement spoken language by “repeating, substituting, complementing, accenting, regulating, and relating it better than mere words alone” (Bonoma and Felder 1977, p. 170), so they should help customers and employees reach mutual understanding. Even if visual cues contradict verbal cues, they provide additional information, such as about a person’s affect; recipients tend to perceive visual cues as more authentic than consciously managed verbal cues (Marinova et al. 2018). Eye contact, smiling, gestures, and body orientation can enhance rapport by signaling positivity, warmth, and friendliness, even in awkward interactions (Gremler and Gwinner 2000). Visual cues also enhance mental stimulation and evoke imagery (Elder and Krishna 2012), which should capture attention and discourage multitasking (Brasel and Gips 2017). Despite their relational and experiential advantages, visual cues make it impossible to maintain anonymity, which may create self-presentation, privacy, or embarrassment costs (Dahl et al. 2001). Visual cues also introduce cue-message consistency concerns (e.g., expressions and body language must match messages). Traditionally, face-to-face interaction was the only format that provided real visual cues, but more formats do so now, including video chat (e.g., Skype, FaceTime) and video messaging (e.g., Snapchat). These emerging digital formats embrace visual cue advantages (e.g., social presence, social information) without evoking high proximity costs. For example, Hubspot, an inbound marketing and sales platform, responds to new customers’ questions (even in a public forum) with video messages, to introduce tone and establish trust.

Furthermore, while visual cues inherent to bilateral formats are dynamic, we recognize that visual cues too can be static (e.g., picture of employee) and added to different formats (e.g., letter, email, live chat). Dynamic visual cues (e.g., multiple facial expressions, body language) provide more social information and evoke greater sensory involvement than static ones (e.g., one facial expression and body position). They will thus likely be important in situations requiring high levels of interpersonal involvement and in early relationship stages (e.g., first impressions, onboarding). For example, research finds that dynamic visual cues have greater carryover rates than static ones and evoke greater mental imagery in nonbilateral communication contexts (Bruce et al. 2017; Roggeveen et al. 2015). Static visual cues, however, will be less costly for both parties (e.g., less cue-message consistency concerns and technology requirements) and thus likely be more important in later relationship stages and situations that cannot accommodate dynamic visual cues but still require interpersonal involvement.

Real verbal cues

A bilateral message inherently includes verbal or textual cues. Verbal cues are made available from the vocal features of the spoken language, such as tone, pitch, inflection, and accent (Walther et al. 2005). We contend that they promote effective and experiential communication goals; should be offered at early stages; and can meet needs for interpersonal involvement, reduce ambiguity, or suffice for message convergence tasks. Approximately 38% of the emotional content in an exchange gets communicated through verbal cues (Barker and Gaut 2002). In face-to-face interactions (i.e., when coupled with proximal and visual cues), verbal cues can indicate competence (e.g., knowledge, skills) and problem-solving orientation (e.g., engaged, proactive), as well as compassion (e.g., empathy, caring) and agreeableness (e.g., courtesy, respect), which support emotional bonding and connections (Marinova et al. 2018). Verbal cues also can enhance perceptions of an employee’s personality, emotional state, credibility, and sincerity; they carry much of a message’s cognitive component (de Ruyter and Wetzels 2000). Thus, they should promote mutual understanding by assigning meaning and intent to the message and reducing ambiguity. Verbal cues can promote experiential communication too, by attracting and maintaining the customer’s attention and establishing an emotional narrative. These experiential benefits could be enhanced even more if combined with visual or proximal cues for a multisensory experience. As the customer–firm relationship evolves, verbal cues may become less necessary though; if an employee consistently expresses concern across multiple exchanges, the customer likely infers concern in future interactions, even without further verbal cues. They also can be carefully managed, so they risk appearing inauthentic, especially if no accompanying visual cues are available for reinforcement. In terms of costs, verbal cues evoke memory and recall demands; people must remember the conversation, because there is no physical record to review. In some cases, verbal cues can be combined with revisability (e.g., video messages recorded and sent via text), to reduce memory and recall demands. But in other cases, they create costs associated with dialect-related concerns (e.g., accent, speaking speed). Verbal cues are available through face-to-face, telephone, video chat, and video messaging formats.

Textual cues

Cues made available from written or typed language such as spelling and punctuation, are textual cues (Sia et al. 2002). People obtain unique information from hearing words (verbal cues) and from seeing words (textual cues). We recommend them for effective and efficient goals and contend that they are appropriate for customers in late relationship stages (e.g., maintenance), tasks that demand message conveyance, and messages with high content variety. The employee (human vs. AI) delivering the message and whether the interaction is public (vs. private) are also relevant. Compared with formats with verbal cues, those with textual cues typically appear more formal and establish physical documentation to some degree (Mohr and Mohr and Sohi 1995), such that they tend to encourage long-term orientations, high interdependencies, and joint planning across cultures but discourage information distortion and withholding (Mohr and Mohr and Sohi 1995). Customers may be more comfortable explicitly expressing their thoughts and opinions through textual cues in later relationship stages, especially if those textual cues combine with high channel revisability (i.e., permanent record of conversation) and when the interaction is public, such as in social posting. Lengthy messages or those with high content variety also may be conveyed better through textual cues, to avoid confusion and reduce cognitive loads. Varied content often is required for exchanges that involve product knowledge acquisition, complex services (e.g., financial), or comparisons. Formats with textual cues also support interactions across cultural and geographical boundaries (Bandyopadhyay et al. 1994) and thus should lower the communication costs (e.g., dialect, travel, time). Even if people are in relatively close geographical proximity, they might use textual formats for efficiency (Ganesan et al. 2005). Still, textual cues offer limited feedback capabilities compared with cues that provide meaning beyond the message; these capabilities are even more limited with AI agents (e.g., lack ability to use punctuation as a genuine expression of emphasis or empathy). Textual cues also raise conversational structuring (e.g., spelling), error (e.g., auto-correct), and privacy (e.g., data breaches) concerns, especially when combined with revisability. If the interaction is public (as with social media posts), the potential benefits and stakes become even higher; any benefit- or cost-related issues affect both communicators and observers (Gu and Ye 2014; Ma et al. 2015). Email, letters, live chat, SMS-text messaging, direct social messaging (e.g., private messages), and social posting all contain textual cues.

Simulated proximal, visual, and verbal cues

Because technological advances have enabled simulated proximal, visual, and verbal cues in online interactions, firms now commonly incorporate such cues into communication protocols through avatars (e.g., Slack’s 24/7 virtual assistant) and virtual worlds (e.g., Second Life), seeking the new advantages and opportunities. But we contend that they also raise some unique challenges. Simulated cues promote experiential communication potentially beyond that of their real counterparts; they provide multisensory, often highly immersive experiences, along with high levels of social presence and even customer escapism. In virtual worlds, a public interaction (with other customer avatars) enhances social presence along with outcomes such as enjoyment and purchase intents (Moon et al. 2013). Simulated cues can even alleviate some of the costs associated with real cues (e.g., colocation and anonymity with real proximal and visual cues). In turn, online interactions have become a dominant exchange mode, and VR can “mitigate some of the sensory disadvantages that customers face online” (Steinhoff et al. 2019, p. 375), which in turn should enhance social presence and performance outcomes (Grewal et al. 2020b). Still, simulated cues might be less effective than real cues, especially if the messages are highly ambiguous; simulated cues are selected and thus do not provide the same information as real ones. The employee sending the simulated cues (human vs. AI) matters too. Due to their lack of emotional intelligence, AI agents already are less effective in certain situations; this detriment may become more pronounced with simulated cues. Customers often perceive simulated cues as inauthentic and carefully constructed (Chakraborty and Sabharwal 2019), especially if they also appear extremely humanlike.

In more detail, simulated proximal cues are made available through the customer’s and the employee’s virtual copresence (e.g., virtual touch, virtual atmospherics). They can allow customers to experience products in various settings while providing vivid, transformative online interactions. They can also eliminate travel costs (i.e., no physical colocation required), even if they introduce limits on which materials can be shared (e.g., brochures) and on product demonstrations. Furthermore, unlike real proximal cues, simulated ones are not always combined with simulated visual (avatars) or verbal (voice features) cues. A virtual world may solely comprise simulated proximal cues (e.g., virtual product interaction), with a chat function (textual cues), for example. Simulated visual cues stem from an avatar’s physical appearance, facial expressions, eye contact, gestures, body language. They eliminate many cue–message consistency, self-presentation, privacy, and embarrassment concerns, yet they may appear less authentic than real visual cues, especially when they are dynamic. In turn, it becomes harder for both parties to gauge and trust the social information in the exchange, preventing them from achieving mutual understanding in situations that demand interpersonal involvement or relationship building. Still, simulated visual cues offer new opportunities for self-presentation strategies. On the one hand, firms can customize employee avatars to match customer avatars, which might increase perceptions of similarity and overcome obstacles associated with non-human, AI agents. Such similarity helps build rapport (Gremler and Gwinner 2000). On the other hand, simulated cues that are too humanlike often seem creepy and perhaps opportunistic; they also threaten a customer’s autonomy in an exchange. Simulated verbal cues result from voice features in online interactions (e.g., simulated voice and pitch). They can eliminate dialect-related concerns (e.g., accent) but also reduce the emotional tone and information in the exchange, which may be essential for mutual understanding when the message is highly ambiguous or requires convergence. Real verbal cues help convey the tone of the message (e.g., concern, sincerity); simulated verbal cues are inauthentic and thus generally lack this capability.

In summary, designing and implementing formats with simulated cues requires investments from the firm (e.g., monetary, training, customer education) and the customer (e.g., learning curve). Firms also worry about introducing AI and VR solutions that customers might reject (e.g., millennials versus boomers). For example, simulated cues can enhance customers’ decision comfort, but privacy concerns attenuate this effect (Hilken et al. 2020). We anticipate that simulated cues will become even more relevant to multiformat communication strategies over time, growing more common in customer–firm exchanges (Gonzalez 2019; Grewal et al. 2020a, 2020b). As technology evolves, simulated cues may become more effective (e.g., convey empathy and feelings). Still, customers may crave human-to-human interaction if AI agents and simulated cues become commonplace (Mende et al. 2019). The only format that inherently has simulated proximal, visual, and verbal cues, combined, is a virtual world, but simulated visual and/or verbal cues can be added to many formats through avatars (e.g., live chat).

Channel characteristics

Channel synchronicity

The degree to which communication is temporally consistent, occurring at the same time and together, determines channel synchronicity (Berger and Iyengar 2013). We contend that channel synchronicity promotes effective, experiential, and (possibly) efficient goals, and we recommend that firms make it available for customers across all relationship stages, especially if exchanges involve ambiguous messages or message convergence. For relatively new customers, synchronicity allows for interruptions, so they can obtain clarification and offer real-time feedback, ensuring that both parties are on the “same page before moving forward” (Berger 2014, p. 600). In later relationship stages, synchronicity also can promote efficient communication, by enhancing behavioral coordination and quicker, substantial feedback. Such feedback is especially important if the task demands message convergence and an understanding of individual interpretations, as with ambiguous messages. Greater synchronicity implies a more interactive exchange, with enhanced sensory and cognitive involvement (Haeckel 1998). Without any response delay, the employee and customer pay close attention to each other, whereas delays create opportunities for disengagement, distraction, and multitasking. Still, synchronicity requires temporal colocation; when both visual cues (real or simulated) and synchronicity exist, multitasking is nearly impossible. Both these elements impose firm and customer costs. Wells Fargo reduced its operating costs by introducing live chat (high but not inherent synchronicity, no visual cues), which enabled its service employees to multitask and handle several customers at once.

Formats with inherent synchronicity include traditional face-to-face and telephone formats, as well as video chats (e.g., Skype, FaceTime) and virtual worlds. Other formats vary in their potential for a response delay. Live chat creates an implicit assumption that the other person is available to provide feedback nearly immediately, but it is not inherently synchronous. This expectation increases with an AI agent, which is quite common with live chat practices (Press 2019). In terms of decreasing synchronicity, we rank non-inherently synchronous formats as follows: live chat, video messaging/SMS-text messaging, direct social messaging/social posting, email, and letters. When people send video messages via Snapchat or SMS-texts, they presumably seek nearly immediate feedback, though the recipient may take seconds, minutes, or days to respond. Similar expectations arise with direct social messaging (e.g., private LinkedIn message) and posting, but people generally experience some delay. With email, longer delays are common. Letters are the least synchronous format, with the longest feedback delays.

Channel revisability

The degree to which messages can be edited during and reviewed during and after encoding determines channel revisability, which supports effective and efficient communication. We recommend that this feature be available mainly in later relationship stages and when the task involves message conveyance or high content variety (e.g., stock market price and trend comparisons). In revisable formats, people can edit information before responding and control the pace of encoding, leaving behind a record of exchanged messages. Both parties can review the message content as many times and in any order they prefer. These elements increase precision and should enhance mutual understanding, because “rather than saying whatever comes to mind, or speaking off the cuff” (Berger and Iyengar 2013, p. 568), the customer and employee gain more control over what they say and how they react. Both parties may choose their words carefully and ensure the meaning of the message is as they intended, which prevents premature reactions or interruptions. Formats with high revisability typically exhibit lower synchronicity and are less likely to interrupt daily tasks or demand substantial mental resources, but they may be associated with longer response delays, effortful encoding, and permanency concerns. When a revisable interaction is public too (e.g., social media posts), it induces additional costs, because the customer and observers may view the message indefinitely. That is, people might not want their messages recorded, out of costly privacy, embarrassment, or topic concerns.

Formats that are inherently revisable include email, letter, SMS-text messaging, direct social messaging, and social posting. Other formats’ revisability levels will vary. Live chat often occurs on company websites, and many companies provide an option to obtain a copy of the conversation after it ends. Because live chat also implies immediate responses (i.e., high synchronicity), the parties have limited time to revise their messages during actual encoding. Video messaging with Snapchat grants both parties control over revisability (i.e., length of time the message is revisable), but a video sent through email entails greater revisability.

Decomposition insights: Toward a theory of multiformat communication

In our effort to address current theoretical gaps and summarize the insights we derived across communication theory, bilateral communication research, and the decomposition of communication formats, we propose formal propositions grouped into five overarching themes that serve as strategic guidelines (Table 4) and provide illustrative business case examples (Web Appendix). They can benefit both managers who design and implement multiformat communication strategies to enhance customer relationships and firm performance as well as scholars seeking to advance research in this domain.

Theme 1: Achieving communication goals with real versus simulated cues

Firms should select and bundle format characteristics together, according to the communication goals of the exchange, while taking into consideration that simulated cues will exert different (positive and negative) impacts on the goals than real cues. Characteristics vary in their ability to promote mutual understanding (effective communication), reduce communication costs (efficient communication), and stimulate sensory involvement (experiential communication). Effective communication, for example, is a primary goal in all exchanges; it is the foundation for a successful exchange. All characteristics can promote mutual understanding, but the degree to which they do so depends on the discussion topics (i.e., message factors). Verbal cues promote effective communication by reducing ambiguity in the exchange, in that they assign meaning and intent to the message beyond the words themselves. Textual cues promote effective communication with high content variety; they reduce cognitive load and avoid confusion. Thus, the former might be more appropriate when customers are trying to evaluate why stock prices have changed and what to do next, whereas the latter should be more meaningful for their efforts to compare actual stock prices. Understanding which characteristics are mandatory to achieve effective communication and additional characteristics needed to promote efficient and/or experiential communication thus is imperative when designing successful multiformat strategies.

Even further, cutting-edge technologies such as avatars and virtual worlds can simulate proximal, visual, and verbal cues, which can vary in their abilities to promote effective and experiential goals. Real verbal cues provide an emotional tone for the message (e.g., inflection, pitch); simulated verbal cues cannot, because they are manufactured. Real visual cues provide authentic, social information and are not as carefully managed as verbal cues (Marinova et al. 2018), especially when their changes are observable (i.e., dynamic); simulated visual cues instead are carefully selected and managed (e.g., avatar hair color, facial expressions, voice). Interactions across formats with simulated cues, however, provide novel, immersive experiences. With VR, a sense of colocation can be simulated, which should heighten customer experiences (Expert Panel 2019). With simulated proximal cues, customers can explore products in different settings, enhancing decision convenience and comfort (Heller et al. 2019; Hilken et al. 2020) and transporting customers into the firm’s story (Grewal et al. 2020b). An avatar can provide an escape for customers, who appear however they want. In other words, simulated cues will be less effective but likely more experiential than real cues. Accordingly, we propose the following:

P1:

For effective communication, the bundle of format characteristics must match the message at each step, or for each task, of the exchange. Specifically, formats with

  1. (a)

    proximal, visual, and/or verbal cues promote mutual understanding when the exchange requires interpersonal involvement.

  2. (b)

    verbal cues and/or channel synchronicity promote mutual understanding when the exchange features highly ambiguous messages or involves message convergence.

  3. (c)

    textual cues and/or channel revisability promote mutual understanding when the exchange involves message conveyance or features messages with high content variety.

P2:

For experiential communication, formats with proximal cues, visual cues, verbal cues, and/or channel synchronicity promote sensory involvement.

P3:

For efficient communication, formats with textual cues, low-medium degrees of channel synchronicity and/or channel revisability have smaller positive impacts, overall, on communication costs than formats featuring proximal cues, visual cues, verbal cues, and/or high degrees of or inherent channel synchronicity.

P4:

Simulated proximal, visual, and verbal cues have smaller positive impacts on mutual understanding than their real counterparts but similar if not larger positive impacts on sensory involvement.

P5:

Dynamic visual cues, real or simulated, have larger positive impacts on mutual understanding and sensory involvement than static visual cues, real or simulated, but also larger positive impacts on communication costs.

Theme 2: Achieving communication goals in early versus late relationship stages

In early relationship stages, firms should select and bundle characteristics that accelerate relational development by promoting effective and experiential goals, whereas in late stages, firms should implement characteristics that promote effective and efficient goals. Efficient communication may be especially important in late relationship stages, when customers grow more sensitive to excessive and unnecessary costs. Experiential communication instead may be critical in early relationship stages, to create long-lasting first impressions, build new mental associations, and facilitate emotional connections between the firm and its customers (Harmeling et al. 2017; Schouten et al. 2007). Real proximal cues promote attachment (Price et al. 1995); real visual cues convey authentic, social information (Marinova et al. 2018); and real verbal cues provide emotional meaning and narrative (Barker and Gaut 2002). Thus, they all convey interpersonal involvement and promote mutual understanding in early stages. Visual cues in particular can accelerate relational development (Walther et al. 2005). Simulated cues promote mutual understanding but not as effectively as real cues, though they can evoke even greater sensory involvement (P4).

Adding to the complexity, formats with both textual cues and revisability, such as email or SMS-texting, can slow down relational development (Walther 1996), because they do not inherently convey interpersonal involvement or allow for behavioral coordination, which are important in early stages. Face-to-face interactions may be ideal for relationship building, but they also are the least accessible and most costly format. In response, emerging digital formats are designed specifically for relational and experiential purposes, such that they provide superior options for firms (e.g., more efficient, lower cost) and customers (e.g., more experiential, novel, immersive). If real proximal cues bear an unnecessary cost, firms can encourage video messaging (e.g., Skype) for initial interactions or a virtual world meeting (e.g., Facebook Horizons), assuming such a novel, experiential aspect would be well received by the customer base (e.g., millennials vs. boomers). Accordingly, we offer the following propositions:

P6:

In early relationship stages, real or simulated proximal, visual, and verbal cues accelerate relationship development by promoting high levels of mutual understanding and sensory involvement, whereas textual cues and revisability, individually and especially synergistically, slow down relationship development.

P7:

In late (vs. early) relationship stages, textual cues and channel revisability, individually and especially synergistically, have larger positive impacts on mutual understanding and smaller positive impacts on communication costs.

Theme 3: Bundling and sequencing complementary versus substitutive characteristics

Formats with substitutive characteristics (e.g., visual and verbal cues both promote interpersonal involvement) can yield benefits that outweigh the high costs, for a single task. However, across multiple tasks (multistep exchange) or multiple interactions, a format sequence with complementary (vs. substitutive) characteristics may yield far greater benefits and fewer costs. The inherent trade-offs across characteristics described in Propositions 1–7 give rise to the idea of complementary and substitutive characteristics within and across formats. For example, video chats (e.g., Zoom, Skype, WebEx) feature visual and verbal cues, and are inherently synchronous (i.e., real-time interaction with no time delay); video messaging (e.g., Snapchat) features visual and verbal cues, medium to high synchronicity (i.e., not an inherently synchronous, real-time interaction), and low to high revisability (e.g., ability to review message more than once during the interaction or re-review after the interaction has ended), depending on response times and the platforms used. These two formats overlap on visual and verbal cues (substitutive across formats) but differ in their degree of synchronicity and revisability (complementary across formats). A format featuring substitutive characteristics with similar benefits should be used for a single task to enhance performance, especially in early relationship stages, when customers tolerate higher costs and firms see greater returns from rich, costly formats. For example, a format with proximal cues, visual cues, verbal cues, and channel synchronicity (e.g., face-to-face, virtual world) likely promotes greater sensory involvement than a format with visual cues, verbal cues, and channel revisability (e.g., video sent through email) but also increases communication costs (e.g., spatial and temporal colocation); in early relationship stages, the substitutive benefits can outweigh the costs for a single task or exchange.

Alternatively, formats featuring complementary characteristics with unique benefits should be used across multiple tasks (i.e., multistep exchange) or multiple exchanges, to enhance performance and avoid reaching the communication frequency threshold (i.e., point of diminishing returns) prematurely. Complementary characteristics across tasks or exchanges can reduce unnecessary costs. A video chat (e.g., Skype) followed by a telephone call would incur high temporal colocation and behavioral coordination costs, but a video chat followed by an email does not, so it likely extends the communication threshold, or the point at which communication returns start to diminish. As noted, communication frequency has an inverted U-shaped relationship with performance, inversely related to format richness (Godfrey et al. 2011), due to the high communication costs associated with rich formats. The complementary versus substitutive nature of cue and channel characteristics may determine how the simultaneous use of multiple formats affects performance, whether negatively (substitutive; Godfrey et al. 2011) or positively (complementary; Reinartz et al. 2005). Following up a video chat with email correspondence (complementary sequence) can bring new advantages and enhance mutual understanding via textual cues and channel revisability; following it with a phone call cannot, due to their high characteristic overlap (substitutive sequence). We thus propose the following:

P8:

Formats with substitutive (vs. complementary) characteristics have larger positive impacts on mutual understanding and sensory involvement but also larger positive impacts on communication costs, when used for a single exchange task.

P9:

Formats with complementary (vs. substitutive) characteristics have larger positive impacts on mutual understanding and smaller positive impacts on communication costs when used across multiple exchange tasks (i.e., multistep exchange) or sequenced across multiple exchanges.

P10:

When formats with substitutive (vs. complementary) characteristics are used across multiple exchange tasks (i.e., multistep exchange) or sequenced across multiple exchanges, the communication frequency threshold is reached sooner.

Theme 4: Characteristics delivered by human versus AI agents

Firms should carefully manage the use of AI agents (e.g., chatbots) in bilateral exchanges across tasks, especially online when simulated cues become involved (e.g., avatars, virtual worlds). The specific task will dictate the characteristic needs, and AI agents will inhibit mutual understanding or relational development for certain tasks. People will interpret characteristics delivered by nonhuman agents as less authentic and more strategically managed than ones delivered by human agents (e.g., robot vs. human smile in retail setting, capitalization for emphasis in online setting). Customers will also expect an even faster response from an AI agent in online exchanges; the heightened expectations may in turn reduce the positive effect synchronicity exerts on performance. Human employees are generally better suited for situations that require interpersonal involvement, ambiguity reduction, or message convergence, and in early relationship stages (e.g., onboarding, recovery). Such exchanges benefit from flexible interpretations of and responses to ambiguous information and authentic emotional expressions. While human employees vary in their interpretation and emotional capabilities, characteristics delivered by them should still be perceived as more genuine and less manufactured in comparison to those delivered by nonhuman employees (e.g., sincere voice). However, AI agents are ideal for unambiguous messages or message conveyance purposes (e.g., repetitive call center tasks), which reflect situations that demand accurate information but not authentic emotional expressions to promote both effective and efficient communication (Thomaz et al. 2020). Thus, to manage the use of human versus nonhuman employees, firms might plan to use AI agents for efficiency until topics that demand flexibility and authentic, emotional characteristic expressions emerge, which the AI agent can identify. Customers then can be redirected to a human employee, through the same or another format with more interpersonal cues.

We suggest that simulated cues are less effective but possibly more experiential than their real counterparts (P4). They may become even less effective with AI agents. When nonhuman agents take on humanlike features, people treat them like human beings, heightening their expectations of the agent and the exchange (Kim et al. 2016). When an online exchange (e.g., live chat) calls for interpersonal involvement, an AI agent cannot meet the heightened expectations; in this situation, we do not recommend adding simulated cues. When an exchange features unambiguous messages (e.g., pay a bill, place an order), an AI agent likely can meet expectations, and can incorporate otherwise absent cues and social information into the exchange (e.g., visual cues in live chat) or reduce communication costs (e.g., no colocation or anonymity concerns, available 24/7), so simulated cues may be beneficial. Still, simulated cues should not be too humanlike, because if customers perceive them as creepy, inauthentic, or opportunistic, it likely mitigates any otherwise positive effects (e.g., lower costs). Characteristics delivered by an AI agent also might evoke lower levels of sensory involvement and social presence (experiential communication) in an exchange, compared with a human agent (Steinhoff et al. 2019). Adding simulated cues to the online exchange (e.g., avatar, virtual world) creates a multisensory, socially charged, immersive experience, which may offer a means to offset the potential negative impacts of an AI agent on sensory involvement. We propose:

P11:

An AI (vs. human) agent suppresses the characteristics’ positive impacts on mutual understanding when the exchange requires interpersonal involvement, features ambiguous messages, or involves message convergence, and this effect (i.e., less effective) is more pronounced in early relationship stages and with simulated cues.

P12:

An AI (vs. human) agent suppresses the characteristics’ positive impacts on communication costs when the exchange features unambiguous messages or involves message conveyance, and this effect (i.e., more efficient) is offset by simulated cues that are too humanlike.

P13:

An AI (vs. human) agent suppresses the characteristics’ positive impacts on sensory involvement, and this effect (i.e., less experiential) is offset by simulated cues.

Theme 5: Managing public interactions via social media and virtual worlds

A public customer–firm interaction, with observers, exerts different (positive and negative) effects across communication goals and formats. Social media posts inherently have textual cues and channel revisability, and the interaction is both public and permanent. Public social media posts provide opportunities for enhancing mutual understanding (e.g., transparency, multiple people can see responses), but at the expense of increased costs. For example, responding to customers through social media posts usually exerts a positive impact on them, such that addressing a complaint generally improves the underlying customer–firm relationship and potentially leads to more positive online posts, from both the focal customer and observers (Ma et al. 2015). However, when firms respond to positive posts, their intent may be questioned, (Wang and Chaudhry 2018), especially with AI (vs. human) agents (e.g., no emotional intelligence or empathy, less authentic). Many firms (e.g., Jet Blue, Hyatt Hotels) use human employees to communicate with customers via social posting. In avatar-mediated virtual worlds, the presence of others enhances both mutual understanding and sensory involvement. Creating socially engaging virtual environments that foster sensory involvement, social presence, and social interaction among customer avatars thus may yield numerous benefits, with few additional costs, especially with a receptive customer base. We formally propose:

P14:

A public (vs. private) interaction via social posting enhances the characteristics’ positive impacts on mutual understanding but also on communication costs, and this effect on mutual understanding is offset by AI agent.

P15:

A public (vs. private) interaction via avatar-mediated virtual worlds enhances the characteristics’ positive impacts on mutual understanding and sensory involvement.

Conclusion, limitations, and further research

Bilateral communication is a critical antecedent of successful relationship marketing; firms can use it to differentiate their offerings. Communication theories predate many technological advances and emerging digital formats, so they focus primarily on traditional, format-level decisions. We propose a holistic, characteristic-level view of bilateral multiformat communication for relationship marketing by synthesizing communication theory and research and decomposing formats into their underlying characteristics. We offer 15 formal propositions group into five overarching themes, to encapsulate the resulting strategic insights and offer a platform for research and a guide for managers’ multiformat strategy decisions (Table 4).

Several limitations of this research suggest avenues for further investigation. First, to derive communication strategies at the characteristic level, we have relied on theory and researchers’ post hoc explanations for their findings. Further research should explicitly test characteristic-level strategies for customer–firm exchanges, as well as explore potential influences of other contextual factors. Investigating customers’ receptivity to certain technologies (e.g., simulated cues, virtual worlds, avatars) is especially relevant. We expect that younger consumers will be more likely to adopt and respond well to technologically advanced agents and characteristics (e.g., millennials vs. boomers). Theoretically, our relationship-based framework could be adapted to reflect different exchange types (e.g., transactional, economic) or non-bilateral (i.e., one-to-many) communication contexts (e.g., advertising, marketer-generated content). Research finds that relationship marketing versus economic marketing communication impact performance differently (Kim and Kumar 2018). In an online digital advertising context, factors such as featured emotions and ad length affect virality and sharing (Tellis et al. 2019). Second, our summary of how different characteristics promote effective, efficient, and experiential goals is not exhaustive. Additional research could explore and empirically test how each characteristic drives performance, to determine optimal characteristic combinations. Communication goals also vary for the firm versus the customer, so research is needed to distinguish and empirically examine these constructs from each party’s perspective. Third, we examine the differential impacts of characteristics delivered by human versus AI agents on performance but do not investigate uses of service scripts (e.g., AI-generated message delivered by human), which generally are discouraged in relational exchanges. Scripts might yield firm benefits (e.g., consistent service, lower costs) but offset human (vs. nonhuman) characteristic advantages (e.g., authentic smile, genuine punctuation emphasis), if customers can detect their use and dismiss the scripted exchange as inauthentic or opportunistic by firm. Investigating whether or not the firm directly discloses the nature of the agent to the customer is also relevant. In summary, our framework offers novel multiformat communication insights for enhancing customer relationships to managers and many avenues for scholars to advance research.