Introduction

Robo advisors have been praised as the next operating system in finance and the “new wealth management interface of the 21st century” (Andrus 2014). Robo advisors provide investment advice without the intervention of a human advisor. In short, robo advisors are digital interfaces that guide investors through an entirely automated process of investment advisory from assessing financial goals, evaluating consumers’ risk profile, and ultimately managing the entire portfolio (Faloon and Scherer 2017; Gomber et al. 2017; Williams-Grut 2017). While discretionary input from consumers is possible, the key property is the fully automated process of risk assessment, asset allocation, and portfolio management, consistent with consumers’ current financial situation, financial goals, and appetite for risk.

Despite the increasing presence of robo advisors in the financial industry with a share of $200 billion in assets under management worldwide in 2017 (Euler 2018) and companies such as Betterment or Wealthfront accumulating $1 billion in assets under management in less than 2.5 years after market entry (Moyer 2014), recent academic and industry studies reveal that growth rates are lacking behind expectations and that a broad consumer acceptance of robo advisors—even among young, affluent investors—has been surprisingly low (Jung et al. 2017; Schweitzer 2019). The majority of consumers still express a preference for human financial advisors, due to the lack of a “human touch” of robo advisors (Salmon 2018) and a human’s greater ability to understand and personalize investment advice to consumers unique financial situation (Hohenberger et al. 2019).

The current work introduces a novel conceptualization of robo advisors, which we refer to as “conversational robo advisors” using AI-enabled chatbots. We refer to conversational robo advisors as advisory interfaces that possess a dialogue-based process of financial advisory, which emulates fundamental properties of human-to-human conversations (such as turn-taking or the presence of social cues throughout a conversation; Davenport et al. 2020; Thomaz et al. 2020). The key hypothesis of the current research is that conversational robo advisors can provide an unexplored alternative to address and compensate for the lack of “human touch” during the advisory process compared to traditional, non-conversational robo advisors. Building on the effective properties of human-to-human conversations in prior communication research and interpersonal psychology (Fiske et al. 2007; Levinson 2016; Sprecher et al. 2013), we develop and empirically test our conceptualization of “conversational robo advisors” and how they affect consumer perceptions of trust, firm evaluation, and investor behavior. Table 1 provides a comparison of the key conceptual features contrasting non-conversational robo advisors (i.e., possessing static, self-report, and one-way communication features) compared to conversational robo advisors (i.e., possessing dynamic, dialogue-based, and turn-taking communication features).

Table 1 Key conceptual features of non-conversational vs. conversational robo advisors

Across four studies, we provide empirical evidence that conversational as opposed to non-conversational robo advisors evoke greater levels of affective trust toward a robo advisor, and that these greater levels of trust in turn alter firm perception and investor behavior. Specifically, we demonstrate that conversational as opposed to non-conversational robo advisors increase consumers’ likelihood to follow portfolio recommendations, positively affect consumers’ attributions of benevolence toward a financial services firm (believing that a financial services firm acts in one’s best interest), and provide more engaging, positively-valenced advisory experiences for consumers.

In what follows, we first review prior work on robo advisory and the relationship between automation and trust. We then develop a set of key hypotheses on how conversational robo advisors alter attributions of trust, firm perception, and investor behavior, and present the results of four studies designed to test our theorizing. We conclude with a discussion of the current findings for research on automation, trust, and public policy in the digital economy.

Theory and hypotheses

Theoretical background and literature review

Robo advisors are digital interfaces that guide private investors through an entirely automated process of investment advisory (Gomber et al. 2017). The predominant mode of robo advisory employs a static, formalized process of self-reports to assess consumers’ financial situation and individual risk profile (see Table 1 for example). These self-reports are generally used to gather information on investors’ goals, existing financial assets, and appetite for risk (Tedesco 2015), which are then translated into an adequate portfolio of financial investments that can be managed automatically by the advisory system (Faloon and Scherer 2017).

As summarized in the literature review of Table 2, previous research on financial robo advisors has mainly focused on designing algorithms with respect to the “optimal” composition of asset classes based on consumers’ risk profile and financial goals (Day et al. 2018; Kilic et al. 2015; Musto et al. 2015), usability aspects of the interface (Jung et al. 2018), or the interactive role of the company’s sales channel and profit orientation to rely on robotic investment advice (Lourenço et al. 2020). Robo advisors have been considered predominantly as static tools to replace and optimize the human advisory process (see Lourenço et al. 2020 for a notable exception, using an interactive “pension builder” interface), to collect client data and allocate an appropriate portfolio consistent with a clients’ risk profile, rather than leveraging the relationship-building potential for the financial services firm or the opportunity to more directly address the lack of broader acceptance of financial robo advisory and lack of a “human touch.”

Table 2 Review of relevant literature

At the same time, recent advances in artificial intelligence and natural language processing have given rise to conversational interfaces, or so called AI-enabled “chatbots” (Dale 2016; Davenport et al. 2020; Thomaz et al. 2020), that use a retrieval-based dialogue system that emulates the characteristics of human-to-human conversations. Thus, a second generation of robo advisors is emerging which we refer to as “conversational robo advisors.” The dialogue-based interaction modality of conversational robo advisors possesses key features (see Table 1), whose malleability enable a more personalized, one-to-one interaction between the investor and financial services firm. First, the interaction is based on a dynamically construed dialogue flow that emulates the characteristics of a human-to-human conversation through a sequential process of turn-taking. Thus, rather than a static, questionnaire-like request for information as in non-conversational robo advisors, the interaction with a conversational robo advisor is based on the structural aspects of a two-sided conversation. Second, the language used by the interface can be semantically adapted to display richer social cues, such as emojis or trivial acknowledgments to signal emotions and active listening (Thomaz et al. 2020). Finally, the conversational interface can further be humanized by integrating anthropomorphic design cues into the interface and altering the visual appearance through avatars or other forms of visual display (Araujo 2018).

Recent research has started to investigate the impact of some of these malleable factors in conversational interfaces, showing that anthropomorphic design cues can positively impact service satisfaction and firm performance (Adam et al. 2019; Köhler et al. 2011), while research on so called “conversational sales agents” in the online retail domain revealed that interfaces with human-like cues tend to foster greater trust in and likability of a sales agent (Araujo 2018; Luo et al. 2006), mimicking related research highlighting the importance of creating a stronger social presence in human–robot relationships (van Doorn et al. 2017; Mende et al. 2019). Building on and integrating these distinct literature streams on financial advisory and conversational agents, the key objective of the current research is to provide a causal test of whether “conversational robo advisors” may alter consumers’ experience of the advisory process, firm perception, and ultimately investor behavior. In what follows, we develop a set of key hypotheses along with an overview of the empirical studies designed to test these hypotheses.

Turn-taking and affective trust

The critical feature of conversational robo advisors is their capacity to take turns during the initial onboarding phase. Such turn-taking mimics natural iterations in human-to-human conversations which have been shown to act as an inherent trust-building mechanism (Bickmore and Cassell 2001). Although healthy, pre-existing relationships exert this property naturally, experimental evidence suggests that turn-taking can effectively contribute to the formation of a more affective, trusting relationship (Sprecher et al. 2013). Specifically, turn-taking during conversations among humans and a more equal share of “air time” is linked to better relational outcomes and greater perceptions of trustworthiness (Levinson 2016; Wiemann and Knapp 1975). For example, inviting mutually unfamiliar individuals to a 12 min conversation using a pre-defined turn-taking protocol has been shown to maximize liking, closeness, and enjoyment of a conversation partner compared to non-reciprocal dyads that were not requested to take turns during a conversation (Sprecher et al. 2013).

The origin of this mechanism lies deep in human development, stemming from a combination of instinctive and learned behavior (Levinson 2016), representing social and cooperative efforts between humans (Grice 1975). The process of turn-taking is often further governed by both verbal and non-verbal social cues, indicating active listening such as providing trivial acknowledgements of what the conversation partner just said or implicit signals to indicate whether the speaker is ready to yield the turn or whether an answer is expected from the listener (Wiemann and Knapp 1975). This back-and-forth communication protocol is an essential trust-building mechanism in human-to-human interactions. Indeed, even trivial acknowledgments or interludes of “small talk” can signal greater involvement and understanding from the side of the interaction partner and build stronger rapport (Bickmore and Cassell 2000; Cappela 1985). Thus, turn-taking is not only a process defining a dialogue-based interaction (i.e., how the conversation is structured), but is also an inherently social activity with both verbal and non-verbal social cues (i.e., acknowledgements and verbal affirmations such as “Got it!” or “I see” as well as non-verbal affirmations such as emotional displays or body movements such as head nodding to indicate active listening), all of which are shaping the social roles and relationships of the interacting parties. Furthermore, prior work on embodied interfaces has shown that users generally feel more comfortable and prefer to interact with systems that are capable of turn-taking, and that adherence to this unique conversational protocol is at least as important as other factors, such as the actual display of emotions by the interface (Cassell and Thórisson 1999). This is consistent with prior work in human-robot interactions, showing that more social behaviors, such as turn-taking, contribute to the formation of trust and are associated with a greater willingness to interact with the same robot in the future (Looije et al. 2010).

Thus, we expect that the inherent turn-taking capacity of conversational as opposed to non-conversational robo advisors enhances affective levels of trust toward the robo advisor. Affective trust is a distinct measure of relational trust between two parties, differing from other forms of trust (such as cognitive trust, which focuses on the objective assessment of competence and quality dimensions of an interaction partner; Johnson and Grayson 2005). Affective trust is a more emotional, subjective dimension of trust, linked to the social nature of a relationship. Taken together, we expect that the turn-taking capacity of conversational robo advisors increases perceptions of affective trust relative to non-conversational robo advisors.

H1:

Conversational as compared to non-conversational interfaces lead to greater levels of affective trust in the robo advisor.

Firm perception and investor behavior

Greater levels of affective trust have been shown to positively affect a wide array of both perceptual and behavioral outcomes, from increasing relationship satisfaction and team performance (Jones and George 1998) to cooperative behavior (Rousseau et al. 1998) and greater long-term customer loyalty (He et al. 2012). Most importantly, affective trust is a critical ingredient when facing decisions that involve risk (Pavlou 2003). Building on our theorizing that conversational versus non-conversational interface might cause greater affective trust, we expect subsequent changes in firm perception and investor behavior in response to these enhanced levels of affective trust. First, developing greater affective trust in one situation has been shown to generate the belief that the other entity is willing to act in the best interest of the focal individual in a subsequent task (Cho 2006; Johnson and Grayson 2005). Thus, we hypothesize that greater perceptions of affective trust in turn stimulates subsequent attributions of benevolence, i.e. the assumption that another entity might act in one’s best interest (Xie and Peng 2009). This is particularly important in the context of financial institutions in which consumers often expect the financial institution to act in its own self-interest (Monti et al. 2014). Thus, we hypothesize that greater levels of affective trust toward the robo advisor generate subsequent attributions of benevolence toward the financial services firm to act in a client’s best interest.

H2:

Greater levels of affective trust in a conversational as comparted to a non-conversational robo advisor lead to higher attributions of benevolence toward a financial services firm.

Greater affective trust may also change investor behavior more directly. Greater affective trust has been shown to make people more susceptible to influence attempts (Aronson 1999; Fransen et al. 2015) and to elevate persuasion effectiveness more broadly (Betancourt 1990; Hexmoor et al. 2008). For example, greater levels of trust in a bank have been found to elevate clients’ use of more uncertain transaction infrastructures (Yousafzai et al. 2010). Thus, we expect that increasing consumers’ levels of affective trust in turn increases their willingness to follow or accept the financial advice received, such as choosing a recommended financial portfolio.

H3:

Greater levels of affective trust in a conversational as compared to a non-conversational robo advisor increases consumers’ willingness to accept a recommended financial portfolio.

In summary, the current research tests three key predictions of how conversational as opposed to non-conversational robo advisors change consumer financial decision making and the downstream consequences of these decisions. First, we hypothesize that conversational robo advisors evoke greater levels of affective trust. Second, we predict that increased levels of affective trust caused by conversational as opposed to non-conversational robo advisors lead to a more benevolent perception of the financial services firm. Finally, we expect that the increased levels of affective trust caused by conversational as opposed to non-conversational robo advisors lead to higher recommendation acceptance.

Overview of studies and experimental paradigm

Study 1 tests our baseline hypothesis of whether conversational robo advisors cause greater levels of affective trust compared to non-conversational robo advisors, and whether enhanced levels of affective trust spill over to a more benevolent evaluation of a financial services firm. Study 2 provides evidence that greater affective trust not only enhances firm perception (attribution of benevolence toward the financial services firm) but also influences behavioral outcomes (asset allocation toward conversational as opposed to non-conversational robo advisors). Studies 3 and 4 further extend these findings by showing that greater affective trust increases consumers’ likelihood to accept portfolio recommendations even if they are objectively wrong (Study 3) or if they invoke larger annual fees (Study 4).

Study 1

Study 1 was designed to provide a first test of our theory on whether conversational as opposed to non-conversational robo advisors alter perceptions of affective trust and whether these changes in affective trust trigger the predicted increase in benevolence attributions toward a financial services firm.

Design and procedure

A total of 307 participants were recruited from a nationwide online consumer panel (Prolific) and preselected based on their interest in financial services and being active private investors (MAge = 43.79, SDAge = 15.20, 52% females). Participants were randomly assigned to one of three conditions: (1) a non-conversational interface condition, (2) a conversational interface condition without social cues, or (3) a conversational interface condition with social cues (see Fig. 1 for an illustration of the three robo advisor conditions). Across all conditions, participants went through a risk profiling assessment involving their current financial situation, their financial goals, and their perception of financial risk (see Web Appendix A.1). The sequence and format of all questions was identical across conditions. The two conversational robo advisor conditions differed in the presence versus absence of social cues and were otherwise identical in the extent of turn-taking or sequence of risk profiling questions. In the conversational interface condition with social cues, the robo advisor provided trivial acknowledgements (e.g., “Great, thanks!”) and displayed emoticons during the conversation (see Fig. 1). In the conversational interface condition without social cues, the interface was identical relative to the non-conversational interface condition except for the key conceptual difference in response and expression modality (i.e., using a chat console during the advisory process but holding both the sequence, content, and all other features of the interaction constant).

Fig. 1
figure 1

Exemplary robo advisor interface conditions (Study 1)

Measurement

Immediately after participants completed the robo advisory task, they were forwarded to the original questionnaire and we assessed their level of affective trust toward the robo advisor using three items (scale adapted from Johnson and Grayson 2005, sample item: “This financial advisory system displayed a warm and caring attitude toward me”; 7-point Likert scale, from 1:“I do not agree at all” to 7:“I fully agree”; αTrustAff = .90, see Web Appendix A3.1 and A5 for further details on scale consistency). Next, we measured participants’ attribution of benevolence toward the financial services firm using 5 items (scale adapted from Schlosser et al. 2006: “This advisory company seems very concerned about my welfare”; 7-point Likert scale, from 1: “I do not agree at all” to 7:“I fully agree”; αFirmBenevolence = .92, see Web Appendix A3.2 and A5).

To generate additional qualitative insight, we used an open response technique at the end of the study to further assess consumers’ spontaneous thoughts about the interaction with the financial advisory system. Participants first wrote down their spontaneous thoughts and feelings related to the robo advisory interface they used and then evaluated the valence of each of these thoughts as negative (coded as −1), neutral (coded as 0), or positive (coded as 1). All open text responses of the thought-listing task were concatenated into a single text vector. We used the AFINN lexicon and the tidytext package in R (Nielsen 2011) for all subsequent text processing and sentiment analyses.

Pretest

We conducted a pre-test with a total of 178 participants (MAge = 34.41, SDAge = 9.86, 30% females) from the same population and using the same selection criteria as in the main study to assess the effectiveness of the experimental manipulation. Specifically, the objective of this pre-test was to assess whether the two conversational robo advisor conditions compared to the non-conversational robo advisor condition (1) were perceived as greater in the extent of turn-taking and (2) whether the conversational interface condition with social cues was perceived as richer in the extent of social cues compared to conversational interface condition without social cues and compared to the non-conversational interface condition. Participants were randomly assigned to the same three experimental conditions as in the main study (non-conversational robo advisor interface vs. conversational robo advisor interface with vs. without social cues) and completed the same risk profiling questionnaire as in the main study. Immediately after completing the risk profiling questionnaire, we assessed participants’ perceived degree of turn-taking (adapted from Song and Zinkhan 2008; sample item: “The system facilitated a two-way conversation,” αTurnTaking = .91, see Web Appendix A3.3) and the perceived presence of social cues (sample item: “The financial advisory system … displayed smiling faces and other emoticons during the task / acknowledged and confirmed my input (e.g., ‘Great, thanks!’),” αSocialCues = .85, see Web Appendix A3.4). Supporting the effectiveness of the experimental manipulation, the interface manipulation had a significant effect on turn-taking (F(2, 175) = 12.61, p < .001) and perception of social cues (F(2, 175) = 15.79, p < .001). Follow-up contrasts confirmed that participants in both conversational interface conditions perceived the extent of turn-taking as significantly higher compared to the non-conversational interface condition (MConversational_SocialCuesAbsent = 5.21, MNonConversational = 4.28, t = 3.975, p < .01; MConversational_SocialCuesPresent = 5.37, t = 4.691, p < .001), while we found no significant difference between both conversational interface conditions (t = 0.732, p > .46). Furthermore, participants in the social cues present condition perceived a significantly larger number of social cues compared to both the non-conversational interface condition (MConversational_SocialCuesPresent = 5.31, MNonConversational = 3.59, t = 5.566, p < .001) and the conversational interface condition without social cues (MConversational_SocialCuesAbsent = 4.28, t = 3.404, p < .01), while the difference between the non-conversational interface condition and the conversational interface without social cues was marginally significant (t = 2.235, p = .07). Taken together, these findings provide support for the effectiveness of the experimental manipulation indicating that both conversational interface conditions were perceived as greater in the extent of turn-taking compared to a non-conversational interface and that the conversational interface with social cues possessed a significantly higher presence of social cues during the advisory task.

Results

Affective trust toward advisor

The interface manipulation had a significant effect on perceptions of affective trust (F(2, 304) = 11.740, p < .001). Specifically, follow-up contrasts with Holm correction for family-wise errors revealed that participants attributed a significantly greater level of affective trust toward the conversational interface without social cues compared to the non-conversational interface (MConversational_SocialCuesAbsent = 3.90, MNonConversational = 3.27, t = 2.750, p < .05). This effect further increased in the conversational interface with social cues condition, both when comparing to the non-conversational interface (MConversational_SocialCuesPresent = 4.37, t = 4.834, p < .001), as well as to the conversational interface without social cues (t = 2.093, p < .05).

Benevolence attribution toward firm

Similarly, we found a significant effect of the type of interface on attributions of benevolence toward the financial services firm (F(2, 304) = 8.259, p < .001). Mirroring the results of affective trust, follow-up contrasts with Holm correction confirmed that participants who interacted with the conversational interface without social cues perceived the financial services firm as significantly more benevolent than participants who interacted with the non-conversational interface (MConversational_SocialCuesAbsent = 3.75, MNonConversational = 3.22, t = 2.627, p < .05). This effect further increased in the conversational interface with social cues compared to the non-conversational interface (MConversational_SocialCuesPresent = 4.03, t = 4.006, p < .001). The difference between both conversational interface conditions was non-significant (t = 1.384, p = .17).

Mediation

Next, we estimated a simple mediation model to provide a direct test of our theorizing. Using effect contrasts of the interface condition (coding the non-conversational robo advisor as −1, the conversational robo advisor without social cues as 0, and the conversational robo advisor with social cues as 1), we estimated a simple mediation model (5000 bootstrap samples) in which the interface condition served as the independent, affective trust as the mediator, and perceived benevolence of the financial services firm as the dependent variable. In support of our theorizing, the positive main effect of the conversational interface on attributions of benevolence toward the financial services firm (βConversational = .37, t = 3.124, p < .01) was significantly mediated via first enhancing affective trust in the robo advisor (βTrustAff = .64, t = 17.926, p < .001), rendering the residual direct effect non-significant (βConversational_Direct = .03, t = .342, p = .73), indicating full mediation (βIndirect = .34, 95% CI: [.17; .51]).

Sentiment analysis

To gain additional qualitative insight into consumers’ thoughts and feelings, we analyzed the thought listing task as follows. As highlighted in the methods and procedure section, we concatenated participants’ thoughts into a single text vector and assessed sentiment valence based on participants’ polarity ratings. Providing additional evidence for our theorizing, these results reveal a significant effect of interface type on sentiment valence (F(2, 304) = 13.500, p < .001). Follow-up contrasts with Holm correction for family-wise errors confirm that participants who interacted with the conversational interface without social cues evaluated the robo advisor interface more positively than those who interacted with the non-conversational interface (MConversational_SocialCuesAbsent = .07, MNonConversational = −.19; t = 2.959, p < .01). This effect further increased in the conversational interface with social cues compared to both the non-conversational interface (MConversational_SocialCuesPresent = .26, t = 5.183, p < .001) and the conversational interface without social cues (t = 2.234, p < .05).

To illustrate the pattern of results based on these qualitative insights, we analyzed participants’ qualitative responses using the AFINN lexicon in the tidytext package in R. Fig. 2 demonstrates the positive relationship between affective trust and firm benevolence (i.e., positive slope across all conditions). Each observation represents the most frequent sentiment per participant and is further color-coded based on the corresponding sentiment valence (using the AFINN lexicon). These findings illustrate that greater affective trust is associated with a more positive evaluation of the financial services firm and a greater density of positively valenced associations in the conversational interface conditions (such as feeling excited, joyful, and interested) as opposed to the non-conversational interface condition with more negatively valenced sentiments (such as feeling bored or stressed).

Fig. 2
figure 2

Affective trust, benevolence, and consumer sentiment (Study 1)

Discussion

The findings of Study 1 provide evidence that both conversational robo advisors (with or without social cues) evoke greater levels of affective trust as opposed to non-conversational interfaces. These changes in the interface modality in turn led to a more positive firm evaluation, and a more positively valenced advisory experience for consumers. Providing additional social cues led to overall larger effect sizes when contrasting both conversational interface conditions. Based on these initial findings and the increasingly dominant trend to anthropomorphize conversational interfaces across industries (BenMark and Venkatachari 2016), all subsequent studies will build on the conversational interface condition with present social cues during the advisory process.

Study 2

Can conversational as opposed to non-conversational interfaces also affect investor behavior more directly? The key objective of Study 2 was to test whether private investors are willing to let a conversational as opposed to non-conversational robo advisor manage their assets as part of an incentive-compatible investment game.

Design and procedure

A total of 309 participants participated in this study using the same preselection criteria as in Study 1 (MAge = 36.32, SDAge = 10.48, 36% females). Participants were randomly assigned to either a non-conversational interface condition or a conversational interface condition with social cues. As in Study 1, all conditions involved the same risk profiling task in exactly the same sequence. At the end of the risk profiling task, participants received a personalized investment portfolio based on their risk profile, displaying the percentage of asset classes such as bonds, high yield bonds, stocks, and cash, with either a capital preserving (only 5% stocks), balanced (30% stocks), or growth-oriented investment portfolio (85% stocks) (see Web Appendix A1.2 and A2.1 for a detailed description of the portfolio matching algorithm).

Immediately after the risk profiling task, participants completed an incentive-compatible investment game. The investment game was designed as follows: Participants received a virtual currency of 100,000 coins and had to decide on the amount they are willing to invest in the portfolio they received during the risk profiling task. Participants were informed that all asset classes may fluctuate and can produce negative and positive returns while non-invested coins will receive a return of zero. Participants then decided which percentage they wish to invest and have managed by their respective robo advisor (dependent on condition). Participants’ decision was consequential, as one participant was chosen at random and received 1% of the surplus from the investment game (e.g., if a participant made a surplus 8000 coins, they had the chance to win $80.00). The key dependent variable in this study was participants’ selected amount to invest and be managed by the robo advisor. Immediately after completing the investment game, participants were forwarded to a follow-up questionnaire to assess their perceptions of affective trust toward the robo advisor (αTrustAff = .85) and attributions of benevolence toward the financial services firm using the same items as in Study 1 (αFirmBenevolence = .87).

Results

Investment amount

Participants in the conversational robo advisory interface condition were willing to invest a significantly larger amount managed by the robo advisor compared to participants in the non-conversational robo advisor condition (MConversational = 62.71%, MNonConversational = 55.69%, t = 2.203, p < .05). Comparing the proportion of those participants who were willing to invest their entire endowment (i.e., 100%) to be managed by the robo advisor further revealed a significantly larger proportion in the conversational compared to the non-conversational robo advisor condition (PConversational = 19.4%, PNonConversational = 7.8%; χ2(1, N = 309) = 7.834, p < .01).

Affective trust and benevolence

Participants in the conversational robo advisory interface also attributed significantly greater levels of affective trust compared to the non-conversational robo advisory interface (MConversational = 5.49, MNonConversational = 4.79, t = 4.860, p < .001). Furthermore, consumers also attributed greater benevolence toward the financial services firm in the conversational as opposed to the non-conversational robo advisor condition (MConversational = 5.16, MNonConversational = 4.79, t = 2.792, p < .01). Consistent with our previous findings, the effect on attributions of benevolence was significantly mediated via perceptions of affective trust (95% CI of indirect effect [.30; 75]), rendering the residual direct effect of the advisory interface condition on benevolence non-significant (βConversational = −.15, t = 1.847, p > .07).

Path model

Next, we estimated a path model using the lavaan package in R with robust standard errors to assess the effect of interface condition (coding the non-conversational robo advisor as 0 and the conversational robo advisor condition as 1) on attributions of benevolence and investment amount via affective trust. The path model results are summarized in Fig. 3 and demonstrate that the increase in affective trust evoked by the conversational robo advisor (βConversational = .53, z = 4.873, p < .001), in turn led to a significant increase in attributions of benevolence (βTrustAff = .81, z = 21.773, p < .001) and a significant increase in investment amount (βTrustAff = .40, z = 8.076, p < .001).

Fig. 3
figure 3

Path model results (Study 2)

Discussion

Study 2 demonstrates that conversational as opposed to non-conversational interfaces affect both investor perception (attributions of affective trust toward the robo advisor and benevolence of a financial services firm) and investor behavior (asset allocation toward the conversational as opposed to non-conversational robo advisor in an incentive-compatible investment game). Extending the findings of Study 1, the current study also highlights the central role of affective trust in the robo advisor and how greater levels of trust in turn affect investors’ decision making using an incentive-compatible measure of behavior.

Study 3

The findings of Study 1 and 2 reveal initial insight on how conversational robo advisors affect consumer financial decision making, demonstrating the positive effects of conversational robo advisors in terms of a more benevolent firm perception, an increase in asset allocation toward robo advisors, and overall positive consumer sentiment. The next two studies explore further downstream consequences of these effects for consumers. Specifically, Study 3 tests whether consumers accept a portfolio recommendation even if the recommendation might be objectively wrong. That is, the current study tests whether receiving a recommendation that is inconsistent with consumers’ actual risk profile varies as a function of the type of robo advisor. We focus on the acceptance of portfolio recommendations that are objectively wrong (i.e., inconsistent with investors’ actual risk profile) as the more interesting case, as it provides the opportunity to directly assess whether consumers are adequately attuned to the investment advice they receive in a conversational versus non-conversational interface.

Design and procedure

Mirroring the experimental paradigm used in Studies 1 and 2, participants were randomly assigned to a non-conversational robo advisor condition or a conversational robo advisor condition with social cues. Participants first completed exactly the same risk assessment task as in all preceding studies. At the end of the risk assessment task, participants received a personalized portfolio recommendation based on their risk profile as in Study 2. However, this recommended portfolio was opposite to their actual risk profile. Specifically, if an investor’s risk profile was above the midpoint of the risk profile index (risk-seeking investor), s/he received a recommended capital-preserving portfolio. On the other hand, if an investor’s risk profile was below or equal to the midpoint of the risk profile index (risk-averse investor), s/he received an aggressive, growth-oriented portfolio. That is, more risk-averse investors received a portfolio recommendation with a greater percentage of stocks as opposed to bonds or money market funds, whereas risk-seeking investors received a portfolio recommendation involving a greater percentage of bonds and money market funds as opposed to stocks (see Web Appendix A1.2 for details of the matching algorithm). All other procedures were identical as in the preceding studies and we used the same scale items to assess the level of affective trust toward the robo advisor (αTrustAff = .94) and attributions of benevolence toward the financial services firm (αFirmBenevolence = .93). A total of 154 private investors (MAge = 34.57, SDAge = 10.62, 33.7% females) participated in this study and were preselected based on the same pre-screening criteria as before (active investors with an interest in financial advisory).

Results

Affective trust and benevolence

Participants in the conversational robo advisory interface displayed significantly greater levels of affective trust compared to the non-conversational robo advisory interface (MConversational = 4.78, MNonConversational = 3.24, t = 5.662, p < .001). Furthermore, consumers attributed greater benevolence toward the financial services firm in the conversational as opposed to the non-conversational robo advisor condition (MConversational = 4.39, MNonConversational = 3.66, t = 2.916, p < .01). Consistent with our previous findings, the effect on attributions of benevolence was significantly mediated via perceptions of affective trust (95% CI of indirect effect [.62; 1.47]), rendering the residual direct effect of the advisory interface condition on benevolence non-significant (βConversational = −.29, t = 1.505, p > .13).

Recommendation acceptance

Next, we estimated a logit model to assess the effect of the interface condition on recommendation acceptance. The findings demonstrate that participants in the conversational robo advisor condition were significantly more likely to accept the objectively incorrect portfolio recommendation compared to participants in the non-conversational robo advisor condition (PConversational = 73.4%, PNonConversational = 40.0%; βConversational = 1.42, z = 4.096, p < .001). Critically, this effect was robust across both risk-averse investors (PConversational = 71.2%, PNonConversational = 38.0%, βConversational = 1.39, z = 3.406, p < .001) and risk-seeking investors (PConversational = 80.0%, PNonConversational = 44.0%, βConversational = 1.42, z = 2.362, p < .05) (see Fig. 4).

Fig. 4
figure 4

Recommendation acceptance by risk profile (Study 3)

Path model

Mirroring the analyses of Study 2, we estimated a path model using the lavaan package in R with robust standard errors to model the effect of the interface condition on affective trust, and how greater levels of affective trust in turn affect firm perception and recommendation acceptance. As summarized in Fig. 5, we find that the pathway of the conversational interface increasing affective levels of trust (βConversational = 1.54, z = 5.699, p < .001), which in turn had a systematic influence on both firm perception and actual behavior. Specifically, affective trust in the robo advisor led to an increase in attributions of benevolence toward the financial services firm (βTrustAff = .625, z = 13.379, p < .001) and greater recommendation acceptance (βTrustAff = .10, z = 5.343, p < .001). These model results were robust even after controlling for differences in investors’ age, risk-profile index, current level of debt, or the volume of assets that investors possessed as indicated by higher information criteria compared to our main model (BICMainModel: 1297.5; BICControlModel: 1311.9).

Fig. 5
figure 5

Path model results (Study 3)

Discussion

Study 3 provides evidence that greater perceptions of affective trust not only affect attributions of benevolence toward the firm but also the likelihood to accept a recommended portfolio, even if this portfolio is inconsistent with consumers’ actual risk profile. Given that all participants were active private investors and that the effect was robust across both risk-averse and risk-seeking investors, these findings indicate the malleability of investor decision making in the context of conversational robo advisors and the central role of greater affective trust to alter financial decision making. We provide a more extensive discussion on the potential policy implications for consumer welfare and financial regulation in the General Discussion section.

Study 4

Regulations in the financial sector have led to a requirement for disclaimers during the advisory process in which (human) advisors have to inform their clients about the cost implications of their financial choices (Nussbaumer et al. 2012). Recent debates on the regulation of the robo advisor industry have discussed the necessity to hold robo advisors to the same regulatory requirements as human advisors (Baker and Dellaert 2018). The current study is specifically designed to test whether such informational disclaimers are sufficient to debias investors and to reduce the observed over-reliance on recommendations received by a conversational robo advisor. The key hypothesis in Study 4 was to test whether this observed over-reliance to follow the algorithmic advice of a robo advisor could be attenuated by providing disclaimers that a financial services firm (and the robo advisor they employ) might act in the interest of the firm providing the advisory service.

Design and procedure

A total of 259 active private investors (same selection criteria as in the preceding studies) participated in this study (MAge = 34.42, SDAge = 10.56, 44% females). Participants were randomly assigned to a 2 (type of robo advisor: conversational vs. non-conversational) × 2 (disclaimer: present vs. absent) between-subject experiment. Participants completed the same risk assessment procedure as in the preceding studies using either a non-conversational robo advisor or conversational robo advisor with social cues. Next, all participants received a portfolio recommendation consistent with their risk profile. This portfolio comprised both actively and passively managed positions in a portfolio. All participants were led to believe that the ideal composition of a portfolio would be a spread of 30% in actively managed positions and 70% in passively managed positions (this spread was chosen based on prior work on the long-term performance of active to passive investments; see Sorensen et al. 1998). In both conditions, the advisory system recommended to increase the percentage of actively managed positions to outperform the market, even though this adjustment would cause greater annual fees. This recommendation was identical between conditions and no further information was provided about the specific range of recommended active positions. In the disclaimer-present condition, participants were then shown a prominent disclaimer prior to the opportunity to adjust the percentage of actively managed positions indicating that “The advice received on this website might be based on partial information about your actual financial situation, needs, or objectives. An automated financial advisory system might be specifically programmed to act in the interest of the company providing the advisory service.” In the disclaimer-absent conditions, no additional information was provided before participants selected their preferred percentage of actively managed investments. After that, participants were free to modify the specific amount of active to passive positions in their portfolio as they wished. Participants were then forwarded to a final questionnaire and indicated their perceptions of affective trust (αTrustAff = .92) and attributions of benevolence toward the financial services firm (αFirmBenevolence = .93). The deviation from the initially provided 30% benchmark of actively managed funds served as the main dependent variable in this study.

Results

Recommendation acceptance

As shown in Fig. 6, a two-way ANOVA with the deviation from the 30% benchmark of actively managed positions as the dependent variable and robo advisor condition (conversational vs. non-conversational) and the disclaimer condition (present vs. absent) as independent variables, revealed a significant main effect of the robo advisor condition (F(1, 255) = 23.753, p < .001), while the main effect of the disclaimer condition (F(1, 255) = 1.405, p > .23) and the interaction between both experimental factors were non-significant (F(1, 255) = 0.935, p > .33). Follow-up contrasts with Holm correction for family-wise errors revealed that participants in the conversational robo advisory condition with disclaimer (MConversational_DisclPresent = 39.73%) still deviated significantly from the 30% benchmark compared to the non-conversational robo advisor without disclaimer (vs. MNonConversational_DisclAbsent = 33.04%, t = 2.499, p < .05) or with disclaimer (vs. MNonConversational_DisclPresent = 32.46%, t = 2.681, p < .05). Also, the conversational robo advisory condition without disclaimer (MConversational_DisclAbsent = 43.95%) deviated significantly from both the non-conversational robo advisor condition with (t = 4.337, p < .001) or without disclaimer (t = 4.173, p < .001). Contrary to our expectation, the difference between both conversational robo advisor conditions was non-significant (p > .26). While we observed only minor deviations from the provided benchmark in the non-conversational robo advisor conditions (tested against zero: MNonConversational = 2.76%, t = 2.791, p < .01), participants in the conversational robo advisory conditions deviated significantly from the 30% benchmark (tested against zero: MConversational = 11.93%, t = 7.05, p < .001).

Fig. 6
figure 6

Deviation from 30% benchmark by iInterface and disclaimer condition (Study 4)

Path model

Next, we estimated a path model using the lavaan package in R with robust standard errors and a linear link function to model the effect of the interface condition on affective trust, and how greater levels of affective trust in turn affect firm perception and recommendation acceptance in terms of deviation from the 30% benchmark. The positive effect of the conversational interface condition on affective levels of trust (βConversational = 1.79, z = 9.913, p < .001) in turn had a systematic influence on both firm perception and behavior (see Fig. 7). Specifically, affective trust in the robo advisor led both to an increase in attributions of benevolence toward the financial services firm (βTrustAff = .54, z = 12.043, p < .001) as well as increasing recommendation acceptance in terms of benchmark deviation (βTrustAff = 2.42, z = 4.197, p < .001). These model results were robust even after controlling for differences in investors’ age, risk-profile index, current level of debt, or the volume of assets that investors possessed compared to our main model (BICMainModel: 3893.4; BICControlModel: 3904.9).

Fig. 7
figure 7

Path model results (Study 4)

Discussion

The combined evidence of Studies 3 and 4 suggests that consumers are significantly more likely to follow investment advice from a conversational robo advisor compared to non-conversational robo advisors even if this investment advice is inconsistent with their actual risk profile (Study 3) or invokes larger annual management fees (Study 4).

General discussion

Theoretical implications

The current work makes four novel contributions. First, to the best of our knowledge, the current research is the first that contrasts the effects of non-conversational robo advisors compared to conversational, dialogue-based robo advisors and how they systematically affect firm perception and investor behavior. We provide a novel conceptualization building on the psychology of human-to-human conversations and demonstrate how the conversational capacity of conversational robo advisors can systematically alter consumer financial decision making and firm perceptions. The current findings demonstrate a more positively-valenced consumer experience (see qualitative insights of Study 1), a more positive evaluation of a financial services firm, and greater recommendation acceptance (even for objectively wrong portfolio advice as in Study 3). The underlying psychological process revealed in the current research provides evidence that the affective relationship between a human investor and a robo advisor is central to our understanding of the perceptual and behavioral changes in response to interacting with a conversational robo advisor.

Second, the findings of this research contribute to prior work on trust formation between machines and humans (Coeckelbergh 2012; Hancock et al. 2011; Laursen 2013) and the role of trust as a critical component in market exchange processes (Bart et al. 2005; Hoffman et al. 1999; Palmatier et al. 2013). While prior work on trust in robots primarily explored trust formation processes between physical and typically anthropomorphic robots (Laursen 2013; Wright et al. 2013), the current research provides evidence that the formation of trust is not limited to humanoid robots possessing anthropomorphic characteristics and can be induced by altering the interaction modality between humans and machines (as in the current research involving a dialogue or chat-based response modality). These findings indicate that trust formation processes and affective relationships between machines and humans may form independently of anthropomorphic traits or morphology (as with physical robots for example) and can be induced merely by altering the properties of how the interface is designed to structure the interaction between humans and machines.

Third, the current findings also contribute to prior work on affective trust more specifically. The formation of affective trust has been considered a long-term process (Rousseau et al. 1998), one that requires significant time evaluating the other entity to build and enhance affective levels of trust during the relationship formation phase (McAllister et al. 2006). However, the current findings demonstrate that affective levels of trust can be enhanced by changing the modality of interaction (such as a greater extent of turn-taking) or through the specific use of social cues during the interaction (such as the use of a more affect-rich language or trivial acknowledgments). Thus, the current findings reveal that affective trust can be induced also more short-term, altered through the specific use of social cues, enhancing and positively affecting consumers’ onboarding and investment experience.

Fourth, the current work contributes to the emerging work on immersive consumer experiences in marketing (Agarwal and Karahanna 2000; Brakus et al. 2009; Schmitt et al. 2015) and the formation of consumer–object relationships through the use of new technologies (van Doorn et al. 2017; Hildebrand et al. 2020; Hoffman and Novak 2018; Melumad et al. 2020; Mende et al. 2019). Even though the time to complete the advisory process was almost twice as long in the conversational compared to the non-conversational robo advisory interface across studies (pooled duration data: MConversational = 288 s, MNonConversational = 162 s, t = 8.647, p < .001), the current findings illustrate that consumers experienced the advisory process as more engaging, intimate, and enjoyable (see also the overall positive consumer sentiment revealed in the qualitative insights of Study 1). These findings highlight that the time to complete a task is a shortsighted measure compared to the importance of the actual task experience. Thus, the conversational capacity of interfaces and the effects reported in the current work illustrate how the conversational modality of a robo advisory system provides more engaging user experiences and a more enjoyable digital onboarding experience during the customer acquisition phase, positively affecting a series of perceptual and behavioral outcomes for both consumers (more positively valenced advisory experience) and the firm (more benevolent view of the firm, more positive consumer sentiment, greater willingness to follow investment advice, and overall greater trust toward a robo advisor).

Future research

We hope that the current research might inspire future work in two notable ways. First, future research may further examine the extent to which conversational interfaces can be specifically designed to express pre-designed “personality traits” that are consistent with the personality of the brand or firm. We see a variety of interesting and unanswered questions in the marketing and branding domain such as whether specific brand or firm associations could be expressed by tailoring the language, appearance, or other design features of a conversational interface. Second, we see great potential for future work at the marketing and finance intersection and how robo advisors might alter consumer–firm relationships more broadly. The current work hints at the possibility that the creation of more affective relationships between humans and machines might not be limited to interfaces that are intentionally anthropomorphized, but that they could be “designed” by altering the modality of interaction within the interface itself (such as building on the capacity to take turns with a richer set of social cues to induce greater affective trust). Recent research by Hoffman and Novak (2018) could provide the conceptual foundations along which future work may organize and study such emerging consumer–object and consumer–firm relationships for the effective design of conversational robo advisors. Future work may also test these effects under realistic market conditions with more consequential outcomes or further assessing the robustness of the current findings across firm types and sales channels (see also Lourenço et al. 2020).

Managerial and policy implications

The findings of this research have important managerial and policy implications. First, we see great potential for the financial industry and other industries to improve the digital customer experience during the onboarding process. Repeated dyadic interaction during service encounters critically contributes to greater customer satisfaction (Solomon et al. 1985) and recent surveys suggest that private investors strongly anchor on the affective experience during these first financial advisor encounters (Darwish 2006). The turn-taking paradigm of conversational robo interfaces as illustrated in the current research demonstrates that conversational interfaces can contribute to such positive, more affective onboarding and firm experiences. Although the current research did not explicitly investigate the entire onboarding process, both future academic research and practitioners may further explore whether the trust and benevolence attributions shown in the current research translate into a more successful onboarding and customer acquisition process.

Second, we see great potential to link conversational interfaces to contextual factors during the investment selection process. For example, conversational robo advisors provide a natural interface to acquire consumers’ personal interests either explicitly through direct interaction or unobtrusively by using referrer or cookie information from the browser. This information can be used to tailor specific recommendations throughout the advisory process, similar to prior work in retargeting and journey analytics on consumer websites (Hildebrand and Schlager 2019; Urban et al. 2014). These avenues hold the potential to provide a more personalized customer experience based on information that is generally unobservable in traditional human-to-human service encounters. Thus, conversational interfaces provide a powerful tool to shape and unlock value in digital customer journeys and provide a more tailored one-to-one customer experience.

Finally, the current findings also address recent debates on the increasing share of actively managed funds among robo advisor providers. For example, Betterment’s “smart beta” strategy was introduced recently for those investors who wish to “outperform the market” (Salmon 2018). Similarly, companies such as Wealthfront have strayed from putting investors solely into a low-cost indexing strategy and decided to invest 20% of their clients’ assets into an internal “risk parity” fund that invested in derivatives such as total return swaps. Both cases caused significant negative press in the media and among investors, questioning these changes in the firm’s investment strategy (Salmon 2018). The findings of Study 4 suggest that conversational robo advisors can lead to an increase in the acceptance of a greater number of actively managed funds in consumers’ portfolio without jeopardizing perceptions of trust or leading to negative perceptions of the firm. However, the combined evidence of Studies 3 and 4 also underline recent calls for missing regulations of robo advisors in the financial industry (Baker and Dellaert 2018) and that consumers are not sufficiently attuned to question the received automated investment advice.

Conclusion

As robo advisors and other digital advisory systems grow in scale and modalities (from traditional, non-conversational interfaces to the natural language processing capabilities of conversational interfaces), a new cross-disciplinary science of new technologies and augmented consumer decision making is emerging. We hope that the current work inspires more work in this emerging field on the consumer psychology of new technologies and the implications for consumer welfare in the digital economy.