1 Introduction

Paid advertising could potentially increase earned media exposures such as social media posts and word of mouth (WOM, hereafter). Brand conversations commonly reference advertisements with estimates of online buzz about movie trailers ranging from 9% (Gelper et al. 2016) to 15% (Onishi and Manchanda 2012), and Keller and Fay (2009) estimate that 20% of all WOM references TV ads. Some industry reports claim that the impact of advertising on WOM is considerable (Graham and Havlena 2007; Nielsen 2016; Turner 2016), and that the impact on total WOM (online and offline) can amplify the effect of paid media on sales by 15% (WOMMA 2014). In some cases, this expectation to boost earned mentions is used to justify buying high priced ad spots in programs like the Super Bowl (Siefert et al. 2009; Spotts et al. 2014).

In contrast, scholarly research that estimates the WOM impressions gained from advertising is scarce. As illustrated in Fig. 1, the current literature either focuses on the influence of advertising on sales (Naik and Raman 2003; Sethuraman et al. 2011; Danaher and Dagger 2013; Dinner et al. 2014), WOM on sales (Chevalier and Mayzlin 2006; Liu 2006; Duan et al. 2008; Zhu and Zhang 2010), or their joint influence on behaviors (Hogan et al. 2004; Chen and Xie 2008; Moon et al. 2010; Stephen and Galak 2012; Onishi and Manchanda 2012; Gopinath et al. 2013; Lovett and Staelin 2016). Research on how advertising induces WOM is mostly conceptual (Gelb and Johnson 1995; Mangold et al. 1999), or theoretical (Smith and Swinyard 1982; Campbell et al. 2017). Existing empirical studies that measure the effect of advertising on WOM, are sparse, and focus on case studies for a single company (Park et al. 1988; Trusov et al. 2009; Pauwels et al. 2016) or specific product launches such as Onishi and Manchanda (2012) and Bruce et al. (2012) for movies, and Gopinath et al. (2014) for mobile handsets, and Hewett et al. (2016) for US banks. Recently, Tirunillai and Tellis (2017) studied how a TV advertising campaign for HP influenced the information spread and content of the blogs and product reviews of the brand. Tirunillai and Tellis (2012) studied the effect of online WOM for 15 firms from 6 markets but the main focus was on firms’ stock market performance. All these studies focused on online social media and did not incorporate offline WOM, although offline WOM is estimated to be 85% of WOM conversations (Keller and Fay 2012). Further, the results from these studies are mixed, with some positive effects (Onishi and Manchanda 2012; Tirunillai and Tellis (2012, 2017); Gopinath et al. 2014), non-significant effects (Trusov et al. 2009; Onishi and Manchanda 2012; Hewett et al. 2016; Pauwels et al. 2016), and even negative effects (Feng and Papatla 2011).

Fig. 1
figure 1

Overview of the literature of advertising and WOM

The main goal of this paper is to evaluate the effect of advertising on WOM. We distinguish two separate measures for WOM using different data sources. The first measure (labeled as total WOM) is drawn from the Keller Fay TalkTrack database. This dataset includes comprehensive information about the number of mentions of brands in individuals’ online and offline conversations. From this dataset, we use information on 538 US national brands across 16 broad categories and over 6.5 years. The second measure (labeled as online WOM) comes from Nielsen-McKinsey Insight database including the number of mentions on online social media posts for the same set of 538 brands over 5 years on Twitter, blogs, and user forums.

We use two distinct analysis approaches to evaluate the effect of advertising on WOM. Our main analysis leverages monthly WOM and advertising expenditures on Internet, TV, and other media (from Kantar Media’s Ad$pender database) to quantify the relationship between advertising expenditures and WOM. We use panel regressions that include brand fixed effects, category-quarter fixed effects and time effects (trends), while also controlling for past WOM and news mentions. All variables have brand-level heterogeneous effects.

We find that the relationship between advertising and WOM is significant, but small. The average implied elasticity of total WOM is 0.019 for TV advertising expenditures and 0.014 for Internet display advertising expenditures. The average implied elasticities of online WOM are in similar ranges: 0.009 for TV advertising and 0.010 for Internet display advertising. To put these numbers into some perspective, for the average monthly spending on TV advertising in our sample, approximately 58 million ad exposures are generated. Based on our estimates, a 10% increase in TV advertising expenditure is associated with about 69,000 additional impressions from total WOM.

We find significant heterogeneity across brands and categories in the estimated relationship between advertising and WOM. For instance, categories with the largest implied elasticities to TV advertising on total WOM are Sports and Hobbies, Telecommunications, and Media and Entertainment. However, the average implied elasticity, even for these largest categories, is still relatively small (e.g., average elasticity between 0.03 and 0.05). Similar conclusions can be drawn for the online WOM.

We conduct a series of robustness tests and find our results are largely consistent across these specifications. In some of these tests, we use instrumental variables (advertising costs and political advertising expenditures) to obtain exogenous variation to estimate the advertising-WOM relationship. Because our results suggest small effects, the main endogeneity concerns are downward biases which could arise from WOM acting as an advertising substitute or measurement errors in the advertising expenditure variables. The results from these robustness tests are supportive of a limited role for these concerns.

Our second set of analyses uses a different approach to causal inference and studies the effect of advertising on WOM where the effect is expected to be large—Super Bowl advertising. We conduct an analysis using the generalized synthetic control technique (Abadie et al. 2010; Bai 2013; Xu 2017), which constructs a difference-in-difference type estimator by matching the treatment group to a control group synthesized from a weighted combination of the non-treated brands. This causal inference technique aims to reduce the potential sources of bias in order to assess from non-experimental data the causal impact of a treatment (in this case, advertising on the Super Bowl) on the outcome (WOM).

From this second set of analyses, we find that being a Super Bowl advertiser increases monthly total WOM by 16% in the month of the Super Bowl and by 22% in the week after the Super Bowl. This increase suggests “free” impressions of the order of 10%–14% of the average monthly ad impressions. This magnitude represents a meaningful contribution because most evidence suggests the impact of WOM engagements on consumer choices is larger than that of advertising exposures (Sethuraman et al. 2011; You et al. 2015; Lovett and Staelin 2016). However, it is perhaps still not as large as one might expect given the large cost of becoming a Super Bowl advertiser. The effect is stronger, but short-lived for online social media posts: we find an average increase of 68% on the day of the Super Bowl. These estimates for online WOM posts suggest that in some cases online posts respond more to advertising than total WOM.

Our findings portray a world in which typical advertising does not really buy lasting, broad-based earned impressions, but might increase online posts in the short-term, for specific, large-scale campaigns. Paid advertising developed for TV and the Internet should not automatically be associated with meaningful increases in WOM. If a brand has the goal of increasing WOM, and uses advertising as the vehicle to do so, then care must be taken both to design the campaign for this goal (Van der Lans et al. 2010) and to monitor that the design is successful. In particular, our results suggest that monitoring needs to include more than counts of online posts, as such measures are neither representative nor reflect total WOM. Our results also demonstrate that some campaigns for some brands such as Super Bowl advertisements generate far higher WOM response, but that the small average implied elasticity and low heterogeneity across brands and categories suggest that these larger effects are relatively rare and are not obtained without a focused investment of considerable resources.

2 Existing theory and evidence on the advertising-WOM relationship

Marketing theory provides a foundation for both a positive and a negative advertising-WOM relationship. On the positive side, engaging in WOM is driven by the need to share and receive information, have social interactions, or express emotions (Lovett et al. 2013; Berger 2014). Advertising can trigger these needs and potentially stimulate a WOM conversation about the brand. Four routes through which advertising might trigger these needs include attracting attention (Batra et al. 1995; Mitra and Jr 1995; Berger 2014), increasing social desirability and connectedness (Aaker and Biel 2013; Van der Lans and van Bruggen 2010), stimulating information search (Smith and Swinyard 1982), and raising emotional arousal (Holbrook and Batra 1987; Olney et al. 1991; Lovett et al. 2013; Berger and Milkman 2012).

However, advertising can also have a negative influence on WOM. Dichter (1966) argues that advertising decreases involvement, and if involvement has a positive influence on WOM (Sundaram and Webster 1998), advertising would cause a decrease in WOM. Feng and Papatla (2011) claim that talking about an advertised brand may make an individual look less unique, and may harm her self enhancement. Similarly, if advertising provides sufficient information so that people have the information they need, they will tend to be less receptive to WOM messages (Herr et al. 1991), which diminishes the scope for WOM.

The overall balance between the positive and negative influences is not clear. Scholarly empirical research on this issue is limited and the available results are mixed. Onishi and Manchanda (2012) estimated the advertising elasticity of TV advertising exposures on blog mentions for 12 movies in Japan, and found an elasticity of 0.12 for pre-release advertising, and a non-significant effect for post-release advertising. Gopinath et al. (2014) studied the impact of the number of ads on online WOM for 5 models of mobile phones and estimated elasticities of 0.19 for emotion advertising and 0.37 for “attribute” (i.e. informational) advertising. Feng and Papatla (2011) used data on cars to show both positive and negative effects of advertising on WOM. Using a model of goodwill for movies, Bruce et al. (2012) found that advertising has a positive impact on the effectiveness of WOM on demand, but did not study the effect on WOM volume. Bollinger et al. (2013) found positive interactions between both TV and online advertising and Facebook mentions in influencing purchase for fast moving consumer goods, but did not study how one affects the other. Tirunillai and Tellis (2017) studied how a TV advertising campaign for HP influenced the information spread and content of the blogs and product reviews of the brand. They found a 10-day elasticity of 0.15, and short-term elasticity of 0.08 on the volume of WOM. Tirunillai and Tellis (2012) studied 15 firms from 6 markets and estimated the elasticity of online WOM on advertising expenditures to be 0.09. Hewett et al. (2016) find that advertising spend by four banks do not affect online Twitter posts, and Pauwels et al. (2016) find that for one apparel retailer the effects of advertising on electronic brand WOM are relatively large in the long-term, but small in the short-term.

Thus, both marketing theory and scholarly empirical research offer mixed guidance about the direction and size of the advertising-WOM relationship. Our focus is to quantify this relationship using data that cuts across many industries and brands, spans a long time-period, and captures a wide set of controls. Our setting is mostly large established brands with relatively large advertising budgets. We next describe the main dataset used in our analysis.

3 Data

Our dataset contains information on 538 U.S. national brands from 16 product categories (the list is drawn from that of Lovett et al. 2013, see Web Appendix 1). The categories include: beauty products, beverages, cars, children’s products, clothing products, department stores, financial services, food and dining, health products, home design and decoration, household products, media and entertainment, sports and hobbies, technology products and stores, telecommunication, and travel services. The brands in the list include products and services, corporate and product brands, premium and economy brands. For each brand from January 2007 to June 2013, we have monthly information on advertising expenditures, total number of word-of-mouth mentions, and number of brand mentions in the news. We also have data on online WOM between July 2008 and June 2013. We elaborate on each data source and provide some descriptions of the data below.

3.1 Advertising expenditure data

We collect monthly advertising expenditures from the Ad$pender database of Kantar Media. For each brand, we have constructed three categories of advertising—TV advertising, Internet advertising, and other advertising. For TV advertising, we have aggregated expenditures across all available TV outlets (DMA-level as well as national and cable). For Internet advertising, the Kantar Media measure captures aggregated expenditures on display advertising. We focus on these TV and Internet advertising expenditures for three reasons. First, for our brands, these two types of expenditures make up approximately 70% of the total advertising expenditures according to Ad$pender. Second, TV advertising is the largest category of spending and has been suggested to be the most engaging channel (Drèze and Hussherr 2003) and often cited as generating WOM. Third, Internet advertising is touted as the fastest growing category of spending among those available in Kantar and reflects the prominence of “new media.” That said, we also collect the total advertising expenditures on other media, covering the range of print media (e.g., newspapers, magazines), outdoor, and radio advertisements.

3.2 Word of mouth and news data

We use two sources of word of mouth data. The first relates to total WOM and comes from an industry-standard measure of WOM that uses a representative sample in each week of self-reported brand conversations. The second is more typical of social media listening data and comes from queries into a large corpus of text from public online posts. In addition, we also collect the number of news and press mentions for each brand.

3.2.1 Total WOM data from the TalkTrack panel

Our primary word-of-mouth data is drawn from the TalkTrack dataset of the Keller-Fay Group. The dataset is an industry standard for measuring WOM, and has been used in various marketing academic studies (Berger 2014; Baker et al. 2016; see Lovett et al. 2013 for a detailed description). It contains the number of mentions for each brand every week across a sample of respondents, who are recruited to self-report for a 24-h period on all their word of mouth conversations. During the day they record their brand conversations and list the brands mentioned in the conversation. Note that a list of brands is not provided to respondents – i.e., they can mention any brand. These conversations can happen both online and offline. The inclusion of offline WOM is important, since it is estimated to be 85% of WOM conversations (Keller and Fay 2012).

The sample includes 700 individuals per week, spread approximately equally across the days of the week. This weekly sample is constructed to be representative of the U.S. population. The company uses a scaling factor of 2.3 million to translate from the average daily sample mentions to the daily number of mentions in the population. We aggregate the WOM data to the monthly level to match with the monthly advertising data on all brands in our main analysis.

3.2.2 Online posts from social media

The second source of word-of-mouth data we use is a dataset of social media posts extracted using the Nielsen-McKinsey Insight user-generated content search engine. This search engine has conducted daily searches through blogs, discussion groups, and microblogs, and the brand specific information is retrieved using designated queries written for each brand (see Lovett et al. (2013) for a detailed description). This dataset covers the time period between July 2008 and June 2013. We use this dataset to study the effects of advertising on online posts. In addition, this dataset allows us to conduct the second part of our analysis described in Section 6 at a more granular level since the online posts data are available at a daily level.

3.2.3 News and press mentions

WOM may be triggered by news media, which might also proxy for external events (e.g., the launch of a new product, a change of logo, product failure or recall). Such events could both lead the firm to advertise and consumers to speak about the brand, so that the WOM is caused by the event not the advertising. To control for such unobserved events and news, we use the LexisNexis news and press database to collect the monthly number of news and press mentions for each brand.

3.3 Descriptive statistics

Table 1 presents category specific information about the advertising, media mentions, and WOM mentions data. This table communicates the large variation across categories in the use of the different types of advertising and in the number of media mentions. For example, the highest spender on TV ads is AT&T, the highest spender on Internet display ads is TD AmeriTrade, and the brand with the highest number of news mentions is Facebook. The average number of total mentions for a brand in the sample is 15.8 (equivalent to 36 million mentions in the population), the brand with the highest total WOM is Coca Cola, and the brand with the highest online WOM is Google. In Web Appendix 1, we present time series plots for four representative brands as well as descriptive statistics and correlations for the data.

Table 1 Monthly spending on advertising (in millions of dollars) on TV and Internet, and number of news and press mentions (in thousands) per category

4 Model for Main analyses

In our main analysis, we focus on relating advertising expenditures to WOM. Our empirical strategy is to control for the most likely sources of alternative explanations and evaluate the remaining relationship between advertising and WOM. Hence, causal inference requires a conditional independence assumption. We are concerned about several important sources of endogeneity due to unobserved variables that are potentially related to both advertising and WOM and, as a result, could lead to a spurious relationship between the two. The chief concerns and related controls that we include are 1) unobserved (to the econometrician) characteristics at the brand level that influence the advertising levels and WOM, which we control using brand fixed effects (and in one robustness test, first differences), 2) WOM inertia that is spuriously correlated with time variation in advertising, controlled for by including two lags of WOM, 3) unobserved product introductions and related PR events that lead the firm to advertise and also generate WOM, which we control using news media mentions of each brand, and 4) seasonality and time varying quality of the brand that lead to both greater brand advertising and higher levels of WOM. For seasonality and time-varying quality, we use a (3rd order) polynomial function of the month of year to control for high-low seasons within a year, and category-quarter-year fixed effects. We also introduce common time effects in a robustness test.

With these controls in mind, our empirical analysis proceeds as a log-log specification (where we add one to all variables before the log transformation).Footnote 1 Under the conditional independence assumption, this specification imposes a constant elasticity for the effect of advertising expenditures on WOM and implies diminishing returns to levels of advertising expenditures. For a given brand j in month t, the empirical model is defined as

$$ {\displaystyle \begin{array}{l}\log {(WOM)}_{jt}={\alpha}_j+{\alpha}_{cq}+{\beta}_{1j}\mathit{\log}{(AdTV)}_{jt}+{\beta}_{2j}\log {(AdInternet)}_{jt}\\ {}\kern3.239999em +{\gamma}_{1j}\log {(WOM)}_{jt-1}+{\gamma}_{2j}\log {(WOM)}_{jt-2}+{X}_{jt}{\beta}_{0j}+{\varepsilon}_{jt}\end{array}} $$
(1)

where αj are brand fixed effects, αcq are category-quarter-year fixed effects, log(AdTV) and log(AdInternet) relate to the focal variables, logged dollar expenditures for TV and Internet display ads, and Xjt contains control variables that include the logged dollar expenditures for other advertising (print, outdoor, and radio), logged count of news and press articles mentioning the brand, and a polynomial (cubic) of month of year. The γ1j, γ2j, β0j, β1j, β2j are random coefficients for, respectively, the effect of lagged word-of-mouth variables, Xjt, and the two focal advertising variables. Here, we focus on the short-term impact of advertising on WOM by including the contemporary advertising only. In Section 5.3 and Web Appendix 2, we report the empirical results of the model with both contemporary and lagged advertising as well as the estimated long-term cumulative effects of advertising on WOM.

In what follows, we focus on the average relationship between advertising and WOM across brands. In one set of results we also allow observable heterogeneity in brand coefficients in the form of category-level differences.Footnote 2 For the models that include both random coefficients and fixed effects we use proc. mixed in SAS with REML. For the models without random coefficients we use plm in R, which estimates the model using a fixed effects panel estimator, noting that in both models our longer time-series implies negligible ‘Nickell bias’ in the lagged dependent variables (Nickell 1981).Footnote 3

5 Main results

We organize the results from our main analysis into four sections. The first section presents our results related to the magnitude of the average relationship between advertising and WOM and interpreting this magnitude in the broader context of advertising. The second section explores how much heterogeneity in the advertising-WOM relationship exists across brands and categories. The third section presents cumulative effects of advertising from a model with multiple lags of advertising, and the final section discusses other robustness tests including using instrumental variables to obtain exogenous variation to estimate the advertising-WOM relationship.

5.1 The advertising-WOM relationship

The first set of columns in Table 2 (Total) presents the results from estimating Eq. (1) on the total TalkTrack WOM dataset. In this section, we focus on the parameters related to the population mean. We find that the advertising variables indicate significant positive coefficients for both TV (0.019, s.e. =0.0017) and Internet display ad expenditures (0.014, s.e. =0.0021). The second set of columns in Table 2 (Online) describes the results for the dataset of online posts. We see that the estimated coefficients are similar but smaller – The coefficient for TV advertising is 0.009 (s.e. 0.001), and for Internet advertising it is 0.01 (s.e. 0.002). The difference between the coefficients for TV and Internet advertising are not significant in both datasets. This is consistent with the results of Draganska et al. (2014), who find that advertising on TV and the Internet do not have significantly different effects on brand performance metrics.

Table 2 Main model with dependent variable Ln(WOM)

The control variables take the expected signs, are significant, and have reasonable magnitudes. Based on the estimated effects for the lagged dependent variables, WOM has a low level of average persistence in WOM shocks that diminishes rapidly between the first and second lag, keeping in mind that these effects are net of the brand fixed effects. News mentions have a much larger significant and positive estimate, but we caution against interpreting this effect as arising due to news per se, since this variable could also control for new product introductions which typically are covered in the news. The variance parameters for the heterogeneity across brands are also all significant.Footnote 4 We discuss the heterogeneity related to the brand advertising variables in more detail in Section 5.2.

How big are these estimated advertising effects on WOM? Since the analysis is done in log-log space, the estimated coefficients on advertising are constant advertising elasticities under the causal interpretation of the coefficient. The implied elasticity of total WOM to TV advertising expenditures is 0.019 and to Internet advertising expenditures is 0.014. For online WOM, the implied elasticity to TV advertising expenditures is 0.009 and to Internet advertising expenditures is 0.01.

We offer some perspective on the magnitude of this relationship. First, the relationship is quite modest even in absolute magnitudes.Footnote 5 For instance, in our sample, the average number of conversations about a brand in a month is 15.8. Given the sampling procedure of Keller-Fay, they project that one brand mention in their sample equals 2.3 million mentions in the United States. This suggests there are 36.4 million conversations about the average brand in our dataset in a month. Our elasticity estimate implies that a 10% increase in TV advertising corresponds to around 69,000 additional conversations in total WOM about the brand per month. For the large, high WOM national brands that we study, this number of brand conversations is quite small. Consider the average spending of $5.89 million on TV advertising. A 10% increase in spending at 1 cent per advertising impression on average generates 58.9 million advertising impressions. In this case, the additional WOM impressions associated with advertising is orders of magnitude smaller than the advertising impressions, only one thousandth.

Second, the translation to sales based on the estimated WOM elasticities in the literature are quite small, too. For instance, You et al. (2015) in a meta-study of electronic WOM find an overall elasticity of 0.236 on sales. At this elasticity for WOM on sales, the average impact of advertising through WOM would be less than 0.004. Further, the 0.236 eWOM elasticity of You et al. (2015) is relatively large compared to recent studies that find elasticities between 0.01 and 0.06 (Lovett and Staelin 2016; Seiler et al. 2017). With these lower elasticities, the effect would be an order of magnitude smaller. Given that the meta-studies on the influence of advertising on sales (e.g., Sethuraman et al. 2011) reveal average advertising elasticities of 0.12, the implied impact of advertising on sales through WOM is only a very small part of the overall advertising influence.

How do these elasticities relate to the elasticities reported in the specific cases studied in the scholarly literature? As mentioned above, reported results are mixed, with some analyses showing a positive effect (Onishi and Manchanda 2012; Tirunillai and Tellis (2012); Gopinath et al. 2014; Pauwels et al. 2016), some showing no significant effect (Trusov et al. 2009; Onishi and Manchanda 2012; Hewett et al. 2016), and some even showing negative effects (Feng and Papatla 2011). The comparison, even in the cases of positive elasticities is not very direct. For example, Onishi and Manchanda (2012) provide an estimated elasticity of 0.12 for daily advertising exposures on pre-release WOM, where the WOM is blogs about 12 different movies in Japan. For five models of mobile phones Gopinath et al. (2014) find elasticities between 0.19 and 0.37 for monthly online WOM to the number of advertisements. Pauwels et al. (2016) finds long-term brand electronic WOM elasticities of 0.085, 0.149, 0.205, and 0.237 for TV, print, radio, and paid search ads for weekly data about one apparel retailer. We differ notably in two ways. First, our measure is the response of total monthly WOM, which may smooth some of the daily variation captured in Onishi and Manchanda (2012) and the weekly variation in Pauwels et al. (2016). Second, our data covers over 500 brands, spans 6.5 years, and includes all types of WOM, not just online. With these broader definitions and sample, it appears the estimated average relationship between advertising and WOM is much smaller than what is currently reported in the literature.Footnote 6 Hence, in absolute terms and relative to the positive findings in the literature, we find a weak average advertising-WOM relationship.

5.2 Does the average effect mask larger effects for some brands or categories?

We now turn to how much stronger the relationship is for some brands and categories than the average coefficients we reported thus far. Brand level heterogeneity in the relationship between advertising and WOM could lead some brands to have strong relationships and others to have weak relationships, resulting in the small average coefficients described above. For instance, this variation could arise from different customer bases, different brand characteristics, different degrees of engagement with the brand, or different types or quality of advertising campaigns between brands. Heterogeneity variances in both Total WOM and Online WOM reported in Table 2 shows that the standard deviations for the heterogeneity in advertising coefficients are roughly the same size as the coefficients themselves, indicating that brands differ meaningfully in the relationship between WOM and advertising. However, the cross-brand variation does not produce an order of magnitude shift in the point estimates. For example, for the TV ads effect, a two standard deviation shift implies that a few brands have point estimates as large as 0.059 for total WOM and 0.049 for online WOM. Although the max of these point estimates is larger than the overall average, 0.059 and 0.049 are still less than half the size of the typical sales elasticity to advertising. This suggests that even for the brands with the largest relationships between advertising and WOM, the magnitudes are relatively modest.

To understand whether the relationships systematically differ across categories, we incorporate category dummy variables and interact them with the variables in Eq. (1). Figure 2a presents the category level estimates with ± one standard error bars for both TV and Internet dollar spend effects on total WOM. As apparent, the beauty category has the smallest average TV advertising-total WOM relationship (−0.003, but not significantly different from zero), whereas the highest estimate is 0.046 for Sports and Hobbies, significantly larger than zero and the coefficient for beauty. Also, on the high-end are Telecommunications, which includes mobile handset sellers, and Media and Entertainment, which includes movies. These latter two categories are ones that past research has found to have significant, positive effects of advertising on WOM (mobile handsets and movies). Hence, the category variation we find is directionally consistent with the categories that have been studied in the past being exceptionally large. For Internet display advertising expenditures, we find that the Clothing category has one of the weakest relationships, whereas Media and Entertainment has the highest.

Fig. 2
figure 2

a: Effect of advertising on total WOM by product category. b: Effect of advertising on Online WOM per product category

Figure 2b shows the same estimates for online WOM. Sports and Hobbies have the strongest relationship for the Internet display advertising expenditure, followed by Media and Entertainment; whereas department stores have the weakest relationship. The highest estimate of the TV advertising-online WOM relationship occurs in Media and Entertainment. However, despite this variation across categories, we find that the effects for the categories with the largest advertising elasticities are still relatively small for both total WOM and online WOM.

5.3 Does the average effect mask larger cumulative effects?

The estimates we report are for contemporary advertising effects, but WOM could also be influenced by advertising in previous months, potentially leading to a larger cumulative effect. Therefore, we also consider models with lagged advertising expenditure variables. The details of this examination are available in Web Appendix 2. Our finding is that although some lags are statistically significant, the results do not meaningfully alter the conclusions reported here. Our estimates indicate that the cumulative relationship of advertising on total WOM is 0.031 for TV advertising expenditure and 0.020 for Internet display advertising. For online WOM the cumulative relationship is 0.017 for TV advertising and 0.013 for Internet display advertising. Interestingly, TV advertising expenditures appear to have some longer-term effects, but Internet advertising expenditures do not.

5.4 Robustness tests and instrumental variables analyses

In Web Appendix 3, we provide details on a range of model tests that support the robustness of the main results presented above.Footnote 7 First, we drop or add different components to the model to evaluate robustness to specification. We find that as long as either lagged WOM or brand fixed effects are included in the model, the small advertising-WOM relationships described above maintain. Importantly, the brand fixed effects are critical controls since without them the relationship between WOM and advertising expenditures would appear to be stronger than it actually is.

Second, we evaluate instrumental variables specifications. Our main empirical strategy leverages control variables to reduce potential endogeneity concerns related to seasonality, unobserved brand effects, secular trends, and new product/service launches. The causal interpretation of our results relies on a conditional independence assumption. The main concerns in estimating advertising causal effects typically involve positive biases (e.g., brands advertise in the high season of sales that might falsely be attributed to the advertising). We have attempted to control for these concerns and show that our control variables do not overly influence the results. Since failing to account for endogeneity of advertising is usually expected to produce larger effect sizes, our small effect size suggests the typical concerns are not a major threat.

Two main arguments specific to our context could lead to a downward bias. The first is that advertising and WOM could serve as substitutes. However, since the advertising for large established brands tends to be planned well in advance, advertising cannot easily respond to short-term fluctuations in WOM. Hence, we can narrow the substitutes concern to planning to cut advertising when WOM is expected to be high and vice versa. For example, when the product would be on consumers’ minds and talked about (e.g., summer for ice cream), the firm would choose not to advertise. On the face, this argument appears counterfactual (i.e., ice cream is advertised more in summer). Even so, our brand level seasonality and secular trend controls are intended to address this kind of concern. The second main argument that could lead to smaller effect sizes is measurement error in the advertising expenditure variables. Classical error-in-variables arising from measurement problems is known in linear models to produce attenuation bias, underestimating the effect size. We next consider models that can account for these endogeneity concerns.

We examine whether our results are robust to an instrumental variables approach to obtaining exogenous variation to estimate the advertising-WOM relationship. We use many instruments--interactions of costs and political advertising with brand identities. We adopt the standard linear instrumental variables specification as well as post-LassoIV approach to approximate the optimal instruments (Belloni et al. 2012). All of these specifications suggest quantitatively the same results as our main finding. Only in one case for online WOM posts, we find a larger and significant implied elasticity. However, these estimates face a weak instruments problem as many of the first stage coefficients have unexpected signs making the theoretical argument for the instruments less clear.

Together, these additional analyses presented in Web Appendix 3 provide support for the robustness of our main results, and in particular the small average effect. However, because the instruments could be relatively weak, it is difficult to establish that endogeneity is not biasing our results toward zero as a result of measurement error or advertising and WOM acting as substitutes. In order to further address these potential issues, we use an alternative approach in Section 6 that specifically avoids both of these concerns.

6 Advertising in the Super Bowl

In this section, we examine what is typically considered a situation where advertising is intended to generate WOM and is believed to have very large effects—Super Bowl advertisements. While the heterogeneity in categories and brands described in Section 5.2 suggests that persistent differences do not lead to large magnitudes for the advertising-WOM relationship, it is possible that some events, periods, or specific campaigns might do so. One leading possibility is that certain campaigns or events are simply better at generating conversation than others. To evaluate this potential, we examine one of the most often cited sources of incremental WOM impressions from advertising—the Super Bowl.Footnote 8 We collected information on which of the brands in our dataset advertised in the Super Bowl in each of the years in our sample. We apply this data to two different analyses. In the first we examine the main model results including main effects for being a Super Bowl advertiser and interactions between this variable and the advertising expenditure variables. In the second analysis we apply a causal inference technique, synthetic controls, to evaluate the relationship between being a Super Bowl advertiser and WOM.

6.1 Regression analysis of Super Bowl advertising effects?

We add to our main analysis of eq. (1) both a main effect of being a Super Bowl advertiser in the month of the Super Bowl (February) and interaction terms between this variable and the logged advertising spending variables. If the Super Bowl increases the effectiveness of advertising spending on WOM impressions, we would expect the coefficients on the interaction terms to be positive. The Super Bowl main effect and interaction effects do not have random coefficients, because they are not separately identified from the brand fixed effects and the brand-specific advertising random coefficients.

Table 3 presents the results. We find some interesting differences between total WOM and online WOM. For total WOM, we find that none of the Super Bowl interaction terms is large or significant. In fact, the term for TV advertising, which one would expect to be positive if Super Bowl advertising is more efficient, is actually negative and small (but insignificant). While this finding suggests that advertising on the Super Bowl does not lead to stronger relationships between advertising expenditures and total WOM, the main effect potentially tells a different story. In particular, the main effect of being a Super Bowl advertiser for total WOM is positive, large (0.27) and significant (t-stat = 2.31). This indicates that although Super Bowl advertising expenditures are not more efficient per dollar than at other times, Super Bowl advertisers have on average 27% higher total WOM in the month of the Super Bowl than in other periods. This large effect size could suggest that advertising is more effective in the Super Bowl for creating total WOM, but that the variation in advertising spending on Super Bowl ads is insufficient to attribute that gain to advertising expenditures. Since such an increase could translate to a much larger effect than what we find in the small average elasticity, this result seems to provide an opportunity for advertising to play a larger role in creating total WOM than our previous findings suggest.

Table 3 Main model results with Super Bowl interactions

In contrast, for online WOM, we find that the Super Bowl interaction term with TV advertising expenditure is positive (0.036) and significant (t-stat = 1.99), while the other interactions are not significant. This implies that advertising in the Super Bowl does lead to a stronger relationship between TV advertising and online WOM posts.

Taken together with the large main effect of Super Bowl in total WOM, these findings are consistent with both the popular press and practitioner literature arguing that Super Bowl advertisements lead to a large increase in WOM impressions. If these Super Bowl effects are causal, then advertising may generate meaningful levels of WOM in some campaigns or when combined with specific events. In the next section, we examine this finding in more detail using a recent causal inference technique that can provide further robustness of our findings.

6.2 A synthetic controls analysis of advertising in the Super Bowl on WOM

Unlike in the main analysis, where we observe multiple continuous advertising expenditure variables, the analysis in this section focuses on whether being a Super Bowl TV advertiser causes an increase in WOM. In this case, we have a discrete “treatment” variable, Super Bowl, which takes a value of 1 for Super Bowl advertisers in the time period(s) when we test for an effect, and 0 otherwise. In this section, we present evidence about the effect of this Super Bowl treatment using a causal inference method to reduce potential bias.

To measure this causal effect, we would ideally like to calculate the difference between the realized WOM for the Super Bowl advertisers as compared to the counterfactual case, the WOM these brands would receive had they not advertised in the Super Bowl. Of course, by definition, we do not (and cannot) observe the counterfactual case for the same brands, and instead seek a way to generate the missing counterfactual WOM data. Ideally, we would run a field experiment that randomizes the assignment of Super Bowl advertising slots to brands in order to justify using the non-treated brands as the counterfactual measure. This is infeasible.Footnote 9

To construct the prediction for this missing counterfactual data, we use a recently developed technique, the Generalized Synthetic Control Method (GSCM) of Xu (2017). This method is a parametric approach that generalizes to multiple treatment units the synthetic control method developed by Abadie et al. (2010). The method was originally developed for comparative case studies, and has been used and extended broadly, including in economics (Doudchenko and Imbens 2016), finance (Acemoglu et al. 2016), political science (Xu 2017), and, recently, in marketing (e.g., Vidal-Berastain et al. 2018).

The intuition behind these methods is to use the non-treated cases—so called “Donors”—to create a “synthetic control” unit for each treatment unit. The synthetic control unit is developed by using a weighted combination of the donor pool cases, where the weights are selected so that they create a synthetic control that closely matches the pre-treatment data pattern of the outcome variable (in our context, logged WOM) for the treated cases. The synthetic control’s post-treatment pattern is then used as the counterfactual prediction for the treated cases. Because the synthetic controls method uses the pre-treatment outcome variable, it naturally conditions on both observables and unobservables. As the pre-treatment time-series increases in length, the level of control increases. Thus, the synthetic control approach can account for unobserved variables that might otherwise invalidate causal inference.

In the GSCM, a parametric model of the treatment effect and data generating process follows the interactive fixed effects model (see Bai 2009) and is assumed to be

$$ {Y}_{it}={\delta}_{it}{D}_{it}+{x}_{it}^{\prime}\beta +{\lambda}_i^{\prime }{f}_t+{\varepsilon}_{it}, $$
(2)

where

Dit:

binary treatment variable for a brand i in a Super Bowl in period t

δit:

The brand-time specific treatment effect

xit:

Fixed effect for every brand/Super Bowl-year and period

β:

The vector of common coefficients on the control variables

ft:

The unobserved time-varying vector of factors with length F

λi:

The brand-specific length F vector of factor loadings

εit:

stochastic error, assumed uncorrelated with the Dit, xitft, and λi

The method requires three further assumptions related to only allowing weak serial dependence of the error terms, some (standard) regularity conditions, and that the error terms are cross-sectionally independent and homoscedastic. Given these assumptions, the average treatment effect on the treated, ATTt, for the set of N Super Bowl advertising brands, \( \mathcal{T} \), can be estimated based on the differences between i’s observed outcome \( {Y}_{it,i\in \mathcal{T}} \)and the synthetic control for i, Yit, SC.

$$ AT{T}_t=\frac{1}{N}{\sum}_{i\in \mathcal{T}}\left[{Y}_{it,i\in \mathcal{T}}-{Y}_{it, SC}\right] $$
(3)

Estimation proceeds in three steps. First, we estimate the parameters β, the λi vectors for all donor pool cases, and the vector ft. These are estimated using only the data from the pre-treatment period for the donor pool. Second, the factor loadings, λi for each of the treated units are estimated using the pre-treatment outcomes for the treatment cases conditioning on the β and ft estimates. Third, the synthetic control for the treated counterfactuals, Yit, SC, are calculated using the β and ft estimates from step one and the λi estimates from step two. This then allows calculating the ATTt for each period. The number of factors, F, is selected via a cross-validation procedure in which some pre-treatment observations are held back and predicted. The three-step procedure is used for each number of factors and the number of factors with the lowest mean squared prediction error is chosen. Inference proceeds using a parametric bootstrap. See Xu (2017) for details on the procedure and inference.

We implement the procedure using the available software package in R, gsynth. We estimate the causal effects including two-way fixed effects (time and brand-year). Our standard errors are clustered at the brand-year level and we use 16,000 samples for bootstrapping the standard errors. We report analyses for both the Keller-Fay total WOM measure and the Nielsen-McKinsey Insight (NMI)‘s online WOM measure. The two datasets overlap from 2008 onward and so we use this common period to make the analyses comparable. We note that for the Keller-Fay measures the reported subsample and the full available time period have quite similar effect sizes and significances.

We report the average treatment effects in Table 4 along with the number of factors used and the number of pre-periods, post-periods, and total observations. In most cases, the number of factors reported is the optimal number selected by the cross-validation technique. In the total WOM cases, the optimal number of unobserved factors was zero suggesting no meaningful remaining interactive fixed effects in the data. This indicates that the fixed unit and time effects already control for the unobserved time-varying influences. This finding provides indirect support for our conditional independence assumption used in the main analysis section. In these cases with zero optimal factors, we also present solutions where we forced the model to have one unobserved factor to ensure robustness against more factors.

Table 4 Average Treatment Effect on the Treated (ATT) for total WOM and for online WOM, in various time resolutions

We begin with the monthly data that most closely approximates our main analysis. We include the last 6 months prior to the Super Bowl as pre-treatment periods and consider the Super Bowl treatment beginning in February (time 0) and continues through March. We find a significant and positive average causal effect of being a Super Bowl advertiser for the month of and after the Super Bowl. The average ATT for the 2 months is 10.8% (s.e. = 0.043,p value = 0.026) with the best fitting number of factors (zero) and 10.3% (s.e. = 0.050,p value = 0.035) with one factor. The ATT for the month of the Super Bowl, February, is estimated to be 15.9% (s.e. = 0.054, p value<.01) with the optimal zero factors and 15% with one factor (s.e. = 0.062, p value = 0.013), but this effect rapidly declines in later months. Already in March, the effect is insignificant with the ATT estimated to be 6% (s.e. 0.054, p value = 0.246) with zero factors. Panel A of Fig. 3 shows the time-varying estimated ATT for each month of the data, showing that the only individually significant month is the month of the Super Bowl. Thus, the effect on total WOM caused by being a Super Bowl advertiser is reasonably large, but only lasts approximately 1 month.

Fig. 3
figure 3

Time-varying estimated average treatment effect on the treated (ATT) for total WOM and online WOM, in various time resolutions

One major concern with this analysis is that, if the Super Bowl advertiser effect is actually shorter-lived than 1 month, monthly data could have an aggregation bias. To examine this, we conduct the analysis on weekly total WOM measures, which is the finest periodicity the Keller-Fay dataset allows. We use 16 weeks prior to the Super Bowl week as pre-treatment periods, and a total of 4 weeks of treatment periods including the week of and 3 weeks after the Super Bowl. Panel B of Fig. 3 shows the weekly pattern of the effects. The week of the Super Bowl has no increase in total WOM (0.1%, s.e. = 0.056), which may not be too surprising since the Super Bowl airs on the last day of the week. We find the first week following the Super Bowl has a 22.1% increase (s.e. = 0.057, p value<.01) in total WOM, but that the following weeks have lower effect sizes of 10.9% (s.e. = 0.056, p value<.061), 14.3% (s.e. = 0.058,p value = 0.012), and 10.4% (s.e. = 0.058,p value = 0.068) respectively for weeks 2–4. The average ATT across the first 4 weeks is estimated to be 11.8% and significant (s.e. = 0.033, p value<.01). While the weekly data indicate a higher peak of WOM effect in the week following the Super Bowl, the general patterns do not suggest the monthly data dramatically understate the average effect. In particular, the effect stays significant for the entire month (4 weeks). Overall, these results indicate that being a Super Bowl advertiser causes a sizable increase in total WOM of 16% in the first month of and 22% in the first week after the Super Bowl.

These results speak to the potential aggregation bias in the total WOM data, one possible source of measurement error. First, the point estimate for the peak weekly effect is less than 50% larger than that of the monthly average. Second, the estimated ATT for February from the monthly data has a 95% confidence interval of (5.2%, 26.6%). This interval actually covers the maximum weekly estimated value, suggesting we cannot statistically distinguish them. These results suggest that our small result in the main analysis that uses monthly data is unlikely to be explained away by shorter-lived total WOM effects. In sum, although there might be aggregation bias, it appears not to be large enough to overturn the main result for total WOM.

We conduct the same kind of analysis on the online WOM measure in order to examine whether the Super Bowl effect is larger for online social media posts and whether the effect is shorter-lived than that of the total WOM. Panels C and D of Fig. 3 present the effect patterns. In the monthly analysis, the measured ATT for the month of the Super Bowl is significant at 26.6% (s.e. = 0.039,p value<.01), and in the month following the Super Bowl, the effect size falls to be insignificant at 4.9% (s.e. = 0.044,p value = 0.240). Thus, the effect does appear to be larger for online posts than total WOM, but lasts at most 1 month. Considering weekly data, the ATT for the week of the Super Bowl is significant at 48.0% (s.e. = 0.042,p value<.01), and the 3 weeks after the Super Bowl are all insignificant at 4.1% (s.e. = 0.048), 1.8% (s.e. = 0.048), and 2.3% (s.e. = 0.051), respectively. This analysis suggests that the Super Bowl has a much larger but shorter-lived effect on counts of online posts than on representative, total WOM mentions.

Because the Nielsen-McKinsey Insight data come daily, we can perform the analysis at this even more fine-grained level. We use 60 days prior to the Super Bowl as the pre-treatment period. Panel E of Fig. 3 indicates that the incremental posts concentrate heavily on the first few days with significant causal estimates of 67.7% (s.e. = 0.062,p value<.01) for the day of the Super Bowl, 62.8% (s.e. = 0.058,p value<.01) for the day after, 39.7% (s.e. = 0.068,p value<.01) for the second day after, 25.2% (s.e. = 0.081,p value<.01) for the third, 12.3% (s.e. = 0.084,p value = .179) and insignificant for the fourth, and dropping to below 10% and insignificant thereafter. These causal effects on online posts for the first 3 days are much larger than the effects on total WOM measured with a representative sample. This analysis also reaffirms the concentration of incremental impressions close to the Super Bowl for online posts, which is distinct from the more spread out effects for total WOM.

How should we interpret these results for the online posts from Nielsen-McKinsey Insight compared to the total WOM from Keller-Fay? First, the effects for online posts are larger for a short duration (few days for daily or 1 week for weekly). In contrast, the effect on the total WOM persists for approximately the full month. These shorter-term, stronger effects in the online data might explain why studies that focus entirely on online posts may find larger effects of advertising on WOM. Second, the monthly periodicity does not appear to produce measurable aggregation bias for total WOM, since the effect is relatively consistent over the whole month. In contrast, aggregation bias appears likely to be more severe in the monthly data for online WOM posts. Daily and weekly effects are much larger than the monthly effects and do not last the full month. This implies that we should interpret with caution the small effect sizes found in the monthly data of Section 5.15.3 for online WOM posts.

It is important to keep in mind that the total WOM from Keller-Fay is measured with a representative sample of the U.S. population and can be interpreted as impressions. In contrast, the online posts have only a vague connection to impressions with some posts never seen by anyone and others seen by many people. Moreover, these posts are not collected to be representative. It is possible that the difference in effects for these two types of WOM could arise from sampling differences in the data or that the individuals that post online are only a (selected) subset of those that talk about brands. In either case, for generalizations to earned impressions that advertising creates, the Keller-Fay data has a stronger foundation.

7 Discussion

Can firms buy earned media impressions with paid media? We conducted an empirical analysis to evaluate the relationship between advertising expenditures and WOM conversations about brands. Our dataset contains information on 538 U.S national brands across 16 categories over a period of 6.5 years and covers both online and total (including online and offline) WOM mentions. Our main analyses control for news mentions, time lagged WOM, seasonality, secular trends, brand fixed effects, category-quarter fixed effects, and random coefficients, and checks robustness against model misspecification. In a second set of analyses, we apply a causal inference technique, generalized synthetic controls (Xu 2017), on Super Bowl advertisers to evaluate the possible impact of large, WOM-focused advertising campaigns on WOM. Together, these analyses present a compelling story. Our main findings include:

  1. 1.

    The relationship between advertising and WOM is positive and significant, but small. Assuming causality, the average implied elasticity on total WOM is 0.019 for TV advertising, and 0.014 for Internet advertising and on online WOM posts is 0.009 for TV advertising and 0.010 for Internet advertising. Projecting from our sample to the entire US population, for an average brand in our dataset this implies that a 10% increase in TV advertising leads to 69,000 additional total WOM conversations about the brand per month. This amounts to approximately 0.1% of the paid advertising exposures for the same advertising spend.

  2. 2.

    Cross-brand and cross-category heterogeneity in the advertising-WOM relationship is significant. The categories with the largest implied elasticities of total WOM to TV advertising are Sports and Hobbies, Media and Entertainment, and Telecommunications. However, even for these categories, the average implied elasticity is relatively small, with values between 0.03 and 0.05. Similarly, the “best” brands are estimated to have average elasticities of only around 0.05. This implies the brands with the most effective brand advertising for total WOM would be associated with increases in WOM conversations that are less than 1% of the increase in advertising exposures. Online WOM posts also have small elasticities among the most responsive categories and brands.

  3. 3.

    Certain events and campaigns are able to achieve higher impact on total WOM. Our synthetic controls analysis of the Super Bowl advertisers indicates that total WOM mentions increase 16% in the month of the Super Bowl and 22% in the week after the Super Bowl. This implies an increase of 10–13% of the average advertising impressions.

  4. 4.

    The Super Bowl advertiser impact on online posts, harvested from the Internet (instead of using a representative sample) is even higher, but much shorter-lived. The effect of being a Super Bowl advertiser is a 27% increase in online posts for the month of, 48% for the week of, and 68% for the day of the Super Bowl.

These results imply that the advertising-WOM relationship is small on average, but that a larger effect is possible for both total and online WOM in certain campaigns. The effects for online WOM posts may be relatively large, but also short-lived. As a result, one should be cautious about generalizing the impact of advertising on WOM based only on online post data collected by crawling the web. Generally, the online posts may signal larger effects than one should expect for total WOM mentions, where the bulk of brand conversation happens.

What are the managerial implications of our findings? Our findings suggest “there is no free lunch” when it comes to WOM. Mass TV and Internet display advertising expenditures do not automatically imply large gains in WOM. More precisely, across 538 brands and many campaigns per brand over the 6.5-year observation window, high advertising expenditures on average are not associated with a large increase in total WOM. Similarly, based on our analysis, no single brand or category appears to generate large average effects. We do find, for Super Bowl advertisers, where expenditures are very large and the event is a social phenomenon with the advertisements playing a relatively central role in media attention about the event, the causal effect on total WOM can be larger, though still modest. However, such successful WOM campaigns must be relatively rare to find the average advertising effects to be so small.

Does the small average effect we find imply that investing in advertising to generate WOM does not pay off? Not necessarily. First, our results suggest that online posts might be more responsive to advertising. Second, if marketers seek to enhance WOM through advertising, they will likely need to go beyond the typical advertising campaigns contained in our dataset. We suggest that for managers to pursue the goal of generating WOM from advertising, they need to be able to track WOM carefully and use methods that can assess the effectiveness of advertising in generating WOM at a relatively fine-grained level (e.g., campaign or creative). Importantly, because of the disconnect between online posts and total WOM, it is critical to evaluate total WOM in order to understand whether the more easily tracked online posts translate into meaningful changes in total mentions.