Elsevier

Journal of Interactive Marketing

Volume 53, February 2021, Pages 80-95
Journal of Interactive Marketing

LSTM Response Models for Direct Marketing Analytics: Replacing Feature Engineering with Deep Learning

https://doi.org/10.1016/j.intmar.2020.07.002Get rights and content

Highlights

  • Traditional customer response models rely heavily on feature engineering.

  • Their performance depends on the analyst's domain knowledge and expertise to craft relevant predictors.

  • In contrast, long-short term memory (LSTM) neural networks rely exclusively on raw data.

  • Still, we demonstrate LSTM models provide superior performance to traditional models.

  • LSTM neural networks are excellent candidates for direct marketing, brand choices, clickstream data, and churn predictions.

Abstract

In predictive modeling, firms often deal with high-dimensional data that span multiple channels, websites, demographics, purchase types, and product categories. Traditional customer response models rely heavily on feature engineering, and their performance depends on the analyst's domain knowledge and expertise to craft relevant predictors. As the complexity of data increases, however, traditional models grow exponentially complicated. In this paper, we demonstrate that long-short term memory (LSTM) neural networks, which rely exclusively on raw data as input, can predict customer behaviors with great accuracy. In our first application, a model outperforms standard benchmarks. In a second, more realistic application, an LSTM model competes against 271 hand-crafted models that use a wide variety of features and modeling approaches. It beats 269 of them, most by a wide margin. LSTM neural networks are excellent candidates for modeling customer behavior using panel data in complex environments (e.g., direct marketing, brand choices, clickstream data, churn prediction).

Introduction

In direct marketing, a firm targets a customer with a marketing solicitation such as a catalog, a direct solicitation, or a coupon, and the customer decides whether or not to respond. Since soliciting a customer unlikely to respond is unprofitable, and not soliciting a potentially profitable customer leaves money on the table, the ability to predict customers' responses has long been a crucial endeavor for both practitioners and academics (e.g., Malthouse, 1999, Roberts and Berger, 1999).

Response models in direct marketing predict customer responses from past customer behavior and marketing activity. These models often summarize past events using features such as recency or frequency1 (e.g., Blattberg et al., 2008, Malthouse, 1999, Van Diepen et al., 2009), and the process of feature engineering has received significant attention (Kuhn and Johnson, 2019, Zheng and Casari, 2018).

In machine learning, a feature refers to a variable that describes some aspect of individual data objects (Dong & Liu, 2018). Feature engineering has been used broadly to refer to multiple aspects of feature creation, extraction, and transformation. Essentially, it refers to the process of using domain knowledge to create useful features that can be fed as predictors into a model.

However, feature engineering presents its own set of challenges.

First, the same features might identically summarize widely different behavior sequences (Blattberg et al., 2008, Fader et al., 2005). Consider the customer behavior pattern depicted in Fig. 1. All four customers in the figure have the same seniority (date of first purchase), recency (date of last purchase), and frequency (number of purchases). However, each of them has a visibly different transaction pattern. A response model relying exclusively on seniority, recency, and frequency would not be able to distinguish between customers who have similar features but different behavioral sequences.

Second, in a complex environment where there are multiple streams of data, such as in a data-rich environment where the analyst has access to historical marketing activity of various sorts (e.g., multiple types of solicitations sent through various marketing channels) and diverse customer behaviors (e.g., purchase histories across various product categories and sales channels) observed across different contexts (e.g., multiple business units or websites, see Park & Fader, 2004), the vast number and exponential complexity of inter-sequence and inter-temporal interactions (e.g., sequences of marketing actions, such as email–phone–catalog vs. catalog–email–phone) will make the data analyst's job arduous.

Let us reflect for a moment on one of the simplest and most commonly used features in direct marketing: recency, or the time elapsed since the last customer's purchase. How should the analyst hand-craft relevant recency features in an environment spanning multiple product categories? Should she take into account the last absolute recency, regardless of the product category purchased (hence losing richness and granularity, and potentially hurting the model's predictive power)? Should she include in the model as many recency indicators as there are product categories in the data set (hence creating excruciating multicollinearity issues if customers buy from multiple product categories at each purchase occasion)? Should she combine individual and aggregate recency indicators? When crafting relevant recency indicators, should the analyst consider purchases in brick-and-mortar stores and purchases on the firm's website jointly, or should she treat these indicators separately?

When an analyst uses feature engineering to predict behavior, the performance of the model will depend greatly on the analyst's domain knowledge, and in particular, her ability to translate that domain knowledge into relevant features for the model. In complex environments, such as in the presence of multiple channels or multiple product categories, it can be quite challenging indeed for an analyst to capture all useful inter-sequence and inter-temporal interactions.

In this paper, we explore whether Long-Short Term Memory neural networks (LSTM), a special kind of Recurrent Neural Networks (RNN), which rely on raw sequential data and do away with feature engineering, can offer the promise of a solution to this general class of modeling problems in marketing.

In customer response models, the data are often in the form of panel data, where the firm's actions (e.g., solicitations) and customers' behavior (e.g., purchases) are observed repeatedly over time and along multiple dimensions (e.g., multiple channels or product categories).

Surprisingly, while RNN models are common in natural language processing, their applications—let alone marketing panel data—have been scarce, and even close to nonexistent. In their seminal book, Goodfellow, Bengio, and Courville (2016) cite applications of RNN in the domains of machine translation, prediction of text sequences, handwriting recognition, and speech recognition. Pointer (2019, p. 70) mentions in passing that RNNs are particularly suited for “data that has a temporal domain (e.g., text, speech, video, and time-series data),” but dedicate the chapter to text analysis. Saleh (2018) dedicates an entire section to the numerous applications of RNN (pp. 153–157), but exclusively cites natural language processing, speech recognition, machine translation, unidimensional time-series forecasting, and image recognition. However, as we will demonstrate, RNN models in general, and LSTM models in particular, seem particularly suited for panel data analysis.

We organize the paper as follows. In the first section, we introduce the LSTM model as a special class of recurrent neural networks. Given the newness of the method to social scientists in general, and to marketing analysts in particular, we dedicate significant space to explain its inner working. While LSTM models take raw behavioral data as input and therefore do not rely on feature engineering or domain knowledge, our experience taught us that some fine-tuning is required to achieve optimal LSTM performance; in the second section, we pay special attention to the proper calibration of an LSTM model, including parameter and hyperparameter tuning, which can be fully automated and do not require domain knowledge either. In the third section, we demonstrate the superior performance of the LSTM model in a relatively simple, direct marketing setting with only donations (yes/no) and solicitations (yes/no). We show that the LSTM model, relying on raw data, achieves a better average fit and performance than the feature-based, benchmark models. In the fourth section, we benchmark a vanilla LSTM model in a much more complex environment (e.g., multiple channels and donation types) against 271 hand-crafted models developed by about as many human analysts. The LSTM outperforms 269 of them. In the fifth section, we discuss the marketing applications in which we expect LSTM neural networks to prove valuable, and important technical considerations in the fast-moving field of deep learning in the sixth section. We conclude in the seventh section.

Section snippets

Recurrent Neural Network (RNN)

In a traditional feedforward neural network, a vector x is processed through propagation in a neural network and produces an output vector y, as depicted in Fig. 2(A). Recurrent neural network (RNN) is a kind of artificial neural network (ANN) that is adapted to model sequential tasks. Rather than relying exclusively on the vector x to make its predictions, an RNN will also use part of the output of the previous iteration (the hidden state) as input for the next prediction (see Fig. 2(B)). By

Bias, Variance, and Model Capacity

As discussed in the LSTM model section, the parameters of the LSTM module/cell are Wu, Wf, Wo, Wc, bu, bf, bo, and bc. We use the parameters Wy and by to generate the predictions of ŷ<t> from the hidden state of the LSTM. The dimension of the LSTM weight matrices depends on the dimension of the hidden state (referred to as hidden units) and the number of input features in x.4

Objective

While an LSTM model does not depend on the analyst's ability to craft meaningful model features, traditional benchmarks do heavily rely on human expertise. Consequently, when an LSTM model shows superior results over a traditional response model—as we have shown in the previous illustration—we cannot ascertain whether it is due to the superiority of the LSTM model, or to the poor performance of the analyst who designed the benchmark model.

To alleviate that concern, we asked 297 graduate

Applications of LSTM Neural Networks in Marketing

Though we set our studies in a direct marketing context, LSTM neural networks can provide a solution to the general class of prediction tasks that involve panel data. We foresee that, since panel data is ubiquitous in marketing, LSTM neural networks can find widespread applications in marketing academia and practice. We discuss some possible applications below.

Technical Considerations

It would be presumptuous to claim that LSTM models offer an ideal, one-fit-all solution to panel data analytics. In particular, the analyst is invited to be mindful of the following challenges.

First, hyperparameter tuning is not a trivial task. While a simple grid search may be sufficient to achieve optimal performance, Bayesian optimization may be required on occasion.

Second, as in all deep learning models, overfitting is a constant concern. Many solutions have been proposed, and can even be

Conclusions

Ben Weber (2019) stated that “One of the biggest challenges in machine learning workflows is identifying which inputs in your data will provide the best signals [i.e., features] for training predictive models. For image data and other unstructured formats, deep learning models are showing large improvements over prior approaches, but for data already in structured formats, the benefits are less obvious” [italics added].

In this paper, we have shown that recent neural network architectures,

References (70)

  • A.K. Basu et al.

    Modeling the response pattern to direct marketing campaigns

    Journal of Marketing Research

    (1995)
  • Y. Bengio

    Practical recommendations for gradient-based training of deep architectures

  • Y. Bengio et al.

    Learning long-term dependencies with gradient descent is difficult

    IEEE Transactions on Neural Networks

    (1994)
  • W. Ben

    2019)

  • G.R. Bitran et al.

    Mailing decisions in the catalog sales industry

    Management Science

    (1996)
  • J. Bjorck et al.

    Understanding batch normalization

  • R.C. Blattberg et al.

    Database marketing: Analyzing and managing customers. International series in quantitative marketing

    (2008)
  • L. Breiman

    Random forests

    Machine Learning

    (2001)
  • A.D. Brown et al.

    Products of hidden Markov models

  • J.R. Bult et al.

    Optimal selection for direct mail

    Marketing Science

    (1995)
  • K. Cho et al.

    Learning phrase representations using RNN encoder-decoder for statistical machine translation

    arXiv preprint

    (2014)
  • J. Chung et al.

    Empirical evaluation of gated recurrent neural networks on sequence modeling

    arXiv preprint

    (2014)
  • A. De Bruyn et al.

    Artificial intelligence and marketing: Pitfalls and opportunities

    Journal of Interactive Marketing

    (2020)
  • G. Dong et al.

    Feature Engineering for Machine Learning and Data Analytics

    (2018)
  • B. Donkers et al.

    Deriving target selection rules from endogenously selected samples

    Journal of Applied Econometrics

    (2006)
  • R. Elsner et al.

    Optimizing Rhenania's direct marketing business through dynamic multilevel modeling (DMLM) in a multicatalog-brand environment

    Marketing Science

    (2004)
  • P.S. Fader et al.

    RFM and CLV: Using iso-value curves for customer base analysis

    Journal of Marketing Research

    (2005)
  • J. Friedman et al.

    glmnet: Lasso and elastic-net regularized generalized linear models

  • Y. Gal et al.

    Dropout as a bayesian approximation: Representing model uncertainty in deep learning

  • F.A. Gers et al.

    Learning to Forget: Continual Prediction with LSTM

    (1999)
  • F. Gönül et al.

    Optimal mailing of catalogs: A new methodology using estimable structural dynamic programming models

    Management Science

    (1998)
  • F.F. Gönül et al.

    How to compute optimal catalog mailing decisions

    Marketing Science

    (2006)
  • I. Goodfellow et al.

    Deep Learning

    (2016)
  • A. Graves et al.

    Offline handwriting recognition with multidimensional recurrent neural networks

    In Advances in neural information processing systems

    (2009)
  • A. Graves et al.

    Neural turing machines

    arXiv preprint

    (2014)
  • Cited by (46)

    • Predicting customer abandonment in recurrent neural networks using short-term memory

      2024, Journal of Open Innovation: Technology, Market, and Complexity
    • How does quality-dominant logic ensure marketing analytics success and tackle business failure in industrial markets?

      2023, Industrial Marketing Management
      Citation Excerpt :

      In recent years, marketing analytics has made great progress, with academics and practitioners across industries recognizing it as a modern revolution (Pasha, 2021; Schuuring et al., 2017), a source of market innovation (Iacobucci, Petrescu, Krishen, & Bendixen, 2019), a reason of distinct performance (Kumar & Sharma, 2017; Rahman et al., 2021), or a catalyst to resolving marketing problems (Cao et al., 2019; Wedel & Kannan, 2016). Marketing analytics refers to the digital tools and techniques used to analyse a large volume of marketing data insights to the end of generating value and making appropriate decisions in order to accelerate firm performance (Rahman et al., 2021; Sarkar & De Bruyn, 2021). Since 2016, research on the firm big data and marketing analytics capability stream has repeatedly emphasized managerial support, infrastructure, skills, and knowledgeability (e.g., Akter, Wamba, Gunasekaran, Dubey, & Childe, 2016; Mikalef, Framnes, Danielsen, Krogstie, & Olsen, 2017; Mikalef, Krogstie, Pappas, & Pavlou, 2019; Rahman et al., 2021).

    • Predicting ammonia nitrogen in surface water by a new attention-based deep learning hybrid model

      2023, Environmental Research
      Citation Excerpt :

      Compared to conventional data-driven models, such as ANNs, LSTM model had a recurrent structure, which could use the previous output as the current input. As reported by a previous study, the LSTM outperformed another 269 manual models when dealing with regression problems (Sarkar and De Bruyn, 2021). Therefore, the LSTM model was chosen as the base model for this study.

    • Customer base analysis with recurrent neural networks

      2022, International Journal of Research in Marketing
      Citation Excerpt :

      To address these questions and to assist managers in designing their marketing programs accordingly, the marketing discipline has produced a rich stream of literature. These contributions include predictive models and techniques for customer targeting and reactivation timing (Gönül & ter Hofstede, 2006; Simester, Sun, & Tsitsiklis, 2006; Holtrop & Wieringa, 2020), market response models for firm- and/or customer-initiated marketing actions (e.g., Hanssens, Parsons, & Schultz (2003), Blattberg, Kim, & Neslin (2008), Sarkar & De Bruyn (2021)), methods for churn prediction and prevention (e.g., Ascarza (2018), Ascarza, Iyengar, & Schleicher (2016), Lemmens & Gupta (2020)), as well as a growing literature on customer valuation (e.g., McCarthy, Fader, & Hardie (2017), McCarthy & Fader (2018)) and customer prioritizing (Homburg, Droll, & Totzek, 2008). However, none of these qualify as a (Swiss Army knife-like) general-purpose problem solver that generalizes across the described decision tasks of managing customer relationships.

    • Marketing analytics capability, artificial intelligence adoption, and firms' competitive advantage: Evidence from the manufacturing industry

      2022, Industrial Marketing Management
      Citation Excerpt :

      The link between marketing analytics or business analytics and a firm's competitiveness has been suggested to be very complicated (Tan, Guo, Cahalane, & Cheng, 2016). However, there is a shortage of conceptual and empirical evidence in the context, primarily focusing on the export-oriented B2B RMG manufacturing firm's aspect (e.g., Kumar & Sharma, 2017; Rahman, Hossain, & Fattah, 2021; Sarkar & De Bruyn, 2021; Wedel & Kannan, 2016). Similarly, a study by Cao et al. (2019) focused on marketing analytics on the B2C aspect, as data evidence exposed from retail and professional services.

    View all citing articles on Scopus
    View full text