LSTM Response Models for Direct Marketing Analytics: Replacing Feature Engineering with Deep Learning
Introduction
In direct marketing, a firm targets a customer with a marketing solicitation such as a catalog, a direct solicitation, or a coupon, and the customer decides whether or not to respond. Since soliciting a customer unlikely to respond is unprofitable, and not soliciting a potentially profitable customer leaves money on the table, the ability to predict customers' responses has long been a crucial endeavor for both practitioners and academics (e.g., Malthouse, 1999, Roberts and Berger, 1999).
Response models in direct marketing predict customer responses from past customer behavior and marketing activity. These models often summarize past events using features such as recency or frequency1 (e.g., Blattberg et al., 2008, Malthouse, 1999, Van Diepen et al., 2009), and the process of feature engineering has received significant attention (Kuhn and Johnson, 2019, Zheng and Casari, 2018).
In machine learning, a feature refers to a variable that describes some aspect of individual data objects (Dong & Liu, 2018). Feature engineering has been used broadly to refer to multiple aspects of feature creation, extraction, and transformation. Essentially, it refers to the process of using domain knowledge to create useful features that can be fed as predictors into a model.
However, feature engineering presents its own set of challenges.
First, the same features might identically summarize widely different behavior sequences (Blattberg et al., 2008, Fader et al., 2005). Consider the customer behavior pattern depicted in Fig. 1. All four customers in the figure have the same seniority (date of first purchase), recency (date of last purchase), and frequency (number of purchases). However, each of them has a visibly different transaction pattern. A response model relying exclusively on seniority, recency, and frequency would not be able to distinguish between customers who have similar features but different behavioral sequences.
Second, in a complex environment where there are multiple streams of data, such as in a data-rich environment where the analyst has access to historical marketing activity of various sorts (e.g., multiple types of solicitations sent through various marketing channels) and diverse customer behaviors (e.g., purchase histories across various product categories and sales channels) observed across different contexts (e.g., multiple business units or websites, see Park & Fader, 2004), the vast number and exponential complexity of inter-sequence and inter-temporal interactions (e.g., sequences of marketing actions, such as email–phone–catalog vs. catalog–email–phone) will make the data analyst's job arduous.
Let us reflect for a moment on one of the simplest and most commonly used features in direct marketing: recency, or the time elapsed since the last customer's purchase. How should the analyst hand-craft relevant recency features in an environment spanning multiple product categories? Should she take into account the last absolute recency, regardless of the product category purchased (hence losing richness and granularity, and potentially hurting the model's predictive power)? Should she include in the model as many recency indicators as there are product categories in the data set (hence creating excruciating multicollinearity issues if customers buy from multiple product categories at each purchase occasion)? Should she combine individual and aggregate recency indicators? When crafting relevant recency indicators, should the analyst consider purchases in brick-and-mortar stores and purchases on the firm's website jointly, or should she treat these indicators separately?
When an analyst uses feature engineering to predict behavior, the performance of the model will depend greatly on the analyst's domain knowledge, and in particular, her ability to translate that domain knowledge into relevant features for the model. In complex environments, such as in the presence of multiple channels or multiple product categories, it can be quite challenging indeed for an analyst to capture all useful inter-sequence and inter-temporal interactions.
In this paper, we explore whether Long-Short Term Memory neural networks (LSTM), a special kind of Recurrent Neural Networks (RNN), which rely on raw sequential data and do away with feature engineering, can offer the promise of a solution to this general class of modeling problems in marketing.
In customer response models, the data are often in the form of panel data, where the firm's actions (e.g., solicitations) and customers' behavior (e.g., purchases) are observed repeatedly over time and along multiple dimensions (e.g., multiple channels or product categories).
Surprisingly, while RNN models are common in natural language processing, their applications—let alone marketing panel data—have been scarce, and even close to nonexistent. In their seminal book, Goodfellow, Bengio, and Courville (2016) cite applications of RNN in the domains of machine translation, prediction of text sequences, handwriting recognition, and speech recognition. Pointer (2019, p. 70) mentions in passing that RNNs are particularly suited for “data that has a temporal domain (e.g., text, speech, video, and time-series data),” but dedicate the chapter to text analysis. Saleh (2018) dedicates an entire section to the numerous applications of RNN (pp. 153–157), but exclusively cites natural language processing, speech recognition, machine translation, unidimensional time-series forecasting, and image recognition. However, as we will demonstrate, RNN models in general, and LSTM models in particular, seem particularly suited for panel data analysis.
We organize the paper as follows. In the first section, we introduce the LSTM model as a special class of recurrent neural networks. Given the newness of the method to social scientists in general, and to marketing analysts in particular, we dedicate significant space to explain its inner working. While LSTM models take raw behavioral data as input and therefore do not rely on feature engineering or domain knowledge, our experience taught us that some fine-tuning is required to achieve optimal LSTM performance; in the second section, we pay special attention to the proper calibration of an LSTM model, including parameter and hyperparameter tuning, which can be fully automated and do not require domain knowledge either. In the third section, we demonstrate the superior performance of the LSTM model in a relatively simple, direct marketing setting with only donations (yes/no) and solicitations (yes/no). We show that the LSTM model, relying on raw data, achieves a better average fit and performance than the feature-based, benchmark models. In the fourth section, we benchmark a vanilla LSTM model in a much more complex environment (e.g., multiple channels and donation types) against 271 hand-crafted models developed by about as many human analysts. The LSTM outperforms 269 of them. In the fifth section, we discuss the marketing applications in which we expect LSTM neural networks to prove valuable, and important technical considerations in the fast-moving field of deep learning in the sixth section. We conclude in the seventh section.
Section snippets
Recurrent Neural Network (RNN)
In a traditional feedforward neural network, a vector x is processed through propagation in a neural network and produces an output vector y, as depicted in Fig. 2(A). Recurrent neural network (RNN) is a kind of artificial neural network (ANN) that is adapted to model sequential tasks. Rather than relying exclusively on the vector x to make its predictions, an RNN will also use part of the output of the previous iteration (the hidden state) as input for the next prediction (see Fig. 2(B)). By
Bias, Variance, and Model Capacity
As discussed in the LSTM model section, the parameters of the LSTM module/cell are Wu, Wf, Wo, Wc, bu, bf, bo, and bc. We use the parameters Wy and by to generate the predictions of ŷ<t> from the hidden state of the LSTM. The dimension of the LSTM weight matrices depends on the dimension of the hidden state (referred to as hidden units) and the number of input features in x.4
Objective
While an LSTM model does not depend on the analyst's ability to craft meaningful model features, traditional benchmarks do heavily rely on human expertise. Consequently, when an LSTM model shows superior results over a traditional response model—as we have shown in the previous illustration—we cannot ascertain whether it is due to the superiority of the LSTM model, or to the poor performance of the analyst who designed the benchmark model.
To alleviate that concern, we asked 297 graduate
Applications of LSTM Neural Networks in Marketing
Though we set our studies in a direct marketing context, LSTM neural networks can provide a solution to the general class of prediction tasks that involve panel data. We foresee that, since panel data is ubiquitous in marketing, LSTM neural networks can find widespread applications in marketing academia and practice. We discuss some possible applications below.
Technical Considerations
It would be presumptuous to claim that LSTM models offer an ideal, one-fit-all solution to panel data analytics. In particular, the analyst is invited to be mindful of the following challenges.
First, hyperparameter tuning is not a trivial task. While a simple grid search may be sufficient to achieve optimal performance, Bayesian optimization may be required on occasion.
Second, as in all deep learning models, overfitting is a constant concern. Many solutions have been proposed, and can even be
Conclusions
Ben Weber (2019) stated that “One of the biggest challenges in machine learning workflows is identifying which inputs in your data will provide the best signals [i.e., features] for training predictive models. For image data and other unstructured formats, deep learning models are showing large improvements over prior approaches, but for data already in structured formats, the benefits are less obvious” [italics added].
In this paper, we have shown that recent neural network architectures,
References (70)
- et al.
A stochastic RFM model
Journal of Interactive Marketing
(1999) - et al.
Customer churn prediction in the online gambling industry: The beneficial effect of ensemble learning
Journal of Business Research
(2013) - et al.
Maximizing profits for a multi-category catalog retailer
Journal of Retailing
(2013) - et al.
Mailing smarter to catalog customers
Journal of Interactive Marketing
(2000) Ridge regression and direct marketing scoring models
Journal of Interactive Marketing
(1999)- et al.
A machine learning framework for customer purchase prediction in the non-contractual setting
European Journal of Operations Research
(2020) - et al.
Capturing evolving visit behavior in clickstream data
Journal of Interactive Marketing
(2004) - et al.
Market share forecasting: An empirical comparison of artificial neural networks and multinomial logit model
Journal of Retailing
(1996) - et al.
The perils of proactive churn prevention using plan recommendations: Evidence from a field experiment
Journal of Marketing Research
(2016) - et al.
Neural machine translation by jointly learning to align and translate
arXiv preprint
(2014)
Modeling the response pattern to direct marketing campaigns
Journal of Marketing Research
Practical recommendations for gradient-based training of deep architectures
Learning long-term dependencies with gradient descent is difficult
IEEE Transactions on Neural Networks
2019)
Mailing decisions in the catalog sales industry
Management Science
Understanding batch normalization
Database marketing: Analyzing and managing customers. International series in quantitative marketing
Random forests
Machine Learning
Products of hidden Markov models
Optimal selection for direct mail
Marketing Science
Learning phrase representations using RNN encoder-decoder for statistical machine translation
arXiv preprint
Empirical evaluation of gated recurrent neural networks on sequence modeling
arXiv preprint
Artificial intelligence and marketing: Pitfalls and opportunities
Journal of Interactive Marketing
Feature Engineering for Machine Learning and Data Analytics
Deriving target selection rules from endogenously selected samples
Journal of Applied Econometrics
Optimizing Rhenania's direct marketing business through dynamic multilevel modeling (DMLM) in a multicatalog-brand environment
Marketing Science
RFM and CLV: Using iso-value curves for customer base analysis
Journal of Marketing Research
glmnet: Lasso and elastic-net regularized generalized linear models
Dropout as a bayesian approximation: Representing model uncertainty in deep learning
Learning to Forget: Continual Prediction with LSTM
Optimal mailing of catalogs: A new methodology using estimable structural dynamic programming models
Management Science
How to compute optimal catalog mailing decisions
Marketing Science
Deep Learning
Offline handwriting recognition with multidimensional recurrent neural networks
In Advances in neural information processing systems
Neural turing machines
arXiv preprint
Cited by (46)
Risk assessment of customer churn in telco using FCLCNN-LSTM model
2024, Expert Systems with ApplicationsPredicting customer abandonment in recurrent neural networks using short-term memory
2024, Journal of Open Innovation: Technology, Market, and ComplexityHow does quality-dominant logic ensure marketing analytics success and tackle business failure in industrial markets?
2023, Industrial Marketing ManagementCitation Excerpt :In recent years, marketing analytics has made great progress, with academics and practitioners across industries recognizing it as a modern revolution (Pasha, 2021; Schuuring et al., 2017), a source of market innovation (Iacobucci, Petrescu, Krishen, & Bendixen, 2019), a reason of distinct performance (Kumar & Sharma, 2017; Rahman et al., 2021), or a catalyst to resolving marketing problems (Cao et al., 2019; Wedel & Kannan, 2016). Marketing analytics refers to the digital tools and techniques used to analyse a large volume of marketing data insights to the end of generating value and making appropriate decisions in order to accelerate firm performance (Rahman et al., 2021; Sarkar & De Bruyn, 2021). Since 2016, research on the firm big data and marketing analytics capability stream has repeatedly emphasized managerial support, infrastructure, skills, and knowledgeability (e.g., Akter, Wamba, Gunasekaran, Dubey, & Childe, 2016; Mikalef, Framnes, Danielsen, Krogstie, & Olsen, 2017; Mikalef, Krogstie, Pappas, & Pavlou, 2019; Rahman et al., 2021).
Predicting ammonia nitrogen in surface water by a new attention-based deep learning hybrid model
2023, Environmental ResearchCitation Excerpt :Compared to conventional data-driven models, such as ANNs, LSTM model had a recurrent structure, which could use the previous output as the current input. As reported by a previous study, the LSTM outperformed another 269 manual models when dealing with regression problems (Sarkar and De Bruyn, 2021). Therefore, the LSTM model was chosen as the base model for this study.
Customer base analysis with recurrent neural networks
2022, International Journal of Research in MarketingCitation Excerpt :To address these questions and to assist managers in designing their marketing programs accordingly, the marketing discipline has produced a rich stream of literature. These contributions include predictive models and techniques for customer targeting and reactivation timing (Gönül & ter Hofstede, 2006; Simester, Sun, & Tsitsiklis, 2006; Holtrop & Wieringa, 2020), market response models for firm- and/or customer-initiated marketing actions (e.g., Hanssens, Parsons, & Schultz (2003), Blattberg, Kim, & Neslin (2008), Sarkar & De Bruyn (2021)), methods for churn prediction and prevention (e.g., Ascarza (2018), Ascarza, Iyengar, & Schleicher (2016), Lemmens & Gupta (2020)), as well as a growing literature on customer valuation (e.g., McCarthy, Fader, & Hardie (2017), McCarthy & Fader (2018)) and customer prioritizing (Homburg, Droll, & Totzek, 2008). However, none of these qualify as a (Swiss Army knife-like) general-purpose problem solver that generalizes across the described decision tasks of managing customer relationships.
Marketing analytics capability, artificial intelligence adoption, and firms' competitive advantage: Evidence from the manufacturing industry
2022, Industrial Marketing ManagementCitation Excerpt :The link between marketing analytics or business analytics and a firm's competitiveness has been suggested to be very complicated (Tan, Guo, Cahalane, & Cheng, 2016). However, there is a shortage of conceptual and empirical evidence in the context, primarily focusing on the export-oriented B2B RMG manufacturing firm's aspect (e.g., Kumar & Sharma, 2017; Rahman, Hossain, & Fattah, 2021; Sarkar & De Bruyn, 2021; Wedel & Kannan, 2016). Similarly, a study by Cao et al. (2019) focused on marketing analytics on the B2C aspect, as data evidence exposed from retail and professional services.