1 Introduction

As the Internet emerges as a new marketing channel, analyzing and understanding the needs and expectations of their online users or customers are considered as prerequisites to activate the consumer-oriented electronic commerce. Thus, the mass marketing strategy cannot satisfy the needs and expectations of online customers. On the other hand, it is easier to extract knowledge out of the shopping process under the Internet environment. Market segmentation is one of the ways in which such knowledge can be represented and make it new business opportunities.” (Kim and Ahn 2004). Already in 2004, Kim and Ahn (2004) described an essential paradigm shift that online marketing was encountering in a time in which the world wide web was rising. The statement focused on the limitation of mass marketing in a period where data-driven technological possibilities arose to analyze web-users footprints and enable personalized-oriented marketing. About two decades later personalized-oriented marketing is still a key challenge that many researchers conduct in their work (Chen et al. 2018; Apichottanakul et al. 2021; de Marco et al. 2021; Nguyen 2021; Sokol and Holy 2021). Not only has it been shown that personalized customer targeting is more profitable for companies, but also that knowledge about customer behavior is a decisive factor for success and failure (Mulhern 1999; Zeithaml et al. 2001; Kumar et al. 2008). In this respect, it is essential to understand the customers and their needs, and to be aware of their behavioral changes over time (Liu et al. 2009; Ding et al. 2019; Griva et al. 2021; Apichottanakul et al. 2021). In addition to technological changes and increasing functional requirements, legal regulations are also subject to constant change. This results in further non-functional requirements, as these regulations firstly describe local conditions and secondly can counteract the functional objectives (Burri and Schär 2016; European-Parliament 2016). From a functional perspective, companies that want to analyze customer behavior need (1) the capacity to record customer data, (2) an algorithm to characterize similar user behavior, and (3) strategies or processes that use the extracted information to achieve the business goal.

Regarding the first requirement, it is necessary to collect data that enable algorithm-based characterization of user behavior. Thereby, we distinguish between customer behavior data that is collected explicitly and implicitly. As the names suggest, explicit data collection is intentional to collect customers’ information. In implicit data collection, the main purpose is not to collect information about customers, but to collect information about the process in which the customer appears as the interactant, such as purchase information for accounting purposes. Explicitly collected data such as demographic information, on the other hand, is difficult to collect and maintain for several reasons. Not all customers are willing to share demographic data or they browse anonymously on the web. In addition, information collected in this way is subject to change over time and, accordingly, is always subject to uncertainty that is difficult to quantify (Chan et al. 2011; Chen et al. 2018). Accordingly, implicitly gathered data is easier to collect. This data can be tracked with every user interaction. E.g. information about products that are purchased together or the amount of money spent for a purchase. Nonetheless, data collected in such an implicit manner requires deeper analytical skills to exploit.

For the second requirement, the gathered interaction data is used. A frequently used approach for managing different customers with diverse preferences is segmentation (Hong and Kim 2012; Hsieh 2004; Chen et al. 2018). Customer segmentation is an unsupervised-learning process and utilizes different clustering approaches which have the goal to separate aforementioned customer data based on similarity. Hereby, similarity is measured by an objective function such as euclidean distance. It should be noted that customer behavior is a continuous process, with customer needs, wants and satisfaction changing over time. Accordingly, the processes and underlying procedures implemented in companies must be flexible in order to accommodate this high level of dynamism (Liu et al. 2009; Ding et al. 2019; Griva et al. 2021).

The last requirement is to utilize the analyzed customer information. Domain experts like marketers can tailor appropriate marketing strategies for individual customer groups based on segmentation. As Birtolo et al. (2013) already stated and showed, instead of domain experts, more and more automated methods to extract and to learn underlying patterns in customer behavior allow to target customers in advance.

The aforementioned dynamics are not only reflected in the respective target market but also can be observed in the underlying segmentation methods. Therefore, the goal of our survey is to provide an overview of digital and autonomous customer targeting processes for customer relationship management (CRM) based on historical data. The main objective of the literature research lies in the customer segmentation process for different e-commerce related use cases like retailing or services in the banking sector. Our study is structured by three guiding questions, to which we provide answers in this work.

  1. 1.

    Which clustering processes and methods are frequently used to understand customer behavior and targeting afterward?

  2. 2.

    Are there methodological limits with regard to data dimensionality?

  3. 3.

    Do methodological trends exist that can be observed over a period of two decades?

The main difference between our survey and former ones is that we focus on the process of customer targeting and behavior analysis in the e-commerce domain. The most recent literature review with a related topic is from 2016 (Sari et al. 2016). However, six years have passed since then, which makes an updated view necessary. Besides that in our study, we conduct a more extensive literature review that leads to a different classification of segmentation methods and more use case examples. In addition, we recognized a more extensive e-commerce process for customer targeting. Our contribution and main finding are:

  • We provide an overview with examples from the literature of how customer behavior analysis is used.

  • We determine a customer targeting process with four phases.

  • We could not identify a consensus in metrics to evaluate and compare the quality of the segmentation algorithm and therefore it cannot be said which of the methods is “best”.

  • Based on the frequency in publications and ability to handle large amounts of data, we recommend a process that uses RFM-analysis as a feature representation and k-means for segmentation.

  • We identify open questions and possible research gaps regarding embeddings for customer representation and deep learning-based segmentation for customer analysis and customer targeting strategies

Our study is structured as follows: In Sect. 2, we present and explain our research methodology. In Sect. 3, we present a literature overview of the identified works. Hence, in this section, we address the first guiding question accordingly and provide an answer. Moreover, we present the survey literature more in-depth. Based on the identified process, we notice that feature selection (be it manual or computerized) is an essential preprocessing step of customer behavior segmentation. Therefore, we explain the different segmentation and feature selection methods that are used. Additionally, the methods in the surveyed literature are described regarding the applied use cases for customer targeting and data volume. Section 3 ends with an overview of the publications’ evaluation metrics for customer segmentation. We analyze and discuss our findings in Sect. 4 which is further divided into two subsections. The first subsection is about the feature selection. In terms of feature selection methods, we present an answer to guiding question two and three. Similarly, in the second subsection we analyzes, discusses, and answers guiding questions two and three regarding the reviewed segmentation methods. In each subsection, we state open research questions that are not covered by our survey but have future potential. Finally, we conclude the survey in Sect. 5 with a brief summary of the findings and new open research questions and potential.

2 Literature research methodology

As already encouraged in the introduction we want to scientifically investigate which processes exist for personalized customer marketing approaches. Especially, to get an overview of commonly used customer segmentation methods in the context of CRM in e-commerce, we have conducted an extensive literature review. Thereby, Vom Brocke et al. (2015) published a recommendation on how to conduct such a search in an effective and highly qualified way. Hence, we followed their recommendation for the most part. Figure 1 illustrates our review process. We started our literature research by reading survey papers to derive an integrated and consolidated understanding of the conceptualization of the subject. Thereafter, we started the literature search. Therefore, we defined our search scope. Vom Brocke et al. (2015) refer to Cooper (1988) which states four steps on how to define a search scope: (1) process, (2) sources, (3) coverage, and (4) technique. Leaning on these four steps we choose a sequential search process. As a publication source, we used the Web of ScienceFootnote 1 (WoS) online research tool as it is one of the leading scientific citation and analytical platforms and provides scientific publications across a wide amount of knowledge domains (Li et al. 2018). To keep the focus on the customer segmentation methods we used the following search term:

  • “Customer segmentation” or “customer clustering” or “user segmentation” or “user clustering”

Herein, we chose to use the word “user” as a synonym for “customer” and “clustering” for “segmentation”. We wanted the search to be as less restrictive as possible to not miss relevant publications. Therefore, we expected works that are not relevant to our research. After having a corpus of hundreds of publications, we started reading the title, abstract, and keywords of the publications. We filtered out all publications that did not deal with customer behavior in commerce, especially in the context of e-commerce. The next step was to read all remaining papers and excluded all publications that did not deal with customer segmentation in an e-commerce use case and it became apparent that customers were segmented based on their information and actions. We extracted all wanted information from publications we classified as relevant. Specifically, we retrieved bibliometric information, information about the use case, the used methods, information about the used data, and the results.

Fig. 1
figure 1

Flow chart of the literature research process

3 Literature overview

As aforementioned in Sect. 2, we started our literature review with reading related surveys. Plenty of research surveys in the field of segmentation prioritize the underlying methodology or class of methods but not their usage in specific domain (Gennari 1989; Rokach 2010; Hiziroglu 2013; Ben Ayed et al. 2014; Firdaus and Uddin 2015; Reddy and Vinzamuri 2018; Shi and Pun-Cheng 2019). For example, Shi and Pun-Cheng (2019) review clustering methods for spatiotemporal data which are collected in diverse domains like social media, human mobility, or transportation analysis. Another survey example is brought by Hiziroglu (2013). The author reviews segmentation approaches for applications of soft computing techniques. Other surveys or studies focus on specific methods like k-means or RFM-analysis (Sarvari et al. 2016; Deng and Gao 2020). The most related literature review we found in our literature search is from Sari et al. (2016) which reviews customer and marketing segmentation methods and the necessary data. They identify different segmentation approaches and e-commerce process which coincides in some parts with our outcomes. However, as already mentioned before, six years have already passed and their paper corpus consist of less than 20 publications. From this, we deduce the need for an up-to-date and more detailed review in the area of customer segmentation in e-commerce.

The WoS search from 2023/01/01 led to 852 publications, of which not all were related to our research as assumed. As described we excluded all publication that did not deal with customer behavior in e-commerce. The major domain that was not related to our research objective dealt with user segmentation in non-orthogonal multiple access (NOMA) techniques. Over half (66%) of the publications were not related to our research topic and we had 289 publications left that were somehow e-commerce related. From the 289 publications, we classified 149 publications as “not relevant” and 140 publications as “relevant” based on the title, abstract and keywords with the aforementioned criteria.

Fig. 2
figure 2

Process of customer targeting based on behavioral information gathered from data

Reading the remaining literature (140 publications), we paid particular attention to recurring processes. We identified a process that is constantly used to determine customer behavior with segmentation approaches. Figure 2 illustrates the identified process that depicts the answer to our first question. It illustrates the customer targeting process and it can be divided into four steps: (1) customer information, (2) customer representation, (3) customer analysis, and (4) customer targeting.

In the first step, the customer information is stored in form of data and is made available for further processing. In the literature, this step is usually given by provided datasets. Nevertheless, in some publications, the information is collected by the researcher. Especially, when data is collected explicitly which is for example done by Hong and Kim (2012), Nakano and Kondo (2018), and Wu (2011).

Based on the collected information a customer representation is built as the second step. The customers are represented by their features which are selected manually or with a feature selection method. In nearly half of the cases (47.6%), features are selected manually and in the other half (52.4%) feature selection methods are used. Both feature selection approaches have their advantages and disadvantages. For example, feature selection methods are utilized to eliminate features with less information content or to aggregate and extract additional knowledge out of the customer data. The most used method in our literature review is the Recency, Frequency, Monetary (RFM) analysis that aggregates additional information about the customers’ behavior and value to a company (Hughes 1994) which we show in Sect. 4. Manual feature selection usually is performed by extracting information like item view or click events, purchased items, and item information such as the associated category. In some other cases, mostly for recommendation, the authors additionally use ratings and reviews for the behavior analysis. Otherwise, demographic data is collected through membership or similar programs. Another approach to get demographic or psychographic information is by user surveys.

The third step of the found process is customer analysis which is the key component of the process and is done by applying segmentation methods. Customers are split into more homogeneous groups of similar behavior. This is done by different segmentation approaches, like methods that compare the similarity between the customer representation or other methods that partition the customers by given thresholds. In Sect. 3.4, we further explain the interaction of customer representation with feature selection methods and the customer analysis on found case studies.

The fourth and last step, customer targeting, uses the behavioral information from the customer analysis to target the right user with the right CRM decision. In the literature, we identified different targeting approaches which includes recommendation, marketing campaigns, and pricing strategies. The main difference in the literature is that recommenders are evaluated against others with evaluation metrics like hit-rate, accuracy, etc, and marketing campaigns or pricing strategies focus on the plausibility of the customer segmentation and try to explain the outcomes over the performance.

We decided to consider only literature that mostly adheres to this characteristic process because it fulfills all necessary conditions for personalized customer marketing which is our defined investigation scope. The work of Coussement et al. (2014) is an example of a scientific publication which we did not consider in our work because it is not in our scope. In their research, they investigated the impact of data quality on different segmentation methods and showed which methods are more robust to inaccuracies.

Fig. 3
figure 3

Distribution of surveyed publications from 2000 until 2022

Based on this aforementioned method, we further filtered our corpus to obtain a final corpus of 105 scientific papers. The literature is distributed between the years 2000 and 2022 over different use cases and journals. The reviewed publications are not equally distributed over the years. Figure 3 illustrates the distribution of the paper’s publication year. We see that there are more publications over time in the field of e-commerce considering customer analysis with segmentation methods. Before 2010, we usually find one publication per year. In the period from 2001 to 2003, however, there is no publication in the paper corpus at all. In total, there are 16 publications in the period from 2000 to 2010. After 2010, there are at least three publications per year with an increasing tendency. 43 out of 105 publications (about 41%) are published in 2020, 2021 and 2022.

Table 4 gives an overview of the 105 publications containing title, author, and year.Footnote 2

3.1 In-depth feature selection methods for customer representation

Fig. 4
figure 4

Distribution of the surveyed feature selection methods over the years

We identify customer representation as a fundamental step in the customer targeting process. Therefore, before applying segmentation methods for customer analysis an appropriated customer representation is needed. As mentioned earlier, this is achieved by applying feature selection methods. In the following, we will refer to manual feature selection as “none” feature selection method. Figure 4 displays the distribution of the used feature selection methods over the years as well as the total amount in percentage. In 50 publications, the authors decide to use handcrafted features to represent the customers.

The RFM-analysis is by far the most popular feature selection method with 44 (80%) of 55 publications that use feature selection methods and 41.9% in total. In the RFM-analysis three features are extracted from customer data. The features are recency, frequency, and monetary. Recency relates to the time of the last user activity, like a purchase. Frequency describes how often a customer interacts in a given period and monetary measures how much money a customer spends in that period (Hughes 1994). In some works, e.g., Stormi et al. (2020), Chang and Tsai (2011) the RFM-analysis is extended by additional features.

Principal component analysis (PCA) is applied in four publications. In 2015 and 2022 once and in 2020 twice. PCA is a dimensionality-reduction method in which the information content of the features is determined and features with low information content can be removed (Pearson 1901; Hotelling 1933).

Purchase Tree are used in two publications and were proposed by Chen et al. (2018). The fundamental idea is to represent purchased products by a tree in which products are the leafs and the product category the nodes of the tree.

The remaining five feature selection methods are each only used once. Chi-square Automatic Interaction Detectors (CHAID) is based on decision trees to handle categorical variables (Kass 1980). Customer Lifetime Value is a popular economic key performance indicators which describes the profit of the customer for the entire lifetime. Discrete wavelet transform captures location and frequency information. In Graph representation, the customer interaction is encoded in such. Multiple correspondence analysis (MCA) allows the representation of categorical features in lower-dimension.

3.2 In-depth customer segmentation methods

Fig. 5
figure 5

Distribution of the surveyed clustering methods over the years of publication

The authors of the reviewed publications utilize different customer segmentation methods for the customer targeting process. Figure 5 shows the distribution of segmentation methods among all publications and over the years.

K-means is the most frequently used customer segmentation method in our surveyed literature (41 of 105). The goal of the k-means algorithm is to partition a set of data points into k segments which minimize the distance between the data. Usually, the euclidean distance is used. Solving the underlying optimization problem is NP-hard and therefore, various approximation algorithms are used (MacQueen 1967; Lloyd 1982). The usage of k-medoids and other k-means variations is included in our k-means classification.

The second most used segmentation algorithms are Hybrid approaches that are used twelve times (11.4%), followed by Other approaches that are used ten times (9.5%). Hybrid clustering refers to the application of two or more clustering approaches to segment the customers. As “other” clustering, we define the clustering methods which don’t fit the previous cluster definitions. For example, Abbasimehr and Shabani (2021) propose a time series segmentation approach to get knowledge from customer behavior or Chen et al. (2018) proposed an segmentation an algorithm which they call PurTreeClust. Hsu and Chen, Y.-g.C. (2007) propose an algorithm to cluster mixed data which is named CAVE and An et al. (2018) proposes a segmentation algorithm based on non-negative matrix factorization.

Nine publications use Rule-based clustering to segment their customers into different behavioral groups. In rule-based approaches, data points are assigned to predefined segments by value thresholds.

In our surveyed literature, five publications utilize a Fuzzy C-Means (FCM) approach. In a fuzzy clustering algorithm, data points can be assigned to different clusters at the same time. The fuzzy c-means (FCM) clustering algorithm is a fuzzy version of the k-means algorithm (Dunn 1973; Bezdek et al. 1984).

Latent class models are used for the latent class analysis to classify discrete variables (Lazarsfeld 1950). This segmentation approach is used six times in the surveyed literature.

Evolutionary Algorithm (EA) are inspired by the biological evolution of living things. EAs are a class of optimization methods to find an approximate solution to a problem which also includes clustering. Simplified, the algorithm can be described as follows. In the first step, a random solution is initialized. The second step is to determine the quality of the solution using a fitness function. In the third step, the best solutions are selected and these are randomly changed, which is also referred to as mutation in this context. This process is repeated until a stopping criterion is met (Eiben and Smith 2003; De Jong 2016). Genetic algorithms (GA) like particle swarm optimization (PSO) (Kennedy and Eberhart 1995) or chaotic ant swarms (CAS) (Zhu et al. 2007) also belong to the family of EAs. Our survey contains five publications that utilize EAs.

Hierarchical clustering is utilized five times by the authors of the surveyed litterateur. The basic idea of hierarchical clustering is to bring similar data points close to each other regardless of the distribution. There are two approaches, known as the agglomerative and divisive approaches. In the agglomerative approach, the algorithm starts with each data point being in its own cluster. At each iteration step, the most similar clusters are merged until a distance criterion is met. The divisive approach works similarly, except that it starts with one cluster and splits in each iteration (Maimon and Rokach 2005).

Self-organizing map (SOM) is also used five times in our paper corpus and are based on neural networks. Neural networks are mainly used for supervised learning tasks. However, it is also possible to use neural networks in an unsupervised manner for clustering by pushing fully connected neurons towards the data points that are closest to them (Kohonen 1982).

The Rough Set Theory was introduced by Pawlak (1982) and is a data mining method to extract knowledge of databases. Besides the use for segmentation, the rough set theory can also be used for feature selection, data reduction, and other applications. In our research, we found three publication utilizing rougth sets to segment the customers.

Deep learning (DL)-based clustering, spectral clustering, and clustering via expectation-maximization are only used once. Similar to SOMs, deep learning-based clustering methods are based on neural networks. Nguyen (2021) presents a deep learning-based clustering approach named Deep Embedding Clustering that combines a deep neural network and a self-supervised probabilistic clustering technique. They state that their approach produces explainable customer segments. In the first step, they determine the optimal number of clusters with a spectral clustering approach and the elbow method. Then they encode their manually selected variables and apply the deep embedding clustering which is a deep autoencoder that is trained with the mean squared error (MSE) loss. The expectation-maximization (EM) algorithm performs a maximum likelihood estimation on given data points which consists of latent variables. It is an iterative approach that optimizes the mean and variance of the cluster distribution until it converges (Dempster et al. 1977). Spectral Clustering is a graph-based clustering approach in which distances between data points are represented by the edges. With the resulting graph’s Laplacian-matrix segments can be computed (Fiedler 1973; Donath and Hoffman 1973).

In the first decade (2000–2010) rule-based, Evolutionary Algorithms (EA), latent class, hybrid, and “other” clustering approaches were used twice. Both hybrid approaches were published in 2004. One hybrid approach combines k-means with a EA and the other combines a hierarchical approach with k-medoids. Hierarchical, fuzzy C-means, and rough set theory segmentation approaches are used once in the years between 2000 and 2010. Self-organizing map (SOM)-based segmentation was used three times which makes it the most applied method in this decade in our survey.

In the second decade (2011–2022), 89 of 105 (84.76%) relevant papers were published. K-means is used for the first time in 2011 (disregarding hybrid approaches). Since then, k-means has been used at least once a year. In 2014 k-means is used in two, 2015 in four, 2018 in five, 2020 in eight, and 2022 in eleven publications. Statistically, this indicates an upward trend. Also rule-based approaches are used repeatedly in the last years.

3.3 Overview customer targeting use cases

The underlying customer targeting process applies to a large amount of business and e-business use cases. In this section, we present an overview of which segmentation methods are used on which use case. Therefore, we briefly introduce the found e-commerce use cases.

The first category of use cases we want to introduce is Retailing. It is the sale of different goods that are not further specified and don’t belong to any other use case category. We also assign use cases to this category if it is not further specified. This means that a pure sports retailer is classified under the Sports use case, or a retailer that sells only clothing is classified under Fashion. Different to retailing, fashion is a dynamic industry (Brito et al. 2015). Like the fashion branch, Electronic is considered as a branch of e-commerce retailing. In the literature, some customer behavior segmentation use cases are related to Banking. Use cases in this category naturally have more information about the customer. In addition, the products and services don’t change as quickly as in retailing. In Mobile operators’ use cases the authors deal with data from mobile network providers. With Youtube, Netflix, and other companies, Video & music streaming platforms and services become very popular, and forecasts show that sales will also grow strongly in the coming years (statista.com 2022). In our literatur search we found some Book use cases that deal with book retailing or renting services. Nowadays, there a plenty of online services to plan a trip. In travel use cases we consider case studies that deal with trip-related action like hotel booking, reviewing, or trip and location recommendation. In our survey food use cases get their own category because in some cases it is difficult to distinguish between food retailing, restaurant reviews, or food production or manufacturing. Manufacturing in e-commerce comes with some benefits and new opportunities. One is product customization (Fan and Huang 2007). Another one is manufacturing-related services. In others use cases, we classify use cases that we could not determine explicitly or don’t fit in one of the other groups, e.g. online news or email campaigns of charitable organizations.

Table 1 shows in the rows all clustering method used in the literature. Each column represents one use case. A check mark indicates that we were able to identify an example for at least one use case. The number of checkmarks indicates the number of use cases we identify for the segmentation methods. Note, that in some publications, the utilized method is showcased on multiple use cases which leads to a mismatch between the number of publications and the number of use cases.

Table 1 Use cases and utilized segmentation approaches for customer targeting of commerce business from the survey literature

Retail is the most occurring use case in the surveyed litterateur with 43 case studies. The authors show with their publications that every segmentation method is usable to approach retail use cases. The retail use case is the only one that have examples for each segmentation method. Besides retail use cases, only travel and “other” use cases are approach by most of the segmentation methods for the customer analysis. The remaining use case categories have at least five different segmentation methods as an application example.

Regarding use case coverage, we found that k-means clustering are used to approach all use cases expect manufacturing. Thereby, k-means is utilized 16 times to approach retail and three times in bank, video and music, and “other” use cases each. Our literature review show that FCM is applied to seven different use cases. Rule-based, hierarchical, hybrid, and “other” segmentation approaches are applied on five different use cases.

3.4 Overview and examples of the interplay between customer representation and analysis for customer targeting use cases

The authors of the identified publications utilize different customer segmentation methods with different feature selection methods for the customer targeting process. In this section, we further investigate and describe these approaches to give a better insight into the interaction of the feature selection and segmentation methods. Table 2 provides an overview of the different segmentation methods with the corresponding feature selection approaches used. It also lists the number of times such a pair of segmentation method and feature selection was used in the paper corpus. The last column of the table shows the publication’s reference. In the following, we present some examples on how the different segmentation and feature selection methods are used in the found literature to approach customer targeting in e-commerce.

Table 2 Segmentation and used feature selection methods with corresponding references of the survey literature

In nine publications rule-based clustering is used to segment the customers into different behavioral groups. Therefrom, seven use the RFM-analysis to represent their customers. An example retail use case that combines RFM-analysis and k-means is provided by Hsu and Huang (2020). In their research they want to identify VIP customers. VIP customers are buyers of critical products which are not purchased by the average customer. In their approach, they apply the RFM-analysis on over 600,000 transactions from around 3800 customers. The segmentation is based on the 20%-quantile of th RFM-values. Another example which utilzes rule-based segmentation with RFM-analysis is from Jonker et al. (2004). In their publication, the authors want to find the best marketing policy out of a set of policies for a customer. The data are from a mailing scenario of a charitable organization. They first utilise an on the email data adapted RFM-analysis and segment the customers based on defined thresholds. To identify the the best policy for a segment the authors used a markov decision process.

Two authors applied rule-based segmentation without applying a feature selection method. Hjort et al. (2013) want to investigate the impact of product returns in a fashion use case provided by Nelly.com which is a Scandinavian online fashion retailer. For the research, the scientists selected six features for each customer which are total sales, average sales per order, total contribution margin, average contribution margin, the total number of orders, and the total number of returns. Based on the feature information, they assign each customer to one of four groups. The groups are based on the buying and returning habits of the customers. The authors conclude from the customer analysis, that customers who tend to return goods are also the more valuable for the company.

In 16 publications the authors decide to not use a feature selection method but select features by hand before applying k-means clustering to the customer data. Authors of 21 publications use the value of RFM-analysis for the segmentation with k-means. Three research groups use a principal component analysis (PCA) for feature selection before clustering with k-means. Only Ding et al. (2019) use a graph representation before segmentation. The graph is built based on user-item interactions.

Griva (2022) analysis the customer of 140 e-commerce stores in European countries with k-means and hand crafted features. The features are extracted from 270,000 responses from a customer satisfaction survey and 1 million orders from 800,000 customers. They propose a framework which is capable to build automated marketing actions based on the created customer satisfaction segments. Example for such marketing actions are social media sharing strategies for the satisfied segments or discounts for the less satisfied customer segments.

Guney et al. (2020) are looking for the best campaign in movie rental use case (video on demand). In a first step they apply an modified RFM-approach which extract two additional features from the data. The two features are the number of days between the first and last rental and the standard deviation of the days between two rented movies. These five features are clustered via a k-means algorithm. The clustering results in four customer groups. An apriori algorithm namely an association rule mining approach is than used to assign the best marketing campaign to the customer segment.

In our selected literature six publications utilize an FCM approach. Ozer (2001) collects the data from customers of an online music service via a customer survey and doesn’t use a feature selection method before applying FCM on the features.

Nemati et al. (2018) search for the most appropriated marketing strategy for the customers of a telecommunication industry use case. First, they compute the customer lifetime value (CLV) for each customer and group them with FCM. To assign the right marketing strategy to the right segment they utilize a fuzzy TOPSIS technique.

For hotel businesses, customers’ satisfaction is crucial. Alghamdi (2022a) investigate customers’ satisfaction of hotel visitors in Mecca and Medina (Saudi Arabia). Therefore, they apply PCA on data collected from TripAdvisor and segment the resulting features via FCM.

Hierarchical clustering is used in five publications. Three authors handcraft their features. Aghabozorgi et al. (2012) calculate the necessary features by applying a discrete wavelet transformation (DWT) on customer data of a bank use case. In their research, DWT is an appropriate approach because they consider customer activities as a time series which is not the norm. After using DWT on the data, the data is initially segmented with a hierarchical clustering method. The cluster is updated incrementally in a given period with new data. Zhou et al. (2021) combines hierarchical clustering with an extended the RFM-analysis for a retail use case. The RFM-analysis is extended by the interpurchase time which results in four different features. The interpurchase time is defined as the time gap between two consecutive purchases in the same location (same website). Afterwards, the customers are clustered by the calculated features.

In our research, we have one publication from Dhandayudam and Krishnamurthi (2014) that combines RFM-analysis for feature selection with rough sets for clustering. In addition, they add another feature to the RFM-values that describes the average time between purchase and payment. They categorize all four features in their 20%-quantiles and then utilize a slightly modified rough set theory approach for the clustering. Song and Shepperd (2006), Wu (2011) don’t use feature selection methods before segmenting the customers with a rough set approach.

Clustering based on latent class models is used six times in the surveyed literature. Four of them manually select the features and therefore, don’t use feature selection methods. Nakano and Kondo (2018) use psychographic, demographic, online store, social media, and device touchpoint data. The information is clustered with a latent class analysis approach which results in seven segments. Goto et al. (2015) propose a method based on latent class analysis that clusters items and customers. They assume valuable users purchase more often only browsing and valuable products are bought more often. They use the latent class model to cluster the customers into “good users” and “other users”. To analyse the resulting segments they use the Classification and Regression Tree (CART) Algorithmus.

Wu and Chou (2011), Apichottanakul et al. (2021) use RFM-analysis for the feature selection and apply a latent class approach for the clustering. Apichottanakul et al. (2021) use the proposed GRFM approach from Chang and Tsai (2011) to analyse the customers of a pork processing use case. First, the RFM scores are calculated for nine product categories and each feature is categorized in one of five categories based on the 20%-quantile. The features are clustered with a probabilistic latent class model. Apriori the optimal number of k is unknown therefore, a suitable number of clusters is determined with the Akaike Information Criterion (Akaike 1974). In the last step, the clusters are analyzed with the help of the RFM-values.

The only publication that uses the EM algorithm for clustering is from Rezaeinia and Rahmani (2016). The goal of their work is to recommend products in a retail use case. Therefore, they first compute the features via RFM-analysis and cluster them with an EM approach for customer targeting.

Spectral clustering is used by Chen et al. (2019) to segment customers buying behavior. Therefore, they use a Purchase Tree representation for customers transactions which was proposed earlier by Chen et al. (2018). For the customer segmentation, they propose a two-level subspace weighting spectral clustering algorithm. Spectral clustering approaches are used only once in our literature.

Our survey contains five publications that utilize EAs for customer clustering of which two use RFM-analysis and three don’t use a feature selection method on the available data. Both publications using RFM-analysis are published by or with Chu Chai Henry Chan. In his publication from 2007, the task is to determine an appropriated strategy for each customer of an Nissan automobile retailer. Therefore, Chan (2008) computes the features from the RFM-analysis and categorizes the values in one of five 20%-quantiles. Then the features are binary encoded with four bits. Based on the binary features a GA is used with the customer lifetime value (CLTV) as the fitness function. In 2016, Chan et al. (2016) apply the same feature preprocessing and PSO with CLTV as fitness function on a similar use case but with more data.

SOMs are used in five publications in total. Verdu et al. (2006); Nilashi et al. (2021) utilize handcrafted features to represent the customers. In the remaining three publications customers are represented by their RFM-values. For example, Hsieh (2004); Liu et al. (2009) combine an RFM-analysis feature extraction with a SOM clustering to segment the customers in their case study. A recent example of an SOM approach is proposed by Liao et al. (2022). They develop different marketing strategies for each segment for a retail use case. Therefore, they use an extended RFM-analysis approach to represent the customers. The extension is not only using RFM-analysis on customer purchase information but also on other behavioral information like clicks, add-to-cart, or add-to-favorite. For this, they utilize 2 million customer interaction records. The SOM approach is than applied on the different RFM-values of the customers to segment them in similar behavioral groups.

Nguyen (2021) presents a deep learning-based clustering approach named Deep Embedding Clustering that combines a deep neural network and a self-supervised probabilistic clustering technique. They state that their approach produces explainable customer segments. In the first step, they determine the optimal number of clusters with a spectral clustering approach and the elbow method. Then they encode their manually selected variables and apply the deep embedding clustering which is a deep autoencoder that is trained with the mean squared error (MSE) loss.

In our literature review, we found twelve research papers that use hybrid clustering methods. In ten publications no feature selecteion method is used. For example, Kang et al. (2012) don’t utilize a feature selection. They split the dataset into two sets of answering customers and not answering customers. The data points are clustered with a k-means and CSI Algorithm with different criteria. Kim and Ahn (2004) use (CHAID) as a feature preprocessing. The clustering is performed by a GA based on k-means clustering. Jadwal et al. (2022) use MCA as feature preprocessing and segment the customers of a bank use case with an segmentation approach based on k-means and hierarchical clustering.

In our survey, we classified ten publications as “other clustering”. Six authors have manually selected features. In three publications the RFM-analysis is used as feature selection method. For example, Abbasimehr and Shabani (2021) propose a time series clustering approach to get knowledge from customer behavior. First, they split the dataset into predefined time intervals. As a second step, they apply RFM-analysis on each interval and use the monetary value of the customer for the time series. On the resulting time series, a time series clustering approach is applied. Also, Hu and Yeh (2014) utilize RFM-analysis based features for the clustering. Therefore, they propose an RFM-pattern-tree to represent customers which also is used to approximate customers with less information. They can use this to detect similar customers with similar behavior. Simoes and Nogueira (2021) uses RFM-features and segment the customers with an ABC curve segmentation. Chen et al. (2018). represent the data as a Purchase Tree and propose for the segmentation an algorithm which they call PurTreeClust and is based on a partitional clustering algorithm.

3.5 An overview of the data dimensionality in the publications’ experiments

An essential component for the behavior analysis and customer targeting process is the information that is collected by the companies. In this section, we describe which methods are used for which data in respect to the order of magnitude. We distinguish between two different types of data amount. The first is the number of data points e.g. transactions and it describes the amount of data an algorithm can handle at least. The second type is the number of customers in a dataset.

The number of customers may indicate how much data an algorithm can process because customers and not data points are segmented. Therefore, it is important to consider the number of customers when analyzing the data dimensionality. Depending on the number of customers the number of data points can be reduced after a feature selection method. For example, in RFM-analysis the information of a user is aggregated for one period which leads to fewer data points the clustering algorithm needs to process. Other feature selection approaches like PCA doesn’t affect the number of data points or user but the number of features.

Table 3 Data dimensionality that the feature selection methods and clustering approaches handle in the experiments of the surveyed literature

Table 3 shows which feature selection methods and clustering algorithms are used with which data dimensionality regarding the number of data points and the number of customers in the use case. The number of data points is described by six columns of which each has a different order of magnitude. We choose a similar representation for the number of customers in a dataset but only have five columns. We annotate the methods that deal with this amount of data with checkmarks. Note, that not all publications describe the data in a way it is possible to extract the information of the data dimensionality. In some cases, only the number of data points are given, in others, we only know about the number of customers, and sometimes we don’t have information at all. How often a method is used, is indicated by the number of checkmarks. In some publications, different datasets with different sizes are utilized. If two datasets have different orders of magnitude, we indicated it by using checkmarks in the appropriated cells. However, if the datasets in the same publication have the same order of magnitude, we indicated it only once per publication.

In terms of feature selection, we see that RFM-analysis is applied up to \(10^8\) data points, but above this number of data points it is not used anymore. For example, Akhondzadeh-Noughabi and Albadvi (2015) apply RFM-analysis on 35,537,276 customer activities from 14,772 customers.

Based on the survey literature, PCA and DWT can be applied to data with up to 1 million data points. The graph approach utilized by Ding et al. (2019) is used on around 50,000 user activities. Chen et al. (2018, 2019) propose a purchase Tree approach which is tested on several datasets with different sizes in a range of a few thousand and 350 million transactions with customer numbers between 800 and 300,000.

The clustering method rows only refer to the clustering algorithms where no feature selection methods are applied. Regarding the number of users in datasets, we see that usually, their number doesn’t exceed 10,000. An expectation is provided by Kang et al. (2012). They test their hybrid approach on two datasets in which one dataset contains information about 101,532 customers. Another one comes from Goto et al. (2015) where they apply a latent class model on 37,278,907 browsing actions from 99,924 users. Abdolvand et al. (2015) apply k-means on 25,000 bank customers. Investigating which clustering methods are used for data with at least one million entries, we identify that it is k-means clustering, latent class models, and hybrid clustering approaches. K-means is used twice on over a million data points by Liu et al. (2015); Zhang et al. (2014). Liu et al. (2015) have access to 3 million transaction data from taoboa.com. Zhang et al. (2014) use the MovieLens datasets in which one has 100,000 movie ratings of 1682 different movies rated by 943 different users and the other has 1 million ratings for 3952 movies made by 6040 users.

3.6 Evaluation metrics

Usually, a clustering model learns in an unsupervised manner and the ground truth is unknown. Therefore different criteria need to be used to evaluate their performance. In the following the frequently used evaluation measures are described and briefly analyzed.

Clustering evaluation or cluster validation is an essential step in verifying the discovered groups in a data set. The fundamental challenge of evaluation lies in the missing ground truth, which can be a reason that we have not found a consensus between the evaluation methods in our literature research. Figure 6 presents the distribution of used evaluation methods for segmentation methods in the literature that also shows the missing consensus. We classified the evaluation criteria into seven different groups which are indicated by a color. In the following, we briefly introduce the evaluation criteria and give some examples. It should be noted, that in some publications the authors don’t apply evaluation metrics. Instead, they analyze the segments based on their plausibility. Chan et al. (2011) for example, measures the performance of the proposed method by comparing the company’s sails before and after the using the approach. Guney et al. (2020), Nie et al. (2021), Wu et al. (2020) evaluate the segments with the help of the RFM-values.

Fig. 6
figure 6

Distribution of the segmentation methods used evaluation methods

Statistical significance test The underlying concept of a Statistical significance test is to determine whether the data points are randomly distributed or not. Krishna and Ravi (2021) have used a statistical t-test to evaluate their genetic algorithm approach on five different datasets. Another approach is the Kendall coefficient (Kendall 1938) that is used by An et al. (2018).

Analysis of variance The basic idea behind the analysis of variance (ANOVA) is to analyze whether the expected values of variables differ in distinct groups. By testing, if the variance of a variable is larger or smaller between the groups than within the groups, a statement about the meaningfulness of the group can be determined. ANOVA tests are used by Li et al. (2009), Hong and Kim (2012), Hjort et al. (2013), Hiziroglu et al. (2018).

Silhouette analysis The silhouette analysis is a (visual) validation method that is independent of the number of clusters and determines the consistency within a cluster. In addition to validation, this method can also be used to find the optimal number of clusters (Rousseeuw 1987). For example, the silhouette analysis is used by Akhondzadeh-Noughabi and Albadvi (2015), Peker et al. (2017), Christy et al. (2018).

Indices As shown in Fig. 6 many different index metrics were used to validate the clustering performance. The most used indices in our literature review are Davies-Bouldin (DB) index, Calinski-Harabasz (CH) index, and Xie-Beni (XB) index. The DB index describes the average similarity of each cluster with its most similar cluster. The DB index is to be interpreted in such a way that the lower the value is, the better the clustering (Davies and Bouldin 1979). The CH index, is the ratio of intra-cluster dispersion and inter-cluster dispersion (Caliński and Harabasz 1974). The XB index is used for fuzzy segmentation approaches and describes the separation and compactness of the clusters. The optimal number of clusters has the lowest XB value (Xie and Beni 1991). Chan et al. (2016) evaluate their proposed EA clustering with the DB index. Munusamy and Murugesan (2020) evaluate their fuzzy c-means clustering approach with XB index but also with the Kwon index, and the Tang index. They also use error measures for the cluster evaluation.

Information criteria These measures are used to select the models that fit the given data best but also take the number of parameters into account to prevent overfitting. One popular information criterion is the Akaike information criterion (AIC) which describes the model’s information based on the number of parameters and the model’s log-likelihood (Akaike 1974). Apichottanakul et al. (2021) utilize the AIC for evaluation to determine the optimal number of clusters in their latent class model.

Error measures Another evaluation method that is used in the surveyed literature is based on error measures like the mean absolute error (MAE), sum of squared error (SSE), root mean squared error (RMSE), or symmetric mean absolute percentage error (SMAPE). Abbasimehr and Shabani (2021) measure the cluster performance with SMAPE. Aghabozorgi et al. (2012) evaluate their proposed hierarchical clustering with SSE. Also, Lam et al. (2021) evaluate their clustering approach with SSE.

Others Some authors combine several evaluation metrics to express the usefulness and quality of their clustering models or use methods which donot fit in the six categories above. The mostly used “other” metric is cluster distance. We classify all inter and intra-cluster distance metrics as cluster distance if they are not further explained by the authors. For example, Wan et al. (2010) utilize an inter and intra-cluster distance to show that their CAS clustering approach has better distances and is more stable than k-means. Sivaguru and Punniyamoorthy (2021) apply a within/total clustering error index (which we consider as a cluster distance metric) to evaluate their k-means approach. In addition, they utilize DB index and t-test too. Umuhoza et al. (2020) utilize the elbow method, silhouette score, and CH index to determine the optimal number of segments. Another metric is the concordance (C) statistic (C-index) also known as receiver operating characteristic (ROC) and associated area under curve (AUC) score is for example used by Hsu et al. (2012) (also use SVM, isolation, and AVG index) or Barman and Chowdhury (2019). Dhandayudam and Krishnamurthi (2014) uses cohesion and coupling to evaluate the cluster quality for their rough set theory approach. Griva et al. (2021) use cohesion, inter and intracluster distance, similarity, and separation for cluster validation and gap statistic plus silhouette analysis to determine the optimal number for their latent class model clustering. Ramadas and Abraham (2018) validate the hybrid clustering which combines GA and fuzzy c-means with a partition coefficient (degree of intersection of clusters), classification entropy (the fuzziness of clusters), XB index, separation index, and partition index. Abdolvand et al. (2015) utilize the DB index to determine the optimal number of segments for their k-means approach and data envelopment analysis (DEA) for the evaluation.

4 Analysis and discussion

As previously shown in Fig. 3, the reviewed publications were not equally distributed over the years. An upward trend in the number of publications can be recognized which indicates the importance of customer behavior analysis and therefore, their segmentation even after twenty decades of research. Especially, in the years 2020, 2021, and 2022, we have found more publications than the years before. There may be several reasons for this. The first reason that comes to mind is the current covid pandemic. This has increased the growth in e-commerce services. This could have prompted less digitalized companies to digitalize more and offer their services online. In many publications the company remains unknown. However, in some other publications the companies are named. Two examples are taobao.com or nelly.com that are established online companies which is an indication against our statement. From the literature conducted experiments did not show the state of digitization of the companies. Therefore, whether this connection exists remains open, and is not further investigated by us. Another reason, and in our opinion a more decisive one, is the increasing availability of the internet regardless of location. This means that a user can access the available online services at any time and from any place. For example, watching a series during a train ride or buying a new product at the online retailer of choice. With new requirements and necessities, the topic is also becoming more relevant in science and thus more is being published.

4.1 Analysis of feature selection methods

Based on our research, feature selection to represent customers is a fundamental step in the customer targeting process. For feature selection, customer information is indispensable. It is a challenge to get customers’ demographic information, physiographic information, or information about their preferences. As already stated, there are two possible ways to collect such data. Explicit information collection is done by questionnaires or user surveys that require customers’ accommodation to participate. Another, more implicit way is to collect demographic information via registration. Information can be collected by setting them as mandatory. Nevertheless, collecting data via registration is often limited to the usual information like age, gender, or address. In some use cases, like fashion, additional information about height and weight can be collected. It needs to be considered, that some users don’t want to provide any information and wish to remain anonymous. They either give false information or leave the website (service). In both cases, it is not possible to gather useful information and in the worst case, the former leads to false conclusions regarding the customers. Furthermore, user groups that don’t participate in a survey or are signed up are not represented in the data which makes the acquisition of unknown and new customers harder.

It is possible to gather customers’ preferences with the aforementioned method. Nevertheless, this comes with a huge disadvantage. The information is outdated soon and needs to be constantly updated which increases the maintaining effort. Constantly asking the customer for an information update can also cause him to quit as a consequence. Therefore, customer preference should be estimated based on their recent behavior. Customer behavior information can be recorded implicitly. Usually, purchase information with product information, timestamp, etc., is stored for a company’s financial overview. In addition, online touchpoints with the customer can be logged by the system. These logs can include various touchpoints like product views, click events, reviews, (dis)like, and many more. The advantages are that the customers do not disclose any personal information. Also, they are likely not interrupted on their shopping journey by unwanted questions. Nevertheless, disadvantages exist too. Predicting customer information from their behavior is not always correct that is for example caused by customers’ heterogeneity. Additionally, a large amount of data is required to make such predictions. Another challenge of implicit data collection is that the information needs to be linked to the customer. However, there are plenty of tracking-techniques to link the data with customers by using cookies or the browser identifier to name two examples.

As shown in Fig. 4, for the customer process as a whole, it makes no difference whether a feature selection method is used or the features are selected or handcrafted by an expert. However, manual feature selection and feature selection methods have their pros and cons.

One advantage of manual feature selection is that no additional computation is required. However, it requires expertise and domain knowledge to select customer information that is meaningful and representative. Feature selection methods are designed to automate the selection of features. One advantage is that domain knowledge is no longer required. However, this doesn’t mean that domain knowledge should generally be dispensed with. Another argument in favor of feature selection methods is that information redundancy can be removed. Redundancies come in hand with the amount of data collected. Removing unnecessary and redundant information can speed up the customer analysis algorithms. This information is hard to determine and select manually even with domain knowledge. Regarding Table 3, we notice that feature selection methods have processed larger amounts of data in our literature. Considering our second question from the introduction, we can state that feature selection methods allow larger amounts of data for customer behavior analysis. Particularly, the RFM-analysis and Purchase Tree have no limitation concerning the data dimensionality based on our research.

Our literature research shows that the RFM-analysis is by far the most popular feature selection method. Therefore, we analyze the RFM-analysis method in more detail hereafter and discuss the advantages and disadvantages. During the literature research, several points caught our attention. The RFM-analysis could be applied to almost any type of purchase or activity data since only three features need to be calculated. Furthermore, the calculation is very simple and requires only the basic arithmetic operations. So there is valuable customer representation in only three values. These values can be represented either numerically or categorically. For the categorical representation, the values were typically divided into five categories, each with 20%-quantiles. Thus, the obtained features are used for any clustering method. In addition, we notice that the RFM-analysis is often extended with additional features. The feature extension is usually use case-specific. Besides adding new features, the RFM-features are extended on different activity levels. For example, the RFM-values are calculated for all product categories or different customer activities. This provides additional information about the customer’s product preference at the category or activity level. Another advantage of RFM-analysis is that it can handle all sizes of data sets without having a scalability problem. This has been sufficiently demonstrated in the publications and is illustrated by Table 3. We also like to note that in some publications, the RFM-analysis is used to explain the resulting clusters and helps with the customer behavior analysis which shows that decision makers can easily understand and interpret the RFM-values. Based on our findings to feature selection methods, we can answer the third question as follows. For feature selection methods no time-depended methodological trend could be determined. However, the most popular feature selection method is the RFM-analysis.

These versatile properties of the RFM-analysis are the reason for its popularity which is also stated by Chan et al. (2011), Alberto Carrasco et al. (2019). Despite it being the most used feature selection method, we also identified weaknesses in the RFM-analysis that all found customer representation has in common. The RFM-analysis, other feature selection methods like PCA, or manual feature selection don’t consider the whole information content of the accessible data. However, to represent more information, more features and therefore, more memory is required, which also increases the computation time for the segmentation methods. Another issue is that there is information in the data that cannot be extracted using feature selection methods or expertise. Recently, embeddings become a popular approach for representations. Embeddings are capable to represent words as shown by Mikolov et al. (2013), time series (Nalmpantis and Vrakas 2019), or products (Vasile et al. 2016) but are not limited to them. With embeddings, it could be possible to encode additional behavioral information that could improve the customer targeting process. This was already demonstrated for product recommendation (Vasile et al. 2016; Tercan et al. 2021; Alves Gomes et al. 2021; Srilakshmi et al. 2022) or customers’ purchase behavior prediction (Alves Gomes et al. 2022). Despite the popularity in several e-commerce tasks, no author used an customer embedding representation in the reviewed literature. From our perspective, the reason is that embeddings are less interpretable, and therefore, non-automated customer targeting is more difficult.

4.2 Analysis of segmentation methods

We found 13 different types of segmentation methods. K-means is by far the most used approach. Especially, in the last years from 2020 to 2022 k-means is used 24 times. In regard to the third guiding question, we can conclude that besides a k-means upwards trend no other trend can be spotted. The question that now arises is “why is k-means becoming so popular recently”? One answer is that k-means is simple to implement and an established approach. In contrast, other approaches like EAs, hierarchical clustering, or SOMs are more complex according to how the run time or space requirements grow as the input size grows (Bachmann-Landau notation) and it needs more effort to implement them (Firdaus and Uddin 2015). The ever-increasing amounts of data in e-commerce amplifies this trend because simple methods can be used more quickly, and thus, results can be obtained faster. However, if this is the reason, then the question that follows is why are rule-based approaches not popular as well? As shown by Fig. 5 the density of rule-base approaches increased in the years between 2018 and 2021 but some other influencing factors play a major role on the methods popularity. While we can only make assumptions at this point, rule-based segmentation approaches have significant drawbacks. For example, they require domain knowledge to set appropriate thresholds for separating customer segments. The increasing and heterogeneous amount of data complicates this setting of appropriate thresholds or requires a higher dynamic, which in turn results in more rules and complex relationships. Our assumption is supported by the aggregated information in Table 3 that shows that k-means is applicable on 100 million data points.

Considering the data dimensionality which is used in the publications we see that k-means approaches can handle a larger amount of data and is in pair with latent class approaches. As we mentioned, the hybrid approach that uses the largest amount of data is a combination of the latent class model. However, concerning the number of customers in the data which are the objective of the clustering, the numbers rarely exceed the 10,000. This indicates that clustering approaches need an appropriate feature selection method to deal with a larger amount of data. All this doesn’t mean that the methods cannot be applied to larger data sets. Our argumentation is based solely on the paper corpus we saw. Based on the findings concerning the data dimensionality, we can state for guiding question number three, that k-means and latent class models can process the largest amount of data among all segmentation methods. However, as already stated this applies only in case of manual feature selection. We recommend using a feature selection methods namely the RFM-analysis that allows to process any kind of data dimensionality. Note that we don’t address the time or memory complexity of the segmentation methods, which is also a performance indicator, but evaluate them based solely on the amount of data used in the literature.

In terms of use cases, we can state that each clustering method is usable in retailing use cases. We cannot make such a generalized statement for other domains. However, it is not unlikely that all segmentation methods can be used independently of the domain. Especially with k-means, we can see that it has the largest variant of different use cases. Nonetheless, the reason for being used in different domains can be because k-means is applied in most publications.

Apart from a quantitative analysis of the segmentation method, we would like to make a qualitative analysis. Unfortunately, there is no way to determine which segmentation method performs best. The major issue in our opinion is that there is no ground truth for the customer segments to determine a score. Therefore, there is no unified method for qualitative evaluation which is necessary to state which segmentation method is superior to the other. We noticed that there are a vast amount of different evaluation methods as presented in Sect. 3.6. Different evaluation approaches are required for different clustering approaches, i.e. fuzzy (soft) clustering has different properties than hard clustering. It would simplify qualitative segmentation analysis if the scientific community agree on a small set of evaluation methods. The urge is there which we can see in the number of different evaluation metrics and the considered publication where the authors try to show that their approach is superior to others. If everyone would use the same metrics, the authors’ efforts would have more significance and the performance of the method could be compared over different publications which are usually done in other scientific disciplines. Nevertheless, due to the absence of ground truth, correctness can never be shown, and therefore, the purpose of unified evaluation methods may be questioned. Another aspect we want to consider is evaluation metrics with semantic interpretability. Such metrics would have the advantage to show which segmentation algorithm partitions the customers in a desirable way. Furthermore, it would create comparability between multiple segmentation methods for identical use cases. However, the challenge is to define evaluation metrics that have the capacity to be semantic interpretable and, at the same time, can be applied to different segmentation methods and use cases. In numerous publications, evaluation methods are used to find the optimal number of segments. Therefore, even if there is no defined uniform way to compare clustering approaches, they still have their reason of existence and are necessary methods for determining an optimal number of segments.

Before the study, we would not have expected such a distribution, as we thought that a relatively old method like k-means (first proposed around 1960 and published in 1982) is not so often used especially not so often in the last years of the considered literature. In addition, we assumed that there would be newer and more innovative approaches like deep learning-based approaches. The reason for our assumption is that deep learning techniques archived great results in a broad range of applications such as computer vision and natural language processing and we expected to see these methods transferred to customer segmentation and analysis. However, deep learning-based segmentation only appeared once in the literature. Regarding our initial assumption, an open question still remains. Will deep learning methods be used for customer segmentation in the future? As with embeddings used as feature representations, one advantage might be that the feature representation phase can be omitted, and thus less information is lost. However, a disadvantage and probably the reason why we did not find more than one deep learning-based segmentation method is that the customer segmentation needs to be formalized as a learning problem. Furthermore, segmentation is by design an unsupervised process and no ground truth exists. Another point that speaks against deep learning segmentation is that deep learning models are black boxes and therefore, interpretation, explainability, and reasoning for decision making are no longer achievable.

Based on our findings and analysis, we recommend using k-means or rule-based segmentation approaches which are easy to use and implement, to partition different customers for e-commerce use cases. In addition, if massive transaction data is available, we recommend RFM-analysis for the customer representation that can be extended with additional features.

5 Conclusion and future research

In this survey, we provided an extensive literature review on customer targeting process for e-commerce use cases whose main focus lies in the segmentation methods for customer behavior analysis. Our goal was to provide an overview of segmentation methods used in the literature and to determine best-practice approaches and their limitations. We introduced the steps of the research and key criteria for the paper selection and analyzed as well as discussed our findings afterward. In our work, we considered 105 publications with different case studies that focused on customer analysis with segmentation methods.

Summarizing the approaches examined, the identified four-step process emerges as the current gold standard for personalized customer targeting in e-commerce. For the customer representation, either hand-crafted features or an RFM analysis adapted to the use case are generally used. Subsequently, for customer analysis, the generated customer representation is segmented using a k-means approach.

Based on our research and literature analysis we made several findings regarding our investigated topic.

  • We identified a common process for personalized customer targeting which includes feature selection methods, customer segmentation, and customer targeting. This process is illustrated by Fig. 2 and can be utilized to plan customer targeting campaigns. Each of the four steps has its own requirements and its a discipline of its own worth to be investigated. We focused on the customer analysis and customer representation part.

  • Over the years, the number of publication that deals with customer targeting in e-commerce are continuously increasing. This supports the preceding assumption that it is a time-relevant subject.

  • Feature selection methods enable the usage of larger datasets and among the utilized methods the RFM-analysis is by far the most popular one. There are many reasons for this: first, the method is easy to use, and second, it is based on features that can be extracted and understood. Another advantage of RFM analysis is the possibility of its easy adaptation to specific use cases by adding further or changing existing features.

  • In approximately half of the publications (47.6%), manual feature selection was used.

  • Among all the used clustering methods, k-means has emerged as the most popular approach (39% in total). Since 2011, it was repeatably used. Besides that, no other over-time trend was identified. The popularity of k-means can be explained by its simplicity and applicability to large scale datasets.

  • We were not able to define the best clustering approach based on its performance because many different evaluation methods exist and were used to evaluate the cluster quality.

  • Some evaluation methods can be used to determine the optimal number of segments which is unknown from the beginning and is often a tunable hyperparameter.

  • The literature review doesn’t show that a segmentation method exists that is applicable to every e-commerce use case that involves customer analysis. This could only be suggested, if at all, for the retail use case. In terms of method, k-means has been used in every use case identified, with the exception of the manufacturing use case.

New insights always come with new challenges and opportunities. Based on our research and findings we propose future research ideas which should be investigated. Especially with regard to recent developments in the field of Deep Learning, there are many approaches that can be adapted and, according to the our assessment, display a lot of potential.

  • Deep learning introduced innovations in many domains such as natural language processing and computer vision. Nevertheless, we only found one DL-based segmentation approach in our research. Therefore, we see potential and a research gap in DL techniques for segmentation.

  • The process steps in the identified four-phase process for customer targeting are essentially based on a high level of understanding of the customers, i.e. their needs and behavior. This is necessary for marketing and domain expert to tailor personalized marketing strategies for the customers. However, with the advent of deep learning-based approaches personalized customer targeting can be done fully automated e.g. end-to-end model and therefore, the customer analysis step which includes customer segmentation can be omitted. This development can be seen for example in deep learning-based recommendation systems which make personalized recommendation without the need of the customer analysis. This leads to the question; How customizable are the individual phases of this process and can individual steps be omitted to increase efficiency or are all steps so fundamental that a deviation from these procedures would have a negative impact on the goal, customer targeting?

  • Manual feature selection is still frequently used. The feature quality is thereby highly depended on the underlying expertise to select or define important features for clustering. Progressive digitization is leading to growing challenges, especially in dealing with data volumes and data diversity. To meet these challenges, manual feature selection is reaching its limits as it is not able to tap the insight potential within this data. Hence, the question arises if approaches exist that can help experts to create meaningful and representative features for customer representation?

    In this regard a look outside the box to other e-commerce research, e.g. click-through rates prediction can yield new approaches. There researchers and professionals have started using feature embeddings on manual selected features with the underlying assumption that the learning models will learn meaningful representations from the data. This would simplify the manual feature selection process. However, these learning models are usually based on deep neural networks which are unfortunately black boxes and not interpretable. The question rises, if segmentation methods can be used as a post-processing to provide interpretability for the embedded features and therefore, an insight over the customers? (Which got lost by not using the customer analysis step).

  • In our research, we identified many different evaluation metrics to evaluate the performance of segmentation methods. Nevertheless, we could not find a consensus on evaluation metrics as in other domains. The reason is the missing ground-truth. This circumstance makes it difficult to determine the effectiveness and transferability of a segmentation approach from one use case to another. The open question that remains is, is it necessary, to develop evaluation metrics with semantic meaning and is it possible to transfer such metrics to different experiments to enable comparision of the segmentation methods?

In our literature review, we covered the usage of feature selection and segmentation method for personalized customer targeting. E-commerce is a dynamic environment with ever new challenges and therefore, new research opportunities.