1 Introduction

More than 20 years ago, Pine and Gilmore (1998) introduced the term experience economy to describe consumers’ increasing desire for experiential value in retailing and services. Since then, marketers and retailers have identified numerous ways to create extraordinary shopping experiences. However, this strategy might backfire since highly arousing store environments likely result in confusion (Bitner 1992). In recent years, scholars and marketers have recognized the confusion potential of hyper-arousing retail store environments. Forbes magazine notes that most new technologies that seek to revolutionize the in-store shopping experience for consumers “may actually make the customer experience more complex or confusing” (Varon 2016). Similarly, academic research demonstrates that poor signage (Bitner 1992; Mitchell et al. 2005), a misleading visual merchandising strategy (Mitchell and Papavassiliou 1997), poorly crafted in-store experiences (Beverland et al. 2006), electronic price tags (Brüggen et al. 2011), and a confusing store layout (Baker et al. 2002) can all lead to shopper confusion.

The relevance of confusion during shopping situations is well reflected in efforts to develop scales that assess confusion. For example, extant scales in this field consider what consumers feel during confusing shopping situations (Garaus and Wagner 2016) or measure the personality trait “consumer confusion proneness” (Walsh et al. 2007). However, the only research that explicitly concentrates on store-related confusion is the work of Garaus et al. (2015). Using a qualitative survey, the authors categorized various in-store elements that exhibit confusion potential into social, design and ambient factors. Nonetheless, the focus of their work was to explore experienced retail shopper confusion, defined as “as a three-dimensional, temporary mental state consisting of the cognitive effort necessary to deal with confusion (cognition), emotions reflecting the discomfort associated with confusion (emotion), and restricted behavioral intentions (conation)” (Garaus et al. 2015, p. 1004).

The high relevance of confusion in shopping situation manifests not only in scale development efforts, but also by the identification of negative consequences of confusing store environments. Bitner (1992, p. 63) states that “unpleasant environments that are also high in arousal (lots of stimulation, noise, confusion) are particularly avoided.” Furthermore, empirical studies demonstrate undesirable consumer responses to confusing shopping environments, including decreases in: shopping value (Garaus et al. 2015), unplanned expenditures, in-store exploration, repeat visit intentions, store patronage intention, and spending time (Garaus and Wagner 2016). Despite the acknowledged importance of environmentally induced confusion (see also Mitchell et al. 2005), elements and properties that determine confusing store environments have received relatively little attention in the extant literature and no efforts have yet been made to develop a valid and reliable measure of confusion evoked by store environments.

In light of the enormous budgets that are spent on store design, a better understanding of the potential negative effects of specific store factors and their properties would offer retailers the opportunity to build store environments that produce less (or even no) confusion potential. A valid, parsimonious measure of confusion evoked by the store environment would allow retailers to continuously monitor the confusion potential of store environments.

Against this backdrop, the study has three research objectives. First, this study seeks to present a conceptualization of store environmental confusion (SEC), wherein SEC is defined as a second-order, formative construct. Second, this research develops and empirically tests a parsimonious instrument that measures SEC through specific environmental properties. In doing so, it creates a more nuanced view on the confusion potential of store environments. Third, the nomological and predictive validity of SEC is established by embedding and testing this new construct within a nomological research framework by relating it to feelings of confusion, which in turn result in avoidance behavior.

The present research contributes to the extant confusion literature by developing, validating, and testing a parsimonious measure for SEC on manifest properties of the store environment. In line with prior research on store induced confusion, this study shows that SEC results in avoidance behavior.

The paper continues with a literature review, where SEC is differentiated from similar constructs. In doing so, it combines various research streams on confusion and highlights the lack of a measurement instrument for assessing SEC, thereby justifying the need for a new scale. The article then follows established index development guidelines. A general discussion, theoretical and managerial implications, and limitations as well as suggestions for further research conclude this paper.

2 Literature Review

Ever since Friedman (1966) first proposed that misleading unit price information will evoke confusion during shopping, the construct of consumer confusion has taken on increasing importance in the consumer behavior literature. Early research on consumer confusion concentrated on the causes of product- and market-related confusion (Mitchell and Papavassiliou 1997; 1999). Research revealed that missing or unclear product instructions (Mitchell and Papavassiliou 1997; Schweizer 2004; Walsh et al. 2007), unclear product pricing (Mitchell & Papavassiliou 1999), an ambiguous brand image (Mitchell and Papavassiliou 1997), and ambiguous front of pack labels (Leek et al. 2015) all cause confusion. Moreover, complexity in general seems to evoke confusion (e.g., complex technology (Drummond 2004) or the complexity of products (Kasper et al. 2010; Leek and Chansawatkit 2006; Leek and Kun 2006)).

Walsh and colleagues (Walsh et al. 2007; Walsh and Mitchell 2010) provided the first formal conceptualization of consumer confusion. These authors conceptualized consumer confusion proneness as personality trait and a product-related construct by relating it to: “how easily/often consumers experience this state of confusion or as consumers’ general tolerance for processing similar, too much or ambiguous information, which negatively affects their information processing and decision-making abilities” (Walsh and Mitchell 2010, pp. 839–840). In contrast, Schweizer (2004, p. 29) concentrates on the confusion potential of shopping situations by defining consumer confusion as an “emotional state that makes it difficult for consumers to select and interpret stimuli.” In a subsequent conference paper, Schweizer et al. (2006) developed a 25-item scale reflecting the six consumer confusion properties variety, novelty, complexity, conflict, comfort, and reliability. However, only five items consider SEC factors; the other items relate to product-induced confusion.

More recently, Garaus and Wagner (2016, p. 3461) published a scale measuring the negative feelings that constitute retail shopper confusion. The authors define retail shopper confusion as “a three-dimensional, reflective second-order construct, consisting of the three reflective first-order dimensions”, represented by feelings: (1) inefficiency, which captures the degree to which cognitive processing abilities are exceeded; (2) irritation, which represents affective feelings regarding the discomfort associated with retail shopper confusion; and (3) helplessness, which describes the restriction in behavioural intention. Even though the authors point to the confusion potential in store environments, the scale does not allow for assessing the sources of confusion within environments.

The most valuable contribution to understanding SEC has been published by Garaus et al. (2015). The authors offer initial evidence on the confusion potential of the store environment through use of a qualitative survey. Based on these data, they classify 64 confusion triggers. In particular, the researchers define the confusion potential of ambient, design and social factors by referring to four environmental properties (variety, novelty, complexity, and conflict). However, no efforts were undertaken to develop a reliable and valid measurement instrument for assessing SEC.

This review of existing scales to measure confusion (proneness) reveals the literature lacks a scale for assessing SEC. The absence of such a measurement instrument also implies that it remains unclear which specific store factors evoke considerable confusion, why they evoke confusion, and how retailers can assess and avoid such confusion triggers.

3 Store Environmental Confusion Measurement Instrument Development

We propose that specific environmental properties of store design factors represent suitable measures for assessing SEC triggers. By way of illustration, the theoretical underpinnings of this research suggest that the two environmental properties complexity and conflict of the store design factor signage reflect the SEC dimension “signage confusion”. Formative index procedures uncover the full scope of the SEC construct and identify and validate six SEC dimensions (aisles, customer flow, shelving and storage, signage, space allocation, visual merchandising). Fig. 1 outlines the four stages of the index development. Stages 1 and 2 are concerned with conceptual issues and hence draw primarily upon theoretical considerations. However, the highly fragmented literature on SEC further requires empirical data (i.e., initial survey) to help offer a more precise specification of SEC dimensions and indicators that measure these dimensions. Stages 3 and 4 validate the previous conceptualizations using data collected in a large-scale quota-based survey.

Fig. 1
figure 1

Index Development Stages, Objectives, Methods and Analysis, and Outcomes

Thus, stage 1 starts with a specification of the construct domain and determines the number of dimensions. Stage 2 identifies the indicators that measure these dimensions. Again, the conceptualization of formative indicators is guided by theory, and empirical data is used to create an initial item pool. Stage 3 assesses the validity of each SEC dimension by estimating eight multiple indicators multiple causes (MIMIC) models and verifies the dimensionality of each SEC dimension. Finally, stage 4 assesses the nomological and predictive validity of the instrument by providing statistically sound and meaningful estimates of measurement and structural parameters.

3.1 Stage 1: Specification of the Construct Domain

3.1.1 Theoretical and Conceptual Considerations

Theoretical Considerations

The specification of a construct domain is important for capturing and considering all of the domain’s facets (Diamantopoulos and Winklhofer 2001). Stage 1 in the scale development process thus strove to conceptualize the dimensionality of the SEC construct. Prior qualitative research identified design factors, ambient factors, and social factors as confusion sources in retail environments (Garaus et al. 2015). These store factors draw on Baker’s (1987) classification of the store environment. However, alternative classifications exclude social factors, as social interactions often occur as a consequence of store particularities (Bitner 1992; Donovan and Rossiter 1982). For instance, music can influence the desire to interact with sales staff (Dubé et al. 1995) and store design and store layout can create spatial crowding (Mehta 2013). In line with these considerations, social factors were not considered subsequently in this study.

Moreover, Baker (1987) suggests that customers perceive ambient factors unconsciously. They take such conditions for granted and become aware of them only in the case of their absence or malfunction. Furthermore, research indicates that consumers are not aware of the effect of music on their behavioral intentions (North et al. 1997). Similarly, scent influences both cognition and behavior unconsciously (Holland et al. 2005). The unconscious nature of ambient factors makes the retrospective measurement of its confusion potential impossible. Hence, ambient factors disqualify for inclusion in the SEC index. The concentration on store design factors is also in line with the objective of this research, namely to develop a parsimonious measurement instrument for SEC; therefore, the present study focuses on the physical store environment by considering only design factors for the development for the SEC index.

The previously discussed literature suggests that eight design factors (aisles, architecture, customer flow, shelving and storage, signage, space allocation, technology, and visual merchandising) might constitute SEC (cf. Garaus et al. 2015). We label these factors with the suffix “confusion.” For instance, the confusion potential of the design factor “signage” is captured by “signage confusion”. As such, each of the eight design factors represent a defining component of the construct SEC. They assess customers’ perceptions of the confusion potential of each respective design factor and are not distinct entities from SEC, but rather belong to the same entity. Thus, SEC cannot occur without any confusion potential among the eight design factors. Accordingly, they are conceptualized as SEC dimensions, and thus are conceptually different from exogenous causes of the construct SEC (cf. Lee et al. 2013). Exogenous causes would rather relate to retail management decisions, such as the installation of new technologies or the complete redesign of a store.

In sum, we conceptualize SEC as a formative construct, as each dimension represents an important and unique aspect of the higher-order construct SEC (Bollen and Lennox 1991). The formative conceptualization of the construct further implies that the confusion dimensions do not necessarily need to correlate with each other (Diamantopoulos et al. 2008). Hence, in a particular store, one design factor (e.g., signage) may bear high confusion potential while another design factor (e.g., visual merchandising) may possess limited confusion potential. A store environment is perceived as very confusing when it scores highly on all of these dimensions.

Despite identifying eight design factors that likely represent SEC dimensions, no empirical data are available so far suggesting which design factor, if any at all, indeed possesses confusion potential. Accordingly, it is not clear if all eight design factors are suitable to capture the SEC construct. Due to its newness, no theoretically-based selection criterion exists to select certain design factors to represent SEC dimensions while eliminating others.

Preliminary Conceptual Considerations Regarding Indicators of SEC Dimensions

In line with the multidimensional perspective of SEC (Garaus et al. 2015), this research proposes that it is not the existence of design factors themselves, but rather the specific structural environmental properties variety, novelty, complexity, and conflict (Mehrabian and Russell 1974) that bear confusion potential. These environmental properties serve as potential indicators of each environmental confusion dimension. Variety (or information density; Cupchik and Berlyne 1979) refers to stimuli associated with uncertainty, such that a wider range of alternative possibilities results in greater uncertainty. In product-related confusion, the variety of products represents a major trigger for confusion (Walsh et al. 2007). Novelty captures new, unusual, unfamiliar, mystical, changing, or surprising stimuli (Berlyne 1960). Complexity describes the degree of variation within a stimulus pattern (Nasar 1987), such that it does not reflect the number of elements in an environment (variety) but rather the specific patterns of these elements, their dissimilarity, and the potential for processing the elements as one unit (Berlyne 1960). Herrmann et al. (2013) demonstrate that complex environmental stimuli inhibit information processing. Finally, conflict is the opposite of coherence, and refers to the assignment of two or more stimuli with different meanings (Nasar 1987).

In this regard, it is important to consider the relationship between these environmental properties, arousal, and affective evaluations. Research acknowledges that the relationship between arousal and affective evaluations (i.e., with respect to the store environment in the present context) follows a curvilinear relationship, with most favorable (i.e., least negative) evaluations experienced at moderate levels of arousal. Low arousal levels are experienced as unpleasant and evoke feelings of boredom (Berlyne 1960; Di Muro and Murray 2012), while intense levels of arousal evoke feelings of confusion (Bitner 1992). Consumers perceive nothing positive in confusion and confusion evokes negative feelings only (Garaus and Wagner 2016). Based on this evidence, we postulate that (substantial) SEC requires higher levels of arousal (i.e., levels exceeding the optimal arousal level). Thus, the present research will focus on higher levels of arousalFootnote 1 which further permits assuming linearity between confusing design factors and environmental properties.

3.1.2 Empirical Validation

An initial data collection (data set I) intended to clarify the role of design factors representing SEC dimensions. Accordingly, a survey attempted to quantitatively assess the confusion potential of the eight design factors (i.e., aisles, architecture, customer flow, shelving and storage, signage, space allocation, technology, and visual merchandising) (Baker 1987; Bitner 1992). Conducting this data collection required specifying indicators that measure these factors (operationalized by the environmental properties variety, novelty, complexity, conflict). The quantitative assessment of the confusion potential of each of these factors sought to empirically verify the theoretically derived environmental confusion dimensions.

The questionnaire included examples for all 4 × 8 combinations (32 items) of the four properties that measure the eight design factors (e.g., for the design factor signage, the question for assessing its variety read, “The environmental property variety, e.g., lots of signage, different sizes, colors or shapes of signage, evokes confusionFootnote 2”). As response format the questionnaire offered a five-point Likert scale for each of the 32 items. Twenty respondents pretested the preliminary questionnaire for comprehensibility and logical flow.

Twenty students received course credits for interviewing at least ten respondents each based on a predefined quota. The sample was representative of the national population of the European country of investigation in terms of age, gender, and education (52% females; age: 16% 15–24 years, 16% 25–34 years, 19% 35–44 years, 17% 45–54 years, 12% 55–64 years, 10% 65–74 years, 10% older than 75 years; education: 11% university, 12% high school, 14% vocational school, 33% apprenticeship, 30% compulsory school).

Respondents qualified for the survey if they had been living in this country for at least three months and shopped regularly for groceries. The high number of interviewers combined with quota specifications enabled questionnaire distribution to a wide variety of respondents while simultaneously representing shoppers typical of the country under investigation. This approach, in turn, guarantees that respondents had varied shopping experiences that are essential for identifying the confusion potential of a broad range of shopping environments.

The first page of the questionnaire introduced respondents to the research topic (exploring the confusion potential of the four environmental properties of the store design factors). A short paragraph highlighted that the research focused on confusing store environments, not on other store characteristics. The instruction for all questions read as follows: “Please indicate which of the following properties of ‘design factor’ evoke confusion in a shopping situation.” Afterwards, for each design factor the following statement appeared: “The following properties of ‘design factor’ in a store evokes confusion in a shopping situation.” Subsequently, respondents indicated the extent to which they agreed that these design factors in combination with the four properties described the confusion potential of store environments (five-point rating scale). For instance, for the design factor aisles, the item assessing the confusion potential of the environmental property complexity reads: “complexity (e.g., no overview of where to find products)” and for the environmental property conflict the item reads: “conflict (e.g., too narrow or wide aisles, barriers, difficult access to other aisles, bad transition to other aisles)” (please see Table 5).

The questionnaire did not provide any stimuli of specific confusing store environments but showed some small Cliparts from the Microsoft Office about typical shopping situations (showing women doing the grocery shopping and clothes shopping) to make the questionnaire more vivid and appealing. After eliminating obvious response errors, 214 questionnaires qualified for further analysis.

Because participants might have experienced fatigue when answering 32 items, leading to satisficing response behavior (Weijters and Baumgartner 2012), initial analyses checked for response bias. First, we examined the standard deviations of each subject’s responses; postulating that the responses would follow a discrete uniform distribution resulted in a benchmark of \(\sqrt{2}\) for the standard deviations. Nearly constant evaluations of items (implying less diligent response behavior) would produce smaller standard deviations. As a consequence, elimination of nine questionnaires whose responses exhibited standard deviations of less than \(\sqrt{2}/2\) occurred; this results in the final sample size of n = 205. Next, Mood’s (1940) asymptotic multinomial runs test explored the response patterns. Intuitively, this test assesses whether runs occur at random or follow regular patterns. Too few runs would indicate that a respondent evaluated many items in a row the same way, likely from fatigue (Weijters and Baumgartner 2012). In contrast, too many runs could indicate deliberate systematic response behavior. The runs test did not reject the hypothesis that they were random for 76% of the cases. Hence, at least 76% of respondents appeared to conscientiously have evaluated the questionnaires. Finally, conducting an analysis of the correlation coefficients for all items across respondents concluded checking for response biases. Ninety percent of the correlations were less than 0.4, indicating that respondents discriminated diligently among the different items. The left part of Table 1 presents mean evaluations of these 32 items.

Table 1 Evaluation of the Confusion Potential of Design Factors through Environmental Properties (based on data set I)

The column labelled “Overall Mean” in Table 1 shows the overall means of all eight design factors, where each design factor is measured by novelty, variety, conflict, and complexity. The entries in Table 1 are ordered according to increasing composite evaluations of design factors. The design factor “signage” (3.13) contribute the least to SEC and the design factor “aisles” (3.53) the most. In general, all eight design factors seem to reflect the confusion potential environments quite well. The row labelled “Global Mean” in Table 1 presents means (over design factors) of the environmental properties.

Following the expectation that the scale midpoint value of three reflects neutral responses, potentially confusing design factors should exceed the scale midpoint (cf. Parks and Floyd 1996). One-sample t-tests for each design factor against the scale midpoint assessed their confusion potential. These tests revealed all design factors bear considerable confusion potential, with an overall mean being significantly (for a type I error of 5%) higher than 3. This exploratory analysis demonstrates the confusion potential of aisles, architecture, customer flow, shelving and storage, signage, space allocation, technology, and visual merchandising. This empirical validation supports the specification of design factors as SEC dimensions. More formally, at this stage in the scale development process, SEC is conceptualized as an eight-dimensional, formative construct.

3.2 Stage 2: Indicator Specification and Initial Item Pool Generation

3.2.1 Theoretical and Conceptual Considerations

Having assessed the dimensionality of the SEC construct, the next step of the scale development process sought to identify indicators used to measure the eight SEC dimensions. This specification formed the basis for the development of an edited item pool.

Prior research emphasizes the relevance of correct indicators. Misspecifying a formative indicator as reflective would not only bias the measurement of the latent construct, but might also affect all other coefficients in a model (Bollen and Diamantopoulos 2017). Moreover, misspecifying formative indicators might lead to indicator exclusion based on traditional reliability statistics, which are not applicable to formative measures and lead to incorrect item elimination criteria (Bollen and Lennox 1991).

Three considerations guided the specification that the indicators of SEC should be formative (cf. Bagozzi and Yi 2012; Diamantopoulos and Winklhofer 2001). First, the indicators should induce variation in the respective SEC dimension. Second, each indicator should capture a specific aspect of a SEC dimension. This postulation implies that—in contrast to reflective measurement specification—the indicators are not interchangeable. Third, these indicators were not required to correlate with each other. Moreover, Diamantopoulos and Winklhofer (2001) emphasize the relevance of identifying indicators that cover the entire scope of the latent variable. This is of particular importance for the present index construction (e.g., when referring to the dimension signage confusion, the indicators assessing this construct need to capture the whole signage confusion potential).

Some literature provides guidance on selecting properties that are suitable for measuring SEC dimensions. Nasar (1987) showed that complexity and conflict can effectively describe the arousing properties of business signs, and high arousal levels have been found to correlate positively with confusion (Bitner 1992; Garaus and Wagner 2016). A recent study confirms that visually complex environments hamper processing fluency (Ketron 2018). However, the theoretical evidence provided by these reports does not sufficiently warrant the inclusion of only complexity and conflict as formative indicators for each SEC dimensions while eliminating novelty and variety. Hence, we follow the advice of Bollen (2011) and employ statistical procedures to investigate the relationships between the four environmental properties for each SEC dimension. This approach will guarantee that the developed SEC index captures only indicators that indeed constitute SEC and likely result in negative consumer responses.

3.2.2 Empirical Validation

The empirical validation identified the properties that best capture the confusion potential of each SEC dimension by relying on the same sample as in stage 1 (i.e., data set I). Stage 2, however, concentrated on environmental properties (the row labelled “Global Mean” in Table 1 presents means of novelty, variety, conflict, and complexity) rather than on overall evaluation of design factors. Columns of environmental properties are sorted according to increasing confusion potential: on average, the property novelty (2.98) appears to be of least importance for SEC and the property complexity (3.56) of most importance. The low relevance of novelty is probably due to infrequent changes in store environments. According to the global mean of variety, shoppers did not assess the variety of design factors (e.g., the variety of signage) as particularly confusing. This result is especially noteworthy because variety represents a major antecedent of product-related confusion and indicates that confusing properties have differing importance for products and stores.

The global means of conflict (3.46) and complexity (3.56) suggest these two properties possess major confusion potential. One-sample t-tests of global means against scale midpoints were significant for both properties. A repeated measurement ANOVA contrast test reinforces the difference in confusion potential between novelty and variety against conflict and complexity (F = 107.51, p < 0.01).

The lower left part of Table 1 shows mean confusion of each environmental property for each SEC dimension. The corresponding right part of Table 1 reports the results of pairwise (pairing of environmental properties) repeated measurement contrast tests for each SEC dimension (which resulted in 32 planned comparisons). Thus, these contrasts analyze the superiority of the environmental properties conflict and complexity as compared to variety and novelty at the factor level. The vast majority (81%) of these comparisons is in line with the global findings (i.e., the confusion potential of conflict and complexity exceeds that of novelty and variety) and is statistically significant; four differences are not significant (those indicated by regular font in Table 1), and only two comparisons are not consistent with expectations but statistically significant (italic font in Table 1). Taken together, these results offer further empirical evidence for the suitability of the two environmental properties conflict and complexity for measuring SEC dimensions.

Prior research also demonstrates that conflict and complexity are highly relevant properties in predicting consumers’ responses to shopping environments (Deng and Poole 2010; Nasar 1987; Orth et al. 2016), while variety and novelty do not appear to reflect the confusion potential of store environments. From an index construction viewpoint, Bollen (2011, p. 362) point out: “removal of a causal indicator might change the nature of the latent variable.” Nevertheless, he also mentions (p. 362) that “if the coefficient of the causal indicator is not significant or is the wrong sign, it is possible that you have an invalid causal indicator. Decisions on whether to eliminate indicators must be made taking account of the theoretical appropriateness of the indicator and its empirical performance in the researcher’s and the studies of others.” The limited theoretical evidence in combination with this very simple analysis prevent us from relying on only one sample before eliminating formative indicators. Hence, advanced statistical procedures with another sample are used to validate the preliminary finding of stage 2 of this index development process, namely that novelty and variety do not reflect valid indicators for the SEC dimensions (see Sect. 3.3.2).

3.3 Stage 3: Assessment of Construct Validity

3.3.1 Theoretical and Conceptual Considerations

While the empirical analysis of this section will offer insights into both, stages 2 and 3 of this index development process, theoretical considerations deal with the assessment of construct validity of each SEC dimension (aisles, architecture, customer flow, shelving and storage, signage, space allocation, technology, and visual merchandising).

In contrast to reflectively measured constructs, construct validity assessment of formative constructs is still challenging. Formative constructs are not identified (Bollen 2011) and treating formative indicators as reflective ones lead to biased coefficients, especially when the formative indicators have low intercorrelations (Jarvis et al. 2003). Recent literature suggests to predetermine the weights based on theoretical reasoning (Cadogan et al. 2013; Lee and Cadogan 2013), i.e. extant theoretical knowledge determines the contribution (i.e. weight) of each indicator.

However, Howell et al. (2007) list a number of shortcomings of such an approach. In essence, they conclude that the predetermination of formative indicator weights is accompanied with a high potential of loss of information, due to the great variety of possible configurations. Based on this reasoning, they argue that subjective weighting of formative measures is only appropriate when a specific dependent phenomenon is of interest, and little meaning and interpretation is attached to the index itself. In support of these notions, Diamantopoulos (2013) endorses that setting fixed weights does not allow for testing the significance of indicators and Diamantopoulos and Temme (2013, p. 162) report that—compared to alternative approaches that rely on pre-defined weights for indicators or composites of those—“only the MIMIC model’s fit could be considered acceptable according to conventional criteria.”

Complementing this line of thoughts, we note that the estimation of the weight-parameters of each indicator is essential for assessing construct validity, as they capture the contribution of each formative indicator to the construct. Accordingly, indicators associated with non-significant weight-parameters should be considered for elimination as they “cannot represent valid indicators of the construct” (Diamantopoulos et al. 2008, p. 1215). The estimation of multiple indicators and multiple causes (i.e. MIMIC) models is an alternative to predetermining indicator weights. By doing so, the problem of under-identification of the measurement model is circumvented by adding three reflective indicators (Diamantopoulos and Riefler 2008). This approach is the preferred option for evaluating formative measurement models since it does not require the inclusion of additional constructs for model identification and thus adheres to the requirement of model parsimony. In addition, estimates of measurement parameters are more stable than structural ones (MacKenzie et al., 2005). Employing a MIMIC model transforms a formative measurement model into a function that predicts a linear combination of reflective indicators (Bollen 2011).

The major problem some scholars associate with MIMIC models is that the meaning of the latent variable is not theoretically grounded in the formative indicators but rather empirically based in the covariance between the latent variable and its reflective indicators (Treiblmaier et al. 2011). By including reflective items, the error term in MIMIC models does not depend on the conceptual meaning of the formative model but results from the unexplained variance when trying to predict the common factor of the formatively measured construct and its reflective indicators (Lee et al. 2013). Accordingly, the formatively measured latent construct changes when the number and content of the reflective indicators change.

In contrast, Diamantopoulos suggests that theoretical considerations should guide the interpretation of the MIMIC model. Alternatively, one could interpret the focal construct as being measured by its formative indicators, while “impacting several directly observed variables” (Diamantopoulos 2013, p. 33). We appraise both perspectives as legitimate and pay particular effort to solve the shortcoming of MIMIC models as discussed by Cadogan et al. (2013). We propose that MIMIC models that include reflective indicators of a specific mediating variable overcome these limitations.

In the present research, each formatively measured confusion construct (e.g., signage confusion) is per definition related to the mediating mental state retail shopper confusion. Compelling empirical and theoretical results provide evidence that confusing store elements evoke the negative mental state confusion (Garaus et al. 2015; Garaus and Wagner 2016). As such, the negative feelings associated with confusion (inefficiency, irritation, and helplessness), relate per definition to confusing store elements, while no theoretical arguments exist for any other mediating construct. Accordingly, confusing store elements cannot be modelled with any other set of reflective indicators than retail shopper confusion. This theoretical reasoning also hinders the application of Adams et al.’s (2003) approach, namely considering the formative indicators as independent predictors of a variety of outcomes. Conceptually, confusing store elements do not influence outcomes directly, but through the specific mediating mental state retail shopper confusion.

Based on these considerations MIMIC models appear to be the most suitable approach for assessing construct validity for the eight SEC dimensions. In addition, the procedure undertaken follows Bollen (1989) who suggests eliminating non-significant formative indicators as they do not represent valid indicators of the respective construct. Composite scores of the three feelings associated with confusion (inefficiency, irritation, and helplessness; Garaus and Wagner 2016) qualify as reflective indicators that allow for the estimation of MIMIC models. In particular, each SEC dimension likely evokes the feelings inefficiency, irritation, and helplessness.

3.3.2 Empirical Validation

A second descriptive study (i.e., face-to-face interviews; data set II) provided fresh data for this confirmatory analysis. To capture many confusing shopping experiences from a great variety of consumers, a pre-defined quota (age, gender, and education) guided respondent selection. Thirty students in a graduate class on consumer behavior received course credit for interviewing at least twenty respondents each who had resided in the country under investigation for at least three months and regularly shopped for groceries. Students were allowed to interview family members, friends, or colleagues but had to strictly observe their individual quotas. Thus, the overall sample characteristics (age, gender, and education) match the quotas for the country under investigation. Respondents read the questionnaire on their own, but the interviewers could provide clarification in case of questions and ambiguities. The data collection lasted three weeks and resulted in 552 response records (after data cleaning, 53% females; age: 16% 15–24 years, 16% 25–34 years, 18% 35–44 years, 17% 45–54 years, 14% 55–64 years, 10% 65–74 years, 9% older than 75 years; education: 9% university, 14% high school, 16% vocational school, 32% apprenticeship, 29% compulsory school).

Individual interviews were structured as follows: (i) A cartoon stimulus exposed respondents to a confusing shopping situation. The cartoon showed a grocery store with many different and conflicting signs, various acoustic stimuli, a confusing store layout, and four obviously confused shoppers. (ii) The questionnaire started with an assessment of the quota criteria (age, gender, education). (iii) Afterwards, respondents evaluated the eight design factors according to their confusion potential in terms of variety, novelty, complexity, and conflict. Since these properties might have appeared to be too abstract, concrete examples for each environmental property based on the research of Garaus et al. (2015) were included. For instance, “no overview of where to find products” accounts for assessing aisle confusion for the environmental property complexity (see Table 5). (iv) Another series of items asked for consumers’ feelings (inefficiency, irritation, and helplessness) in a confusing shopping situation. (v) Finally, several items inquired about respondents’ avoidance behavior in a confusing shopping situation (see Appendix, Table 7).

Indicator Validation

The discussion in Sect. 3.1.1. pointed to a potential curvilinear relationship between confusion-induced arousal and (negative) affective evaluations. In the present context, this type of nonlinearity might apply to the relationship between the store environmental design factors and the evoked feelingsFootnote 3. In order to keep the investigations tractable, analysis is carried out at an aggregate level (composites over all environmental design factors, i.e., x; and over the three types of feelings, i.e., y). Low arousal situations (i.e., boredom) are not covered here; therefore, a piecewise linear relationship with only two regimes is postulated:

$$y_{t}=\left\{\begin{array}[]{ll}\alpha_{10}+\alpha_{11}{\cdot}x_{t}&\mathrm{for}\;x_{t}\leq x_{s}\\ \;\alpha_{20}+\alpha_{21}\cdot x_{t}&\mathrm{for}\,x_{t}> x_{s}\end{array}\right.\quad\mathrm{and}\quad\alpha_{10}+\;\alpha_{10}{\cdot}x_{s}=\alpha_{20}+\alpha_{21}\cdot x_{s}$$

The threshold xs might be either set to the midpoint of the scale (i.e., to 3) or assumed to be a continuous quantity and thus estimated econometrically. Table 2 presents the estimated regression parameters. The estimated threshold \(\left(\hat{x}_{s}=2.10\right)\) roughly corresponds to the vertex of the functional relationship between arousal and (negative) affective evaluations as outlined in Sect. 3.1.1; consequently, the corresponding slope parameter \(\left(\hat{\alpha }_{11}^{\left(1\right)}=- 0.38\right)\) is not significantly different from zero (for a type I error of 0.05). Data collection is based on a discrete number of response categories, which suggests setting xs a priori (to 3, the midpoint of the scale in the present case). Metaphorically, this threshold corresponds to the range of arousal in which negative evaluations start to decrease substantially, i.e., leaving the interval of optimal arousal in the arousal-negative evaluation relationship. This pattern is reflected by the slope parameter \(\left(\hat{\alpha }_{21}^{\left(2\right)}=0.58\right)\), which is significantly different from the slope parameter of the first regime \(\left(\hat{\alpha }_{11}^{\left(2\right)}=0.31\right)\). Clearly, these results confirm the postulated piecewise linear relationship. Furthermore, the findings reinforce the previous postulate of not including environmental properties variety and novelty because of their limited confusion potential (as determined for the first data set, with global means not significantly different from 3; cf. Table 1).

Table 2 Piecewise Linear Relationship between Aggregated SEC Dimensions and Aggregated Feelings of Retail Shopper Confusion (based on data set II)

MIMIC Models

Following the suggestion of Diamantopoulos and Siguaw (2006), measurement assessment of each formative construct started with a multicollinearity check. Table 3 presents the correlations between indicators and variance inflation factors (VIF). Consistently, all VIF statistics did not exceed the threshold of 3 (Hair et al. 2006). Accordingly, multicollinearity is not of concern for the present data.

Table 3 Correlation and VIF Values of Formative Indicators (based on data set II)

The analysis proceeded with the estimation of eight MIMIC models in Lisrel version 8.51 (Jöreskog and Sörbom 1997). Composite scores of the three feelings (inefficiency, irritation, and helplessness) were added as reflective indicators to each SEC dimension for model identification (see Fig. 2 for a MIMIC model of the environmental confusion dimension signage).

Fig. 2
figure 2

MIMIC Model of Signage Confusion

All MIMIC models show moderate or large effect sizes. With regard to the factor loadings, the indicator conflict of the SEC dimension shelving and storage and the indicator conflict of the SEC dimension architecture exhibit non-significant loadings. The architecture construct also shows minor overall fit values. Unsatisfactory fit values also apply to the SEC dimension technology (see Table 4). Therefore, these two dimensions were not considered further in the analysis. Satisfactory fit values of the dimension shelving and storage qualified this dimension for retention in further analysis (see Fig. 3). Hence, following the suggestion of Diamantopoulos and Winklhofer (2001), the non-significant indicator is omitted and only the indicator complexity constitutes shelve confusion. The remaining indicators exhibit significant loadings. In sum, six design factors (aisles, customer flow, shelving and storage, signage, space allocation, and visual merchandising) and 11 indicators constitute the SEC index (see Table 5).

Table 4 Fit Statistics of MIMIC Models (SEC Dimensions) (based on data set II)
Fig. 3
figure 3

Measurement Model of SEC

Table 5 SEC Indexa

3.4 Stage 4: Assessment of Nomological and Predictive Validity

3.4.1 Theoretical and Conceptual Considerations

The final step in the index development process assesses nomological validity, a process that specifies the relationships between the focal and related constructs and confirms the multidimensional nature of the focal construct (MacKenzie et al. 2011). In line with previous research (e.g., Garaus and Wagner 2016), SEC is thought to result in avoidance behavior (i.e., low unplanned expenditures, low store exploration, low revisit intentions, low store patronage intentions, and low spending time). Drawing on extant literature (Garaus and Wagner 2016; Garaus et al. 2015) this link between SEC and avoidance behavior is not direct, but rather indirect via the mediating state retail shopper confusion. This expectation results in including retail shopper confusion, operationalized through the three feeling states inefficiency, irritation, and helplessness, as a mediating construct in the structural equation model.

3.4.2 Empirical Validation

Nomological Validation

Nomological validity of the SEC construct also relied on dataset II used in stage 3. Sect. 3.3.2 already detailed that step (v) of the interview prompted respondents to imagine a confusing shopping situation and asked them to answer items that sought to operationalize avoidance behavior (see Appendix).

Structural equation modeling tested the nomological validity of the SEC index. In line with prior research (cf. Landis et al. 2000), the use of composite scores for the three feelings inefficiency, irritation, and helplessness that are associated with confusion (mediating construct) and the six SEC dimensions allowed us to estimate a parsimonious modelFootnote 4.

The structural model possesses an excellent fit, such that all fit statistics exceed the recommended threshold levels (χ2 / df = 2.72, RMSEA = 0.05, SRMR = 0.05, GFI = 0.95, NNFI = 0.93, CFI = 0.95). In addition, all exogenous constructs exhibited satisfactory power over their endogenous constructs. All R2 values are between 0.16 and 0.46. Consistent with these highly satisfactory fit statistics, all but two path coefficients were statistically significant for a type I error of 0.05. Specifically, the complexity and conflict of aisles (γ1 = 0.24, p < 0.01), customer flow (γ2 = 0.17, p < 0.01), signage (γ4 = 0.16, p < 0.01), and visual merchandising (γ6 = 0.27, p < 0.01) increased negative feelings associated with confusion, and this increase in turn resulted in avoidance behavior. Shelving and storage, and space allocation did not influence the mental state retail shopper confusion. Hence, for the current sample the confusion potential of these two SEC dimensions is low. Since both SEC dimensions exhibit satisfying fit statistics on a construct level (i.e., MIMIC models, see Table 4), they are retained in the final SEC index (see Table 5).

The estimates with respect to the hypothesized consequences of SEC mediated through negative feelings associated with confusion were all significant and in the proposed directions. That is, SEC decreases unplanned expenditures (β1 = −0.40, p < 0.01) and in-store search (β2 = −0.58, p < 0.01). The negative relationship between negative feelings associated with confusion and repeat purchase intention (β3 = −0.68, p < 0.01) was the strongest path in the model. In addition, negative feelings were negatively related to store patronage intentions (β4 = −0.53, p < 0.01) and time spent in the store (β5 = −0.52, p < 0.01) (see Table 6).

Table 6 Structural Model Estimates of the SEC Nomological Framework (based on data set II)

Checking for Common Method Bias

Reliance on the same scale formats and measuring variables at the same point in time might result in systematic response behavior or artifactual covariance (Podsakoff et al. 2003). Two approaches examined whether common method bias had indeed occurred. First, Harman’s one-factor test explored whether a single factor can account for all the variance in the data. The chi-square difference test between the structural equation model and a one-factor solution confirmed the superiority of the structural model (∆χ2(df=28) = 977.27, p < 0.01) and to some extent eliminates common method bias concerns. Second, a partial correlation procedure included a measure of a potential source of method variance as covariate in the analysis: A marker variable controlled for common method bias (Lindell and Whitney 2001; Podsakoff et al. 2003). In particular, one item asked respondents whether they prefer plastic or paper bags when shopping. Adjustment of the zero-order correlations among the constructs by partialling out this marker variable did not find any changes in the signs or the significance levels of the factor loadings. Thus, common method bias did not appear to affect the findings. Overall, the results demonstrate a high degree of predictive and nomological validity for the SEC construct.

4 Discussion, Implications, Limitations and Future Research

4.1 Discussion

Three objectives inspired the present study. The first objective delineates a conceptualization of SEC. Second, the research sought to develop a measurement instrument for assessing SEC. The third objective provides an assessment of the nomological and predictive validity of the construct SEC. Introducing this construct follows the call to explore the confusion potential of store environments. In particular, this study expanded Garaus et al.’s (2015) research by verifying the confusion potential of six store design elements (aisles, customer flow, shelving and storage, signage, space allocation, and visual merchandising) through environmental properties complexity and conflict. In doing so, this study seeks to move the literature beyond a one-facet classification of the store environment, to explain why and how shoppers perceive store environmental stimuli as confusing.

Following established index development guidelines, two large-scale quota-sample based data collections allowed us to construct this parsimonious index. The results deliver strong empirical support for the confusion potential of store environments. Furthermore, the results of the assessment of the nomological validity of the index provide insights into the relationships between SEC, feelings associated with confusion, and behavioral responses. In particular, data confirm that SEC evokes feelings of inefficiency, irritation, and helplessness that mediate the relationship between SEC and avoidance behavior. Therefore, the results confirm extant research that points to the negative consequences of confusing store environments (Garaus et al. 2015; Garaus and Wagner 2016). However, in contrast to these extant studies, we demonstrate the multidimensional nature of the SEC construct and relate store-induced confusion to negative feelings and avoidance behavior.

The present research is the first to explore the varying confusion potential of different design factors. The six design factors aisles, customer flow, shelving and storage, signage, space allocation and visual merchandising (see Table 5 and Fig. 3) bear a high confusion potential. Although all these six SEC dimensions exhibit satisfactory fit statistics when assessing their construct reliability, only four (aisles, customer flow, signage and visual merchandising) increase the negative feelings that constitute confusion in the nomological network in this particular sample. Aisle and visual merchandising bear a higher confusion potential as compared to signage and customer flow. Accordingly, a lack of overview where to find products (complexity) and too narrow or wide aisles, barriers, difficulties to access other aisles and a bad transition to other aisles (conflict) evoke negative feelings of confusion. Likewise, price labels that do not match products, and a barely understandable unstructured visual merchandising strategy (complexity) as well as conflicting price labels, wrong signage on shelves and wrong promotional signage (conflict) represent the environmental properties evoking confusion for the visual merchandising dimension. A complex signage content or lots of information on signage reflect the environmental property complexity for the signage dimension. The environmental property conflict is manifested in wrong signage, conflicting contents, multiple signage and similar signage. Finally, referring to the dimension customer flow, a labyrinthine structure, an unclear customer flow and a bad orientation are examples of the environmental property complexity. Examples for the environmental property conflict include the absence of customer paths to specific products and a conflicting or ambiguous customer flow.

The properties conflict and complexity are suitable to represent the SEC dimensions. Conflict (i.e., mismatch) leads to enhanced cognitive processing (Mattila and Wirtz 2001) and confusion (Beverland et al. 2006). In addition, complex design factors require greater cognitive processing than simple ones (Herrmann et al. 2013), inducing SEC. These findings further emphasize the need to consider SEC as a unique construct that differs considerably from product-induced confusion (in contrast, the properties variety and novelty are of particular relevance for products).

Finally, this article contributes to the extant literature by applying a new approach for testing for response bias likely caused by respondents’ fatigue. Study 1 analyzed response patterns using Mood’s (1940) asymptotic multinomial runs test. This test allows for assessment of whether response runs occur at random or follow regular patterns. Even though most studies endeavor to use parsimonious scales to avoid response fatigue, the development of an initial item pool in scale development procedures often requires comprehensive questionnaires. Research emphasizes the risk of response bias when using extensive measurement instruments (Weijters and Baumgartner 2012), especially because capturing the entire construct in the early stage of scale development procedures necessitates large initial item pools (Netemeyer et al. 2003). The present study offers an approach for testing for response fatigue, thereby creating the opportunity to eliminate participants who did not diligently respond. The elimination of biased responses further guarantees the development of a reliable and valid measurement instrument.

4.2 Implications

The findings of this research possess several practical implications. So far, the only opportunity to reduce confusion during shopping situations falls into the realm of manufacturing (e.g., by using less similar packaging). This research is the first that offers retailers the opportunity to reduce the confusion potential of store environments by employing a measurement instrument. Two large-scale studies identified that complex and conflicting aisles, customer flows, signage, and visual merchandising bear the highest confusion potential. Managers might apply this information to revise their aisles and customer flow plans to reduce confusion in their store. For example, complexity in aisles implies a lack of organization, and organization in turn can help customers find products or identify blind alleys. Conflict-ridden aisles might be too narrow or too wide, difficult to access, or may not be aisles at all (e.g., star layout). Complexity conveyed by customer flow likely stems from a labyrinthine structure, unclear or poor traffic patterns, or a confusing shop-in-shop design. An ambiguous, unclear, or unidentifiable flow pattern suggests the conflict element. Hence, retailers are encouraged to pay particular attention to these design factors. Moreover, store designers should consider information about the confusing environmental properties complexity and conflict.

However, the confusion potential likely differs among stores. On an individual store level, the SEC index developed here offers retailers the opportunity to identify confusion sources within a particular store. Assessing SEC proposes design factors that managers can easily manipulate to reduce the confusion potential of the entire store environment and create clear and non-confusing store environments. These insights are of high relevance for retailers.

4.3 Limitations

Despite the important implications of this study, the findings must be interpreted with several limitations in mind. There remains an ongoing discussion on the validity of MIMIC models (Diamantopoulos, 2013; Diamantopoulos and Temme 2013; Lee et al. 2013; Lee and Cadogan 2013; Cadogan et al. 2013). Many theoretical considerations guided our decision to rely on MIMIC models in this index development process. However, we agree that the applicability of MIMIC models in formative index construction offers room for discussion. Even though we feel that we overcame one major criticism of MIMIC models (namely variance explained by different sets of reflective indicators), we acknowledge that MIMIC models are associated with other problems and that further procedures for testing formative measurement models would enrich the extant literature.

Although much effort was placed into eliminating potential common method bias, both statistical approaches used in the present study cannot completely exclude the possibility of any common method variance. Additional data collection is required to offer further evidence on the robustness of the results of the nomological framework of SEC.

4.4 Further Research

These results cannot be generalized to different cultures without further research. Asian customers might perceive narrow aisles as less confusing because, for example, they have different acceptance levels of spatial distance. Hence, in Asia aisle confusion might not be as important as in European countries. Testing the whole SEC framework in various cultural contexts would represent a valuable avenue for future research.

Exploring different retail sectors concerning their inherent confusion potential might be worthwhile. The developed SEC index can be employed to identify industries that exhibit more pronounced confusion potential. Additionally, confusion causes might include the redesign of a whole store or the employment of new technologies. Extending these considerations, analysis of interactions between drivers of confusion offers further research options. In this context, it could be worth comparing SEC sources among different industries (e.g., grocery stores vs. clothing stores).

It is also reasonable to assume that SEC is not only of high relevance to retail industries but also to service settings. Hence, future studies might expand the SEC construct to a service context.

Another fruitful area of future research is the interplay between product-induced and store-induced confusion. Since retailers cannot reduce the confusion potential of products (i.e., similarity of products), it would be particularly interesting to investigate whether a pleasant and clear store design can reduce the overall confusion level during a shopping experience.