Introduction

Categories are central to virtually all cognitive processes. Much effort has been devoted to understanding how categories are represented and the particular training features that might influence how they are learned (e.g., Markman & Ross, 2003). Outside of the laboratory, however, learning a category representation is not typically an end in and of itself. Instead, the utility of category representations lies in their ability to support other functions (e.g., decision making in novel situations – Hoffman & Rehder, 2010; Markman & Ross, 2003) and the generalizability of category representations depends upon the nature of the representation itself (Carvalho & Goldstone, 2014; Ell, Smith, Peralta, & Helie, 2017; Hélie, Shamloo, & Ell, 2017; Hoffman & Rehder, 2010; Levering & Kurtz, 2015). Thus, it is important to understand the limits of different types of category representations and to identify training features that promote those representations that are most successful for generalization.

Category representations that focus on within-category similarities (e.g., prototypicality, covariation/range of stimulus dimensions within a category) have been argued to be more versatile in supporting generalization than representations that focus on between-category differences (e.g., learn what dimensions are relevant for classification, along with decision criteria or category boundaries) (Chin-Parker & Ross, 2002, 2004; Ell et al., 2017; Helie, Shamloo, & Ell, 2018; Hélie et al., 2017; Kattner, Cox, & Green, 2016; Yamauchi & Markman, 1998). For instance, within-category representations can support both generalization to novel stimuli and generalization to a novel task (Chin-Parker & Ross, 2002; Ell et al., 2017). Furthermore, within-category representations can be applied to novel categorization problems (Hélie et al., 2017; Kattner et al., 2016) and be reconfigured to form new category representations (Helie et al., 2018).

Although a number of methodological factors have been identified as being important for promoting within-category representations (e.g., blocked training – Carvalho & Goldstone, 2014; concept learning – Hélie et al., 2017; observational training – Levering & Kurtz, 2015; family-resemblance category structures – Markman & Ross, 2003), the emphasis of the present work is on the goal of the task (Goldstone, 1996; Hoffman & Rehder, 2010; Love, 2005; Markman & Ross, 2003; Minda & Ross, 2004; Yamauchi & Markman, 1998). The task goal of classifying a stimulus into one of a number of contrasting categories has been argued to lead to a between-category representation (Erickson & Kruschke, 1998; Hélie et al., 2017; Maddox & Ashby, 1993; Nosofsky, Palmeri, & McKinley, 1994; Smith & Minda, 2002), whereas the task goal of inferring a missing stimulus feature from a partial stimulus and a category label has been argued to lead to a within-category representation (Chin-Parker & Ross, 2002; Ell et al., 2017; Markman & Ross, 2003).

The importance of classification versus inference in promoting within-category representations has been argued to depend upon the category structure (Ell et al., 2017; Hélie et al., 2017). Information-integration category structures (in which information from multiple dimensions needs to be integrated prior to making a categorization response) generally promote within-category representations (Ashby & Waldron, 1999; Ell et al., 2017; Hélie et al., 2017; Thomas, 1998). In contrast, although rule-based category structures (in which logical rules are applied to the stimulus dimensions diagnostic of category membership) can promote within-category representations when learned by inference, rule-based structures may be incapable of promoting within-category representations when learned by classification (Ell et al., 2017)Footnote 1.

An inability to learn within-category representations, however, may not be a general feature of rule-based category structures. Ell et al. (2017) used a unidimensional, rule-based structure in which the stimuli varied along two continuous-valued dimensions, but only a single stimulus dimension was diagnostic of category membership. Thus, successful classification depended upon a single dimension, but the within-category representation (i.e., knowledge of the correlational structure of the categories) depended upon both stimulus dimensions. The inability to learn and generalize within-category representations when classifying a rule-based structure was interpreted as reflecting a limitation of the between-category representation (i.e., the logical rule used for classification). While this may be true when the classification rule depends upon a single stimulus dimension, it is also possible that within-category representations could be learned if the classification rule depended upon the same number of dimensions as the within-category representation.

The following experiments investigate this issue using a two-dimensional, rule-based category structure (i.e., exclusive-or; Fig. 1, bottom). In this category structure, successful classification (i.e., classifying stimuli as a member of category A or B) requires attention to both stimulus dimensions. Similarly, successful inference (i.e., inferring a missing stimulus feature when given one feature and the category label) also requires attention to both stimulus dimensions. Although the between-category representation (i.e., the logical rule: members of category A either have larger circles and steeper lines, or smaller circles and shallower lines, than members of category B) would convey some rudimentary information about the within-category correlations, it is not at all clear if this information would be sufficient to support generalization from classification to inference.

Fig. 1
figure 1

(Left) Example displays for the two training methodologies. (Right) Rule-based category structure used in Experiments 1 and 3. Category A (crosses) and B (circles) stimuli used during the training phase. The insets are example stimuli. The solid black boundaries represent the optimal conjunctive decision strategy. The dashed black boundaries represent an alternative decision strategy. Stimuli used during the test phase are plotted as filled red circles. Probe stimuli used during the final block of training are plotted as blue squares. See text for details (color figure is provided online)

Briefly, across three experiments, participants were trained on classification or inference and subsequently tested on inference. If it is not possible to learn within-category representations when classifying a rule-based structure, only participants trained by inference should evidence knowledge of the within-category correlations at test. In contrast, if attending to multiple stimulus dimensions during training is a critical factor promoting the learning of within-category representations, participants in both conditions should evidence knowledge of the within-category correlations at test. To foreshadow, the results support the latter hypothesis suggesting that within-category representations can be learned in a rule-based task and generalized to a novel task (i.e., from classification to inference).

Experiment 1

Method

Participants and design

One-hundred and nineteen participants were recruited from the University of Maine student community and received partial course credit for participation. Sample size (approximately 30 participants/condition) was estimated based upon a similar experiment in our lab (Ell et al., 2017). Data collection was continued beyond this target (until the end of the semester) in order to provide sufficient research opportunities for participants in an introductory psychology research pool. Participants were randomly assigned to one of two experimental conditions: classification or inference training. A total of nine participants were excluded from analyses: seven due to a software error and two participants did not complete the task within the hour-long experimental session, resulting in sample sizes of 55 in each condition. All participants reported normal (20/20) or corrected-to-normal vision.

Stimuli and apparatus

The stimuli comprised circles (varying continuously in diameter) and an attached line (varying continuously in orientation from horizontal) (Fig. 1, top). The category structures were created using a variation of the randomization technique (Ashby & Gott, 1988) in which the stimuli were generated by sampling from bivariate normal distributions defined in a diameter × angle (from horizontal) space in arbitrary units. The category means for the stimuli in each of the four quadrants of Fig. 1 (two per category) were μA1 = [650, 250], μA2 = [350, -50], μB1 = [350, 250], and μB2 = [650, -50]. The covariance matrices were \( {\Sigma}_{\mathrm{A}}=\left[\begin{array}{cc}3875& 3625\\ {}3625& 3875\end{array}\right] \) and \( {\Sigma}_{\mathrm{B}}=\left[\begin{array}{cc}3875& -3625\\ {}-3625& 3875\end{array}\right] \) (i.e., a correlation of 1 between diameter and angle for each quadrant assigned to category A and -1 for each quadrant assigned to category B).

On each trial a random sample (x, y) was drawn from category A or B and used to create a stimulus with a circle of \( \frac{\mathrm{x}}{2} \) pixels in diameter and a line \( \frac{180\mathrm{y}}{800} \) degrees (counterclockwise from horizontal) with a length of 200 pixels. The line was always connected to the highest point of the circle. For the training phase, 80 stimuli (40 from each category, 20 from each quadrant) were generated for each of the four blocks of trials (black symbols in Fig. 1). For the test phase, 56 stimuli (28 from each category, 14 from each quadrant) were used for the single test block (red circles in Fig. 1). The experiment was run using the Psychophysics toolbox (Brainard, 1997; Kleiner et al., 2007; Pelli, 1997) in the Matlab computing environment. Each stimulus was displayed on a 1,600 x 1,200 pixel resolution 20-in. LCD with a viewing distance of 20 in.

Participants were expected to use a conjunctive strategy in the classification task (e.g., the solid black decision boundaries plotted in Fig. 1: If the circle is large and high on orientation or if the circle is small and low on orientation, respond A; otherwise respond B), but given the large separation between stimuli in the four quadrants, other strategies could also result in high levels of accuracy. For instance, a strategy assuming participants integrate the stimulus values prior to any decision process would also predict high levels of performance during training (e.g., the linear classifier plotted as dashed black boundaries in Fig. 1; see Appendix for more details). To address this issue, probe stimuli (16 total) were included in the final block of classification training, resulting in a total of 96 trials during the final block (light blue squares in Fig. 1; Table 1). The example conjunctive and linear classifiers plotted in Fig. 1 would predict different categorization responses for a subset of the probe stimuli. For instance, for the two circled probe stimuli, the conjunctive classifier (solid) would predict a category A response whereas the linear classifier (dashed) would predict a category B response. In order to equate the similarity between conditions, probe stimuli were also included in the inference condition. The probe stimuli were not members of either category, thus there is no correct or incorrect response to these stimuli. Thus, no feedback was provided on probe trials and the probe stimuli were excluded from accuracy analyses. Probe trials were only used to estimate individual participant decision strategies in the classification condition.

Table 1 Probe stimuli coordinates (arbitrary units)

Procedure

Each participant was run individually. At the beginning of the training phase, participants were informed that stimuli would comprise a circle with a line connected at the top, and that the stimuli would be presented individually, but would vary across trials in circle diameter and line angle. In the classification condition, participants were instructed that their goal was to learn to distinguish between members of category A and B by trial and error. On each trial, participants were shown a stimulus and prompted with “Is this image a member of category ‘A’ or category ‘B?’” and instructed to select a category for each stimulus by pressing a button labeled ‘A’ or a button labeled ‘B’ on the keyboard to indicate which category was selected.

In the inference condition, participants were instructed that their goal was to learn to draw the missing stimulus component by trial and error (Fig. 1). On each trial, either a circle or a line was presented with the category label. Participants were instructed to draw the missing stimulus component. On half of the trials they were asked to “Draw the circle that goes with this line angle” and on the other half they were asked to “Draw the line that goes with this circle.” To draw the circle, participants used the mouse to indicate the location of the bottom of the circle (indicating the diameter of the circle relative to the dot at the beginning of the line). To draw the line, participants used the mouse to indicate the location of the end of the line (indicating the orientation of the line relative to horizontal). The circle or line was then drawn to match the participant’s selection with a line beginning at the dot at the top of the circle (at a constant length of 200 pixels). Subsequently, participants were able to fine-tune the circle diameter or the line angle using the arrow keys on the keyboard. Any selected stimulus values outside the allowable range (diameter 10–600 pixels, angle: 50–110°) were reset to the nearest allowable value.

Stimulus presentation was response terminated with an upper limit of 60 s. After responding, feedback was provided. In the classification condition, the screen was blanked and the word “CORRECT” (in green, accompanied by a 500-Hz tone) or “WRONG” (in red, accompanied by a 200-Hz tone) was displayed. In the inference condition, the correct circle or line was overlaid upon the participant’s response (in black). In all conditions, feedback duration was 2 s and the screen was then blanked for 1 s prior to the appearance of the next stimulus.

In addition to the trial-by-trial feedback, summary feedback was given at the end of each training block. For the classification condition, proportion correct for the block was shown (participants were informed that higher numbers are better) and for the inference condition the root-mean-square-error between the drawn and correct stimulus was shown (participants were informed that lower numbers are better). The presentation order of the stimuli was randomized within each block, separately for each participant. Participants completed several practice trials prior to beginning the training phase to familiarize themselves with the task using stimuli randomly sampled (with equal probability) from the training categories.

During the test phase, all participants performed the inference task (one block of 56 trials). Instruction was provided to all conditions and participants completed several practice trials using stimuli randomly sampled from the test phase stimuli (with equal probability). No feedback was provided during the test phase.

Results

Training phase: Performance on classification and inference

In the inference condition, the diameter-angle correlations within each quadrant were in the appropriate direction (i.e., positive for the upper right and lower left, negative for the lower right and upper left), thus the following analyses average across the quadrants.Footnote 2 Data were also averaged across quadrants in the classification condition, and for all subsequent analyses, unless otherwise noted.

The dependent measure was different for the classification (proportion correct) and the inference (correlation between the given and produced stimulus values) conditions, therefore the data from each condition were analyzed separately. Performance generally improved across blocks for both conditions (Fig. 2, Table 2). Consistent with this observation, separate paired-samples t-tests indicated significant increases from block 1 to block 4 in proportion correct for the classification condition: [t(54) = -4.491, p < .001, d = .72] and the diameter-angle correlation for the inference condition: [t(54) = -4.454, p < .001, d = .52].

Fig. 2
figure 2

Training performance in the classification (proportion correct) and inference (correlation between the given and produced stimulus values) conditions of Experiment 1

Table 2 Training and test phase performance in Experiment 1

Training phase: Classification decision strategy

Participants were expected to learn conjunctive strategies in the classification condition. In order to confirm this, a number of decision bound models (Ashby, 1992a; Maddox & Ashby, 1993) were fit to the individual participant data from the classification condition. Four different types of models were evaluated in order to assess an individual’s strategy during the final training block. Unidimensional models assume that the participant sets a single decision criterion on one stimulus dimension (e.g., if the circle is large, respond A; otherwise respond B). Conjunctive models assume separate decision criteria on both dimensions (e.g., If the circle is large and high on orientation or if the circle is small and low on orientation, respond A; otherwise respond B. Fig. 1). Information-integration models assume that the participant integrates the stimulus information from both dimensions prior to making a categorization decision (Fig. 1). Finally, random responder models assume that the participant guessed. Each model was fit separately to the final block of training (including the probe stimuli), for each participant, using a standard maximum likelihood procedure for parameter estimation (Ashby, 1992b; Wickens, 1982) and the Bayes information criterion for goodness-of-fit (Schwarz, 1978) (see Appendix for a more detailed description of the models and fitting procedure). Based upon our previous work (Ell et al., 2017), participants using information-integration strategies during classification training, but not unidimensional strategies or guessing, would be expected to promote within-category representations that could be used to support performance on the test phase inference task. If simply attending to both stimulus dimensions during classification training is sufficient to promote within-category representations, then participants using conjunctive strategies would also be expected to perform well during the test phase inference task. Consistent with expectations, the majority of participants learned a task-appropriate, conjunctive strategy (64%), with the remaining participants being best fit by either the unidimensional (9%) or random responder (27%) models.

Test phase

Initial inspection of the correlations during test phase suggests the learning of the correlational structure of the categories in both the inference and classification training conditions (Fig. 3). To analyze these data, one-sample t-tests (within each condition), an independent-samples t-test comparing the conditions, and the scaled JZS Bayes Factor, B01 (Jeffreys, 1961; Kass & Raftery, 1995; Rouder, Speckman, Sun, Morey, & Iverson, 2009) were computed. Consistent with the inspection of the Fig. 3 data, the correlation during test was significantly greater than zero in both the inference [t(50) = 6.22, p < .001, d = .87; B01 = 142785.4 to 1 in favor of the alternative hypothesis] and classification [t(53) = 5.88 p < .001, d = .80; B01 = 53171.69 to 1 in favor of the alternative hypothesis] conditions.Footnote 3 For the classification condition, this result was driven primarily by participants using a task-appropriate, conjunctive strategy (conjunctive: M = .28, SD = .21; unidimensional: M = -.10, SD = .11; random responder: M = .05, SD = .17). An independent-samples t-test comparing the two conditions, however, indicated superior test-phase performance in the inference condition [t(103) = 2.26, p = .03, d = .44; B01 = 1.96 to 1, weakly favoring the alternative hypothesis].

Fig. 3
figure 3

Performance on the inference task during the test phase (left). Note that the diameter-angle correlations from category B are multiplied by -1 prior to averaging with the diameter-angle correlations from category A, thus positive values suggest learning of the within-category correlations. Relationship between learning during training and test phase performance in the classification (middle, r = .57) and inference (right,r = .62) conditions. The grey areas represent 95% confidence intervals

If test-phase performance is driven by learning during the training phase, the amount of learning during the training phase should be predictive of test-phase performance. To assess this, in the classification condition, the Pearson correlation was computed between the change in accuracy across blocks (block 4 minus block 1) and the observed diameter-angle correlation during the test phase. In the inference condition, the Pearson correlation was computed between the change in the observed diameter-angle correlation (block 4 minus block 1) and the observed diameter-angle correlation during the test phase. There was a significant positive relationship between learning during training and test-phase performance in both the classification: [r(52) = .57, p <.001] and inference: [r(49) = .62, p<.001] conditions. The strength of this relationship, however, did not differ between the classification and inference conditions [Fisher’s z = 0.36, p = .72]. In sum, these data suggest learning of the within-category representations for both the classification and inference conditions, with a possible advantage for participants in the inference condition.

Summary

The goal of Experiment 1 was to determine if classification of a two-dimensional, rule-based category structure was sufficient to support the learning of within-category representations or if an inability to learn within-category representations is a more general feature of rule-based structures (Ell et al., 2017). Consistent with the former, participants demonstrated knowledge of the within-category correlations at test in both the inference and classification conditions, although test phase performance in the inference condition was superior. That being said, training phase performance was positively associated with test phase performance to a similar extent in both conditions. In sum, these data suggest that learning to classify a rule-based structure that requires attention to multiple stimulus dimensions is sufficient to support the learning of within-category representations that can be generalized to a novel task (i.e., from classification to inference).

Experiment 2

The results of Experiment 1 suggest that inference training may be superior to classification in promoting the learning of within-category representations, but there was evidence that within-category representations were learned in the classification condition as well. This latter result may be a consequence of the need to attend to both stimulus dimensions for successful performance during training, but there is another possible explanation. The Experiment 1 analysis computed the observed diameter-angle correlation within each quadrant of the stimulus space, and then averaged these results across the two quadrants assigned to each category, in order to estimate the within-category representations. There was, however, a congruence between the local diameter-angle correlation within each quadrant and the global correlation within each category (e.g., positive within the two quadrants assigned to category A and positive within category A, across the stimulus space). Thus, another possibility is that this local-global congruence facilitated learning of the within-category representations. Experiment 2 investigates this question using a category structure in which the local diameter-angle correlation within each quadrant is incongruent with the global within-category correlation (e.g., negative within the two quadrants assigned to category A and generally positive within category A, across the stimulus space – see Fig. 4). If learning within-category representations in the classification condition is dependent solely upon a need to attend to both stimulus dimensions, participants should still evidence knowledge of the within-category correlations at test. If, instead, the local-global congruence is critical, it should be difficult for participants to learn the within-category correlations. It is expected that participants in the inference condition will still be able to learn the within-category correlations with the Fig. 4 structure, but it is possible that inference too would be sensitive to a local-global incongruence.

Fig. 4
figure 4

Conjunctive category structure used in Experiment 2. Category A (crosses) and B (circles) stimuli used during the training phase. Stimuli used during the test phase are plotted as filled red circles. Probe stimuli used during the final block of training are plotted as blue squares (color figure available online)

Method

Participants and design

Seventy-four participants were recruited from the University of Maine student community and received partial course credit for participation. Participants were randomly assigned to one of two experimental conditions: classification or inference training. Two participants were excluded from analyses due to software error, resulting in sample sizes of 34 (classification) and 38 (inference). All participants reported normal (20/20) or corrected-to-normal vision.

Stimuli, apparatus, and procedure

The stimuli and procedure were identical to Experiment 1 with one exception. The stimuli within each quadrant of the stimulus space were rotated 45° (about the quadrant mean) in order to reduce the congruence between the diameter-angle correlation within each quadrant and diameter-angle correlation within each category (Fig. 4).

Results

Training phase

Only participants in the classification training condition showed learning of the category structures as accuracy was higher in block 4 than in block 1: [t(33) = -4.36, p < .001, d = .76] (Fig. 5, Table 3). There was no significant increase in the correlation learned from block 1 to block 4 in the inference condition: [t(37) = -0.28, p > .78, d = .06].

Fig. 5
figure 5

Training performance in the classification and inference conditions of Experiment 2

Table 3 Training and test phase performance in Experiment 2

The decision-bound models described in Experiment 1 were fit to the final training block in the classification training condition. A majority of the participants in the classification condition learned a task-appropriate, conjunctive strategy (53%) with the remaining participants being best fit by the unidimensional (10%), information-integration (3%), or random responder (33%) models.

Test phase

Inspection of the test phase data revealed that participants often mis-estimated the direction of the diameter-angle correlation across quadrants of the stimulus space in both conditions (Table 4). Due to this issue, the correlations were not averaged across quadrants. Instead, diameter-angle correlations were evaluated within each quadrant against the critical value M = +/-.11 [estimated using α = .05 two-tailed, t(33) = 2.04 and an average SD = .31]. The diameter-angle correlations in the lower left quadrant (classification) and the upper left quadrant (inference) were significantly different from 0 and in the opposite direction of the actual correlation. The correlations in the remaining quadrants were not significantly different from 0. In sum, there was no evidence that participants were able to learn the within-category correlations with the Fig. 4 category structures.

Table 4 Within-category correlations by quadrant

Summary

The goal of Experiment 2 was to investigate if the learning of within-category representations while classifying was dependent upon a congruence between the local, diameter-angle correlations within each quadrant and the global diameter-angle correlations within each category. The results suggest that this was the case. Although participants learned to classify the Fig. 4 structure, there was no evidence that within-category representations were learned at test. Unexpectedly, this was also true in the inference condition. The results of Experiment 1, along with previous work from our lab (Ell et al., 2017), suggested that inference training facilitated the learning of within-category representations regardless of the category structure. The results of Experiment 2 suggest that even for inference training, there is a limit to the learning of within-category representations. In sum, the learning of within-category representations, with the rule-based structures investigated here, is dependent upon a local-global congruence regardless of the task goal.

Experiment 3

The results of Experiments 1 and 2 suggest that inference training more strongly promotes the learning of within-category representations, at least when there is a congruence between local and global regions of the stimulus space. This advantage may be driven by a practice effect given that participants in the inference condition performed the same task during the training and test phases whereas participants in the classification condition performed different tasks during the training and test phases. Experiment 3 addresses this issue using a two-alternative, forced-choice version of inference training that more closely matches classification training and enables the investigation of generalization to a novel task in the inference condition (i.e., from forced-choice to a production task). In addition, the forced-choice procedure in Experiment 3 is more similar to inference training procedures used in previous work (e.g., Yamauchi & Markman, 1998). The vast majority of previous work with the forced-choice procedure, however, has used discrete-valued dimensions with a small number of stimuli. Experiment 3 extends this work to a category structure to continuous-valued dimensions with a large number of stimuli.

Method

Participants and design

Seventy-one participants were recruited from the University of Maine student community and received partial course credit for participation. Participants were randomly assigned to one of two experimental conditions: classification or inference training. One participant was excluded from analyses due to software error. The resulting sample sizes by condition were classification: 37; inference 33. All participants reported normal (20/20) or corrected-to-normal vision.

Stimuli and apparatus

The stimuli were identical to Experiment 1 with the exception of the inference condition. Two response alternatives were presented 325 pixels below the stimulus, one offset 325 pixels left of center and the offset 325 pixels right of center (Fig. 6). One of the response alternatives was correct. The incorrect alternative was generated by selecting the corresponding value for the missing dimension from the contrasting category. The location of correct/incorrect alternatives were counterbalanced.

Fig. 6
figure 6

Example display for the inference training methodology (color figure available online)

Procedure

The procedure was identical to Experiment 1 with the exception that during training, participants in the inference condition were asked to choose from one of the two response alternatives rather than drawing the missing stimulus dimension. In addition, trial-by-trial feedback in the inference condition was presented in the same way as in the classification condition. The test phase was identical to Experiment 1 (i.e., all participants were instructed to draw the missing stimulus dimension and no feedback was provided).

Results

Training phase

Learning was evident in both conditions (Fig. 7, Table 5). Although the dependent measure (proportion correct) was now the same across conditions, training performance was analyzed separately for the two conditions to maintain consistency with the analyses in the previous experiments. Separate paired-samples t-tests indicated significant increases from block 1 to block 4 in proportion correct for the classification condition: [t(36) = -4.163, p < .001, d = .87] and for the inference condition: [t(32) = -2.920, p = .006, d = .56].

Fig. 7
figure 7

Training performance in the Classification and (forced-choice) Inference conditions of Experiment 3

Table 5 Training and test phase performance in Experiment 3

The decision-bound models described in Experiment 1 were fit to the final training block in the classification training condition. Consistent with expectations, the majority of participants learned a task-appropriate, conjunctive strategy (57%) with the remaining participants being best fit by the unidimensional (5%), information-integration (5%), or random responder (33%) models.

Test phase

Participants in the classification condition evidenced knowledge of the within-category correlations [t(36) = 3.53, p = .001, d = .58; B01 = 1/27.44, favoring the alternative hypothesis], whereas participants in the inference condition performed marginally better than chance [t(32) = 1.99 p = .06, d = .35; B01 = 1.07, equivocal support for the null and alternative hypotheses]. The two conditions, however, were not significantly different from each other [t(68) = .53, p = .60, d = .13; B01 = 3.6 in favor of the null hypothesis] (Fig. 8).Footnote 4 For the classification condition, this result was driven primarily by participants using a task-appropriate, conjunctive strategy (conjunctive: M = .23, SD = .24; unidimensional: M = .05, SD = .14; random responder: M = .003, SD = .16).

Fig. 8
figure 8

Performance on the inference task during the test phase (left). Note that positive values suggest learning of the within-category correlations. Relationship between learning during training and test phase performance in the classification (r = .47) (middle) and inference (r = .67) conditions. Note that the diameter-angle correlations from category B are multiplied by -1 prior to averaging with the diameter-angle correlations from category A. The grey bars represent 95% confidence intervals

In both conditions, however, greater learning during the training phase was associated with higher performance during the test phase [classification: r(36) = .47, p = .003; inference: r(32) = .67, p < .001]. A re-analysis of the data from the inference condition excluding five potential multivariate outliers [robust Mahalanobis squared distances were calculated and values that exceeded \( x{(1)}_{critical}^2=5.02,\alpha =.025 \)were considered outliers] indicated an association identical in magnitude to that of the classification condition [r(25) = .47, p = .01].

Summary

The primary goal of Experiment 3 was to investigate the extent to which a two-choice version of inference training would support the learning of within-category representations. Participants in the classification condition were able to learn the test-phase correlations. Participants in the two-choice inference condition, however, performed only marginally better than chance. That being said, test phase performance was not significantly different in the classification and inference conditions. Similar to Experiment 1, there was a positive correlation between training and test phase performance which did not differ by condition. Taken together, these results suggest that by eliminating a potential practice effect by introducing a two-choice version of the inference task, participants in both conditions learned the within-category correlations equally well.

General discussion

Previous research suggests that the between-category representations (i.e., logical rules) thought to support the learning of rule-based tasks do not also support the learning of within-category representations (Ell et al., 2017). This work, however, focused on a rule-based structure for which learning required attention to a subset of the stimulus dimensions that were critical for the within-category representation. The present work investigated the extent to which an inability to learn within-category representations is a general limitation of rule-based structures or a more specific limitation resulting from a mismatch between the information necessary for learning between- and within-category representations. The results of Experiment 1 were consistent with the latter hypothesis. More specifically, participants were able to learn to classify a two-dimensional, rule-based structure and this knowledge was able to support the learning of within-category representations that could be generalized to a novel task (i.e., inference). This result was dependent upon a congruence between local and global features of the category structure (Experiment 2). Although participants who learned by inference in Experiment 1 demonstrated stronger knowledge of the within-category representations at test, this advantage seems to have reflected a practice effect (Experiment 3). In sum, these results suggest that a task goal thought to promote the development of between-category representations (i.e., classification) can promote the development of within-category representations, but such learning is sensitive to characteristics of the category structure.

Learning and generalization of within-category representations

Consistent with previous work (Anderson & Fincham, 1996; Ell et al., 2017; Thomas, 1998), within-category correlations could be learned during categorization. These representations could also be generalized across tasks with knowledge of the within-category correlations learned during classification training being able to support inference at test. Both learning and generalization, however, depended upon a congruency between the local, diameter-angle correlations within each quadrant and the global diameter-angle correlations within each category (Experiment 2). Disrupting this congruency seems to have impaired the ability to learn within-category representations while sparing category learning, suggesting a different type of category representation may have supported the learning of the Experiment 2 categories. Although we do not have a direct measure of the category representation learned in Experiment 2, the model-based analyses suggest that nearly half of the participants learned a between-category representation (i.e., logical rules) and previous work suggests that rule-based strategies are used with other exclusive-or category structures (Kurtz, Levering, Stanton, Romero, & Morris, 2013; Nosofsky et al., 1994). Nevertheless, we cannot rule out the possibility that participants learned a different type of within-category representation (e.g., exemplars, prototypes, within-category range).

A related, and important, question is how exactly are within-category representations learned from classification (Experiments 1 and 3)? If most participants are learning between-category representations during classification of the Fig. 1 category structure, as suggested by the model-based analyses, are these between-category representations facilitating the development of within-category representations? The optimal conjunctive rule (i.e., members of category A either have larger circles and steeper lines, or smaller circles and shallower lines, than members of category B.) conveys some basic information about the within-category correlations. It may be the case that this information was sufficient to support generalization from classification to inference. The results of Experiment 2 suggest, however, suggest that this is unlikely. In Experiment 2, the optimal rule was the same, but participants only learned within-category correlations consistent with this rule in one quadrant of the stimulus space. That being said, our method does not allow for distinguishing between participants that are good at using this rule to perform inference versus participants that have a richer knowledge of the within-category correlations, thus more work is needed to address this possibility.

Alternatively, perhaps there is a learning system operating that is acquiring within-category representations that could be used to support both classification and inference. For instance, the DIVA (Kurtz, 2007) and SUSTAIN (Love, Medin, & Gureckis, 2004) models of category learning, other kinds of models that learn multiple category prototypes (e.g., Ashby & Waldron, 1999), or hybrid models that combine exemplar and prototype processes (Minda & Smith, 2001; Smith & Minda, 1998) would, in principle, be able to estimate within-category correlations. Indeed, SUSTAIN has been successful in accounting for different patterns of performance across linearly separable and nonlinearly separable category structures in inference versus classification (Love et al., 2004). Given that within-category representations can be used to mimic rule-like behavior (e.g., Hélie, Ell, Filoteo, & Maddox, 2015), this would provide a possible means by which within-category representations could support a wide range of observable behavior.

Boundary conditions on the learning of within-category representations

The aim of Experiment 2 was to determine if disrupting the congruence between the local, diameter-angle correlations within each quadrant and the global diameter-angle correlations within each category would impair the learning of within-category representations with classification training. Participants were able to learn during classification training, albeit at lower levels of accuracy than in Experiment 1, where there was local-global congruence. Unlike Experiment 1, however, this learning did not promote the knowledge of within-category representations that could be used to support inference during the test phase. Surprisingly, the local-global incongruence also impaired the learning of within-category representations with inference training, suggesting the learning of within-category representations may be generally sensitive to characteristics of the category structure. That being said, we cannot rule out the possibility that our approach to introducing incongruence altered some other factor that may be critical for learning within-category representations. For instance, although there is minimal overlap between the categories, the overlap occurs in different parts of the stimulus space in Experiments 1 (center of the stimulus space) and 2 (edge of the stimulus space), but it is not clear why this would make it impossible to learn the within-category correlations with inference training while preserving learning during classification training.

The results of Experiment 1 suggest that inference may be superior to classification for promoting the learning of within-category representations. In the inference condition, the training and test phases were identical with the exception of the removal of feedback during the test phase. Thus, the test phase advantage in the inference condition may reflect a practice effect. The goal of Experiment 3 was to address this issue using a two-alternative, forced-choice version of inference training that more closely matches classification training and is more similar to inference training procedures used in previous work (e.g., Yamauchi & Markman, 1998). Test phase performance in the inference and classification conditions did not significantly differ in Experiment 3 suggesting that the Experiment 1 inference advantage may reflect a practice effect. In comparison to the Experiment 1 inference task, the Experiment 3 inference task discretized the response and feedback. Although these methodological changes increased the similarity between the classification and inference conditions, it is possible that they contributed to the relatively weak test phase performance in the inference condition of Experiment 3. This is somewhat surprising given the success of two-alternative, forced choice inference tasks (Markman & Ross, 2003). The vast majority of previous work, however, has used discrete-valued dimensions with a small number of stimuli. It is possible that forced-choice inference works well in promoting within-category representations having discrete-valued dimensions with a small number of stimuli, but is not well suited to a category structure having continuous-valued dimensions with a large number of stimuli.

A common theme in the research on the kinds of category representations learned during training is that participants learn what is necessary to perform the task at hand (Ell et al., 2017; Hélie et al., 2017; Love, 2005; Markman & Ross, 2003; Pothos & Chater, 2002; Yamauchi & Markman, 1998). For example, with the rule-based structure used by Ell et al. (2017), successful performance during classification training did not depend upon learning the relationship between diameter and angle. Instead, participants needed only to attend selectively to a single, diagnostic stimulus dimension in order to achieve perfect classification performance. The present results suggest that selective attention to a single stimulus dimension may hinder the ability to learn the two-dimensional, within-category correlations. Attention to multiple stimulus dimensions would seem to be a necessary, but not sufficient to promote the learning of this kind of within-category representation.

That being said, it is possible to learn and generalize other types of within-category representations when learning to categorize based upon a single stimulus dimension. A seemingly minor tweak of the typical classification instructions (i.e., concept training – participants learn categories by classifying stimuli as a member/nonmember of a target category; Maddox, Bohil, & Ing, 2004a; Posner & Keele, 1968; Reber, 1998; Smith & Minda, 2002; Zeithamova, Maddox, & Schnyer, 2008) shifts the emphasis from between-category differences to within-category similarities (Casale & Ashby, 2008; Hélie et al., 2017). In Hélie et al. (2017), participants learned two rule-based category structures (simultaneously) along a single diagnostic stimulus dimension (category A vs. category B and category C vs. category D). Participants were subsequently tested on a novel categorization problem using the same categories (i.e., category B vs. category C). Participants were successfully able to generalize the knowledge when receiving concept training, but not when receiving traditional classification training, suggesting that concept training promoted a representation based on the categories themselves rather than between-category differences (see also Hoffman & Rehder, 2010; Kattner et al., 2016). Thus, it may be the case that concept training promotes a minimal within-category representation that is sufficient to support classification on a novel rule-based categorization problem (e.g., the range of values on the stimulus dimensions), but not so rich so as to include knowledge that was not required during training (e.g., the correlational structure of the categories).

Conclusions

In sum, taken together with previous work, the current results suggest that the demands of learning may be the most critical factor in promoting within-category representations. If the task requires participants to learn about the relationship between dimensions, they can learn within-category representations. Such demands can be imposed by the nature of the category structure (e.g., the exclusive-or structure used here, the information-integration structure used by Ell et al., 2017) or by the goal of the task (e.g., inference with unidimensional rule-based structures). These data also suggest important boundary conditions on the learning of within-category representations. For instance, even when learning about the relationship between stimulus dimensions, incongruency between local and global regions of the stimulus space can disrupt the learning of within-category representations. Knowledge of this limitation may be an important factor to consider when developing training regimens to promote the knowledge of within-category representations. These results complement the growing body of work highlighting the impact of category structure and task goal on category representations (Carvalho & Goldstone, 2015; Hammer, Diesendruck, Weinshall, & Hochstein, 2009; Levering & Kurtz, 2015). These results also build upon previous work by investigating the relationship between these factors and the generalization of categorical knowledge (Carvalho & Goldstone, 2014; Chin-Parker & Ross, 2002; Hoffman & Rehder, 2010), thereby providing a window into the cognitive utility of category representations in novel situations.