The main objective of the experiments reported here was to examine how being a member of a group in which one member is linked to a positive or negative social label (“good” or “bad”) has consequences for the allocation of resources. Acquiring such a relatively simple social label might be compared to a person having a certain reputation, something that has consequences for how others behave towards you in terms of social and economic payoffs (e.g., Bolton, Katok, & Ockenfels, 2004; Cadsby, Servátka, & Song, 2010; Dale, Morgan, & Rosenthal, 2002; Schmidt, Shupp, Walker, & Ostrom, 2003; Servátka, 2010). In the study by Servátka, for example, the dictator game approach was used in which the “dictator” can give money to the player with whom they are paired (the recipient; see also Guala & Mittone, 2010; Rousu & Baublitz, 2011; Tammi, 2013). The experiment consisted of two conditions, one in which participants were paired with a person who had a particular reputation, and another in which participants were paired with a stranger. It was found that, on average, dictators allocated more money to recipients who had a reputation for being generous than to recipients with no reputation (strangers), a result that clearly stresses the effects that a person’s reputation may have on how others behave towards you, treat you, or, in this case, reward you.

In recent years, research in the area of stimulus equivalence has provided experimental protocols for establishing relations between arbitrary stimuli such that a group of related stimuli emerges. A standard procedure used is “matching-to-sample” (MTS). For example, a “sample” stimulus (e.g., YIM; coded as A1 by the experimenter) might be presented at the top of a computer screen and the participant has to select between two “comparison” stimuli placed underneath (e.g., VEK (coded as B1) and FAP (coded as B2)). Counterbalanced across trials is another arrangement using the same comparison stimuli but a different sample stimulus (e.g., GIX (coded as A2)). The computer is programmed to reinforce a participant’s response of selecting A1–B1 and A2–B2 relations. Following mastery of this training, two other sets of relations are trained (e.g., B1–C1 and B2–C2) in the same way. One measure of the relatedness of these stimuli is to use a procedure that tests for C1–A1 relations and C2–A2 relations using the matching-to-sample procedure. Called an “equivalence” test, it shows that in general (but not always), and in some contexts, participants treat these stimuli as equivalent, even though these stimuli were never directly paired during training. Thus, at the end of training the result is that two sets of stimulus classes emerge, A1B1C1 and A2B2C2. These stimulus classes are called “equivalence classes” because the stimuli can be shown to be substitutable or equivalent for each other.

The procedures used for generating equivalence classes have been shown to be relevant to social psychological research in the areas of social attitudes, social categorization and stereotyping (Leslie et al., 1993; McGlinchey, Keenan, & Dillenburger, 2000; Moxon, Keenan, & Hine, 1993). Social attitudes are the evaluations that people make about socially significant objects, events, symbols, groups of people, or individuals, usually in either a positive or negative way (Hewstone, Stroebe, & Jonas, 2012). The concept of an attitude is a hypothetical construct and the three-component model of attitude structure typically comprises an emotional, cognitive, and behavioral component (Hogg & Vaughan, 2005) and can involve either explicit (conscious) or implicit (unconscious) associations that are said to influence decisions and behavior. The development of measuring systems for addressing and investigating these issues is the mainstay of much social psychological research (Ajzen & Fishbein, 2005; Kruglanski & Stroebe, 2012). The recent interest in relational responding in the field of behavior analysis has sparked the development of techniques for measuring implicit associations (Grey & Barnes, 1996; Hughes, Barnes-Holmes, & De Houwer, 2011; O’Reilly, Roche, Ruiz, Tyndall, & Gavin, 2012; Roche, Barnes-Holmes, Barnes-Holmes, Stewart, & O’Hora, 2002; Schauss, Chase, & Hawkins, 1997).

Common to all these techniques is the focus on generating networks of relations between stimuli, often arbitrary, instead of retrospectively speculating about the preceding development of networks of certain stimuli in an individual’s social history. One of the first studies to adopt this approach was an experiment by Watt, Keenan, Barnes, and Cairns (1991). The goal of this study was to generate relations between stimuli as a basis for examining how previous social learning might interfere with what might have been expected if socially relevant stimuli had not been used. Using a simple MTS procedure, relations were established first between Catholic names and nonsense syllables (A–B relations) and then between the same nonsense syllables and Protestant symbols (B–C relations). Because the experiment involved participants from Northern Ireland who had a strong tendency to categorize people and events as either Catholic or Protestant (Cairns, 1984), it was thought that it might be possible to disrupt the responding to C–A relations that would ordinarily appear in such an experiment when socially neutral stimuli are used. This expectation was confirmed and was bolstered by the finding that English participants who were unfamiliar with the Protestant–Catholic stimuli responded to the C–A relations as would normally be expected from participants who lack such social learning history. Variations of this simple experimental paradigm have produced similar results in a wide variety of contexts (Barnes, Lawlor, Smeets, & Roche, 1996; Dixon, Rehfeldt, Zlomke, & Robinson, 2006; Merwin & Wilson, 2005; O'Reilly et al., 2012; Plaud, 1995; Roche & Barnes, 1996; Roche, O’Reilly, Gavin, Ruiz, & Arancibia, 2012; Roche, Ruiz, O’Riordan, & Hand, 2005).

The study of equivalence responding has been extended by additional procedures wherein a specific function/behavior is trained at one of the stimuli in an equivalence class. Once one stimulus in a class has acquired discriminative properties for a behavior, it likely that all stimuli in that class acquire similar properties and the class is called a “functional equivalence class.” This phenomenon is called transfer (or transformation; see Dymond & Rehfeldt, 2000) of function. Tonneau (2001) argued that the theoretical strength of equivalence-based analysis for complex human behavior hinges on transfer of function (see also Greenway, Dougher, & Wulfert, 1996). A wide range of operant and respondent behaviors, such as transfer of the rate of responding (e.g., Barnes & Keenan, 1993), transfer of respondent eliciting and extinction functions (e.g., Dougher, Augustson, Markham, Greenway, & Wulfert, 1994), consequential functions (Hayes, Kohlenberg, & Hayes, 1991), and motor functions such as clapping and waving (e.g., Barnes, Browne, Smeets, & Roche, 1995; Bones et al., 2001). Also, transfer of function has been demonstrated using derived relations other than equivalence, such as sameness, opposition, and difference (Dymond & Barnes, 1996; Whelan & Barnes-Holmes, 2004), and more than and less than (O’Hora, Roche, Barnes-Holmes, & Smeets, 2002; Whelan, Barnes-Holmes, & Dymond, 2006).

The present study extended this line of research in transfer of function by examining it in a social context. We opted for a relatively simple procedure by using a descriptive statement to establish social functions within an equivalence class. McGuigan and Keenan (2002) showed that instructions could be used to generate a transfer of function effect with a simple motor response. The general goal in the current study was to establish an equivalence class, load a social function via a descriptive statement about one of the stimuli, and assess the effects of that function by getting participants to allocate tokens to each member of the stimulus classes. In effect, this procedure is tantamount to a role-play in which one member of a group (i.e., a stimulus class) is given a “reputation” and an assessment is made of how the subsequent distribution of available resources is influenced by group membership of a stimulus class.

Experiment 1

Method

Participants

Participants were five students (three males and two females, 21–23 years of age) from Ulster University in Northern Ireland. They were enrolled via opportunity sampling and were given little information initially as to the purpose of the study other than it being a “study of learning.” Participants could leave the study at any time.

Materials and Apparatus

The experiment was conducted in an experimental room in the psychology lab at Ulster University in Northern Ireland. In the room there were two tables, one of which had a chair, and a laptop for training equivalence classes. On the second table, were six stimuli ((i.e., A1 (ZID), B1 (YIM), C1 (FAP), A2 (VEK), B2 (RIX)), and C2 (KUD) printed on flash cards and distributed randomly, face up. A bowl containing 25 tokens was on top of the desk.

General Procedure

There were seven phases in total during the experiment (Figure 1, top panel) and participants were trained and tested individually. Phases 1, 2, and 3 were used to train and test the two three-member equivalence classes at one table. Phase 4 was used to add a social function to stimulus B1, followed by Phase 5 where token distribution was studied. Finally, Phase 6 was used to reverse the social function and Phase 7 involved a replication of Phase 5.

Fig. 1
figure 1

A schematic diagram of the sequence of phases used in each experiment; Experiment 1 top panel, Experiment 2 bottom panel. Solid lines between stimuli in Phases 1–2 indicate directly trained relations whereas dashed lines in Phases 3–9 indicate emergent relations; the desk in the middle of each panel illustrates where and how (face up) stimuli (printed on flash cards) were presented in the presence of a bowl containing tokens (figure made with Smith Micro Software “Poser” and Adobe Photoshop)

Phase 1: A–B training

Using continuous reinforcement in a conditional discrimination procedure, relations were established between stimuli A1B1 (ZIDYIM) and A2B2 (VEKRIX) in blocks of 12 trials. First block of 12 trials involved A1 as a sample stimulus, and after reaching mastery criterion, the next block of 12 trials involved A2 as a sample stimulus. Stimulus A1 was presented at the top of the screen with comparison stimuli B1 and B2 positioned underneath. Across all trials, the positions of the comparison stimuli B1 and B2 were counterbalanced semi-randomly as to eliminate any position bias in responses. In the presence of sample stimulus A1, if the participant chose comparison stimulus B1 then the word “Correct” appeared onscreen. If the participant picked comparison stimulus B2 in the presence of A1 then the word “Incorrect” appeared onscreen. If A1–B1 training was less than 92% accurate (i.e., more than one error in 12 trials), the program repeated the block of 12 trials. If the 92% criterion was not met after three blocks of trials the experiment was terminated. Once relations between stimuli A1 and B1 had been established to 92% accuracy, A2–B2 relations were trained. The sample stimulus A2 appeared onscreen with B1 and B2 as comparison stimuli presented as before; there were 12 trials. This time, selection of B2 produced the word “Correct” whereas selection of B1 produced the word “Incorrect.” If A2–B2 training was less than 92% accurate, the program repeated the block of 12 trials. If the 92% criterion was not met after three blocks of trials, the experiment was terminated. Once relations between stimuli A2 and B2 had been established to 92% accuracy, the procedure progressed automatically to Phase 2.

Phase 2: A–C training

The conditional discrimination training here was the same as that used in Phase 1, with the same numbers of trials and criteria for progression, only this time the relations trained initially were between A1–C1 (ZID-FAP) and then between A2–C2 (VEKKUD). Across all trials, the positions of the comparison stimuli C1 and C2 were counterbalanced semi-randomly in order to eliminate any position bias in responses. As in Phase 1, a criterion of 92% of correct responses was used to determine when to move from A1–C1 training to A2–C2 training. When the criterion of 92% correct for A2–C2 responding was met, Phase 3 began automatically.

Phase 3: Testing for B–A, C–A, B–C, and C–B relations

Phase 3 tested for symmetrical relations (i.e., between B–A and C–A) and equivalence relations (i.e., between BC and CB). Each of the eight relations (B1–A1, B2–A2, C1–A1, C2–A2, B1–C1, B2–C2, C1–B1, C2–B2) was tested a maximum of 10 times in a random order. Thus B–A relations were tested with either B1 or B2 as sample and with A1 and A2 as comparisons. There was no feedback during testing of all relations. A criterion of ≥ 90% correct responses had to be met before participants could move onto the next phase of the experiment. If they did not meet this criterion at the end of the 80 trials in this phase then they could choose to return to Phase 1 or they could finish the experiment.

Phase 4: Addition of Social Function

In this part of the experiment a social function was added to one of the stimuli, B1. The experimenter simply looked at the participant and calmly said “YIM is a good person” (GOOD function). This statement was made only once, by the same experimenter on each occasion, and there were no standardized procedures for which clothes the experimenter was to use; this was the case on each occasion in which a social function was added.

Phase 5: Instruction and Distribution of Tokens

Participants were told that there was a further step in the experiment and brought over to the second table; there was no chair at this table. Participants were read the following instructions: “Here we have six stimuli. I would like you to distribute these resources (Experimenter pointed to box containing the tokens) by placing them on top of stimuli of your choosing. I will turn my back to you, so tell me when you are finished.” Once the participants told the experimenter that they were finished, the experimenter asked the participant to turn their back to the table and he took a photograph of the results on the table, returned the tokens to the bowl, reorganized the cards on the table in semi-random manner and thanked the participant. Immediately afterwards the same instructions were read to the participant. This process was repeated five times in total for this phase.

Phase 6: Reversal of Social Function

With their backs to the table used in Phase 5, participants were verbally instructed that a mistake had been made within the experiment (BAD function): “Sorry I have made a mistake. YIM was actually a bad person not a good person.”

Phase 7: Distribution of Tokens after Reversal of Social Function

This phase was an exact replication of Phase 5.

Results

All participants scored over 90% on the equivalence test prior to the transfer of function tests and again after the tests were conducted. Figure 2 summarizes the data obtained for all participants across all trials during Phases 5 and 7. When tokens were distributed, not everyone used all the tokens that were available. As a result, the data presented here show the percentage distribution of tokens across all trials and participants that were allocated to each stimulus. At first, Class 1 (ZID, YIM, FAP) received more tokens than Class 2, with YIM receiving the most tokens. Across these participants there was some variation in the extent of this effect with one participant (P1) giving 100% of the tokens to Class 1 whereas other participants gave 93% (P3), 61% (P4), and 72% (P5) to this class, respectively. One participant (P3) gave substantially more tokens to YIM than to other members of Class 1.

Fig. 2
figure 2

The total percentage distribution of tokens for each stimulus across all participants in Experiment 1 when YIM was described as “GOOD” and “BAD”; the bottom value and top value are, respectively, represented in each frame beside each stimulus (figure made with Smith Micro Software “Poser” and Adobe Photoshop)

In Phase 6, participants were told that a mistake had been made and that in fact YIM was a “Bad person and not a Good person.” The main effect for all five participants was a decrease in the percentage of tokens given to Class 1, whereas Class 2 now received more tokens than it had done in the previous Phase 5. This effect was most pronounced for participant P1 with a change from 0 tokens in Phase 5 to 20 (out of a total of 25) tokens in Phase 7 for Class 2 (i.e., a change from 0% to 80% of all tokens distributed between classes). The changes in tokens allocated for Class 2 between Phase 5 and Phase 7 for the other participants were from 49.6% to 60% (P2), from 7% to 61% (P3), from 39% to 62.5% (P4), and from 28% to 58% (P5). Within Class 1, there was also a pronounced decrease in relative percentage of tokens given to YIM by all participants. The effect was more pronounced for participant P3 where YIM received 81% of Class 1 tokens in Phase 5 but received 0% in Phase 7. The relative distribution within the class for participant P1 was unchanged but there was a decrease in the total number of tokens allocated. The change in relative distribution within Class 1 for YIM by other participants was from 33.3% to 20% (P2), from 33.3% to 22% (P4), and from 33.3% to 21% (P5). Across participants there was a decrease in the number of tokens given to B1 in Phase 7 (B1 is “Bad”) compared to Phase 5 (B1 is “Good”; Figure 3). A detailed overview of results of each individual participant can be found in Supplementary Materials (Appendix 1: Tables 1, 2, 3, 4, and 5).

Fig. 3
figure 3

The total percentage distribution of tokens for each stimulus across all participants in Experiment 2 when no function (NEUTRAL) was associated with YIM and when it was described as “GOOD” and “BAD”; the bottom value, middle value, and top value are, respectively, presented in each frame beside each stimulus (figure made with Smith Micro Software “Poser” and Adobe Photoshop)

Discussion

The goal of this experiment was to establish two equivalence classes (A1, B1, C1 and A2, B2, C2) and then examine the effects of manipulating a social function within one class. After all participants had progressed through training and tests for equivalence showed the existence of both classes, a social function was established at B1 (YIM) in Class 1 using a verbal statement (i.e., YIM was described as a “Good person”). Following this instruction, participants were given tokens to distribute to each of the six stimuli from the two classes (Phase 5). There were two general findings. First, for all participants, stimuli in Class 1 received more tokens than stimuli in Class 2, In addition, for one participant, B1 received more tokens than the other stimuli in Class 1. Also, for all participants, B1 received more tokens than B2 after the original training. The social function was then changed in Phase 6 (i.e., YIM was now described as a “Bad person”). The effect of this was assessed in Phase 7 by again asking participants to distribute tokens. The main finding was that for all participants, the percentage distribution of tokens allocated to Class 1 stimuli decreased and the percentage distribution of tokens allocated to Class 2 stimuli increased. Also, for all participants, B2 received more tokens than B1 following reversal training of the instructions.

Bearing in mind that there were no explicit instructions on how tokens were to be distributed at any time, these findings show that assigning a social value to an arbitrary stimulus in an equivalence is a useful strategy for examining social behavior. The behavior arising from a general instruction to distribute tokens was clearly influenced by the relation between stimuli within the equivalence classes. The distribution of stimuli indicate that social value has function altering effects that are in keeping with findings from transfer/transformation of function studies mentioned earlier. However, the conclusion is somewhat compromised because there was no assessment of token distribution before a social value was assigned to B1. That is, there is no way of determining whether the initial distribution of tokens arose as a consequence of the social value assigned to B1 or whether it arose because of a bias in favor of the stimuli chosen for Class 1. The next experiment addressed this shortcoming by getting participants to distribute tokens before a function was added to B1. This would serve as a baseline from which to assess the effects of adding the social function.

Experiment 2

Method

Participants

The participants for this study consisted of eight students (three males and five females, 2123 years of age) from Ulster University in Northern Ireland. They were enrolled through opportunity sampling through face-to-face interaction. The participants were given little information about the study prior to the experiment other than a participation number and that the experiment is a study of learning. Afterwards they were fully debriefed.

Apparatus and Materials

The experimental setup was virtually the same as that used in Experiment 1 (Figure 1, bottom panel). The only difference was that due to an experimenter oversight 21 tokens were used instead of 25.

Procedure

Phases 1–3 were exactly the same as those in Experiment 1 (see Figure 1 for an overview of the general procedure).

Phase 4: Distribution of Tokens

Here participants were asked to distribute tokens using the general procedure in Experiment 1. There was no function added to any stimuli (NEUTRAL function).

Phase 5: Adding a Social Function

Using the general procedure described in Experiment 1, a social function was assigned to stimulus B1 (YIM) informing the participants that “YIM” is a good person (GOOD function).

Phase 6: Distribution of Tokens

Using the general procedure described in Experiment 1, participants were instructed to distribute the tokens as they considered appropriate.

Phase 7: Reversal of Social Function

Using the general procedure described in Experiment 1, participants were informed that there had been a mistake made and that YIM is actually a bad person and not a good person (BAD function).

Phase 8: Distribution of Tokens after Reversal of Social Function

This phase was an exact replication of Phase 6.

Phase 9: Equivalence Testing

The final phase was an exact replication of Phase 3 except there was no criterion performance required. They were then tested without training for B–A, B–C, C–A, and C–B relations within Class 1 (A1–ZID, B1–YIM, C1–FAP) and Class 2 (A2–VEK, B2–RIX, C2–KUD). This Phase was used to ensure that the equivalence classes were still in place.

Results

In Experiment 2, all participants scored over 90% on the equivalence test prior to the transfer of function tests and again after the tests were conducted (Phase 9). Figure 3 summarizes the data obtained for all participants across all trials during Phases 4, 6, and 8. When tokens were distributed, not everyone used all the tokens that were available. As a result, the data presented here show the percentage distribution of tokens across all trials and participants that were allocated to each stimulus. In Phase 4 (NEUTRAL function), there were no functions trained at any of the stimuli and participants were instructed to distribute the tokens in whatever way they considered appropriate. The general finding was a relatively even distribution of tokens across all stimuli from both classes. During Phase 6 (GOOD function), the distribution changed and Class 1 (ZID, YIM, FAP) now received substantially more tokens than Class 2 (VEK, RIX, KUD). However, within Class 1, the percentage distribution of tokens was skewed towards YIM; some participants (P7, P10, P11, and P12) assigned all of the tokens to YIM. Each of the stimuli in Class 2 received about half of the tokens given to each of A1 (ZID) and C1(FAP). During Phase 8 (BAD function), there was a substantial reduction in the distribution of tokens given to YIM accompanied by an increase in the distribution of tokens for ZID and FAP. The distribution of tokens for Class 2 also matched those for ZID and FAP. A detailed overview of results of each individual participant can be found in Supplementary Materials ( Appendix 2: Tables 6, 7, 8, 9, 10, 12, and 13).

General Discussion

Experiment 1 used an AB design in which the initial performance can be viewed as the baseline condition after which there was a change in the contingencies. There is no doubt that we should have returned to the baseline and used an ABA design. However, at the time, we did not expect that it could be argued by reviewers that such strong preference for YIM would be interpreted as simply a biased preference for a particular nonsense syllable, unrelated to the contingencies that were arranged. In Experiment 2, we provided a different baseline to examine potential biased preference for this nonsense syllable. There was no evidence for any supposed bias for YIM. However, the robustness of the main effect in Experiment 1 was not as strong in Experiment 2. It could be argued that there was little evidence of experimental control. However, the alternative argument is that such findings demonstrate history effects where control is influenced by the effects of prior exposure to the contingencies in the initial condition (see also Hayes, Brownstein, Zettle, Rosenfard, & Kom, 1986; Keenan, 1999; Watt et al., 1991). That said, there was still evidence of how the distribution of resources was influenced by the procedures used to assign “value” to the nonsense syllables.

Across most participants, except for P10, there was no preference any particular stimulus and tokens were evenly distributed across all six stimuli. Once the social function was added to B1 (i.e., by saying YIM is a “Good person”) the distribution of tokens changed significantly and there was preferential treatment for Class 1, with B1 often receiving the most tokens. This effect was reversed, however, when the social function was reversed. Of course, there was no function explicitly added to Class 2 and therefore the subsequent change in the distribution to this class raises another question that needs to be investigated. Perhaps the increase in the number of tokens given to Class 2 as well as two members of Class 1 (A1 and C1) is a simple demonstration of the behavioral contrast effect. Behavioral contrast refers to the finding that there is a change in the strength of one response when the rate of reward of a second response is changed or when a change in reinforcement in one context causes behavior to change in the opposite direction in another context (Catania, 1992; Killeen, 2014). Because social labels “Good” and “Bad” may be considered actual social opposites, these labels may well be suitable for demonstrating the behavioral contrast effect, similar to “Catholic” versus “Protestant” in the study by Watt et al. (1991). Also, procedural characteristics that enabled participants’ acquisition of trained relations (i.e., the use of “correct” and “incorrect” after comparison selection) may have contributed to the behavioral contrast effect; after all, A1, B1, and C1 are distinctly different (nonequivalent) from A2, B2, and C2.

Both experiments of the current study add to the literature on transfer of function in equivalence classes. There is clear evidence of experimental control when we examine performance across participants, and some striking examples within participants. But there is nothing exceptional about this range of findings for it is well known that in transfer/transformation of function studies, the effects are not always an inevitable outcome for each participant (see McVeigh & Keenan, 2009).

It has been standard practice in the literature on transfer/transformation of function to train a function to one stimulus and examine the effects across stimuli. We followed this tradition and extended the way in which a function was trained using a rule-based procedure. We then examined the effect of this rule-based intervention by monitoring the distribution of resources given to each stimulus. We did not simply train the function of giving X amount of resources directly to YIM to see what happens. This would have been asking completely different questions to the rule-based intervention we used. Of course, a rule about one nonsense syllable might have an effect on all nonsense syllables for verbally competent participants. The question is, though, once those nonsense syllables are segregated into different classes, how might you examine the effects of the rule? Our decision was to examine the allocation of resources. Any procedure that examines the consequences of using a rule for establishing a function for one particular stimulus will inescapably be construed as an assessment of the indirect effects of that rule.

Unlike prior studies in transfer of function, though, there were no explicit instructions on how to behave and instead a simple verbal description was used to attach an attribution to one member of a class (cf. McGuigan & Keenan, 2002). It might be argued, though, that the phrase used when reversing the social function (i.e., “YIM was actually a bad person not a good person”) acted as an instruction and the results could have been different if the phrase had been just “YIM was actually a bad person.” In other words, including “not a good person” is like a double directive to change their initial responses or allocation. There are two responses to this argument. First, when the social function was added initially (Phase 4, Experiment 2), there was no double directive (“YIM was actually a good person not a nonsense syllable”) used to change their initial responses for the distribution of tokens to be affected by a phrase that referenced the stimulus. Second, the phrase used was in keeping with the colloquial use of English in Northern Ireland. Of course, this still leaves open the possibility of instructional effects arising from the use of phrases in this context. But this is not a problem because in many respects this is precisely what the study is about. It shows that the function of stimuli in a class can be influenced by social interactions that reference stimuli in a class. The findings also add to the body of research examining attitudes in the context of procedures used to establish complex conditional discriminations between stimuli (e.g., Watt et al., 1991; McGlinchey & Keenan, 1997).

The results that have been reported here are relevant to social psychology areas of research such as stereotyping and social categorization (Ellemers, Kortekaas, & Ouwerkerk, 1999; Giles, Reed, & Harwood, 2010; Hogg & Vaughan, 2010). The preferential treatment of Class1 compared to Class 2 regarding the distribution of tokens when B1 was described as a “Good person” could be viewed as being analogous to social discrimination or prejudice. This analogy was further evident when B1 was subsequently described as a “Bad person.” A basic definition of prejudice is an unjustified attitude toward an individual based solely on that individual’s membership within a social group. In Experiment 1, all members of Class 1 received fewer tokens even though B1 was “the only bad apple,” so to speak. In Experiment 2, this finding was not replicated and instead other members of the class (A1 and C1) now received more tokens than B1. The initial inclusion of a neutral function in Experiment 2 might account for this disparity in so far as both A1 and C1 recovered tokens to a level previously obtained.

Using the allocation of tokens to describe and study transfer of function, may have both advantages as well as disadvantages; on the one hand, it may not be a sufficiently precise measure because allocating a particular number of tokens to any stimulus on test probes could be affected by other effects than just the actual transfer of functions (such as an individual’s preexperimental learning history). Also, when the number of tokens that participants allocated to stimuli on test probes are the same, interpretation in terms of function transfer may be difficult. On the other hand, in theory, using the allocation of tokens to study transfer of function does allow for the objective ranking of stimuli in terms of the number of tokens that participants allocated. This is considered an advantage, because functions that were used in previous studies did not allow for this to happen at all. This advantage, however, will only materialize when enough differences and changes in token allocation are observed in a sufficient number of participants. This will then also enable the use of inferential statistical analyses, which will enable more substantial conclusions regarding possible transfer of functions using social labels. Future experiments could focus on this and other methodological variations not studied here such as (1) history effects across different sequencing of conditions such as stating with “YIM is bad,” (2) different training methods for establishing equivalence classes, (3) different reversal instructions, and (4) different social labels. Regarding the first suggestion, future studies could focus on the effects of variations in sequencing “GOOD,” “BAD” conditions on subsequent token allocation.

Regarding the second suggestion, in the current experiments, blocks of 12 training trials were used to establish A–B and A–C relations; if, however, participants were to receive overtraining of for example A–B, while receiving training as usual on A–C, this may positively influence the strength between A1–B1 and A2–B2 and affect transfer of functions in two ways; first, increased relational strength may produce increased transfer of function (i.e., A1 receiving more resources than C1) and, second, it may lead to less resistance to reversal (i.e., A1 receiving less resources after B1 is labelled negatively, than C1). The latter may be the experimental operationalization of social psychological concepts such as “loyalty” and “in group identification” (Gaertner, Dovidio, Anastasio, Bachman, & Rust, 1993). The effects of overtraining on relational strength were recently demonstrated in a study by Bortoloti, Rodrigues, Cortez, Pimentel, and de Rose (2013) and offers promising opportunities to explore regarding the current aim to experimentally approach social psychological concepts. Of course, this all speculation on our part and there may be better ways to manipulate the strength of relations within equivalence classes. Also, more attention could be given to issues around contextual control (Bush, Sidman, & de Rose, 1989; Gatch & Osborne, 1989; Kohlenberg, Hayes, & Hayes, 1991; Randell & Remington, 2006).

Regarding the third suggestion, the instructions used in the current experiments (“YIM is a good/bad person”) not only provided social labels but also suggested that originally nonsense words actually represented individuals, which may or may not have an extra effect on resource allocation (before and after reversal). Recent research by Arntzen, Nartey, and Fields (2014) and Fields, Arntzen, Nartey, and Eilifsen (2012) may be relevant here. They examined the effects of meaningful stimuli in the creation of equivalence classes and found that the formation of equivalence classes was enhanced compared to equivalence class formation of arbitrary stimuli only. In the current study, however, equivalence classes already existed before the social functions were added. Thus, adding a social function may be viewed as including a meaningful stimulus (e.g., good/bad) to an existing class and the resulting changes may in turn have contributed in some way to the eventual distribution of resource. Future experiments should also explore the effects of a variety of methods to assign social labels to members of equivalence classes on subsequent resource allocation. One example of this might be to use differential consequences during A1–B1, C1–B1 training on one hand and A2–B2 and C2–B2 on the other hand, where consequences are already associated with “good” or “bad,” such as tokens showing angelic versus devilish emotions. Previous research has demonstrated that class-specific consequences are likely to become a member of their respective equivalence classes (Barros, Lionello-DeNolf, Dube, & McIlvane, 2006; Dube, McIlvane, Mackay, & Stoddard, 1987; Dube & McIlvane, 1995; Lionello-DeNolf, Dube, and McIlvane, 2006; Pilgrim, 2004; Schenk, 1994) by providing differential socially loaded emoticon tokens as consequences during A–B and C–B training. As a consequence, A and C stimuli may acquire social loading by emergent relations with “good” or “bad” emoticons, which may generate different resource allocation effects compared to acquiring social loading by function transfer from B.

Regarding the fourth suggestion, the current experiments were limited to the use of only two social labels (i.e., good and bad). Using a variety of socially opposite labels, such as altruistic versus egoistic or capitalist versus socialist, one could also study effects of reversal, or resistance to reversal, when using social labels that imply more permanent versus more temporary person characteristics.

To conclude, the present study offers another avenue for an experimental study of social behavior. We have not proffered a conceptual framework for interpreting the effects of the procedures used beyond staying within the approach taken in general by the experimental analysis of behavior. When contingencies are arranged in the way described here, then it makes sense for the interpretation of outcomes to be anchored in an analysis of the contingencies that were employed. Future research can vary systematically aspects of what was done here to explore and extend the general findings. The basic approach was to generate relations between arbitrary stimuli such that two different classes emerged and then to examine transfer/transformation of function using a social behavior that involved the distribution of tokens to class members. The advantage of doing this is that it provides control over the kinds of relations that are used to establish group membership. Future studies could examine other kinds of functions that might be added via descriptions (e.g., rule-governed behavior) and compare the results to behaviors that are directly trained (e.g., contingency-shaped behavior; for discussions of rule-governed and contingency-shaped behavior, see Hayes, 1989). It might also be interesting to use a variety of other procedures to initially establish relational networks (see Barnes-Holmes, Barnes-Holmes, Smeets, Cullinan, & Leader, 2004) instead of using the simple match-to-sample procedure used here and then determine how the outcomes produced by each procedure map onto assessments conducted by traditional pencil and paper measurements of attitudes.

Other variations in the basic experimental setup could include experiments that explore what happens when a variety of names are used to establish equivalence classes instead of using arbitrary nonsense syllables. For example, studies could use a participant’s own name, or examine the effects of using a friend’s name compared to the name of a stranger. Likewise, the effects of using an immigrant’s name versus nonimmigrant’s name could be compared. In a follow-up to the study by Watt et al. (1991), Catholic and Protestant names could be used with Northern Irish participants and this could be compared to results obtained in similar contexts where there is community conflict. Of course, the distribution of tokens is a very limited example of social behavior and there is no way within the current procedure to determine its construct validity with regard to actual social behavior in real life situations; one example might include a case in which one member of a group loses their reputation whereas another example might include changes in behavior after a particular reputation is reestablished. Another suggestion might be to examine if arbitrary stimuli could also be used and worn as badges by real people to see if the interactions demonstrated here extend to real world simulations. Other studies might explore whether the same effects reported here might be replicated if more complex social behaviors were used. To conclude, the current findings tentatively suggest that procedures used in the study of relational responding within equivalence classes could provide opportunities for examining social psychological constructs such as social attitudes (Stahlberg & Frey, 1988), social categorization, and stereotyping.