Cochlear implants (CIs) electrically stimulate auditory nerve fibers and allow people with severe or profound sensorineural hearing loss access to speech sounds (Svirsky, 2017). Children with prelingual bilateral severe or profound sensorineural hearing loss and who used CIs were the target population in the present study. These children usually learn to relate sounds to world events (i.e., auditory recognition), understand what they hear (i.e., auditory comprehension), and acquire oral language. The development of these repertoires, however, depends on the degree to which these children have opportunities to interact with sounds and communicate through speech (Houston, Stewart, Moberly, Hollich, & Miyamoto, 2012). Auditory or listening comprehension is often a prioritized skill in auditory rehabilitation strategies (Moog & Stein, 2008). Such comprehension can be described in operant terms (Mackay & Sidman, 1984; Sidman, 1971, 2000) and is better developed under systematic teaching conditions (Lund, 2016).

Investigations with newly hearing populations, such as individuals with prelingual hearing loss who receive CIs after a long time of auditory deprivation, can allow the identification of relevant information about ontogenesis and conditions under which listening comprehension can be learned (Almeida-Verdu et al., 2008; da Silva et al., 2006). In this population, the use of CIs implies the possibility of hearing for the first time. The potential of CIs to foster language acquisition depends on a learning history through which hearing behavior acquires symbolic properties (i.e., behavior that is controlled by stimuli that are arbitrarily related by equivalence; Sidman, 1994, 2000; Sidman & Tailby, 1982).

The stimulus-equivalence paradigm (Sidman, 1994, 2000; Sidman & Tailby, 1982) provides operational support for investigating symbolic relations and analyzing auditory comprehension, conceived as a network of equivalence relations between auditory stimuli and other physical and social events (Almeida-Verdu et al., 2008; da Silva et al., 2006; Mackay & Sidman, 1984; Sidman, 1971, 1994). According to this model, establishing arbitrary relations between stimuli or between stimuli and responses that share common elements can produce emergent relations that were not previously taught. These relations can be defined as equivalence if they conform to the mathematical properties of equivalence (Sidman, 1994; Sidman & Tailby, 1982), implying that stimuli that are related by equivalence become interchangeable with each other. A refinement of this definition was advanced by Sidman (2000): “equivalence relations...consist of ordered pairs of all positive elements that participate in the contingency” (p. 127).

A procedure that is broadly used to establish equivalence relations and teach relations between stimulus pairs with common elements or nodes is the matching-to-sample (MTS) procedure (Critchfield, Barnes-Holmes, & Dougher, 2018; Mackay & Sidman, 1984; Sidman, 1994, 2000; Sidman & Tailby, 1982). The typical MTS procedure usually teaches at least two conditional discriminations simultaneously, which requires the learner to discriminate between two sample stimuli that are presented successively and two comparison stimuli that are displayed simultaneously. The selection of a specific comparison is correct for one of the samples but incorrect for the other. Differential consequences are provided for correct and incorrect responses. These consequences are critical for establishing the sample-comparison conditional relations.

Beginning with the canonical paper by Sidman (1971), teaching a conditional relation between the dictated word “pen” and a picture of a pen (AB relation) and teaching a conditional relation between the same dictated word and the written word “PEN” (AC relation) using the auditory stimulus (A) as a node have been well documented to favor the derivation of the conditional relations between the picture and the written word (BC and CB) that were not directly taught. Positive results in BC and CB probes would indicate the formation of an equivalence class that includes the dictated word “pen,” the written word “PEN,” and the picture of a pen (ABC class). Therefore, assuming equivalence as a model of meaning (Sidman, 1994, 2000; Sidman & Tailby, 1982; Wilkinson & McIlvane, 2001), these results would suggest that learners would then understand what they hear when one says “pen” or read the written word “PEN.”

Matching-to-sample procedures have been successfully used with CI users (with pre- and post-lingual hearing loss). Equivalence class formation has been reported after teaching conditional relations between stimuli of different modalities (e.g., textual, pictorial, and especially auditory) with different verbal units, such as words (Almeida-Verdu et al., 2008; Anastácio-Pessan, Almeida-Verdu, Bevilacqua, & de Souza, 2015; Lucchesi, Almeida-Verdu, & de Souza, 2018) and sentences (Neves, Almeida-Verdu, Assis, Silva, & Moret, 2018; Silva, Neves, & Almeida-Verdu, 2017). In these studies, equivalence-based instruction programs (EBIs; cf. Cooper, Heron, & Heward, 2020; Critchfield et al., 2018; Pilgrim, 2020) have been shown to effectively foster auditory comprehension, integrate verbal skills (e.g., verbal operants, such as tact and textual behavior, that are controlled by members of equivalence classes), and extend speech accuracy from text to pictures in reader children with CIs.

However, despite the many successful cases with this population, arbitrary auditory-visual MTS has been challenging for some participants, especially young children or older children with late CI implantation. For example, in our initial study (da Silva et al., 2006), children with prelingual hearing loss failed to establish auditory-visual discrimination with electrical stimuli (i.e., stimuli that are received by the CI) as samples and abstract pictures as comparisons. In another study (Almeida-Verdu et al., 2008), one of the participants showed equivalence class formation only after two reapplications of auditory-visual conditional discrimination training that involved stimuli with extra-experimental histories. In a recent study (Neves et al., 2018), all of the participants needed more than five exposures to auditory-visual MTS training with sentences to achieve the learning criterion. In other words, people with CIs needed efficient procedures to learn auditory-visual symbolic relations (Almeida-Verdu et al., 2008; Lund, 2016).

Comparable difficulties with arbitrary matching have been found in other studies with young children and children with developmental disabilities (e.g., Augustson & Dougher, 1991; Pilgrim, Click, & Galizio, 2011; Pilgrim, Jackson, & Galizio, 2000). For this reason, the development and investigation of the effectiveness of other teaching procedures are relevant issues for research with this population.

Empirical studies have shown that class-specific consequences (including auditory consequences) may become part of equivalence classes. A pioneering study demonstrated stimulus class membership that was established by stimulus-reinforcer relations (Dube, McIlvane, Mackay, & Stoddard, 1987). Positive results have been reported in other studies using class-specific consequences to establish different types of baseline relations that support the emergence of equivalence relations: identity matching (Barros, Lionello-DeNolf, Dube, & McIlvane, 2006; Dube & McIlvane, 1995; Schenk, 1994; Silveira, Mackay, & de Rose, 2018; Varella & de Souza, 2015), arbitrary matching (Dube et al., 1987; Dube, McIlvane, Maguire, Mackay, & Stoddard, 1989; Goyos, 2000; Guld, 2005; Luffman, 2012; Johnson, Meleshkevich, & Dube, 2014; Joseph, Overmier, & Thompson, 1997; Minster, Jones, Elliffe, & Muthukumaraswamy, 2006; Schenk, 1994; Varella & de Souza, 2014), and arbitrary constructed response MTS (Calado, Assis, Barboza, & Barros, 2018). Specific consequencesFootnote 1 could optimize teaching, given the increase in the number of relations that are potentially established without direct training (Pilgrim, 2020; Varella & de Souza, 2015; Vladescu & Kodak, 2013), fostering the learning of untrained relations when target responses to a stimulus are followed by consequences with extra nontarget stimuli (Reichow & Wolery, 2011; Vladescu & Kodak, 2013).

In the study by Varella and de Souza (2015), for example, a child with Autistic Spectrum Disorder (ASD) demonstrated equivalence classes with letters (i.e., dictated letters and written letters in upper- and lowercase) after identity MTS training with lowercase letters (stimulus Set 1) and uppercase letters (stimulus Set 2), in which correct responses were followed by specific consequences that consisted of the uppercase letter (Set 1) or the lowercase letter (Set 2) and its dictated name. For example, in a trial with the sample written stimulus “e” that was displayed simultaneously with three simultaneous comparison stimuli (written letter “e” as the S+ and written letters “a” and “j” as the S), selecting “e” produced the simultaneous presentation of the dictated letter “e” and written uppercase letter “E” on the computer screen (i.e., a compound stimulus). Notably, identity MTS training (lowercase-lowercase and uppercase-uppercase) fostered the emergence of arbitrary relations (lowercase-uppercase, uppercase-lowercase, dictated name-uppercase, dictated name-lowercase), a result that was only possible because of the role of the class-specific compound consequences that were used as nodes. These results suggest that procedures that use auditory stimuli as elements of multicomponent specific consequences to engender auditory-visual relations could be a feasible alternative for teaching these repertoires to CI users.

Equivalence class formation can also result from simple discrimination training, as suggested by Sidman (1994, 2000) and empirically demonstrated in some studies (e.g., Debert, Matos, & McIlvane, 2007; Sidman, Wynne, Maquire, & Barnes, 1989), including simple discrimination training with multicomponent specific consequences (Luffman, 2012; Yonkers, 2012; cf. Pilgrim, 2020). For example, in the initial phase of the study by Yonkers (2012), four young children with learning disabilities mastered three simple discriminations with correctly oriented numerals (4, 7, or 10) as the S+. In each trial, three stimuli were displayed simultaneously: the stimulus that was defined as positive (S+) and two stimuli that were defined as negative (S). Each negative stimulus was the same numeral that was presented as the S+, with the difference that the numeral was displayed in either an upside-down or a 90° orientation. Selection of the S+ produced a compound class-specific consequence: the spoken numeral name (e.g., “Seven”) and the corresponding written word (e.g., “SEVEN”). Selecting either S produced a buzzer sound. Probe trials were conducted in the MTS format and presented the numeral, spoken word, and written word both as a sample or as comparisons. For all children, responding was class-consistent. In another condition, the children learned to select arithmetic addition operations that were displayed in the correct orientation (e.g., S+: “1 + 1”). Correct responses were followed by the spoken numerical result (e.g., “Two”) and the corresponding written numeral (e.g., “TWO”). The multicomponent specific consequences functioned as a node. This training was also applied to arithmetic subtraction operations. Most of the participants had positive results in derived conditional relation probes, including auditory (dictated numbers) and visual (written Arabic numerals) components of specific consequences, thus indicating the formation of equivalence classes. These results confirmed previous findings on the emergence of equivalence classes after simple discrimination and extended the effects of differential outcomes or specific consequences to this training arrangement.

The present study investigated the auditory comprehension of sentences in CI users using simple discrimination and class-specific compound consequences (auditory and visual) as a baseline to assess the formation of equivalence classes that would include dictated sentences, written sentences, and pictures. Simple discriminations are typically easy to acquire in young children and children with disabilities (Pilgrim, 2020). One issue is whether the population of children who use CIs can equally benefit from simple discriminations as an initial step to establish a complex network of equivalence relations.

To determine effective teaching arrangements to promote auditory comprehension in CI users, we need to empirically verify whether these children would also benefit from the use of specific consequences. As in the study by Varella and de Souza (2015), teaching visual discriminations using auditory stimuli as consequences may establish auditory-visual relations through derived relations, without requiring, from the beginning, the acquisition of auditory discriminations.

Simple discrimination training was conducted with two sets of discriminative stimuli (SD or S+): written pseudo-sentences (C) and abstract pictures (D). With both sets, positive stimuli were correctly oriented, and negative stimuli (SΔ or S) were the same sentences or pictures as the positive stimuli but were displayed in either an upside-down or inverse orientation. Specific consequences were auditory-visual compounds: a dictated sentence (A) and the corresponding picture (B). Potentially emergent auditory-visual AB, AC, and AD relations were evaluated in probe trials, together with all other visual-visual relations: BC, CB, BD, DB, CD, and DC. Positive results in these probes would demonstrate the formation of equivalence classes between all elements of the teaching contingencies, including the dictated sentences, and would support the use of this experimental arrangement to promote auditory comprehension in CI users.

Method

Participants

Three girls, aged 9–11 years, who were diagnosed with prelingual severe-profound bilateral sensorineural hearing loss and were users of unilateral or bilateral CIs, participated in the study. The participants attended elementary school and were accompanied by audiological and educational services at the Craniofacial Anomaly Rehabilitation Hospital (HRAC) in Bauru. One participant (LAR) had no history with stimulus control experiments, whereas the other two (DEM and BIA) had previously participated in at least one study. The research was conducted according to HRAC ethical protocols (CAAE 45782215.2.0000.5441).

Before the study, the participants underwent an individual assessment using standard tests for intellectual abilities (Columbia Mental Maturity Scale CMMS standardized for Portuguese: Alves & Duarte, 2001), receptive vocabulary (Peabody Picture Vocabulary Test, 4th revision [PPVT-4R]; Dunn & Dunn, 2007), and reading, writing, and math skills (Teste de Desempenho Escolar [School Performance Test; TDE]; Stein, 1994). Information on the participants’ clinical history, CI use, and hearing and language assessments were obtained from medical records of the HRAC Cochlear Implant Section.

Table 1 presents the characteristics of each participant. All three participants had below average scores in intellectual abilities (specifically, reasoning and concept formation) on the CMMS. All three had scores in reading, writing, and arithmetic subtests on the TDE that were lower than expected based on their school history. All three participants had receptive vocabulary scores on the PPVT-4R that were lower than expected for their chronological age.

Table 1. Characterization of Participants: Gender, Age, Hearing Loss Etiology, Time of Hearing Deprivation and Hearing with Cochlear Implant (CI), CI Model and Laterality, Hearing and Language Categories, Results in Columbia, PPVT, TDE (Reading, Writing and Arithmetic) and School Years

With regard to hearing/listening, the participants’ records included several measures that resulted from a set of speech-language assessments that placed the children in categories 1–6 (as proposed by Geers, 1994). As shown in Table 1, the participants reached the highest levels on those scales. For example, the participants were able to identify words by consonant recognition (DEM: Category 5) and could identify words in open sets (LAR and BIA: Category 6). Finally, in several communication assessments that place children in categories from 1 to 5, the participants reached level 4 (e.g., LAR and DEM emitted sentences) and level 5 (e.g., BIA spoke sentences fluently).

Hearing deprivation refers to the time period from birth to CI implant surgery, which varied between participants from nearly 3 years (LAR and BIA) to 5 years (DEM). The use of CIs (from implantation to the beginning of the study) varied from approximately 4 to 9 years. LAR and BIA underwent CI surgery in one ear during the sensitive period of neuroplasticity (Kral & Sharma, 2011), and they later received the CI in the other ear. DEM received the CI after 4 years of hearing deprivation, was using the CI for almost 4 years at the beginning of the study, and had received therapy with an emphasis on lip reading before receiving the CI. Therefore, all three participants likely presented some well-developed level of symbolic function in the visual modality (even before receiving CIs) and auditory-visual modality (after receiving CIs), both by exposure to the natural environment and by exposure to speech therapy.

Experimental settings

Data collection was conducted in rooms in a university laboratory, a public school, and the participants’ homes. The researcher ensured that all of the environments had good lighting, ventilation, and low noise. Each setting included a table and two chairs, one for the participant and one for the experimenter. The participant was positioned in front of a notebook computer. The researcher sat on the right side, gave instructions, and praised correct responses (during training only), and operated the software to change or end trial blocks.

Sessions were conducted individually, two to three times weekly, and each session lasted approximately 30 min. After the session was completed, the participants played with toys or pre-installed free games (e.g., Jogo da Forca®) on a Samsung Galaxy tablet for 10 min, followed by the opportunity to choose a small gift from a box.

Material and equipment

For data collection, a Dell notebook with access to PROLER 10 software (Assis & Santos, 2010), speakers, and a Sony GR-AX837 camcorder were used. The software presented tasks (i.e., stimuli, instructions, and consequences) and automatically recorded the participants’ responses in each trial.

Games (e.g., Uno®), a Samsung Galaxy Tab A8 4 GB tablet (with free games installed), colored pencils, and drawings were used during playtime. School items, stickers, storybooks, and small toys were chosen as gifts by the participants at the end of the sessions.

Standardized instruments were administered individually to characterize some of the participants’ skills. The CMMS (Alves & Duarte, 2001) was used to measure the ability to form concepts. The assessment required that the participants point to one picture, among others that were presented simultaneously, that did not belong to a given category. The PPVT-4R (Dunn & Dunn, 2007) was used to evaluate receptive vocabulary and required that the participant point to one of four pictures that corresponded to a dictated word. The TDE (Stein, 1994) was used to assess basic academic skills through handwriting dictation tasks, word-list reading, and arithmetic tests, producing a score relative to the expected score for school placement.

Experimental stimuli

Considering the participants’ level of symbolic function, to control for past experience, stimuli that were used as the S+ in discrimination training were pseudo-sentences and abstract pictures (Fig. 1). Each of three pseudo-sentences (adapted from de Souza, Postalli, & Schmidt, 2013), was composed of three terms [infinitive pseudo-verb]-[definite article]-[pseudo-object]. Stimulus Sets A and B were used as specific consequences. Stimulus Sets C and D were used as the S+ in discrimination training. Stimulus Sets C’ and D’ were used as the S.

Fig. 1
figure 1

Stimuli used in the study and their respective alphanumeric labels. The dashed area represents stimuli of potentially emerging equivalence classes (A1B1C1D1, A2B2C2D2 e A3B3C3D3)

Stimulus Set A comprised dictated pseudo-sentences (A1, A2, and A3) that were pronounced slowly (but without long pauses after each syllable) in a male voice. Set B comprised representative pictures (B1, B2, and B3) that were designed especially for this study (500 × 500-pixel files). Dictated pseudo-sentences (A) and representative pictures (B) were used as components of specific compound consequences (A1 + B1, A2 + B2, and A3 + B3). They were always presented simultaneously after a correct response during discrimination training.

Stimulus Set C was composed of written pseudo-sentences (C1, C2, and C3) that were typed in black font (75 Arial) in a 3 cm × 6 cm text box with a white background in occidental written language orientation (left-to-right and up-down). The sequence of letters in the words was compatible with Portuguese (i.e., the participants’ native language). Set D consisted of two-element abstract pictures (Fig. 1) with the same file configurations and parameters as Set B. The relation between pictures in Sets B and D was established experimentally. The abstract pictures in Set D had no physical similarity to or iconic representation of pictures in Set B.

Stimuli that served as the S in C and D discrimination training were rotations of stimuli in Sets C and D (180°- mirrored; or 270° - “upside down”; see Fig. 1). The rotation of these stimuli followed the parameters that were reported by Yonkers (2012). The rotated written pseudo-sentences were designated as Set C’ (C’4, C’5, C’6, C’7, C’8, and C’9), and rotated pictures were designated as Set D’ (D’3, D’4, D’5, D’6, D’7, D’8, and D’9).

Auditory stimuli were presented through the computer speakers. Visual stimuli were presented in demarcated locations (approximately 6 cm × 6 cm squares) that were arranged as a cross-shaped matrix with three squares that were displayed horizontally and three that were displayed vertically (see examples in Fig. 2). The sample stimulus that was used in the visual-visual conditional discrimination probe trials was presented in the square in the middle of the cross arms (in the center of the screen). Three visual stimuli were displayed simultaneously in simple discrimination training or as comparisons in conditional discrimination probe trials. Each stimulus was presented in one of the three (out of four) squares at the ends of the cross. The position of the empty square varied throughout the tests.

Fig. 2
figure 2

Schematic representation of simple discrimination training of C1 (upper left panel) and D1 (upper left panel) with specific multicomponent consequences (A1 + B1). The dashed arrows in the upper right panel indicate potentially emergent relations tested in probe trials. The lowest right panels illustrate the display of stimuli for two types of MTS in probes trials

Procedure

Overview

The procedure was organized into teaching and probe steps. The main teaching task was simple discrimination with specific consequences using two stimulus sets, with three stimuli per set. Probe trials assessed both simple and conditional discrimination trials. Programmed (teaching and probing) tasks were presented as blocks of computer-managed discrete trials. In simple discrimination training, correct responses (i.e., mouse click on the stimulus that was defined as the S+) were followed by specific multicomponent consequences (auditory + visual), and incorrect responses were followed by a 3-s black screen, a 1-s intertrial interval (ITI), and presentation of the next trial. Blocks of probe trials that were conducted in extinction assessed taught simple discriminations, and conditional auditory-visual and visual-visual relations that could potentially result from simple discrimination training. Conditional discrimination (MTS) trials were interspersed with simple discrimination trials.

Experimental design

Probe blocks were conducted according to a multiple-probe design (Horner & Baer, 1978). The general sequence of teaching and probe blocks was arranged as follows: probes (Block 1), simple discrimination training with stimulus Set C (written pseudo-sentences), full baseline (Set C), probes (Block 2), simple discrimination training with stimulus Set D (abstract pictures), full baseline (Set D), and probes (Block 3). Therefore, the assessment preceded and followed C training and D training. Simple discrimination probes after training assessed the maintenance or stability of taught (baseline) relations because this was the potential basis for emergent performance. Figure 2 shows representations of teaching (left panels) and probe (right panels) procedures, which are detailed below.

Simple discrimination training C (written pseudo-sentences) and D (abstract pictures)

Table 2 presents the sequence and content of each simple discrimination training block. The general teaching procedure was the same for C training (three discriminations of written pseudo-sentences) and D training (three discriminations of abstract pictures). The stimuli are shown in Fig. 1. For each stimulus set (C or D), training was arranged in four successive blocks. Each of the first three training blocks taught one discrimination in a sequence of 12 consecutive trials. For example, C1 (written pseudo-sentence MUPAR A GUZATA) was the S+ in all 12 trials in the first block. C2 - VOQUER A REVECA was the S+ in all 12 trials in the second block. C3 - ZABIR A TABILU was the S+ in all 12 trials in the third block. Table 2 shows the two S stimuli that were presented with each written sentence (C1+/ C’4-C’5-; C2+/C’6-C’7-; C3+/C’8-C’9-) and with each abstract picture (D1+/ D’4-D’5-; D2+/D’6-D’7-; D3+/D’8-D’9-). In this training, no reversal of the function of the S+ or S occurred. No S+ functioned as the S in any trial, and no S functioned as the S+ in any trial (cf., Yonkers, 2012).

Table 2. Simple Discrimination Training: Distribution of Training Trials per Block, Number of Trials, S+ and S-, Specific Consequences and Block Destination according to Criterion Attainment

Each trial began with a blue square that was presented in the center of the screen. A mouse click on this square was followed by the simultaneous presentation of three visual stimuli on the screen: the S+ (with the conventional orientation) and two S (with rotated orientations; see left panels in Fig. 2). A mouse click on the S+ was followed by the specific multicomponent consequence (a dictated pseudo-sentence; auditory component, stimulus Set A) and a representative picture (visual component, stimulus Set B; see stimuli in Fig. 1). The dictated pseudo-sentence that corresponded to the written pseudo-sentence that was presented as the S+ was presented through the computer speakers (A). Therefore, when MUPAR A GUZATA (C1) was the S+, the auditory component of the consequence was A1 (“mupar a guzata”), and the visual component of the consequence was the representative picture B1. Correctly responding to C2 produced A2 (“voquer a reveca”) and picture B2. Correctly responding to C3 produced A3 (“zabir a tabilu”) and picture B3.

The learning criterion for each 12-trial block was 100% correct responses. If this criterion was not met, then up to two exposures to the block were presented in the same session. If more exposures were needed, then the session ended, and the procedure resumed in the next session. Once this accuracy criterion was reached in simple discrimination C1, the same procedure was used to teach C2. Once the criterion was reached in simple C2 discrimination, C3 training was introduced. The fourth training block mixed all three discriminations (full baseline of simple discriminations C1, C2, and C3). The block had nine trials, with three trials for each simple discrimination. Each trial was presented as in the previous training, including the two rotated S and specific consequences (C1→A1+B1; C2→A2+B2; C3→A3+B3). The three trial types alternated in no systematic order throughout the training block.

The main criterion remained at 100% correct responses. After discrimination training with Set C was completed, a probe block was conducted and followed by discrimination training with Set D. The same training procedure that was used for Set C then commenced, including the presentation of specific consequences (i.e., when D1 was S+, the consequences were A1+B1; the programmed consequences for correct responses to D2 were A2+B2; the programmed consequences for correct responses to D3 were A3+B3). Therefore, the compound A and B consequences were the possible nodes between C and D.

Probes of learning outcomes and derived relations

Blocks of probe trials that were conducted in extinction assessed the participants’ repertoire in simple and conditional discrimination with the experimental stimuli before training, after C training, and after D training. The participants were informed that they would not receive feedback but that they should respond as best they could and they could play with their favorite game and choose an item from the gift box at the end of the session.

Each block included 33 randomly arranged trials, with one per relation. The blocks tested the simple discriminations (C and D) that were trained but without consequences. The blocks also tested all potentially emergent auditory-visual (AB, AC, and AD) and visual-visual (BC, CB, BD, DB, CD, and DC) conditional relations. Table 3 presents a list of probed relations.

Table 3. Trial Types Assessed in Probes Conducted Before and After the Simple Discrimination Training: Stimuli or Relations Tested, Discrimination Type, Required Responses, Sample Stimuli in Conditional Discrimination Trials, and S+ and S- Stimuli in All Trial

Simple discrimination probe trials

These trials were similar to training trials, with the exception that specific consequences did not follow correct responses. Each trial displayed one S+ (correctly oriented stimulus) and two S stimuli (rotations of the original S+). In probes with stimuli from Set C, each trial simultaneously presented one written pseudo-sentence in the correct orientation (S+: C1, C2, or C3) and two stimuli in rotated orientations (S: C’4 and C’5, C’6 and C’7, or C’8 and C’9; see Table 3). Probes with stimuli from Set D simultaneously presented one abstract picture in the correct orientation (S+: D1, D2, or D3) and two stimuli in rotated orientations (S: D’4 and D’5, D’6 and D’7, or D’8 and D’9). These trials were intermixed with conditional discrimination trials.

Conditional discrimination probe trials

The assessment of potentially derived relations, which were indicative of the formation of equivalence classes, was programmed via the MTS procedure (see Table 3 and the diagram of tested relations in the top right panel in Fig. 3). For visual-visual relation trials (BC/CB, BD/DB, and CD/DC), the sample stimulus was presented in the center of the screen together with three comparison stimuli that were displayed on three of the four ends of the cross-shaped matrix on the computer screen (see example of a D1B1 relation in the bottom right panel in Fig. 2). The locations of the three stimuli varied across trials. For auditory-visual relations (AB, AC, and AD), a blue square was displayed in the center of the screen. A mouse click on the square (i.e., the trial initiation response) resulted in the auditory sample being played through the computer speakers and the comparison stimuli appearing on the screen. The selection of the comparison stimulus that was consistent with the potential class was recorded as a correct response; otherwise, the response was recorded as an incorrect response.

Fig. 3
figure 3

Percentage of correct responses in simple discriminations training with written pseudo-sentences (stimulus set C) and abstract pictures (stimulus set D)

Data analysis procedure

The participants’ responses were automatically recorded by the computer software. Performance on simple and conditional discrimination tasks was analyzed in terms of the number of correct responses per relation and per block and are expressed as a percentage.

Results

The participants completed the study in approximately ten sessions (5 h) that were distributed over 4 weeks (three sessions/week). Figure 3 shows the results of simple discrimination training. Figure 4 shows the results of the probe blocks that were conducted before and across training.

Fig. 4
figure 4

Percentage of correct responses in probe trials of simple discriminations (baseline) and of derived relations (conditional discriminations)

The simple discrimination training of written pseudo sentences as the S+ was completed within five to ten blocks (Fig. 3, left panel). DEM and BIA, who were already readers before the study, met the learning criterion (100% correct) for the first block in two exposures. The other two discriminations (C2 and C3) were mastered by DEM and BIA in a single block. LAR was a pre-reader and required up to five exposures to learn the first discrimination (C1; MUPAR A GUZATA), and this participant required two exposures to master C2 (VOQUER A REVECA) and C3 (ZABIR A TABILU). The black bars show that two of the girls (DEM and LAR) scored 100% correct on the first exposure to the full baseline block (mixed discriminations C1, C2, and C3). BIA required two blocks, but the percentage of correct responses was above 75% in the first block.

As shown in the right panel in Fig. 3, LAR and BIA required more blocks to learn discriminations of the abstract pictures (D) than to learn the C discriminations. DEM required two exposures only to the first block (same as for C training). In the full baseline (black bars) with abstract pictures (D), DEM and LAR showed an accurate baseline in the first exposure, whereas BIA required two exposures.

Probes assessed whether the simple discrimination training of written pseudo-sentences (C) and abstract pictures (D) with specific multicomponent consequences results in emergent relations between stimuli with the properties of equivalence classes (Fig. 4). Each graph presents three bars, corresponding to probes before training (white), after C training (gray), and after D training (black). Generally, performance before training, represented by the white bars, was absent or lower than in subsequent probes.

Simple discrimination probes

As shown by the first set of bars to the left of the dashed line in Fig. 4, in the first probes (white bars) that were conducted before written pseudo-sentence (C) discrimination training, differences were observed across participants. In subsequent probe blocks, BIA maintained 100% accuracy, whereas DEM and LAR reached 100% accuracy after direct training with sentences (gray bars) and maintained that performance after abstract picture discrimination training (black bars).

For simple discriminations that involved abstract pictures (D), as shown by the second set of bars, DEM and LAR scored 0% correct at the initial baseline, and BIA scored 66% correct. Accuracy in this task increased after training C discriminations (100% for DEM and BIA and 25% for LAR; i.e., before any D training with pictures). All three participants reached 100% accuracy after D training.

Conditional discrimination probes (emergent relations)

The first three columns of the graphs that are shown to the right of the dashed line in Fig. 4 show auditory-visual conditional relations (MTS tasks) AB, AC, and AD. These tasks verified whether the auditory component of specific consequences (A) became related to the visual component (B) that was presented simultaneously with it during baseline and related to visual stimuli C and D, in the presence of which those consequences followed correct responses (see diagram of tested relations in the right panels in Fig. 2).

Conditional relations between dictated pseudo-sentences and pictures (AB) were demonstrated by DEM and LAR after training, increasing from 0% correct at baseline to 100% correct after training. For conditional relations between dictated and written pseudo-sentences (AC), baseline performance was more accurate for DEM and BIA (who were reader participants) than for LAR (the pre-reader participant, who scored 0% correct). DEM maintained 100% accuracy in the two subsequent tests. BIA reached 100% accuracy after C training. LAR exhibited better accuracy after C training and reached 100% correct after D training. Conditional control by dictated pseudo-sentences over abstract pictures (AD) was achieved by DEM and LAR only after teaching D discriminations. BIA achieved an intermediary accuracy score (66%). In summary, training two sets of simple visual discriminations (C and D) engendered emergent auditory-visual relations that were consistent with the potential formation of equivalence classes. The auditory component of the consequences became positively related to the other component (the representative picture [B]) and related to the corresponding stimuli that were used as the S+ in C and D simple discrimination training.

The middle group of bars to the right of the dashed line in Fig. 4 represents the results of probes that verified whether the visual component of specific consequences (representative pictures [B]) became related to the visual stimuli from simple discriminations (C and D). For DEM and LAR, these visual-visual conditional relations (BC/CB and BD/DB) increased to 100% of responses that were consistent with the potential emergent relations after completing the relevant training. BIA had low and variable accuracy across all tests.

The last two sets of bars in Fig. 4 show the results of CD and DC probes, in which the visual stimuli that were used separately in simple discrimination training were presented as samples and comparisons in conditional discrimination trials. Two participants (DEM and LAR) had low accuracy (< 40%) in the baseline tests and after C discrimination training, but responses that were consistent with potential emergent relations increased to 100% after D discrimination training. BIA presented an increase in these relations after mastering the two sets of simple discriminations, but performance remained under 70%.

Discussion

The present study evaluated the effects of simple discrimination training with multicomponent specific consequences on emergent conditional relations and the formation of equivalence classes with pseudo-sentences in three children with CIs. All three participants learned the simple discriminations that involved written pseudo-sentences (C) and abstract pictures (D). Dictated sentences (A) and corresponding pictures (B), both according to language conventions, were components of specific consequences. Two children developed auditory comprehension, demonstrated by positive results in probes of conditional relations (visual-visual and especially auditory-visual relations) that were consistent with equivalence class formation.

Establishing baseline simple discrimination

The acquisition of discriminations that involved written pseudo-sentences (C) was achieved after different numbers of exposures to training for individual participants. DEM and BIA (who were readers) required only one or two exposures. LAR (who was a pre-reader) required more training sessions to meet the learning criterion. This difference may have been attributable to the control that was exerted by some dimensions of the textual stimuli, which was well established for DEM and BIA and only in process for LAR. Although the sentences had no pre-experimental meaning, to correctly indicate the pseudo-sentence that was defined as the S+, the participant had to pay attention to the left-right and top-down orientation, which were the same relevant and arbitrarily conventional dimensions of Western languages (Hulme & Snowling, 2014). Control by these textual dimensions is established in early reading instruction (Hulme & Snowling, 2014), which may account for the need for more training for LAR compared with DEM and BIA. Further studies should control for this variable and verify the pace of discrimination acquisition as a function of the reading repertoire of children with CIs.

Two participants (BIA and LAR) required more repetitions of discrimination training with abstract pictures (D) than with written pseudo-sentences (C). The pictures were abstract compounds (two elements each), and discrimination may have been difficult because control depended solely on the position of rotations, especially when rotation implied only a left-right inversion of the two elements relative to the S+ (see Fig. 1; compare S+ stimuli D1, D2, and D3 to S stimuli D’5, D’7, and D’9). Additionally, these abstract pictures (D) were completely unknown to the participants, and S+ was established solely by direct exposure to contingencies, unlike the written pseudo-sentences (C), for which textual dimensions were familiar (such as left-to-right orientation) and quickly mastered. Eventually, the learning challenge of D discriminations was successfully overcome. Previous studies that used the same procedure to teach the simple discrimination of numbers and arithmetic operations with children (Guld, 2005; Yonkers, 2012) reported rapid learning, but these participants already had contact with numbers in the correct orientation (S+). These results suggest that familiarity with stimuli may facilitate discriminative learning and the formation of equivalence classes (Fields, Arntzen, Nartey, & Eilifsen, 2012) or that the relevant discrimination had already been established. For example, for reader participants, when written sentences or numbers were used as the S+, stimulus orientation was probably a cue for correct responding during discrimination training (i.e., the orientation itself became the discriminative stimulus property instead of the content or topography of the stimulus). This is particularly plausible when considering that C1, C2, and C3 in the present study were always used as the S+ only. Stimuli that were used as the S were rotations of S+, but S+/S reversals were never used. These possibilities (i.e., stimulus familiarity and responding by the exclusion of incorrect stimulus orientations) are plausible when comparing the results of C and D training, which intra-experimentally replicated differences that were previously observed in previous studies. C and D discriminations were purposefully easy because the objective was to favor the establishment of relations between the S+ and the stimuli used as consequences, with the goal of establishing a baseline for the emergence of equivalence relations. However, the greatest difficulty that was observed in learning D discriminations suggests that the participants’ lack of familiarity with abstract pictures may have favored the occurrence of more incorrect responses because of the procedure’s trial and error feature (Ferrari, de Rose, & McIlvane, 1993). This hypothesis should be experimentally verified by examining whether errorless learning procedures (e.g., fading and exclusion) accelerate the learning of simple discriminations with these pictures and minimize errors (Sidman, 2010). Another source of difficulty was likely the compound nature of D pictures. A replication of this procedure could verify whether unitary pictures would make D discrimination easier.

Unlike BIA and LAR, DEM performed accurately with the first exposure to three of four blocks of D discrimination training, as she did in C training. DEM only made a few errors in the very first trials of the first training block and quickly mastered the three discriminations (D1, D2, and D3), even when all three discriminations were mixed in the fourth block. Notably, DEM had already reached 100% correct responses in D probes that were conducted before D training (see Fig. 4). However, one limitation of these probes was that they assessed only one trial of each discrimination, thus leaving open the possibility that this score had been obtained by chance in the absence of feedback. Support for this possibility comes from the results of BIA, who also obtained 100% correct responses in D probes that were conducted before D training but required many training blocks to master each of these discriminations during subsequent training (see Fig. 3). DEM may have benefited from C training by responding rapidly to differential consequences for correct and incorrect responses at the beginning of D training.

Testing for the emergence of novel relations

After training simple C and D discriminations, the stimuli that were previously used as antecedents (C in the first training and D in subsequent training) and consequences (simultaneously presented A and B) in simple discrimination were presented as sample and comparison stimuli in MTS probe trials. Positive results for DEM and LAR revealed emergent relations that were consistent with equivalence relations (Sidman, 2000; see Fig. 4) for the auditory-visual relations (AB, AC, and AD) and visual-visual relations (BC, CB, BD, DB, CD, and DC). Responding in these several types of MTS probes was consistent with the formation of three four-member equivalence classes (A1B1C1D1, A2B2C2D2, and A3B3C3D3). Specifically, positive results on tests of emergent conditional relations between written pseudo-sentences and abstract pictures (CD and DC), which were never presented together in training, support a transitive relation interpretation (e.g., if C1 discriminative and A1+B1 consequence relations and D1 discriminative and A1+B1 consequence relations, then C1D1 and D1C1 relations) and more generally equivalence class formation (Barros et al., 2006; Guld, 2005; Johnson et al., 2014; Pilgrim, 2020; Sidman, 2000; Yonkers, 2012). Notably, CD and DC tests were similar to those that were used in the seminal paper by Sidman (1971), which tested relations between two stimulus sets that were not directly related in training but related to a common auditory node. The formal definitions of transitivity and equivalence that were established by Sidman and Tailby (1982) were formulated under the control of aspects of the MTS procedure for training conditional discrimination and testing interchangeable functions that involved sample and comparison stimuli. The refinement of the definition of equivalence relations as “ordered pairs of all positive elements that participate in the contingency” (Sidman, 2000, p. 127) allowed the extension of those original definitions to new procedures, such as the three-term contingency that was used in the present study and other studies. The present results add to several previous findings that support the claim of Sidman (2000), based on emergent relations that derive from three-term contingencies.

Concerning specifically derived conditional relations between dictated pseudo-sentences and pictures (AB; i.e., both components of specific consequences), one should consider the nature of stimulus-stimulus pairing during the simultaneous presentation of these stimuli in training (Amd, de Almeida, de Rose, Silveira, & Pompermaier, 2017; Amd, de Oliveira, Passarelli, Balog, & de Rose, 2018; Canovas, Queiroz, Debert, & Hübner, 2019; Debert et al., 2007; Leader, Barnes-Holmes, & Smeets, 2000; Takahashi, Yamamoto, & Noro, 2011). Derived AB relations were produced because of the effects of the simultaneous presentation of compound stimuli that differed from operant pairing (i.e., a stimulus-stimulus relation that depended on the response and consequence), which was implemented by Canovas et al. (2019) and Debert et al. (2007) and comes closer to respondent-type training (Amd et al., 2017, 2018; Leader et al., 2000; Takahashi et al., 2011).

In summary, the main outcome of the present study was that teaching six simple visual discriminations with common outcomes fostered the emergence of 27 novel relations and the formation of three four-member equivalence classes. The classes included C and D stimuli, which became discriminative (S+) as a result of direct training, and A and B stimuli, which were elements of the specific compound consequences and common to C and D. These results also confirmed previous reports, in which specific consequences functioned as nodes to merge the classes (Barros et al., 2006; Sidman, 2000; Yonkers, 2012), and extended the results to a population of children with auditory deficits. The procedure and design of this study were similar to Yonkers (2012) with regard to teaching a math program to young children with learning disabilities, in which the stimuli were numerals and arithmetic operations (cf., Pilgrim, 2020). The results of two of our children with CIs replicated the findings of Yonkers (2012) and extended them to this population and complex auditory and visual stimuli, such as sentences.

DEM’s and LAR’s positive results in auditory-visual (AB, AC, and AD) and visual-visual (BC, CB, BD, and DB) conditional relations can be interpreted as evidence that each component of the specific consequences (A and B, which served as nodes) operated as a positive element in the contingency and was incorporated as a member of the equivalence classes (Guld, 2005; Sidman, 2000; Varella & de Souza, 2014; Yonkers, 2012). Specific consequences with one or more components became members of equivalence classes when contingent on (i) simultaneous simple discrimination (as shown in the present study and Yonkers, 2012) and conditional discrimination by (ii) identity MTS (Varella & de Souza, 2015) and (iii) arbitrary MTS (Dube et al., 1987, 1989). The present results confirm and extend to children with CIs the finding that specific consequences have the potential to participate as members of equivalence classes (Barros et al., 2006; Calado et al., 2018; Dube et al., 1987, 1989; Dube & McIlvane, 1995; Guld, 2005; Johnson et al., 2014; Schenk, 1994; Sidman, 2000; Silveira et al., 2018; Varella & de Souza, 2014, 2015; Yonkers, 2012).

Our third participant, BIA, failed to consistently exhibit derived conditional and equivalence relations, although she achieved intermediate scores just below the criterion. She responded at chance level in the auditory-visual probes (AB and AD) that required picture recognition (with both representative and abstract pictures), but she did well in written sentence recognition probes (AC), which replicated findings with CI users with a well-established reading repertoire (Anastácio-Pessan et al., 2015; Lucchesi et al., 2018). BIA had a clinical condition of partial electrode insertion, the intermittent interruption of CI use, and attentional and cognitive deficits, which may have interfered with her performance. However, BIA’s data converge with previous studies that employed specific consequences and showed inconsistent results for some participants (Barros et al., 2006; Schenk, 1994; Silveira et al., 2018; Varella & de Souza, 2014), indicating the need to further control and investigate variables that may interfere with baseline acquisition (Sidman, 1960) or with the emergence of novel relations. For example, she made many more errors in discrimination training with abstract pictures (D) than in discrimination with written sentences (C). The errors may have occurred under the control of undesired stimulus dimensions (McIlvane & Dube, 1992, 2003), such as responding to the orientation (rotation of the abstract picture) cue only, regardless of the left-right position between the two elements (e.g., D1, which functioned as the S+, differed solely in its left-right position between the two elements of the D’5). A reached the learning criterion, but if irrelevant stimulus control topographies (McIlvane & Dube, 1992, 2003) occurred at baseline, then this would result in inconsistent responding with the classes in probe trials (Johnson & Sidman, 1993). Greater exposure to training and stimulus control-shaping procedures might have helped promote symbolic relations with BIA. Alternatively, a review of her accuracy in acquiring and maintaining simple discriminations (Fig. 3 and left of the dashed line in Fig. 4) suggests that BIA may have had difficulty performing arbitrary discriminations, a possibility that could be measured by the ABLA-R Test (Assessment of Basic Learning Abilities - Revised; Kerr, Meyerson, & Flora, 1977). The ABLA evaluates a hierarchy of discriminative tasks (levels 1 through 6) and has predictive value for discriminative learning in similar tasks (cf. Varella, de Souza, & Williams, 2017; Vause, Martin, & Yu, 2000). Studies have found that individuals who do well at level 3 tasks but have difficulty at level 4 rarely learn levels 5 and 6. Thus, BIA may have transitioned from performing simple discrimination to conditional discrimination as described by Dube (1996). An accurate assessment of the participants’ discriminative performance could help design teaching strategies to master more demanding levels of discrimination.

Despite BIA’s difficulties, DEM’s and LAR’s results are consistent with previous research that documented the formation of equivalence classes by pre- and post-lingual CI users with different types of stimuli (auditory, textual, and pictures) and different extensions of verbal stimuli (Almeida-Verdu et al., 2008; Anastácio-Pessan et al., 2015; da Silva et al., 2006; Lucchesi et al., 2018) and specifically converge with the results of studies that used sentences as stimuli (Neves et al., 2018; Silva et al., 2017). Overall, the present findings contribute to systematic replication of the phenomenon (i.e., equivalence class formation with sentences). In previous studies, equivalence classes in CI users derived from direct auditory-visual conditional discrimination training (Almeida-Verdu et al., 2008; Anastácio-Pessan et al., 2015; Lucchesi et al., 2018; Neves et al., 2018; Silva et al., 2017). The simple discrimination training and specific consequence procedure in the present study also generated equivalence class formation. Therefore, the present findings add to the literature on the symbolic function of auditory stimuli via CIs (Almeida-Verdu & Golfeto, 2016, for a synthesis) and suggest that children with CIs can acquire auditory-visual symbolic relations through contingencies of four terms (conditional discrimination) and three terms (simple discrimination, with specific consequences). The simple discrimination procedure with specific consequences may be a viable alternative to program teaching contingences for learners who fail at conditional discrimination (Pilgrim, 2020; Pilgrim et al., 2000; Sidman, 2000). Further studies and replications should be performed to assess the effectiveness and generalizability of the present data to support teaching decisions.

As argued before, acquisition of the symbolic function of auditory stimuli (auditory recognition and comprehension) is crucially important for children with auditory deficits and CI users. These children can make progress with these repertoires, but they usually encounter difficulties in the learning process and lag behind their typically developing peers in oral language acquisition (Almeida-Verdu et al., 2008; Houston et al., 2012; Lund, 2016; Moog & Stein, 2008). Therefore, there is a demand for evidence-based interventions for the auditory rehabilitation of this population (Spencer & Marschark, 2010). Behavior analysis has contributed to the development and evaluation of teaching procedures (Pilgrim, 2020; Sidman, 2010; Stromer, Mackay, & Stoddard, 1992) and identified both strengths and difficulties in teaching contingencies for children with CIs (Almeida-Verdu & Golfeto, 2016; Lucchesi et al., 2018), some of which are shared with other populations, such as young children who have difficulty in acquiring conditional discriminations (Pilgrim, 2020; Pilgrim et al., 2011). Considering the needs of this population, the present procedures suggest a new alternative to promote auditory comprehension, in which auditory-visual and symbolic conditional relations can be achieved through simple discrimination by using specific compound consequences with combined auditory and visual components (Guld, 2005; Johnson et al., 2014; Varella & de Souza, 2015; Yonkers, 2012), even in people with a history of the early and long-lasting deprivation of auditory stimulation, followed by CI surgery. However, the potential utility and scope of this procedure is limited because of the small number of participants and their various entrance repertories. Data should be replicated and better examined to extend the variety and complexity of stimuli, baseline verbal repertoires, and number of participants, among other variables.

One other aspect that deserves consideration is that the participants in the present study had slightly more accurate performance in emergent auditory-visual relations that involved written sentences than in emergent visual-visual relations. Control by auditory stimuli was well-established and can be related to both experimental and extra-experimental variables. LAR and BIA received CIs during a sensitive period of neuroplasticity, and their performance in auditory tasks may reflect auditory sensorioneural development as suggested in the audiological literature (Kral & Sharma, 2011). This performance may be a function of extra-experimental contingencies that systematically teach discriminative control by auditory stimuli, such as educational and rehabilitation settings. Concerning the training contingencies, the programmed amount of training and systematic exposure to auditory stimuli as one of the components of specific consequences (as a node) engendered equivalence class-consistent responding. Several studies reported that emergent crossmodal conditional relations do not necessarily depend on the training of crossmodal conditional relations. Varella and de Souza (2015) described the emergence of auditory-visual (crossmodal) relations after visual-visual (unimodal) identity matching training with auditory stimuli that were used as specific consequences. The results of the present study can be seen as a systematic replication (Sidman, 1960) of that study, thus increasing the generalizability of the data.

One limitation of the present study was that only one baseline test was conducted before training began. Future research should incorporate more baseline measures to ensure the stability of the dependent variable (DV) before presenting the independent variable (VI). Furthermore, a multiple-baseline design across participants could clarify the effect of training in a more controlled fashion (Gast, 2010). Participants’ verbal baseline repertoires, especially reading and writing, could be better controlled to assess isolated and combined effects of training.

Despite these limitations, systematic data from two participants suggest the strong applied potential for this procedure. Future research should investigate the possibility of incorporating the present procedure into instructional technologies to program Equivalence Based Instruction or EBI (Cooper et al., 2020; Critchfield et al., 2018; Pilgrim, 2020) and contribute to stimulus-control research (Almeida-Verdu & Golfeto, 2016) and evidence-based practice (Spencer & Marschark, 2010).