Introduction

When adults simultaneously see an optically distorted size and feel another size of the same object, their estimates of the perceived size tend to be between the separate visual and haptic sizes (Helbig & Ernst, 2007; Hershberger & Misceo, 1996). To predict whether the perceived estimate leans toward either the visual or the haptic cue, Welch and Warren (1980) proposed a modality-precision hypothesis. It holds that the immediate response to discordant intersensory cues favors the more reliable modality. Ernst and Banks (2002) quantified the hypothesis with a computational model that minimizes the variance (i.e., maximizes precision) of the merged estimate to improve veridical performance. Surprisingly, this understanding of intersensory relations has been built mainly on distorting the accuracy and reliability of the visual cue. Rarely has the precision hypothesis been examined when touch is made unreliable and when the separate sources of stimulation arise from actual (non-illusory) discordances between the seen size and the felt size.

Specifically, the precision hypothesis has been repeatedly supported with procedures that optically distort the seen size (or shape) of an object (Ernst & Banks, 2002; Gepshtein & Banks, 2003; Helbig & Ernst, 2007). For example, Rock and Victor (1964) had participants only once simultaneously feel through a hand-concealing cloth and see through a minifying lens a thin square whose optical size was half its haptic size. They found that participants’ subsequent estimates of the perceived size were determined visually, whether their estimates (matches) were made using only sight or only touch. However, studies since Rock and Victor’s seminal experiment suggest that the visual dominance (i.e., non-integration) found by them may have arisen not from a natural superiority of vision over touch in processing spatial information (i.e., a modality-appropriateness hypothesis), but from a discounting (ignoring) of the intersensory cues as arising from a single source due to the large intersensory discrepancy (Ernst & Bülthoff, 2004; Gepshtein, Burge, Ernst, & Banks, 2005; Welch & Warren, 1980).

Some studies, however, have ruled out response biases arising from the visually concealed hand and from a large discordance. It has been shown that discordant visual and haptic cues merge when the hand is either visually cloaked or uncloaked and when the optical size is half its haptic size (Helbig & Ernst, 2007; Heller, Calcaterra, Green, & Brown, 1999; Hershberger & Misceo, 1996; Jovanovic & Drewing, 2014). Other studies have shown that what generally matters for observing cue binding are the spatial coincidence of the concurrent intersensory stimulations and the cue sampling of the haptic information through repeated estimation of the felt object (Gepshtein et al., 2005; Heller et al., 1999; Van Doorn, Richardson, Wuillemin, & Symmons, 2010). What yet remains unexamined is whether discordant cues arising from distorted touch matters for cue combination and whether, although frequently invoked to explain sensory independence (Ernst & Banks, 2002; Jovanovic & Drewing, 2014), the intersensory interaction varies with the objective difference between the felt size and the seen size (cf. Takahashi & Watt, 2014).

Studies that used actual size discordances have found the perceived size to display an incomplete fusion of the separate cues, i.e., estimates made either visually or haptically were biased in the direction of the modality of the estimate (Derrick & Dewar, 1970; Klein, 1966; McDonnell & Duffett, 1972). Such response effects also have been found with optically distorted sight so as to make the separate cues to be perceived as arising from one source (Helbig & Ernst, 2007, 2008; Heller et al., 1999; Hershberger & Misceo, 1996; Jovanovic & Drewing, 2014). Whether similar response effects would occur when the haptic cue is distorted was examined by Misceo and Jones (2017). They had participants see one size and feel another unseen size either with bare fingers or with fingers sleeved in rigid tubes to decrease haptic precision. Participants’ subsequent visual estimates of the felt size leaned toward the visual size and the estimates of the seen size leaned toward the haptic size as the haptic explorations decreased in precision. Also, the effect size of the manipulated explorations for each modality was reciprocally symmetrical (Cohen’s d ≈ .40). Yet, even though the results showed the response modality and touch imprecision to interact, its effect size may have been weak (\( {\upeta}_{\mathrm{p}}^2 \) = .12) and, more troublesome, the 95% confidence intervals (CIs) for the means of each response modality overlapped appreciably across the different exploratory haptics (Cumming, 2013), especially for the estimates of the seen size. Those overlaps suggest that the combined estimate of the seen size was only slightly influenced by touch imprecision. This greater influence of vision on touch could mean that the observed interaction represented a response-bias effect, as participants could have ignored touch when judging the seen size but they could not have ignored sight when judging the felt size. Also, cue discounting has been thought to be likely with large intersensory discrepancies, presumably because they promote the inference that the inputs arise from separate sources (Ernst & Banks, 2002; Ernst & Bülthoff, 2004; Jovanovic & Drewing, 2014).

Consequently, the main purpose of the present experiments was to examine whether the observed group-by-modality interaction was a trivial response bias arising from objectively discordant intersensory cues. If the interaction solely reflects a discounting bias because the perceptual system rejects the inputs as originating from the same source (Ernst & Banks, 2002; Ernst & Bülthoff, 2004), then the effect size of the interaction should remain about the same whether the discordance is small or large, because only one modality would predominate the response. This expectation partly arises from Jovanovic and Drewing’s (2014) finding that children and adults were more likely to combine separate cues when their discrepancy was small (25%) rather than large (50%). Conversely, if the interaction evinces a cue-merging effect, then the effect size should change as the intersensory discrepancy increases. Indeed, even if the combination is suboptimal, even under conditions expected to promote the discounting of their combination, a modality bias should still be reflected in their combination. For even though Jovanovic and Drewing found the composite response to be modality dependent, a cue combination model more faithfully matched their data than a discounting (cue switching) model.

Accordingly, then, Experiment 1 varied the intersensory discordance amount (0%, 50%, and 85%) to observe changes in the response modality by precision interaction. Experiment 2 further tested the presence of the interaction in the absence of intersensory inputs. Specifically, Experiment 2 examined whether the response modality would interact with haptic precision when stimuli are solely felt with either bare fingers or tube-sleeved fingers. Observing an interaction effect in the absence of multisensory cues would confidently suggest that the effect is dependent on the response measure itself (i.e., discounting) rather than on an underlying computational binding process. Decisions regarding either possibility were based on the behavioral measures of perceived size, not to decide if the modality reliability weights mediate integration, but to provide supporting evidence for intersensory convergence (Stevenson et al., 2014).

Experiment 1

Method

Participants

Forty-six undergraduates (21 males and 25 females) enrolled in general psychology classes volunteered for the study after the local ethics board (IRB) approved the study. Based on the effect size (\( {\upeta}_{\mathrm{p}}^2 \) = .12) finding for the group-by-modality interaction in previous research, the G*Power application with alpha set at 5% and power set at 80% indicated that a sample size greater than 24 would adequately safeguard against false positives (Faul, Erdfelder, Lang, & Buchner, 2007; Misceo & Jones, 2017).

Procedure

Misceo and Jones (2017) described the details of the experimental apparatus; what follows is a summary of their apparatus as adapted for the present studies. Briefly, volunteers inspected pairs of black plastic cubes. The pairs had the same height (5 mm), but their widths and lengths could be 35, 40, 50, 60, and 65 mm. Hereafter these cubes are called squares because they varied in only two dimensions and because equilateral rectangles can be fully described by an edge length. Square pairs were placed in an inspection box (42 × 52 × 52 cm). It had two compartments separated by a nontransparent white surface. One square was placed on the floor of the top compartment, and another square was placed in the bottom compartment. The bottom square was held in midair with a long steel rod attached to a side and was spatially aligned below the square in the top compartment. The other end of the rod was fastened to a retort stand on the back side of the inspection box. Participants could simultaneously inspect each square by looking at the top square through a viewing tube (8 cm) while actively feeling the edges of the bottom square with all fingertips of their hand. They manually grasped the edges of the bottom square by inserting their writing hand in an opening (15 × 15 cm) on the front of the inspection box. Figure 1a shows a participant looking at one square and feeling another one in the separate compartments of the inspection box.

Fig. 1
figure 1

(a) A participant simultaneously looking at one size located in the top compartment and feeling another unseen size located in the bottom compartment of the inspection box. (b) The ways particitants explored the haptic stimulus, either with bare fingers or with fingers sleeved with rigid tubes. (Adapted from “Again, knowledge of common source fails to promote visual-haptic integration,” by G. F. Misceo, S. V. S. Jackson, and J. R. Purdue, 2014, Perceptual & Motor Skills, 118, p. 186. Copyright 2014 by Sage Publications)

Participants were assigned to one of two randomized groups. One looked at one square and felt another square with their bare (B) fingers, and the other looked at one square and felt another one with rigid PVC tubes (T) sleeved over their fingers. Before inspecting the squares, participants chose five tubes that best fit their fingers from a set of 19, which were all 3 mm thick but varied in diameter (1.27, 1.90, and 2.54 cm) and in length (6, 7, and 8 cm). Figure 1b shows the ways participants explored the square in the bottom compartment of the inspection box.

All participants, after looking and haptically exploring the squares, estimated either the felt (H) size or the seen (V) size of each square. The differences between the seen size and the felt size were 30, 20, and 0 mm on a side (i.e., 35/65H, 40/60H, 50/50H, 60/40H, 65/35H, 65V/35, 60V/40, 50V/50, 40V/60, 35V/65). Each participant received the ten pairs in a randomized order, and all estimated twice the felt size and twice the seen size of each pair. The 20 test-trials comprised a 5 (size pair) × 2 (matching modality) × 2 (practice) design.

Note that if either the seen size (35V/65 and 65V/35) or the felt size (35/65H and 65/35H) of each size in each stimulus pair is veridical, the average of the two estimates would be around 50 mm. Alone, however, the 50-mm response cannot decide whether the estimate represents a response effect or a perceptual effect, for either modality independence or modality convergence could give rise to the intermediate response value. To decide between the two possibilities, one must compare the composite estimates with and without encumbered fingers. For example, if the tubes hobble the precision of touch, the perceived size for stimuli explored with the tubed fingers should be below those explored with the bare fingers.

It also should be noted that the same-sized pair (50/50) offered participants the opportunity to confirm the instructions they received before initiating their task, as they were told that they would examine squares whose sizes “may or may not be the same,” i.e., the cues could arise from the same source. Additionally, the size-congruent pair allowed a check on the efficacy of the tube manipulation, as the estimates of felt size should be more variable with clouded than with unclouded touch. Finally, observing a modality-by-precision interaction with physically nondiscordant pairs would support a cue-merging interpretation of the interaction, if the tubes clouded haptic perception.

Participants began their inspection of each stimulus pair by standing in front of the inspection box. After the experimenter asked if they were “ready” and after they responded “yes,” the experimenter said “go,” which signaled them to remove with their non-writing hand a flap over the viewing tube and to simultaneously view one square and feel another unseen square with their writing hand. They were told to “get a good feel and a good look for the size of the square” and to “make sure that you actively explore the square with the tips of all five fingers, like this … .” The experimenter showed the participants how to actively explore the edges of a practice (wooden) square with naked fingertips by moving the fingers along its edges.

During the inspection trials, the experimenter monitored the participant’s compliance with the scripted instructions. Of 20 monitored participants, 95% moved their fingers along the edges of a square. Two active explorations were a contour-following (63%) metric and a finger-span (37%) metric. Only 5% of the participants propped the square by repeatedly gripping its edges (Klatzky & Lederman, 1993; Klatzky, Lederman, & Matula, 1993). Also monitored was whether participants looked at the stimulus. The experimenter occasionally glanced at an image of the participant’s head produced by a beveled mirror. Figure 1 shows the beveled aperture holding the mirror on the experimenter occluding screen. No participant was observed looking away from the inside of the inspection box while manually grasping a stimulus.

Participants were told to stop inspecting the squares after 5 s had lapsed on a stopwatch from the time their fingers made contact with the haptic stimulus. Afterward, they replaced the flap over the viewing tube and turned around to look at a visual display of comparison sizes. These visual comparisons (VCs) were 5-mm thick black plastic squares whose sizes ranged on a side from 25 to 75 mm in 5-mm increments. The comparisons were arrayed on a board (56 × 50 cm) in a circular order of size. The board was attached to a wall at eye level about 2.5 m from the inspection box. Participants read aloud an alphabetic letter just below each comparison size to identify the one that “best matched” what they had either felt or seen. Which match they would make was announced immediately after the inspection period to prevent them from anticipating the response type. Participants never received feedback about the accuracy of their responses. No hypothesis was entertained about the two practice trials. Their purpose was solely to improve the reliability of the estimates to appraise the effect of primary interest – the interaction of the between-group (precision) variable with the within-group (response modality) variable.

After participants completed the 20 test trials, they completed four (unimodal) control trials to verify the reproducibility of the estimates of the bimodally explored 50/50 pair with the bare figures. Participants either only viewed a 50-mm stimulus (VS) or only grasped with bare figures a 50-mm stimulus (HS). They estimated the seen size twice and the felt size twice from the visual comparison set (VC). Each participant rendered a counterbalanced order of the modality estimates (e.g., HVVH or VHHV).

Results

Statistical procedures

A side length (mm) of the comparison match served as the measure of perceived size, and the estimates of perceived size for each discordant pair (0, 20, and 30 mm) were analyzed in three separate 2 (group) × 2 (modality) × 2 (practice) mixed ANOVAs. The anticipated between-group heterogeneity of variance motivated mindful data analyses to meaningfully interpret the observed effects. For example, the ANOVA partial eta squared (\( {\upeta}_{\mathrm{p}}^2 \)) was used to compare the efficacy of the manipulations across the three analyses and to explain the variability associated with the group-by-modality interaction independently of the effects from other incidental variables (Keppel & Wickens, 2004). Moreover, Cohen’s d values were calculated by dividing the difference between the treatment mean and the bare fingers mean by the standard deviation of the bare fingers mean, not the pooled standard deviation (Cumming, 2012). It was simply more meaningful to finely gauge the efficacy of the precision manipulation against a measure of uncertainty expected in the untreated group. Similar reasoning applied to the computations of the standard errors for the t-values. Finally, even though the statistical assumptions underlying NHST were expected to be unattainable, null tests were nonetheless undertaken when the assumptions were statistically warranted (α set at .05).

Descriptive statistics

Before presenting the findings of main interest, an overview of the results is worthwhile, especially those bearing on the validity of the between-group manipulation. It was expected that the size estimates would be less precise if the sizes were explored with tubed fingers than with bare fingers (Misceo & Jones, 2017). This expectation was affirmed by inspecting the between-group standard deviations (SDs) for each stimulus size, matching modality, and practice combination (i.e., 2 × 5 × 2 × 2 mixed design). The results showed that 70% of the SDs of the tubed-fingers group were greater than the SDs of the bare-fingers group and that 50% of the SDs of the felt estimates of the tubed-fingers group were greater than the felt estimates of the bare-fingers group. The unexpected modest 50% figure arose mainly from three responses by two participants who, when asked to match the felt size, matched instead the visual size (e.g., 65/35H) or, when asked to match the seen size, matched instead the haptic size (e.g., 65V/35). Such inverse matching inflated the SDs of the estimates for the bare-fingers group. These clear cases of cue-switching (i.e., maximum response bias) were retained in the data set to give a complete account of the possible responses. On the whole, then, the dispersion of the estimates showed that the tubes decreased the precision of the size estimates.

Size congruent data

Exploratory analyses of the responses for the congruous stimulus pair (50/50) showed few (12.5%) between-group heterogeneous variances and many (50%) symmetrically distributed responses. On average, the estimates of the seen size (M = 45.65, SD = 2.85) were smaller than those of the felt size (M = 48.27, SD = 4.59). As expected, the estimates of the felt square by the tubed-fingers group were smaller (M = 47.93, SD = 5.03) than those by the bare-fingers group (M = 50.76, SD = 6.59). These estimates of the bare-fingers group agreed with the estimates of the seen-only and the felt-only 50-mm square (see planned analyses below). The unimodal data showed only a modality main effect, F(1, 45) = 25.98, p < .001 (\( {\upeta}_{\mathrm{p}}^2 \) = .36). The felt-only size rendered visually was greater than the seen-only size rendered visually, respectively, MHS/VC = 49.08, SDHS/VC = 5.58 and MVS/VC = 45.76, SDVS/VC = 4.24. These baseline estimates verified those of the bare figures group for the size congruent squares for the felt size, t(22) < 1, (d = .17, SDBH = 6.59), and for the seen size, t(22) < 1 (d = .05, SDBV = 3.89). Thus, modality independence in the estimates of the discordant inputs could be readily compared across the haptic manipulation.

Figure 2a shows the group-by-modality interaction for the congruent size pair (50/50). Note that the average estimated felt size decreases (M = 50.76, SD = 6.59 and M = 47.93, SD = 5.04) and the average estimated seen size slightly increases across the groups (M = 45.43, SD = 3.89 and M = 46.85, SD = 5.01). With the ANOVA assumptions reasonably met, the analysis showed a modality main effect, F(1, 44) = 12.23, p = .001 (\( {\upeta}_{\mathrm{p}}^2 \) = .21), and a group-by-modality interaction, F(1, 44) = 5.34, p = .026 (\( {\upeta}_{\mathrm{p}}^2 \) = 0.11). It was expected that the tubed fingers would alter the accuracy and the dispersion of the estimates. Accordingly, planned analyses used the estimates of the bare (B) fingers group as the reference distribution to estimate the effect magnitude of the manipulation. The between-group comparisons showed that the tubed fingers biased the estimates of the felt size toward the visual size, t(22) = 2.06, p = .025 (d = .43, SDHB = 6.59), and the estimates of the seen size toward the haptic size, t(22) = –1.75, p = .047 (d = –0.36, SDVB = 3.89). Evidently, even though the inspected sizes were physically the same, their sensed sizes were not the same, for the tubes reduced the accuracy and the precision of the felt estimates.

Fig. 2
figure 2

Mean estimated felt size and seen size for the bare-fingers group and tubed-fingers group. (a) The group-by-modality interaction for the nondiscordant pair (50/50). (b) The interaction for the 20-mm discordant pairs (40/60 and 60/40), and (c) the interaction for the 30-mm discordant pairs (35/65 and 65/35). Error bars show 95% confidence intervals based on the estimates in each cell

Small discordance data

Misceo and Jones (2017) tested the precision hypothesis with a 20-mm visual-haptic discordance. To assess the reproducibility of their findings, Fig. 2b displays the group-by-modality interaction for the small (50%) discordance. Note that the boundaries of the 95% CIs about the means are wider for the felt size than for the seen size, especially when the stimuli were explored with sleeved fingers. Also, similar to Misceo and Jones’ findings, Fig. 2b shows the average estimated felt size to decrease (M = 48.21, SD = 3.95 and M = 46.63, SD = 5.95) and the average estimated seen size to increase (M = 44.73, SD = 2.32 and M = 45.98, SD = 4.28) across the groups. An ANOVA showed a response modality main effect, F(1, 44) = 10.97, p = .002 (\( {\upeta}_{\mathrm{p}}^2 \) = .20), and a group-by-modality interaction, F(1, 44) = 5.14, p = .028, \( \Big({\upeta}_{\mathrm{p}}^2 \) = .11). The modality main effect indicated that the felt size (M = 47.42, SD = 5.06) was bigger than the seen size (M = 45.35, SD = 3.46).

An inspection of the variabilityFootnote 1 of the estimates in each cell of the 2 × 2 mixed design pooled over practice showed that the response distributions were generally symmetrical (75%) and that the between-group variances were heteroscedastic. The planned between-group comparisons indicated that the haptic explorations with sleeved fingers biased the estimates of the felt size toward the visual size, t(22) = 1.93, p = .033 (d = .40, SDHB = 3.95) and the estimates of the seen size toward the haptic size, t(22) = –2.58, p = .008 (d = –.54, SDVB = 2.32). The d-values, taking into account the inflated errors in the estimation of the seen size, suggest that the bias strength of the manipulation for each response modality was symmetrically reciprocal.

Large discordance data

Figure 2c shows the group-by-modality interaction for the 30-mm (85%) intersensory discordance. Generally, the boundaries of the 95% CIs about the means are wider for the estimates of the felt size than for the seen size, and the interaction shown in Fig. 2c corroborates that shown in Fig. 2b. For example, Fig. 2c shows the average estimated felt size to decrease (M = 50.00, SD = 5.98 and M = 47.17, SD = 4.53) and the average estimated seen size to slightly increase (M = 45.11, SD = 2.96 and M = 46.30, SD = 3.12) across the group manipulation. An ANOVA showed a modality main effect, F(1, 44) = 12.15, p < .001 (\( {\upeta}_{\mathrm{p}}^2 \) = .22), and a group-by-modality interaction, F(1, 44) = 5.92, p = .019 (\( {\upeta}_{\mathrm{p}}^2 \) = .12). The modality main effect indicated that the felt size (M = 48.59, SD = 5.44) was bigger than the seen size (M = 45.71, SD = 3.07).

An inspection of the variabilityFootnote 2 of the estimates in the 2 × 2 mixed design pooled over practice showed 50% of the response distributions were symmetrical and the between-group variances were homoscedastic. Again, using the estimates of the bare (B) fingers group as the reference distribution, the observed convergence in the modality estimates suggests that the haptic explorations with the sleeved fingers biased the estimates of the felt size toward the visual size, t(22) = 2.26, p = .017 (d = 0.47, SDHB = 5.98), and the estimates of seen size toward the haptic size, t(22) = –1.92, p = .033 (d = –0.40, SDVB = 2.96). Also, the d-values suggest that the bias strength of the manipulation was symmetrically reciprocal.

Discussion

Experiment 1 examined whether the effect size of the group-by-modality interaction would increase as the intersensory discordance increases. Figure 2 shows that the interaction pattern remains about the same across the intersensory size differences. Additionally, the group and the modality variables together accounted for about 11% of the response variability (cf. also Misceo & Jones, 2017). Thus, even though sight biased touch and touch biased sight as touch precision decreased, the stable effect size across the discordances seems contrary to the presumed notion that the group-by-modality interaction represents a perceptual effect whose magnitude would depend on the visual-haptic size discordance (Jovanovic & Drewing, 2014).

Yet, it is conceivable that the stable effect sizes affirm the precision hypothesis if the bias depends not on the size of the discordance but on the degree of the haptic imprecision. Indeed, optimal models of integration postulate shifts in the composite perception in the direction of the least variable sensory cue, not in the direction of the least discordance. But Experiment 1 varied the discordance without varying the nature of the haptic explorations for hobbling touch precision. Although Jovanovic and Drewing (2014) optically varied the intersensory discordance, they too did not vary the visual noise. It may then be unwise to expect the interaction strength to grow when the precision of touch (or vision) remains unchanged.

It is worth noting that Jovanovic and Drewing (2014) found cue combination, though suboptimal, to vary with the difference between the felt size and the optically distorted seen size. However, whether the intersensory discordance was small or large, the composite estimate was still in between the seen-only and the touched-only sizes. Whereas the children’s estimates leaned more toward the seen size than the adults’ estimates, the adults’ estimates were much closer than the children’s estimates to the combined (weighted) average value of the separate visual and haptic sizes. It remains uncertain, though, if the children’s suboptimal estimates can unambiguously be attributed to the large intersensory discordance. Evidently, the response scale for the large discrepancy, unlike the scale for the small discrepancy, did not capture the middle value of the seen-alone and the felt-alone sizes. Response scales for the large discordance were either one unit below or one unit above the (absent) middle value of the seen-alone and the felt-alone sizes. Consequently, the confounded response scale with the size discrepancy lessens confidence in the view that children are less likely than adults to combine cues arising from large size discrepancies.

Nonetheless, Jovanovic and Drewing’s (2014) findings from the adults corroborate the present results, which show that young adults may combine the intersensory cues regardless of their size discrepancy. Furthermore, that even the estimates of the congruent stimulus pair (Fig. 1a) showed the group-by-modality interaction further suggests that response bias cannot alone explain the present findings. Consequently, Experiment 2 further explored the group-by-modality interaction as a response rather than a perceptual effect. Experiment 2 used procedures that would prevent the occurrence of intersensory interactions. For example, squares explored only haptically (HS) and matched using either haptic comparisons (HC) or visual comparisons (VC) could answer whether the responses interact with touch precision in the absence of vision as well as whether the crossmodal estimates (HS/VC) change with touch precision. It was expected that touch precision would matter only for the intramodal haptic estimates (HS/HC).

Experiment 2

Method

Participants

Thirty-two undergraduates (16 males and 16 females) enrolled in general psychology courses volunteered for the IRB-approved study.

Procedure

Participants were assigned to one of two randomized groups. One examined squares of various sizes with their bare fingers, and another examined the same sizes with rigid tubes sleeved over the fingers. The haptically explored squares were, on a side, 30, 40, 50, and 60 mm (M = 45, SD = 13). Each was horizontally suspended in midair inside a modified inspection box. An opaque surface fastened to the top of the box prevented participants from seeing a square inside the box. They could only manually explore each square by reaching into the opening on the front of the box to grasp the edges of the square with the fingertips of their writing hand. In Experiment 2, the experimenter also monitored the haptic exploratory strategies (Klatzky & Lederman, 1993). More participants actively grasped (75%) than passively propped (25%) the squares. Frequently, participants actively felt the edges of a square with all their bare fingers (53.3%) or tubed fingers (68.7%), and these participants exhibited either a contour-following (edge-length) strategy (79.2%, N = 24) or a finger-span strategy (16.6%). Other than repeated gripping, the remainder (4.2%) of the participants showed an undiscernible metric to their haptic explorations.

After participants haptically inspected each square for 5 s, they found a “best match” either haptically or visually. They were never told to find the best match for what they had either “felt” or “seen.” Instead, they were told to find their match from either the “wall” (VC) or the “draped box” (HC). These comparisons ranged in size from 20 mm to 70 mm in 5-mm increments. The range was arrayed horizontally in a draped box (100 × 31 × 29 cm). Figure 1 shows the draped box next to the inspection box. When participants found a “match from the draped box,” they placed their writing hand under the drape and freely felt the linearly arrayed sizes for the best match. Participants always used their naked fingers to find a comparison match by saying “this one.” As in Experiment 1, the circular array of the visual comparisons was attached at eye level on a wall behind the participant. When asked to “find a match from the wall,” they turned around, viewed the arrayed sizes, and read aloud a letter to identify their match.

Each participant estimated the four sizes twice, once using the haptic comparisons and once using the visual comparisons. The eight trials were repeated twice in a 2 (group) × 4 (size) × 2 (comparison) × 2 (practice) mixed design. Participants never received feedback about their matches, and the main interest of Experiment 2 remained the group-by-modality interaction.

Results

Figure 3 shows the mean estimated size rendered with the haptic comparisons (HC) and the visual comparisons (VC) for each type of haptic exploration. Note that the boundaries of the 95% CIs overlap appreciably for the haptic estimates and for the visual estimates across the groups. Recall too that it was expected that touch precision would not matter for the crossmodal visual estimates, but would matter for the intramodal haptic estimates.Footnote 3

Fig. 3
figure 3

Mean estimated size of the haptic stimuli with either haptic comparisons (HC) or visual comparisons (VC) for the bare-fingers group and tubed-fingers group. Error bars show 95% confidence intervals based on the estimates in each cell of the group-by-modality mixed design

Figure 3 shows the estimates to diverge as haptic precision decreases when the estimates were rendered haptically (HC) rather than visually (VC). On average, the haptic estimates of the felt size, M = 42.67, SD = 4.36, 95% CI [41.24, 44.11], were smaller than the crossmodal visual estimates of the (unseen) haptic size, M = 46.11, SD = 3.77, 95% CI [44.68, 47.55]. Also, the estimates of the bare-fingers group made haptically were less than those made visually (respectively, M = 43.92, SD = 1.91 and M = 45.87, SD = 2.66). But the estimates of the tubed-fingers group made haptically were less than those made visually (respectively, M = 41.42, SD = 5.68 and M = 46.36, SD = 4.71), and less than those made by the naked-fingers group. Although the ANOVA assumptions of normality and sphericity were statistically satisfied, the between-group variability of the intramodal (HS/HC) estimates was heterogeneous. Thus, Cohen’s d relied on the standard deviation of the bare-fingers group. Results showed that the manipulation was seven times more effective when the estimates were made with the haptic comparisons, t(15) = 5.23, p < .001 (dHC = 1.30, SDBH = 1.91), than with the visual comparisons, t(15) = < 1, p = .23 (dVC = –0.18, SDBV = 2.66).

Discussion

Experiment 2 results suggest that intersensory cues may be necessary for observing a group-by-modality interaction. Support for this view can be seen from the way the interaction differs across the experiments. Experiment 1 found the average estimate of the felt size rendered visually (HS/VC) to be greater than the average estimate of the seen size rendered visually (VS/VC). Conversely, Experiment 2 found that the average estimate of the felt size rendered haptically (HS/HC) was smaller than the average estimate rendered visually (HS/VC). Also, Fig. 3 shows that the mean estimate rendered haptically is generally below the estimate rendered visually. This group-by-response pattern was unlike the pattern observed in Experiment 1. In contrast to the latter pattern of the means, the ordinal inversion of the means in Experiment 2 occurred for practically every stimulus size. Lastly, Experiment 1 found a convergent group-by-modality interaction, but Experiment 2 found a statistically nonsignificant divergent group-by-modality interaction, F(1, 30) = 3.67, p = .065 (\( {\upeta}_{\mathrm{p}}^2 \) = .11). Finding the explorations of the felt size with tubed fingers to more likely change the estimates of the felt size than the estimates of the seen size indicates that the findings of Experiment 2 run contrary to the results of Experiment 1.

General discussion

The present experiments examined whether size perception would be weighted by the visual cue as the reliability of touch decreases. Past studies using objective size differences without distorting haptic perception found the strength of the sensory bias to depend on the modality of the response (Klein, 1966; McDonnell & Duffett, 1972). Figure 2 shows that, when size explorations are kinesthetically impaired, the estimated felt size and the estimated seen size converge to be midway the respective estimates of the naked-fingers group. This convergence supports the hypothesis that perceived size favors the more reliable modality.

Varying the amount of the discordance mattered neither for the general pattern of the interaction nor for the strength of its effect. Also, the interactions replicated the observations of others (Helbig & Ernst, 2007; Misceo & Jones, 2017). Replicated too was the uncertainty associated with whether the strength of the bias is symmetrical. Misceo and Jones found that when touch was distorted the visual size biased the estimates of the felt size more than the converse. Figure 2 displays similar asymmetries in the strength of the modality bias. Specifically, the change in the felt estimates relative to the change in the seen estimates was 1.9 times more for the congruous sizes, 1.3 times more for the small discordance, and 2.3 times more for the large discordance. These modality asymmetries (together with the overlaps of the 95% CIs) may give credence to the notion that touch is more likely to be discounted than vision, for felt size need not be taken into account to estimate the directly registered seen size.

Experiment 1, however, suggests that the observed interactions may not be entirely consistent with a discounting interpretation. Cohen’s d values for each response modality were reciprocally symmetrical for the congruous sizes (M|d| = .39) and for the small (M|d| = .46) and large intersensory discordances (M|d| = .43). These effect sizes are comparable to those found by Misceo and Jones (2017). Experiment 2 also offers additional evidence contrary to the response-bias interpretation of the interactions. Recall that participants in Experiment 2 only touched either with bare fingers or with tubed fingers different sizes, and after exploring each size, they estimated its perceived size using either haptic comparisons or visual comparisons. Although they did not match (as in Experiment 1) either the “felt” or the “seen” size, they nonetheless made analogous matches from the separate comparison sets without having their attention expressly drawn toward either response type. It was expected that the intramodal estimates (HS/HC) would be smaller as well as more variable for the tubed-fingers group than for the naked-fingers group.

Results support both predictions, which together validate the expectation that the manipulation impaired the accuracy and the precision of haptic perception. For example, Fig. 3 shows considerable (> 60%) overlaps of the 95% CIs for the “seen” (VC) estimates, and the crossmodal responses (HS/VC) did not vary much in the way the squares were haptically explored. More importantly, Fig. 3 shows the response modalities to diverge, especially the “felt” (HC) estimates, across the group manipulation. This apparent interaction, although not statistically significant, was quite different from the interactions observed in Experiment 1. It found a convergence of the responses across the group manipulation. Additionally, Experiment 1 found the felt estimates to be greater than the seen estimates. Both qualities of the interaction were the reverse of those observed in Experiment 2. Finally, the effect size (dHC = 1.30) of the group manipulation for the “felt” (HC) estimates in Experiment 2 was about three times that of the average effect size (M|d| = .43) in Experiment 1 and, in Experiment 2, the manipulation was more effective at changing the “felt” estimates than the “seen” estimates (dVC = –.18).

Altogether the findings suggest that the reliable changes in the modality estimates across the precision manipulation cannot be fully explained as artifacts of the response modality. Evidence instead favors the explorations with sleeved fingers biasing the estimates of the felt size toward the visual size and the estimates of seen size toward the haptic size. The reciprocally symmetrical effect sizes support this interpretation of the intersensory interaction. Additional support comes from the congruent sizes. Observing the interaction with physically identical sizes shows that what matters for cue merging is the sensed discordance, not the amount of the discordance. Conversely, the interaction asymmetry in Experiment 2 suggests that the estimates of the “seen size” (HS/VC) were released from touch, for participants never experienced bimodal sensory inputs. In other words, the symmetries may have arisen from the coupling of the intersensory cues, whereas the asymmetries may have arisen from the uncoupling of touch from vision, and the uncoupling manifests itself in the greater efficacy of the tubes at altering touch perception.

The above interpretations could be strengthened by examining whether the symmetries but not the asymmetries correlate with EEG oscillatory responses of brain regions. It is known, for example, that dynamic interactions of neuronal populations, leading to synchronized oscillatory (γ-bands) firing patterns, play a key role in mediating cue coupling when the oscillations are phase coherent across the modalities (Chen et al., 2017; Sarko, Ghose, & Wallace, 2013; Senkowski, Schneider, Foxe, & Engel, 2008). The observed symmetries would then be expected to concur with the synchronized phase-coherent oscillations.