Introduction

Our sensory systems provide us with timely and reliable warnings about consequential changes in our surroundings. Increasingly, such changes come from technologies that emit sensory alerts to notify their users. One way technologies can create salient signals is by generating rhythmic patterns of light, sound, or vibrations. Our growing reliance on auditory and vibrotactile signaling devices like cell phones makes it important to understand the capabilities and limitations of these sensory systems as communication channels for temporal information (Meng & Spence, 2015).

Surprisingly little is understood about the potential of the skin, our largest sensory organ, for communicating temporal information. An influential essay by von Békésy (1959) provides a valuable introduction to the skin’s advantages and disadvantages in this domain. Comparing the temporal responsiveness of audition and somatosensation, von Békésy (1959) wrote that “all the phenomena in which time patterns are involved can be discriminated by the skin only if the changes are slow relative to changes that the ear can recognize” (p. 6). While von Békésy’s generalization has enjoyed considerable support (Azari, Mioni, Rousseau, & Grondin, 2020; Grondin & Rousseau, 1991; Jones, Poliakoff, & Wells, 2009; Lauzon, Russo, & Harris, 2016; Mayer, Di Luca, & Ernst, 2014), there has been at least one dissenting view, namely an assertion that the two modalities possess similar temporal acuity (Nagarajan, Blake, Wright, Byl, & Merzenich, 1998).

As a way to clarify the matter, we decided that studying bimodal auditory and somatosensory stimulation could afford additional insight into each modality’s individual capacity for communicating temporal information. Our approach builds on the modality appropriateness hypothesis (Welch & Duttonhurt, 1986; Welch & Warren, 1980), which states that the spatial and temporal characteristics of different modalities are optimized for different types of processing, where vision is purportedly specialized for processing spatial information and audition is instead specialized for processing temporal information. Studies of audio-visual temporal processing support this hypothesis, having shown that auditory cues dominate bimodal judgments (Burr, Banks, & Morrone, 2009; Repp & Penel, 2002; Walker & Scott, 1981). Adapting this approach, we studied duration discrimination for vibrotactile and auditory stimuli, and also for bimodal stimuli in which the two were combined. Following the modality appropriateness hypothesis, if auditory temporal precision were actually superior to vibrotactile temporal precision, auditory stimuli should dominate when the two are paired.

Here, we report two experiments comparing temporal acuity for auditory and vibrotactile stimuli. In both experiments, subjects were presented with a series of empty intervals demarcated by auditory, vibrotactile, or bimodal pulses. Isochronous standard intervals were followed by a comparison interval of variable duration. Subjects compared the duration of the final interval relative to that of the isochronous standard intervals. Our first experiment evaluated auditory, vibrotactile, and bimodal temporal sensitivity by comparing performance across those three conditions, keeping the number of standard intervals presented constant. There is some evidence, however, that varying the number of repetitions of a standard stimulus affects temporal sensitivity (Grondin, 2001; Grondin & Mcauley, 2009; Miller & McAuley, 2005; Ten Hoopen, Van Den Berg, Memelink, Bocanegra, & Boon, 2011). It is unclear whether this effect arises from influences in early sensory processing (e.g., von Békésy, 1959), from influence by higher-order cognitive processes, or both (VanRullen & Thorpe, 2001; Agam & Sekuler, 2007; Cavanagh, 2011; Waskom, Okazawa, & Kiani, 2019). Our second experiment tested whether the number of standard intervals on each trial affected temporal sensitivity in each unimodal condition.

Experiment 1

Our first experiment adapted a task introduced by Lumaca, Trusbak Haumann, Brattico, Grube, and Vuust (2019). In our adaptation, shown schematically in Fig. 1, subjects received sequences of four empty intervals, each demarcated by brief auditory, vibrotactile, or bimodal pulses, indicated by the solid vertical lines in the figure. The first three intervals were isochronous, each 400 ms long. The duration of the fourth interval, however, was randomized so that its terminating pulse could arrive either earlier or later than 400 ms after the previous pulse. Subjects categorized the timing of the final pulse as either “early” or “late” relative to the preceding interval(s). We used these judgments as an index of subjects’ perception of the duration of the final interval.

Fig. 1
figure 1

Sample “early” (a) and “late” (b) trials. Each 50-ms stimulus pulse is represented by a solid vertical line. Three standard intervals, each 400 ms, preceded a final comparison interval of variable duration. The sign and magnitude of the variable x represents the trial’s rhythmic deviancy: on an “early” trial (a), the comparison interval was shorter than 400 ms (i.e., 400–x), and on a “late” trial (b), the comparison interval was longer than 400 ms (i.e., 400+x). Dashed vertical lines represent when the final stimulus pulse would have occurred with no rhythmic deviancy

Methods

Subjects

Twenty-one subjects (13 female; mean age = 20.8 years, SD = 2.1 years) participated in this experiment. All subjects denied having difficulty in sensing vibrations on the skin or in hearing. We performed an a priori power analysis to assess whether our sample size was large enough to detect a previously reported effect size in a similar task (Jones et al., 2009) with 85% power. The Superpower package in R (Caldwell & Lakens, 2019) showed that a sample of ten subjects would be needed to observe a similar effect, confirming the sufficiency of our sample size. All experimental procedures had been approved by Brandeis University’s Institutional Review Board, and work was conducted in accordance with the Declaration of Helsinki. All subjects provided written informed consent prior to their participation. During the consent process, subjects were informed that they would receive at least $12.00 (USD) for their time, as well as an additional $0.25 for each block on which their accuracy exceeded 80%.

Apparatus and stimuli

Stimulus presentation and response collection were controlled by a Macintosh computer running MATLAB (version 9.3.0; 2017) and extensions from the Psychophysics Toolbox (version 3.0.14; Brainard, 1997). The computer worked in tandem with an Arduino UNO microprocessor that directly controlled the stimulus generation devices (Fig. 2).

Fig. 2
figure 2

A test subject whose left hand is cradled within a 3D-printed support that positioned their index finger on the vibrotactor’s active element. A subject used their right hand to communicate their binary judgments on the computer’s numerical keypad. In the photo, the computer is displaying one of the experiment’s instruction screens. The Arduino microprocessor and associated circuitry are enclosed in the 3D-printed box. Photo used with permission from this subject

Tactile (hereafter, T) stimulation was produced by a linear resonant actuator vibrotactor (Engineering Acoustics, Inc., Casselberry, FL, USA). The active element of the vibrotactor was 7.90 mm in diameter, and produced vibrations at 250 Hz with < 2 ms rise time and 0.5 mm peak vertical displacement. Note that the frequency of our T stimuli, 250 Hz, is near the peak sensitivity for Pacinian corpuscles, the mechanoreceptor type most sensitive to vibrations (Makous, Friedman, & Vierck, 1995; Mountcastle, Lamotte, & Carli, 1972). For auditory (hereafter, A) stimuli, we chose a higher frequency because detection of auditory stimuli is appreciably slower for low frequencies like 250 Hz, compared to detection of higher-frequency tones (Woods, Alain, Covarrubias, & Zaidel O., 1993). Our A stimuli were \(_{\widetilde {~}}\)1000-Hz pulses at \(_{\widetilde {~}}\)70 dB generated by a piezo active buzzer. Bimodal auditory-vibrotactile (hereafter, AT) stimuli were created by activating the vibrotactor and the piezo buzzer simultaneously, with a negligible delay (< 10μ s) between sending the signals to activate the vibrotactor and buzzer. Throughout the experiment, the intensities of the vibrotactile and auditory stimuli were held constant at levels well above threshold.

Vibrations were delivered to the ventral surface of the subject’s left index finger, which rested on the vibrotactor’s active element. To stabilize the finger’s contact with the vibrotactor, the subject’s hand was positioned within a close-fitting, custom 3D-printed container. This container also included insulation around the vibrotactor to reduce unexpected auditory emissions resulting from vibrations against the plastic of the 3D-printed container. The piezo buzzer used to generate the auditory stimuli was located next to the container housing the vibrotactor (Fig. 2).

Brownian noise calibration

Despite the insulation in its 3D-printed container, activation of the vibrotactor produced an audible, low-frequency sound. This sound was masked by continuous Brownian noise whose level was customized for each subject using a standard calibration procedure as described below to prevent this sound from contaminating the conditions in which the vibrotactor was used. Subjects heard the continuous Brownian noise through supraural headphones, and, at subjects’ option, also wore in-ear, sound-absorbing earplugs.

The calibration procedure used a modified method of limits to find a level of Brownian noise that completely masked the sound of the vibrotactor and still allowed the piezo buzzer to be heard. At each level of masking noise, five A stimuli were presented with the buzzer; the subject then reported how many tones they heard. Brownian noise amplitude was decreased following incorrect responses. This procedure allowed us to find the maximal level of noise through which subjects could still hear the A stimuli. With this level of masking noise, we repeated the procedure with T stimuli, increasing noise amplitude if subjects verbally noted they could hear the sound of the vibrotactor. This iterative process produced a level of noise that allowed the subject to hear A stimuli but not the sound generated by the vibrotactor.

Experimental task

On each trial, subjects received five 50-ms pulses of A, T, or AT stimulation. These five pulses demarcated four empty intervals, each defined by the temporal separation between successive pulses. The first three intervals were isochronous, at 400 ms each. However, the fourth interval (before the fifth and final pulse) was equally likely to be less than or greater than 400 ms. This made the final pulse occur either “earlier” or “later”, respectively, than a pulse following a standard, 400-ms interval. The difference between the duration of the final (comparison) interval and the duration of each standard interval defined what we will call the trial’s rhythmic deviancy. The subject’s task was to characterize the arrival of the fifth pulse as either “early” or “late” compared to the timing of the previous pulses. If the fourth interval had a negative rhythmic deviancy (duration < 400 ms; Fig. 1a), the last pulse arrived “earlier” than a pulse terminating a standard interval would have, and if the fourth interval had a positive rhythmic deviancy (duration > 400 ms; Fig. 1b), the last pulse arrived “later” than a pulse terminating a standard interval would have.

In each trial, a white fixation cross appeared on the center on a black computer screen for 500 ms. Then, the stimulus sequence began. After the fifth pulse, subjects signaled their responses with their right hands, using the “1” and “2” keys on a numeric computer keypad (Fig. 2). The mapping of the keys onto the response categories (“early” and “late”) was counterbalanced across subjects. Following each response, immediate feedback (“Correct!” or “Incorrect.”) was displayed on the screen for 1000 ms, along with the subject’s average accuracy so far during that block of trials. Subjects had up to 1000 ms to enter their responses; if they failed to respond within that time, the trial was categorized as incorrect.

Preliminary testing confirmed that a range of rhythmic deviancies of ± 1-80 ms produced response accuracies which ranged from 50% to nearly perfect (100%). Rhythmic deviancies for the experiment were therefore drawn from two ranges: 320–399 ms (“early” trials) and 401–480 ms (“late” trials). To facilitate our planned data analysis, each range was subdivided into four time bins, each spanning 20 ms (e.g., 320–339 ms). Within each time 20-ms bin, values were drawn randomly from uniform distributions. This arrangement of bins (i) allowed each psychometric function to be based on eight values of the independent variable, and (ii) facilitated serial communication between MATLAB and the Arduino. In total, for each modality, a subject received 20 trials with equal numbers of rhythmic deviancies drawn randomly from the eight different time bins (four “early” and four “late”). This yielded a total of 480 trials per subject (3 [modality conditions] × 8 [rhythmic deviancy bins] × 20 [trials per modality condition and rhythmic deviancy bin]).

Subjects first completed the calibration procedure described above, and then completed a familiarization phase. During the familiarization phase, subjects received two example trials (1 “early” and 1 “late”), and then completed six practice trials (3 “early” and 3 “late”) for each modality (three rounds total). The rhythmic deviancy on all pre-experimental trials was ± 60–80 ms. All trials in the familiarization phase followed the general trial structure described above. Following the familiarization phase, each subject completed 12 blocks of 40 trials per block. The experiment followed a block-randomized design, with all experimental trials in a single block devoted exclusively to one modality. Subjects were permitted ad lib breaks between blocks, and received a bonus of $0.25 for each block on which their accuracy exceeded 80%. During breaks, an on-screen text display informed the subject which modality would be tested in the next block of trials. Subjects completed the entire experiment, including instructions and practice, in \(\sim \)45–60 min.

Data curation

Subjects failed to respond on 0.8% of all trials. One-way ANOVAs and t tests showed that the number of missed trials was unrelated to stimulus modality (p = 0.170), response key mapping (p = 0.884), or response category (“early” or “late”, p = 0.358). We subsequently excluded these trials from our analyses.

Psychometric modeling

Psychometric functions (PFs) supplied the main analytical parameters used to characterize subjects’ temporal sensitivity. Each PF modeled the likelihood that a subject would categorize a trial’s final pulse as “late” given the trial’s rhythmic deviancy value and its modality. To visualize the main trends and overall differences among modality conditions, we first fit a group-level model by aggregating data from all subjects. We then fit a subject-level model to generate distributions of estimated parameters for our analyses. We fit logistic, hierarchical Bayesian PFs using the simulation engine Just Another Gibbs Sampler (JAGS, Plummer, 2003), as implemented in the Palamedes toolbox for MATLAB (Prins & Kingdom, 2018). We used uninformative, flat priors for all model parameters, with one chain, 5000 (group-level)/1000 (subject-level) samples per chain, and 2000 discarded samples per parameter. Each PF was defined by four parameters: location along the rhythmic deviancy axis (α), slope (β), lapse rate (γ), and guess rate (λ).

We assessed goodness-of-fit of the subject-level model by the width of the 95% high-density interval (HDI) for the location parameter. Briefly, the 95% HDI bounds the inner 95% of the estimated parameter’s posterior distribution, with narrower HDIs indicating more precise parameter estimation (Calin-Jageman & Cumming, 2019; Kruschke, 2013).

Treatment of model parameters

We used the estimated model parameters to derive two separate dependent measures: the just noticeable difference (JND) and the point of subjective equality (PSE). Each of these measures captures different aspects of subjects’ timing judgments.

In duration discrimination studies, the JND captures sensitivity to changes in interval duration (Bausenhart, Di Luca, & Ulrich, 2018). In our model, the JND indicates how much rhythmic deviance (in ms) is needed to produce a reliable change in “early” and “late” responses, with smaller values of JND denoting higher sensitivity. We calculated the subject-level JND for each modality by averaging the values of the inverse PF at P(“late”) = 0.25 and P(“late”) = 0.75.

The location parameter (α) provided a different perspective on temporal resolution, namely, in terms of the PSE. This parameter provides the interval duration subjects were equally likely to categorize as either “early” or “late”, or the duration deemed perceptually equivalent to that of the standard intervals (400 ms). If comparison intervals were perceptually represented with perfect fidelity to the standard, that is, with no constant error, the PSE would be 400 ms. Subject-level PSEs for each modality were generated directly by our model-fitting procedures, where α = PSE.

Quantitative evaluation of PSE and JND estimates

We used one-way, within-subject ANOVAs to guide interpretations of modality effects on subject-level distributions of JND and PSE values. Mauchly’s test for sphericity assessed potential violations of the assumption of sphericity; if violated, we corrected degrees of freedom and p values using the Greenhouse–Geisser epsilon. Because we expected a significant main effect of modality, we performed planned pairwise comparisons using estimated marginal means following the ANOVAs. In addition to p values and effect sizes for each comparison, we report Bayes factors (BF10) calculated using the BayesFactor package in R (Morey & Rouder, 2011). Reporting the BF10 allows us to present evidence in support of not only our alternative hypotheses (i.e., observing a meaningful difference between conditions, BF10 ≥ 3) but also our null hypotheses (BF10 < 3; Schönbrodt & Wagenmakers, 2018 Jeffreys, 1961).

Cue combination

For a quantitative account of how each type of unimodal cue influenced bimodal sensitivity, we adopted a model-based approach introduced by To, Baddeley, Troscianko, and Tolhurst (2011). As the slope β of a PF directly reflects sensitivity to temporal change, we used models of cue combination to estimate m, an index of the way that information from the two modalities was combined:

$$ \beta_{AT} = ({\beta}_{A}^{\textit{m}} + {\beta}_{T}^{\textit{m}})^{1/\textit{m}} $$
(1)

Specifically, we tested whether the combination of unimodal cues followed linear summation (both cues weighted equally, m = 1) or a MAX metric (complete dominance by one of the two inputs, m = \(\infty \)). We also tested whether audio-tactile cue combination in our task mirrored the mode of audio-visual cue combination reported by To et al., (m = 2.56; 2011). Summed squared error (SSE) values were used to estimate model fit.

Post hoc analyses

We used post hoc tests to explore trends in the results of our primary analyses. To compare PSE estimates to 400 ms, we calculated the constant error (CE), defined as CE = 400 ms - PSE. To test whether CE = 0, we ran Bayesian t tests using R’s BEST package (Kruschke, 2013). We also used Bayesian t tests to compare the standard deviations of the JND estimates across modality conditions. We considered standard deviations to be reliably different if the 95% HDI of the difference distribution did not include zero (Kruschke, 2013). We report adjusted R2 values for each regression model.

Results

Figure 3 shows the results of our model fits. Group-level PFs are shown in Fig. 3a, and two representative subject-level PFs are shown in Fig. 3b (best fit) and Fig. 3c (worst fit). Subject-level 95% HDI widths (Fig. 3d) suggest that T measures were estimated with less precision than either A or AT measures.

Fig. 3
figure 3

Experiment 1: Psychometric model fits. a Group-level psychometric functions fit via Bayesian estimation for the three modalities show that subjects were least sensitive to rhythmic deviancy in the T condition (red). b, c Representative subject-level PFs showing good and poor model fit, respectively. d Boxplots of the 95% HDI widths for the subject-level PFs reveal that the worst goodness-of-fits were from the T condition, compared to the A and AT conditions. e Boxplots of subject-level JND estimates for the three modalities show that T JNDs were larger than AT JNDs (p< 0.001, BF10 > 100) and A JNDs (p < 0.001, BF10 > 100). AT JNDs were not different from A JNDs (p = 0.253, BF10 = 0.24). In each boxplot, the vertical line inside each box signifies that distribution’s median

Subsequent analyses performed on subject-level JNDs (Table 1, Fig. 3e) revealed that temporal discriminability significantly differed among the three modalities (F(2, 40) = 42.13, p< 0.001, \({\eta }_{p}^{2}\) = 0.68, BF10 > 100). Contrasts showed that this effect was driven by higher T JNDs compared to A and AT JNDs (T vs. A: F(1, 40) = 53.37, p< 0.001, \({\eta _{p}^{2}}\) = 0.57, BF10 > 100; T vs. AT: F(1, 40) = 71.67, p< 0.001, \({\eta _{p}^{2}}\) = 0.64, BF10 > 100). A and AT discriminability did not significantly differ (F(1, 40) = 1.35, p = 0.253, \({\eta _{p}^{2}}\) = 0.03, BF10 = 0.24).

Table 1 Mean JNDs and PSEs for each modality, in milliseconds. Values of standard deviation are shown in parentheses

The analysis of subject-level PSEs showed a significant effect of modality, F(2, 40) = 17.55, p< 0.001, \({\eta _{p}^{2}}\) = 0.47, BF10 > 100 (Table 1). All pairwise comparisons were statistically significant (A vs. T: F(1, 40) = 35.04, p< 0.001, \({\eta _{p}^{2}}\) = 0.47, BF10 > 100; A vs. AT: F(1, 40) = 9.98, p = 0.003, \({\eta _{p}^{2}}\) = 0.20, BF10 = 14.45; T vs. AT: F(1, 40) = 7.62, p = 0.009, \({\eta _{p}^{2}}\) = 0.16, BF10 = 5.76). We also tested how veridical each PSE was with respect to the 400-ms standard interval duration (Table 1). The mean PSE with A stimuli showed the highest fidelity, followed closely by the mean PSE for AT stimuli. The least veridical PSEs were found in the T condition.

As explained above, we used Eq. 1 to examine the separate, unimodal contributions to bimodal temporal sensitivity. Results showed that AT slopes were best predicted from individual A and T slopes using a MAX metric, mean SSE < 0.001. The MAX rule outperformed both linear summation, m= 1 (SSE = 0.915), and a previously reported metric of audio-visual cue combination, m = 2.56 (SSE = 0.473). This suggests that one of the two unimodal inputs strongly dominated bimodal sensitivity, and the similarity between the A and AT JNDs points to A signals as the dominant input to the bimodal condition.

Exploratory analysis

An additional analysis of Experiment 1’s data uncovered a result that, despite its exploratory nature, deserves to be reported here. This exploratory analysis began with an observation about Experiment 1 results with bimodal stimulation. Although the mean JND did not differ between AT and A conditions, the variability in subject-level JND estimates was reduced five-fold in the bimodal condition compared to the unimodal auditory condition (Fig. 3e, Table 1). A Bayesian t test showed that these standard deviations were reliably different, 95% HDI = [2.17, 5.88], suggesting that inter-individual differences in sensory discrimination were diminished in the bimodal condition. Notably, Gamache and Grondin (2010) found a similar result: reduced variance in temporal reproductions of intervals demarcated by audio-visual stimuli compared to intervals demarcated by visual cues only; Murray, Thelen, Ionta, and Wallace (2019) observed an analogous effect in response times to audio-visual stimuli.

We asked what might account for this finding. Møller et al., 2018 suggest that, in accordance with the inverse effectiveness principle, individual differences in unimodal sensitivity mediate the differential effects of bimodality across subjects. This allows for reduced bimodal inter-individual differences despite large unimodal inter-individual differences. In particular, Møller et al., propose that subjects with lower unimodal sensitivity benefit more from bimodality than subjects with higher unimodal sensitivity. We tested this hypothesis by correlating each subject’s unimodal discriminability (JNDA or JNDT) with the improved discriminability they showed with the other unimodal component added. We represent the bimodal discrimination benefit of adding T stimuli to unimodal A stimuli (JNDAJNDAT) in Fig. 4a, and we represent the bimodal discriminability benefit of adding A stimuli to unimodal T cues (JNDTJNDAT) in Fig. 4b. In Fig. 4, data points in the green region (y > 0) represent subjects who showed improvements in bimodal sensitivity relative to their discriminability in one unimodal condition (shown along the abscissa); data points in the red region (y < 0) represent subjects whose bimodal sensitivity was impaired relative to their discriminability of that unimodal condition.

Fig. 4
figure 4

Individual subjects’ benefit in discriminability with bimodal stimulation as a function of their unimodal discriminability. We define “bimodal discrimination benefit” as the difference between a subject’s unimodal and bimodal JNDs, where this difference measure reflects the effect of information added from the other modality. (Panel A) Individual subjects’ unimodal auditory sensitivity (x-axis) predicts their bimodal discrimination benefit from the addition of T stimuli (y-axis), p< 0.001, R2 = 0.968. Only subjects with poor A discriminability (higher JNDs) benefited from the addition of T stimuli (green region, y > 0), whereas subjects with better A discriminability (lower JNDs) were impaired by the addition of T stimuli (red region, y < 0). (Panel B) Individual subjects’ unimodal T discriminability (x-axis) predicts that subject’s bimodal discrimination benefit from the addition of A stimuli as well (y-axis), p< 0.001, R2 = 0.972. All subjects benefited from the addition of A stimuli to T signals (green region, y > 0)

Figure 4 shows two important results. First, the amount of multisensory gain increases linearly with unimodal JND in both the A and T conditions, following from the principle of inverse effectiveness. Poorer unimodal discriminability, characterized by larger A or T JNDs, is associated with higher discriminability gain from combining both forms of stimulation (supplementing A cues with T signals: F(1, 19) = 605.0, p< 0.001, R2 = 0.968, Fig. 4a; supplementing T cues with A signals: F(1, 19) = 703.7, p< 0.001, R2 = 0.972, Fig. 4b). Second, it is worthwhile to note that all subjects benefited from adding A to T stimuli, but only subjects with poorer A discriminability benefited when T stimuli were added to A stimuli. The performance of subjects with exceptional A discriminability actually lost ground from the addition of T stimulus in the bimodal condition (y < 0 in Fig. 4a).

Note that although T signals clearly affected performance by reducing variance in estimates of bimodal discriminability and reducing bimodal PSEs (Table 1), the T modality’s variable effects on participants and overall weaker temporal acuity allowed A inputs to disproportionately influence estimates of bimodal sensitivity. Indeed, the bimodal discrimination benefit across all subjects conferred by T signals, as shown along the ordinate in Fig. 4a, was not significantly different from zero, t(20) = 1.41, p = 0.175, d = 0.31, BF10 = 0.54. In comparison, adding A stimuli to T signals consistently aided discrimination (green region of Fig. 4b), further confirming auditory dominance, t(20) = 8.21, p < 0.001, d = 1.79, BF10 > 100.

Discussion

Our analysis of JNDs showed better discrimination of duration when intervals were demarcated by A pulses than T pulses. Additionally, analysis of PSEs showed that comparison intervals between A pulses were perceived with greater fidelity to the standard than those between T pulses. Both results are consistent with previous research comparing judgments with unimodal auditory and tactile durations (Azari et al., 2020; Grondin & Rousseau, 1991; Jones et al., 2009; Lauzon et al., 2016; Mayer et al., 2014). Importantly, the bimodal (AT) condition directly assessed how the result of concurrent stimulation with both modalities compared to the result with either modality alone. Bimodal discrimination (measured with JND values) closely approximated auditory-only discrimination (Fig. 3e), but PSEs in the AT condition were significantly shorter than those with A stimuli. This divergence between the two psychophysical measures suggests that the addition of a tactile pulse to an auditory pulse affected perceived duration but not sensitivity to changes in duration (Fig. 4a).

Our finding of strong but somewhat incomplete auditory dominance when A and T stimuli were combined reflects the complexities of multisensory combination. Often, discriminability with bimodal stimuli surpasses what would have been achieved with either modality alone (e.g., Diederich & Colonius, 2004; Hershenson, 1962; Meredith & Stein, 1986; Murray et al., 2019). However, if one unimodal input is less reliable than the other, their bimodal combination is dominated by the more reliable of the unimodal inputs (Ernst & Banks, 2002; Burr et al., 2009; Repp & Penel, 2002; Walker & Scott, 1981). Our results are consistent with this latter outcome: as unimodal auditory processing was considerably more precise than tactile processing, the dominant input in determining bimodal discriminability was the auditory one. Finally, to complete the picture, it should be noted that under some conditions, individual subjects’ responses to bimodal combinations suggest differences in the weights they place upon the separate components in multisensory combination (Zhou, Wong, & Sekuler, 2007), a fact further supported by the exploratory analyses whose results are shown in Fig. 4.

We believe that our exploratory analysis offers a useful lesson for developers of various human-interface devices. Incorporating bimodal stimuli can reduce individual differences in performance by compensating for subject-level differences in processing unimodal stimuli: bimodal stimulus presentation reduced inter-individual variability in JND (Fig. 3e), supporting the results from Gamache and Grondin (2010) and Murray et al., (2019). For technology developers using time-locked stimuli, this means that supplementing unimodal signals with information from another modality could help homogenize different users’ ability to exploit the information from the signals. However, the measurements we made are merely snapshots of individual differences in bimodal combination, and are mute as to whether those striking differences are stable over time, whether they extend to other modality combinations or tasks (Murray et al., 2019), and how they might relate to other, well-established predictors of bimodal combination (Stevenson, Zemtsov, & Wallace, 2012). We believe that follow-up research could profitability explore individual differences in multisensory processing.

Experiment 2

In Experiment 1, subjects judged the duration of a trial’s final interval relative to its isochronous predecessors. To make these comparisons, subjects must have formed some internal representation of the duration of the standard interval (i.e., 400 ms). We cannot tell, though, what information subjects actually used to construct that representation. For example, in every trial, did subjects incorporate each and every one of the preceding intervals into an internal representation against which to compare the trial’s final interval?

The “multiple-look model” proposed by Drake and Botte (1993) hypothesizes that repeated exposure to a stimulus promotes formation of a memory trace, or perceptual memory, of that stimulus. With additional “looks”, the memory trace strengthens and sharpens, presumably up to the precision limit of the system. While there is some evidence that increasing the number of repetitions of a standard stimulus aids temporal sensitivity (Grondin, 2001; Grondin & Mcauley, 2009; Miller & McAuley, 2005; Ten Hoopen et al., 2011), several studies have failed to show a “multiple-look effect” (Grondin & Mcauley, 2009; Miller & McAuley, 2005; Ten Hoopen et al., 2011). This failure could be explained by subjects’ ability to draw on some robust memory trace that developed early in the experiment. Such a memory trace would render the standard intervals on subsequent trials superfluous (e.g., Morgan, Watamaniuk, & McKee, 2000; Nachmias, 2006; Dyjas, Bausenhart, & Ulrich, 2012; Zimmermann & Cicchini, 2020).

These findings prompted us to conduct a second experiment in which we varied the number of standard intervals that preceded the final comparison interval. If subjects based their judgments on each and every one of the standard intervals presented on a trial, performance should improve with added standard interval presentations. Alternatively, if subjects developed and based their judgments on some robust, persistent memory trace of the isochronous interval, then the number of standard intervals presented before the final comparison interval should be inconsequential.

Methods

Except where explicitly noted, the methods were the same as in Experiment 1.

Subjects

Fifteen subjects (eight female, mean age = 19.3 years, SD = 1.3 years) participated in this experiment; none had participated in Experiment 1. No subject reported issues related to processing auditory or vibrotactile signals. However, data from two subjects were excluded from analysis: one subject withdrew before completing the experiment, and data from another subject were lost because of a computer error. To confirm the sufficiency of our sample size, we conducted an a priori power analysis for each independent variable following the procedures used in Experiment 1. We found that, with 85% power, we would need n = 12 subjects to observe an effect of modality similar to that reported by Jones et al., (2009) and n = 6 subjects to observe an effect of standard intervals similar to Grondin and Mcauley (2009), so our sample size of n = 13 was adequate. All experimental procedures were approved by Brandeis University’s Institutional Review Board and were conducted in accordance with the Declaration of Helsinki. All subjects provided written informed consent prior to participation. Subjects were awarded credits towards a university course requirement for their time.

Apparatus and stimuli

With the exception of some additional sound-absorbing insulation inserted beneath the vibrotactor, the apparatus was unchanged from Experiment 1. Note that this extra sound-absorbing material did not affect the intensity of T stimulation. Additionally, the strong similarity of AT and A performance in Experiment 1 led us to cull the bimodal condition, testing performance with just the two unimodal stimuli.

Brownian masking noise calibration

Although we had no reason to suspect that subjects could hear the T stimuli in Experiment 1 (since our analyses revealed significant differences between the A and T JNDs and PSEs), we made our calibration procedures more rigorous for Experiment 2. We used a two-alternative forced choice task to find an optimal level of Brownian noise needed to mask the auditory sounds of the vibrotactor for each subject. Subjects were told that the vibrotactor would emit an auditory sound when activated, and were then instructed to listen for any such sound on the calibration trials. Subjects removed their hand from the vibrotactor before the calibration started so they could not feel any of the vibrations. Each trial consisted of two components presented in random order. During one component, the vibrotactor was actively vibrating, while in the other component, it was inactive. Subjects responded by indicating which trial component, the first or the second, contained the sound. Accuracy \(\sim \)50% constituted chance performance, and was taken as an indication that the subject could not hear the sounds produced by the vibrotactor. Each block contained 24 trials. Until subject accuracy fell in the middle 90% of a binomial distribution, the amplitude of the masking noise was increased and another block of calibration trials was presented.

Task

We modified Experiment 1’s task by varying the number of standard intervals, hereafter “nSTD”, delivered on each trial. Across different sets of trials, nSTD was block randomized from 1STD (one standard interval preceding the comparison interval) to 4STD (four standard intervals preceding the comparison interval). All 48 trials in a block presented stimuli from the same modality and had the same nSTD, but rhythmic deviancy was randomized from trial to trial. Before each block of experimental trials, subjects completed six practice trials with that block’s modality and nSTD.

The complete set of 16 blocks of trials took \(\sim \)1.5 h to complete. In total, for each modality and nSTD, each subject received 12 trials with rhythmic deviancies drawn randomly and equally from each of the eight bins (four “early” and four “late”). Each subject completed 768 total trials (2 [modalities] × 4 [nSTD] × 8 [rhythmic deviancy bins] × 12 [trials per condition]). As in Experiment 1, subjects were allowed short, self-timed breaks between blocks. Subjects could earn additional course credits with accuracy scores ≥ 80% on eight or more of the 16 blocks.

Data curation

Except where noted, we used the same analytic procedures as in Experiment 1. Subjects failed to respond before the 1000-ms maximum response time on just 0.9% of all trials; these trials were excluded from analysis. One-way ANOVAs and t tests showed that trials with missing responses were not disproportionately represented for modality (p = 0.293), nSTD (p = 0.769), response key mapping (p = 0.520), or response category (“early” vs. “late”, p> 0.999) conditions.

Psychometric modeling

The group- and subject-level models each contained eight conditions (2 [modalities] × 4 [nSTD]). From the psychometric functions, we derived values of JND and PSE as in Experiment 1. We used two-way, within-subjects ANOVAs to test for additive and interactive effects of modality and nSTD on subject-level JND and PSE estimates.

Results

Figure 5 shows the model-fitting results. Table 2 lists the mean JND and PSE for each experimental condition. Table 3 reports full omnibus ANOVA results for both JND and PSE; only p values for these effects are reported in the following sections.

Fig. 5
figure 5

Experiment 2: Psychometric model fits. The psychometric functions (PFs), high-density intervals (HDIs), and just noticeable difference values (JNDs) comparing A and T sensitivity (panels A–C) were estimated by averaging across all nSTD, and vice versa (panels D–F). a Group-level PFs fit via Bayesian estimation show that, like Experiment 1, T temporal sensitivity was poorer than A temporal sensitivity. b Boxplots of 95% HDI widths for subject-level PSEs show that model fits were poorer for the T condition than for the A condition. c Boxplots of JNDs derived from subject-level models show that A JNDs were smaller than T JNDs. d, e Temporal sensitivity across all of the nSTD conditions were similar, when averaging A and T performance. f No significant differences existed among the JNDs from the four nSTD conditions

Table 2 Mean JND for each modality and nSTD value. Standard deviations are shown in parentheses
Table 3 Results of two-way [modality × nSTD] within-subjects ANOVAs on JNDs and PSEs

Analysis of JNDs

Replicating a main finding from Experiment One, an omnibus ANOVA showed that JNDs were larger for T stimuli than for A stimuli, p < 0.001 (Fig. 5c). Averaging over both modalities, nSTD had no significant effect on JND, corrected p = 0.082 (Fig. 5f); however, the interaction between modality and nSTD was significant, corrected p = 0.012. To dissect the ANOVA’s significant interaction, we separated JNDs by modality and performed two separate one-way ANOVAs with polynomial contrasts, testing for linear and quadratic trends across the nSTD conditions. Neither linear nor quadratic components were statistically significant in the A ANOVA, confirming that JNDs with A stimuli were invariant as nSTD increased (F(1, 36) = 7.4, p = 0.547, \({\eta _{p}^{2}}\) = 0.01; and F(1, 36) = 3.6, p = 0.674, \({\eta _{p}^{2}}\) = 0.005, respectively). For T stimuli, the picture was a bit more complex. First, a non-significant linear component suggested that there was no monotonic change in T JNDs with nSTD, F(1, 36) = 0.26, p = 0.616, \({\eta _{p}^{2}}\) = 0.007. However, there was a significant quadratic component, reflecting the non-monotonicity that is shown in Table 2, F(1, 36) = 9.14, p = 0.005, \({\eta _{p}^{2}}\) = 0.20.

Analysis of PSEs

Averaging across levels of nSTD, an omnibus ANOVA showed that PSEs did not significantly differ between modalities, p = 0.808. Also, collapsing across modalities, variation in nSTD did not affect PSEs, p = 0.194. However, there was significant interaction between nSTD and modality, p = 0.013. We again examined this significant interaction by separating PSEs by modality and performing two separate one-way ANOVAs with linear and quadratic contrasts. PSEs for A stimuli were constant over values of nSTD, with non-significant linear and quadratic components (F(1, 36) = 1.65, p = 0.207, \({\eta _{p}^{2}}\) = 0.04; and F(1, 36) = 0.03, p = 0.869, \({\eta _{p}^{2}}\) < 0.01, respectively). However, the decrease in T PSEs as nSTD increased was significantly linear, F(1, 36) = 10.71, p = 0.002, \({\eta _{p}^{2}}\) = 0.23. The quadratic trend was not significant, F(1, 36) = 0.007, p = 0.932, \({\eta _{p}^{2}}\) < 0.001.

We performed additional post hoc tests to compare the PSE in each condition (see Table 2) to the value of the standard interval, 400 ms. In three of the eight conditions, subjects’ mean PSE significantly differed from the standard: A-2STD, A-4STD, and T-1STD (BF10 = 23.99, 95% HDI = [1.15, 3.95]; BF10 = 2.23, 95% HDI = [0.09, 6.66]; and BF10 = 3.01, 95% HDI = [1.03 11.64], respectively).

Discussion

Overall, Experiment 2 showed that increases in nSTD brought little or no systematic benefit to temporal sensitivity as measured by JNDs. Specifically, JND values for A stimuli remained stable despite variation in nSTD, and JNDs for T stimuli varied some, but did so non-monotonically (Table 2). So, neither modality exhibited what might be expected had there been a multiple-look effect (Drake & Botte, 1993). At first glance, this null result seemed surprising; after all, comparable studies showed performance improvements with additional presentations of a standard stimulus (e.g., Drake & Botte, 1993; Grondin, 2001; Grondin & Mcauley, 2009; Miller & McAuley, 2005; Ten Hoopen et al.,, 2011). The explanation for the divergence between our and others’ results may lie in experimental design. Specifically, multiple-look effects are more likely when the number of comparison intervals varies, rather than the number of standard intervals (Grondin & Mcauley, 2009; Miller & McAuley, 2005; Ten Hoopen et al., 2011). The number of standard intervals becomes a more critical factor when the duration of the standard interval varies (Miller & McAuley, 2005), probably because a fixed standard interval duration promotes the formation of robust memory traces that renders additional standard presentations on each trial superfluous (Morgan et al., 2000; Nachmias, 2006; Dyjas et al., 2012; Zimmermann & Cicchini, 2020). Such speculation about the role of memory raises but does not answer a number of questions, such as the rate at which such a memory forms, its robustness (Tong, Dubé, & Sekuler, 2019), and whether, once formed, memories for multiple stimuli can co-exist as demonstrated in psychophysical experiments with standard stimuli very different from those used here (Morgan et al., 2000).

Although the analysis of JNDs in Experiment 2 echoed Experiment 1’s modality difference, the analysis of PSEs did not: there was no main effect of modality on PSE. For a more apples-to-apples comparison with Experiment 1, we evaluated differences between modalities in Experiment 2 within the 3STD condition specifically. The analysis showed no significant difference between the modalities, t(12) = 1.16, p = 0.269, d = 0.32, BF10 = 0.49. While PSE values did not differ between modalities, we found differences in how PSE varied within each modality as nSTD varied. The linear reduction in T PSE with increasing nSTD suggests that the comparison interval seemed longer when preceded by more standards. No such effect was present in the A condition.

Both JND and PSE analyses echo the results of other studies conducted using auditory stimuli (e.g., Grondin & Mcauley, 2009; Miller & McAuley, 2005; Ten Hoopen et al., 2011). Importantly, however, our data also extend to the vibrotactile domain, showing that increasing standard intervals in this modality did not consistently affect temporal sensitivity but did have an impact on perceived duration.

Summary and conclusions

The first experiment compared auditory, vibrotactile, and bimodal temporal discriminability using a task in which subjects compared the duration of a variable interval with the duration of three preceding standard intervals. JND and PSE estimates for each modality were derived and compared using hierarchical Bayesian psychometric fits, and the unimodal contributions to bimodal processing were explored by modeling how stimuli from different modalities were combined, and by examining the degree to which individual subjects benefited from bimodal stimulation. Our second experiment also compared auditory and vibrotactile temporal discriminability, as well as the effect of varying the number of standard intervals from one to four. These data were also interpreted using psychometric fits, JND estimates, and PSE estimates.

Overall, temporal sensitivity measured with JND was lower for vibrotactile than for auditory signals (Experiments 1 and 2), and bimodal audio-tactile processing was dominated by auditory cues (Experiment 1). Following from the principle of inverse effectiveness (Meredith & Stein, 1983; Ross, Saint-Amour, Leavitt, Javitt, & Foxe, 2007), inter-individual differences in discriminability were significantly reduced in the bimodal condition, suggesting that developers of human-interface technology would be well advised to utilize bimodal stimuli for a more uniform user experience (Experiment 1). Moreover, the number of standard intervals presented per trial had no consistent effect on temporal discriminability in either modality, pointing to the likely contribution of memory for the non-roving standard duration (Experiment 2).

Combined with results from another study from our lab (Villalonga, Sussman, & Sekuler, 2020), our findings suggest that different modalities occupy different levels in a hierarchy of temporal acuities, with audition at the top, followed by vibrotactile sensation and then vision. This general framework is consistent with a comparable ordering of Weber fractions reported by (Azari et al., 2020): smallest for auditory stimuli (0.121) vibrotactile stimuli at an intermediate level (0.170), and visual stimuli slightly higher (0.186). However, attempts to draw parallels between multiple sets of results need to be tempered by a recognition of differences among tasks and among the particular exemplar stimuli used to represent various modalities.

Open practices statement

Datasets generated during and analyzed during the current study are available in the Open Science Framework (OSF) repository, found at https://osf.io/q8y6j/?view_only=76df6923d10a498b84ec44883a75b525. Neither of the experiments was preregistered.