Introduction

Natural sounds such as speech or music contain temporal structure at multiple time scales. Particularly slow acoustic modulations in the delta-theta range (2–9 Hz) are considered crucial for speech and music processing (Ding et al., 2017; Pellegrino et al., 2011; Singh & Theunissen, 2003). Such natural statistics are arguably not accidental and co-occur with potential neuronal coding principles in auditory cortex (Ravignani et al., 2019; Singh & Theunissen, 2003). Support for this proposal comes from electrophysiological studies that identified endogenous oscillations in auditory cortex in the delta-theta band (Giraud et al., 2007; Keitel & Gross, 2016; Lakatos et al., 2005; Lubinus et al., 2019). By entraining to acoustic signals at these time scales, neuronal oscillations in auditory cortex might contribute to the processing of temporal information in sound (Ghitza, 2012; Giraud & Poeppel, 2012; Gross et al., 2013; McAuley & Jones, 2003; Miller & McAuley, 2005; Rimmele, Gross, et al., 2018). The optimal processing range of neuronal populations should, therefore, constrain auditory perception by facilitating auditory temporal processing within this range (Haegens & Zion Golumbic, 2018; Rimmele, Morillon, et al., 2018).

Perceptual constraints, such as decreased neuronal tracking of speech and reduced speech comprehension at fast rates outside of the presumably optimal range, have been shown previously (Ahissar et al., 2001; Brungart et al., 2007; Doelling et al., 2014; Ghitza & Greenberg, 2009). Similarly, for amplitude-modulated sounds and isochronous tone sequences, reduced neuronal tracking (Teng et al., 2017; Teng & Poeppel, 2020) and reduced temporal sensitivity have been observed at stimulus rates associated with the higher alpha range compared to lower rates (Drake & Botte, 1993; Friberg & Sundberg, 1995; Teng et al., 2017; Viemeister, 1979). Overall, there is considerable evidence from studies on rate perception for “constant” optimal auditory temporal processing in the theta range (4–8 Hz, including lower rates in the delta range 2–4 Hz). This has typically been assessed through relative difference thresholds for rate discrimination, that is, the minimal difference between two stimulation rates necessary for discrimination normalized by the standard rate. Relative difference thresholds have been shown to be lowest and constant in the theta range, which is commonly interpreted as a zone of optimal temporal processing, referring to Weber’s law (Drake & Botte, 1993; Friberg & Sundberg, 1995; Viemeister, 1979). According to Weber’s law, the ability to distinguish stimulation rates is proportional to the frequency of the presentation rate. The absolute rate difference necessary for discrimination, thus, scales with the stimulation rate, resulting in a constant relative difference threshold. Although the temporal sensitivity has been shown to be constant at low stimulation rates (corresponding roughly to the delta-theta band, in neural terms), the onset of the decrease in temporal sensitivity is controversial. While some studies already find higher relative thresholds for rates around 8 Hz (Drake & Botte, 1993; Friberg & Sundberg, 1995; ten Hoopen et al., 1994; ten Hoopen et al., 2011), others report a threshold increase at 10 Hz (McAuley & Kidd, 1998; Michon, 1964), 12 Hz (Ehrlé & Samson, 2005; Elliott & Theunissen, 2009), 16 Hz (Nordmark, 1968; Viemeister, 1979), or even 40 Hz (Dau et al., 1997; Sheft & Yost, 1990). In these studies, typically only a coarse range of standard (modulation) rates in the upper theta and alpha range was tested.

Here, we investigate whether interindividual differences in auditory-motor coupling strength might contribute to the controversial or mixed findings regarding the (onset of) sensitivity changes, using a behavioral paradigm. “Temporal predictions” from motor cortex have been shown to modulate auditory processing in studies presenting periodic tone sequences (Arnal et al., 2015; Morillon et al., 2014; Morillon & Baillet, 2017) or continuous speech (Keitel et al., 2018; Park et al., 2015), even during passive listening (Chen et al., 2008; Grahn & Rowe, 2013). Top-down effects from motor cortex seem to particularly affect auditory processing and facilitate behavior during demanding listening situations (Stokes et al., 2019; Wu et al., 2014). Interestingly, it has been proposed based on MEG findings that the right hemispheric lateralization of speech processing is reduced at more challenging fast stimulation rates (Assaneo, Rimmele, et al., 2019), whereas such lateralization might be linked to motor top-down predictions (Tang et al., 2020). Recently, Assaneo et al. (2019) developed a simple behavioral protocol to measure spontaneous auditory-motor synchronization. The synchronization of one’s own speech production to a perceived speech stream differed widely but systematically across individuals, revealing a bimodal distribution of high and low synchronizers. On a neuronal level, high synchronizers showed stronger functional and structural connectivity between frontal speech-motor and auditory cortices, rendering this behavioral protocol suitable to estimate individual differences in auditory-motor cortex coupling strength. Furthermore, rhythmic speech production more strongly modulated perception in high synchronizers compared to low synchonizers, suggesting increased motor top-down predictions with increasing auditory-motor coupling strength (Assaneo et al., 2020; Assaneo, Ripollés, et al., 2019).

Here, we investigate the following hypotheses: (1) There is a particular auditory sensitivity for processing stimulation rates in the theta range that decreases at higher rates in the alpha range, reflected in increasing rate-discrimination thresholds. (2) Interindividual differences in the auditory-motor synchronization, measured behaviorally with the spontaneous speech synchronization test (SSS-test; as an estimate of fronto-temporal structural connectivity strength) (Assaneo, Ripollés, et al., 2019), modulate auditory temporal processing sensitivity. Based on previous findings, we expect (1) an overall increased auditory temporal sensitivity in high compared to low synchronizers due to increased temporal top-down predictions from production areas, or, alternatively, (2) the benefit might particularly occur at higher “non-optimal” rates. In an adaptive weighted up-down staircase procedure, we tested auditory discrimination thresholds for a fine-grained range of rates in the theta to alpha range (4–15 Hz). Bayesian model comparison was used to test our hypotheses. To validate the threshold measure, we used the method of constant stimuli to measure rate discrimination performance at two rates in the theta and alpha range (4, 11.86 Hz).

Method

Participants

All participants reported normal hearing and absence of any type of dyslexic, neurological, or psychiatric disorder or intake of psychotropic substances during the last 6 months. The experimental procedures were ethically approved by the Ethics Council of the Max Planck Society (No. 2017_12). All participants gave written informed consent prior to the study and received monetary compensation. Participants were excluded from the statistical analyses because of outlier behavioral performance (n = 3; mean relative difference threshold in at least one psychophysical procedure exceeded the median ±3 median absolute deviation; Leys et al., 2013) or because the two runs in the SSS-test were inconsistent (n = 2). The final sample included 55 participants (age range: 19–32 years (M = 25.38, SD = 3.78), 27 female, three left-handed), clustered into 35 high and 20 low synchronizers.

Procedure

The experiment was conducted in a sound-attenuated experiment booth. Stimulus presentation and response acquisition were run on a Windows 7 computer using the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997) for Matlab R2017a (version 9.2). Responses were collected on a standard computer keyboard. Auditory stimuli were presented binaurally at a 44,100 Hz sampling rate via a RME Fireface UCX audio interface, a Phone Amp Lake People G109 headphone amplifier, and Beyerdynamic DT770 Pro (80 Ohm) closed-back headphones. In the SSS-test, we used Etymotic Research 3C insert earphones (50 Ohm) with foam ear tips and a directional gooseneck condenser microphone (Shure MX418 Microflex) placed about 5 cm in front of the participant for speech recording.

Rate-discrimination task

We adopted an auditory two-interval forced-choice (2IFC) rate discrimination task (Fig. 1a), in which a standard and a faster comparison sequence were presented in random order (Drake & Botte, 1993). Participants were asked to indicate as accurately and quickly as possible by keypress whether the first or second sequence was faster. They were instructed to guess whenever they failed to discriminate the rate and to refrain from any movement to the beat such as tapping.

Fig. 1
figure 1

Experimental design. (a) In a two-interval forced-choice (2-IFC) task, participants judged whether the first or second of two isochronous tone sequences was faster. Standard and comparison sequence were randomly presented at either position. (b) Using a weighted up-down staircase procedure, we measured relative difference thresholds at eight standard rates between 4 and 15 Hz (exemplary run from one participant at a standard rate of 4 Hz). (c) Additionally, the discrimination performance was measured in a constant stimuli procedure for two standard rates at seven stimulus levels (i.e., levels of the comparison sequence). The psychometric function was fitted to estimate the relative difference threshold (exemplary data and psychometric function from one participant at a standard rate of 4 Hz). (d) In the auditory-motor speech synchronization test, participants repeatedly whispered the syllable /te/ while listening to an isochronous train of random syllables (Assaneo, Ripollés, et al., 2019). The synchronization between the produced and presented speech was measured (icons based on resources from Flaticon.com). Each participant completed two runs. (e) Histogram of the syncrhonization measurements, computed as the phase-locking value (PLV) between the envelope of the produced and perceived speech signals. PLV values were averaged across runs. We clustered participants into two groups, low and high synchronizers. Lines represent normal distributions fitted to each cluster

We generated isochronous sequences by concatenating a 15-ms 440-Hz sinusoidal pure tone (5-ms rise/fall of the sound envelope) with silent intervals, depending on the stimulation rate. One of the two sequences consisted of five and the other of seven tones, randomized trial-by-trial. Auditory stimuli were presented at 70 dB SPL (peak amplitude normalization).

On each trial, a fixation cross appeared 500–750 ms prior to stimulus onset (random uniform distribution). Standard and comparison sequences were presented in random order separated by an inter-sequence interval (random uniform distribution between 4.5 and 5.5 times the tone inter-onset interval of the first sequence). After giving their response, participants received visual feedback (correct, incorrect) in the weighted up-down procedure, but not in the method of constant stimuli. The intertrial interval was 1,000–1,500 ms (random uniform distribution).

Weighted up-down procedure

We used an adaptive weighted up-down (WUD) staircase method implemented in the Palamedes toolbox (Kingdom & Prins, 2010) to measure discrimination thresholds block-wise for eight standard rates linearly spaced from 4 to 15 Hz (Fig. 1b). The relative difference between the standard and the comparison rate was reduced by a certain step size given a correct response and was increased by three times that step size following an incorrect response to converge on a 75% correct level, defined as the threshold (Kaernbach, 1991). All parameters were chosen in terms of the relative difference between standard and comparison rate (in Hz), given by dr = (ratecomparison – ratestandard)/ratestandard. Start values for each standard rate condition (linearly increasing across rates from 20–27%) were selected well above the expected threshold to observe a convergence to threshold level and to capture the threshold area in all participants at each standard rate (Green, 1990) based on previous studies and pilot results. We chose linearly increasing start values to avoid differential adaptation and learning effects as higher thresholds were expected at higher rates. The initial step size (1%) was divided in half after six (0.5%) and again after 12 reversals (0.25%), that is, after 12 changes of direction in the adaptive run (e.g., up to down). Progressively decreasing step sizes were expected to yield reliable and fine-grained threshold estimates more efficiently (Levitt, 1971; Rammsayer, 1992). In case the relative rate difference became smaller or equal to one step-size, the current difference was divided in half after a correct response.

The relative difference threshold for each standard rate was computed as the mean of the last six reversals. Short breaks were included every 20 trials. One adaptive run ended after 18 reversals. To familiarize participants with the task, a minimum of five and a maximum of 15 training trials (75% correct criterion) were completed at the beginning of each block. In training trials, the relative difference between the standard and the comparison rate was 10% higher compared to the start values in the main part (i.e. 30–37%, 1% steps).

Method of constant stimuli procedure

We used the method of constant stimuli (CS) to validate the WUD threshold estimates by fitting the psychometric function at two standard rates of main interest, 4 and 11.86 Hz (Fig. 1c). The psychometric function describes the relationship between a participant’s response behavior and a stimulus feature (Wichmann & Hill, 2001). We measured performance (percentage correct) at seven comparison levels (relative rate difference in the comparison sequence). For every participant, we calculated the rate of the comparison sequences by multiplying a seven-point scale logarithmically spaced from 0.2 to 3 with the individual WUD threshold to capture the whole range of the psychometric function from 50% chance level to 100% correct (see: Herbst & Obleser, 2019). Every run contained one trial of each comparison level in random order (mixed presentation). There were 30 runs in total, amounting to 30 trials measured per comparison level. Pauses were included after 21 trials. The procedure was completed for each standard rate separately in randomized order across participants.

We fitted a Weibull function to the individual data at both standard rates using Bayesian inference methods implemented in the Psignifit Toolbox 4 (Schütt et al., 2016). The guess rate was fixed at 50%, which is the chance level given the 2IFC task, and three parameters were estimated: the 75% correct threshold, the width, and the lapse rate. We assessed the goodness-of-fit for individual data by comparing the deviance to a parametric bootstrap sample distribution (N = 10,000; 95% percentile).

Auditory-motor speech-synchronization test

In the SSS-test (Fig. 1d; for details see Assaneo, Ripollés, et al., 2019) we measured participants’ ability to synchronize their speech production to a heard syllable sequence. First, participants wearing earphones were instructed to cautiously increase the volume of a background babble while continuously whispering the syllable /te/, until they could not hear their own speech anymore (start sound level: 70 dB SPL, maximum: 95 dB SPL). In two training trials, participants listened to a 10-s periodic syllable /te/ sequence (at 4.5 Hz) and were asked to whisper /te/ at the same rate for 10 s directly afterwards. Finally, a 70-s random syllable train with a progressively increasing rate (M = 4.5 Hz, range: 4.3–4.7 Hz, steps: 0.1 Hz after 60 syllables) was presented. The random syllable stream audio file was created using the MBROLA synthesizer (male American English voice, 200-Hz pitch) (Dutoit et al., 1996). Participants were instructed to whisper /te/ continuously in synchrony with the audio. Each participant completed two blocks of the test.

We quantified auditory-motor speech synchronization by calculating the phase-locking value (PLV) (Lachaux et al., 1999) between the cochlear envelope of the audio stimulus and the envelope of the recorded speech signal (time windows 5 s, overlap 2 s). The cochlear envelope was determined using the NSL Auditory Model toolbox for Matlab (auditory channels: 180–7,246 Hz). Phases of both envelopes were calculated via the Hilbert transform after resampling to 100 Hz and band-pass filtering (3.5–5.5 Hz). To classify high and low synchronizers, we performed k-means clustering (Arthur & Vassilvitskii, 2007) of participants into two clusters based on mean PLVs across both blocks (Fig. 1e, mean PLV across high synchronizers = 0.74, SD = 0.1, mean PLV across low synchronizers = 0.34, SD = 0.12). Note that one participant was not assigned to one cluster consistently given the nature of the clustering algorithm. Our findings were not altered by the classification of this participant as a high synchronizer (reported here, because this was the case in ~58% of 10,000 simulation runs), a low synchronizer, or an exclusion.

Finally, participants filled out a questionnaire on demographic data, task strategies, and musical sophistication (Goldsmiths Musical Sophistication Index (Gold-MSI); Müllensiefen et al., 2014).

Analysis

Data analysis was performed in Matlab 2018b (version 9.5) and in R (version 3.6.1). In order to specifically test our hypotheses on temporal sensitivity in the WUD procedure, we performed a Bayesian model comparison using the RStan software (version 2.19.2) (Carpenter et al., 2017). We compared five different model types (Fig. 2a), each specified by a set of parameters θ, to explain the observations D (i.e., the relative difference thresholds) depending on auditory-motor speech synchronization behavior and the rate of the standard tone sequences. For model simplicity, we assumed that the observations D are independent and identically distributed given the models’ parameters. We approximated the data generation process by log-normal distributions, given that threshold values could only be positive and were expectedly right-skewed. In an additional control analysis, Gaussian normal distributions truncated at zero yielded lower posterior probabilities, while the main results were equivalent. All model parameters were set to be lower bounded by 0, given the definition of the relative difference threshold and our prior assumptions. We further assumed that the variance in each standard rate condition and subgroup (high and low synchronizers) was drawn from the same distribution.

Fig. 2
figure 2

Weighted up-down method indicates an optimal processing range modulated by auditory-motor coupling. (a) We tested five model types, which included a constant threshold (Null), a difference between high and low synchronizers (Group Baseline Difference), a linear threshold increase (Increase), an additional baseline difference (Increase + Group Baseline Difference), or an additional variable slope (Increase + Group Slope Difference). In the models that included a threshold increase, we considered all seven possible different onset points (see b), that could vary between groups. P(M|D) gives the marginal posterior probability of each model type given the data marginalized over possible onset points. (b) Heatmap illustrating the marginal posterior probabilities P(M|D) of each onset point of the threshold increase in high and low synchronizers marginalized over model types. (c) Relative difference thresholds for high and low synchronizers. Colored dot: individual participant, white dot: median, thick line: quartiles, thin line: quartiles ± 1.5 × interquartile range. (d) Median relative difference thresholds (dots) and model predictions of the best model (line) with 95% confidence interval (shaded area) for high and low synchronizers

In the null model M1, the relative difference threshold was constant at μ across all standard rates.

$$ D\sim \mathrm{LogNormal}\left(\log \left(\mu \right),{\sigma}^2\right),\mathrm{where}\ \mu, {\sigma}^2\in {R}_0^{+} $$

The prior distribution for μ was set to be a normal distribution with prior mean μ0 and prior variance \( {\sigma}_{\mu_0}^2 \). Summarizing previous results on rate discrimination in isochronous sequences (Drake & Botte, 1993; Ehrlé & Samson, 2005; Friberg & Sundberg, 1995; ten Hoopen et al., 1994; ten Hoopen et al., 2011), we expected the mean threshold μ across the tested frequency range to be 5% on average, with single observations ranging from minimally 0.1% to maximally 50%. We doubled the maximal variance possible given that range to obtain a prior estimate for \( {\sigma}_{\mu_0}^2 \). The same prior distribution for μ was used in the remaining models as well.

$$ \mu \sim \mathrm{Normal}\left({\mu}_0,{\sigma}_{\mu_0}^2\right),\mathrm{where}\ {\mu}_0=0.05,{\sigma}_{\mu_0}^2=2\frac{{\left(0.5-0.001\right)}^2}{4}\approx 0.125 $$

The prior distribution for σ2 was specified as a uniform distribution with a lower bound α = 0. The upper bound γ was determined by the maximum expected variance, given an expected minimal threshold of 0.1% and a maximum threshold of 50%. The same prior distribution for σ2 was used in the remaining models as well.

$$ {\sigma}^2\sim \mathrm{Uniform}\left(\alpha, \gamma \right),\mathrm{where}\ \alpha =0,\gamma =\frac{{\left(\log (0.5)-\log (0.001)\right)}^2}{4}\approx 9.66 $$

Model M2 included the group baseline difference in relative difference thresholds between high and low synchronizers, coded by a group indicator variable j = {1, 2} for every data point and two group-specific parameters μj.

$$ D\sim \mathrm{LogNormal}\left(\log \left({\mu}_j\right),{\sigma}^2\right),\mathrm{where}\ j\in \left\{1,2\right\}\ \mathrm{and}\ {\mu}_j\in {R}_0^{+} $$

Based on differences between musicians and non-musicians in previous studies, we expected the threshold to be on average 4% higher in low synchronizers. Correspondingly, we set the mean of the Gaussian prior to 5% for high synchronizers and 9% for low synchronizers. The prior variance \( {\sigma}_{\mu_{j-0}}^2 \) was assumed to be the same as in M1 for both groups.

$$ {\mu}_j\sim \mathrm{Normal}\left({\mu}_{j-0},{\sigma}_{\mu_{j-0}}^2\right),\mathrm{where}j\in \left\{1,2\right\},{\mu}_{1-0}=0.05,{\mu}_{2-0}=0.09,{\sigma}_{\mu_{j-0}}^2\approx 0.125 $$

In the models of type “Increase,” we included the hypothesized departure from a constant relative threshold μ in the theta range and modelled a linear threshold increase by including a slope parameter β. For each participant, the threshold increase at a particular starting point was coded by an indicator variable xk (length corresponding to the eight standard rates tested). We considered all possible seven points k, where the threshold could start to differ from a constant baseline threshold (5.57–15 Hz). Thus, the first point of the indicator variable, corresponding to the standard rate 4 Hz, was always zero. For example, an increase starting at 5.57 Hz was coded by x5.57 = {0, 1, 2, 3, 4, 5, 6, 7}. The starting point could vary in high versus low synchronizers (again coded by the indicator variable j: xk, j), which resulted in a total of 49 possible combinations (see also Fig. 2b), tested in models M3 to M51.

$$ D\sim \mathrm{LogNormal}\left(\mu +{x}_{k,j}\beta, {\sigma}^2\right),\mathrm{where}\mu, \beta, {\sigma}^2\in {R}_0^{+},k\in \left\{5.57,\dots, 15\right\},j\in \left\{1,2\right\} $$

The prior distribution of β was a normal distribution with a prior mean β0 of 2% predicated upon previous results on threshold increases in the alpha range. The prior variance \( {\sigma}_{\beta_0}^2 \) was set equal to \( {\sigma}_{\mu_0}^2 \). The same prior distribution was adopted in the remaining models as well.

$$ \beta \sim \mathrm{Normal}\left({\beta}_0,{\sigma}_{\beta_0}^2\right),\mathrm{where}\ {\beta}_0=0.02,{\sigma}_{\beta_0}^2\approx 0.125 $$

The fourth type of model (models M52 to M100) combined a constant baseline group difference and a linear threshold increase at varying starting points between high and low synchronizers.

$$ D\sim \mathrm{LogNormal}\left({\mu}_j+{x}_{k,j}\beta, {\sigma}^2\right),\mathrm{where}j\in \left\{1,2\right\},\left({\mu}_j,\beta, {\sigma}^2\right)\in {R}_0^{+},k\in \left\{5.57,\dots, 15\right\} $$

Upon suggestion of a reviewer, we included another model type, in which the slope βj of the threshold increase could vary between high and low synchronizers. In models M101 to M149, we tested this possibility again for all possible starting points.

$$ D\sim \mathrm{LogNormal}\left(\mu +{x}_{k,j}\beta, {\sigma}^2\right),\mathrm{where}j\in \left\{1,2\right\},\left(\mu, {\beta}_j,{\sigma}^2\right)\in {R}_0^{+},k\in \left\{5.57,\dots, 15\right\} $$

The previous prior for the slope parameter was applied for both groups.

$$ {\beta}_j\sim \mathrm{Normal}\left({\beta}_0,{\sigma}_{\beta_0}^2\right),\mathrm{where}\ {\beta}_0=0.02,{\sigma}_{\beta_0}^2\approx 0.125 $$

In total, we compared 149 models, which we assigned equal prior probability, \( P\left({M}_i\right)=\frac{1}{149} \) . According to Bayes’ rule, the posterior probability of a model Mi is given by the product of its prior probability and the marginal likelihood of the observations given the model divided by the marginal probability of the observations:

$$ p\left({M}_i\right|D\Big)=\frac{p\left({M}_i\right)p\left(D|{M}_i\right)}{p(D)},\mathrm{where}\ p(D)={\sum}_{j=1}^mp\left(D|{M}_j\right)p\left({M}_j\right) $$

The Bayes factor (BF) was calculated to determine the amount of evidence in favor of one model as the ratio between its posterior probability and the posterior probability of the remaining models (Kass & Raftery, 1995). Similarly, we obtained the probability for a specific effect by marginalizing over all secondary effects and comparing the resulting posterior probability.

We used numerical methods implemented in the RStan and the bridgesampling package (Gronau et al., 2017) in R to estimate the log posterior probability. We first obtained samples of the posterior distribution (log density function) by running each model in RStan, which uses a No-U-Turn (NUTS) sampling Markov chain Monte Carlo (MCMC) algorithm (Carpenter et al., 2017). We ran five chains (maximal tree depth = 10, target average acceptance probability for adaptation = 0.95), each containing 2,000 iterations, including 1,000 warm-up iterations (stan default settings). Further increasing the number of iterations did not change the parameter estimates and posterior probabilities, indicating convergence. The accuracy of the MCMC algorithm was checked via the diagnostics provided in the RStan environment (rhat statistics, divergences, saturation of the maximum three depth, and the Bayesian fraction of missing information). Finally, we used the RStan output to compute the log marginal likelihood via bridge sampling and, based on that, the posterior probability (Gronau et al., 2017).

Results

Weighted up-down procedure

Median relative difference thresholds in the weighted up-down (WUD) procedure ranged from 4.08% at 4 Hz to 10.5% at 15 Hz (threshold averaged across standard rates in each participant: Mdn = 5.25%, MAD = 2.54%). High synchronizers displayed lower thresholds averaged across all standard rates (Mdn = 5.52%, MAD = 2.15%) compared to low synchronizers (Mdn = 8.21%, MAD = 3.7%) (Fig. 2c; Wilcoxon rank-sum test, one-sided, W = -2.25, p = .012, r = -0.21). We used a Bayesian model comparison to test the influence of stimulation rate and auditory-motor speech synchronization behavior more specifically. The Bayesian sampling algorithm displayed appropriate behavior for all of the 149 models (rhat statistics < 1.001; no divergence in any of the fits; no saturations of the maximal tree depth 10; non-suspicious Bayesian fraction of missing information). Bridge sampling therefore allowed for accurate estimation of the posterior probability of each model given the data (see Table S1 in the Online Supplementary Materials). As we used a uniform prior probability for a large number of models (\( P\left({M}_i\right)=\frac{1}{149} \)), we did not expect a high BF for a single model. More importantly, the posterior probability of the most likely model (M33) increased considerably from a prior probability of about 0.67% to 28.61% (BF ≈ 0.4). This model included a constant baseline threshold (μ ≈ 4.48%, 95% CI [4.17%, 4.84%]) from 4 to 7.14 Hz in low synchronizers and from 4 to 10.29 Hz in high synchronizers, and a subsequent increase with the same slope in both groups (β ≈ 1.44%, 95% CI [1.13%, 1.75%]) (Fig. 2d). Overall, the model type “Increase” yielded the highest posterior probability marginalized over onset points, P(M3-51| D) ≈ 74.53% (BF ≈ 2.93) (Fig. 2A). In contrast, an additional group difference between high and low synchronizers in the baseline threshold (P(M52-100| D) ≈ 24.5%, BF ≈ 0.32) or the slope of the threshold increase (P(M101-149| D) ≈ 0.96%, BF ≈ 0.01) was not supported by our data. Crucially, we found considerable evidence for an earlier threshold increase in low synchronizers compared to high synchronizers, when comparing the corresponding models to all the remaining ones, marginalizing over model types (P(M(Onset Low < Onset High) | D) ≈ 87.97%, BF ≈ 7.31) (Fig. 2b). The results suggest that the threshold increase is most likely to start at around 7.14–8.71 Hz in low synchronizers and around 10.29–11.86 Hz in high synchronizers.

Above we reported the measured relative rate discrimination thresholds. In order to relate our findings to oscillatory theories of auditory perception (e.g., Poeppel, 2003; Teng et al., 2017; Teng & Poeppel, 2020), the relative rate difference can be expressed as a relative fraction of the respective oscillatory cycle (in ms) required for reliable rate discrimination. This can be straightforwardly computed from the relative difference thresholds reported here and yields similar results (Table S2, Online Supplementary Materials).

Constant stimuli procedure

To validate the WUD threshold measure procedure, we examined rate discrimination sensitivity in the constant stimuli (CS) procedure by fitting the psychometric function at 4 Hz and 11.86 Hz (Fig. 1c). Inspection of goodness-of-fit for individual data demonstrated proper fit for all 110 individual fits (55 subjects, two standard rates; see Fig. 3a for mean data in each standard rate condition across all participants). The lapse rate was generally low, with a trend for higher lapse rates at 11.86 Hz (Mdn = 1.16 × e–07%, MAD = 2.07%) compared to the 4-Hz standard rate condition (Mdn = 6.05 × e–08%, MAD = 1.3%) (Wilcoxon signed-rank test, two-sided, T = -1.93, p = .054). This suggests that participants remained attentive throughout the whole procedure, producing only a few lapses.

Fig. 3
figure 3

Constant stimuli method suggests optimal processing range in the theta range. (a) Mean psychometric functions across all participants at two standard rates and seven comparison levels. Error bars represent the standard error of the mean. (b) Scatter plots of average thresholds across standard rates in the weighted up-down (WUD) and constant stimuli (CS) procedure. Correlation calculated on high/low group-demeaned variables. (c) Relative difference thresholds at two standard rates across all participants. Grey dot and horizontal line: individual participant, white dot: median, thick line: quartiles, thin line: quartiles ± 1.5 × interquartile range. Thresholds were significantly lower at rates of 4 Hz compared to 11.86 Hz. *** p < .001. (d) Thresholds for high and low synchronizers. Colored dot: individual participant

Relative thresholds at 4 Hz and 11.86 Hz strongly correlated between the WUD and CS method (average across standard rates: Spearman-rank correlation, high/low synchronizer group-demeaned variables, rs = 0.68, p < .001, Fig. 3b; 4 Hz: rs = 0.66, p < .001; 11.86 Hz: rs = 0.55, p < .001), as well as the difference in thresholds between 4 Hz and 11.86 Hz (rs = 0.39, p = .003). Thresholds were overall lower in the CS procedure compared to the WUD procedure at both 4 Hz (Wilcoxon signed-rank test, two-sided, T = -4.37, p < .001, r = -0.42) and 11.86 Hz (Wilcoxon signed-rank test, two-sided, T = -3.24, p = .001, r = -0.31), presumably reflecting increased familiarity with the task. But the difference between relative difference thresholds at 4 and 11.86 Hz (indicating a threshold increase at rates in the alpha range) did not differ between methods (Wilcoxon signed-rank test, two-sided, T = 0.91, p = .361). Together, these results confirm that temporal sensitivity was measured adequately in the adaptive weighted up-down procedure, which was our main procedure to test temporal sensitivity across the theta and alpha range.

Thresholds in the CS method were also higher at 11.86 Hz (Mdn = 4.84%, MAD = 2.38%) compared to 4 Hz (Mdn = 3.13%, MAD = 1.55%) (Fig. 3c; Wilcoxon signed-rank test, one-sided, T = 4.95, p < .001, r = 0.47). This provides further evidence for a decrease in temporal sensitivity from the theta to the alpha range at 11.29 Hz. Average thresholds across standard rate conditions were descriptively lower in high (Mdn = 3.82%, MAD = 1.42%) versus low synchronizers (Mdn = 4.87%, MAD = 2.05%), but there was only a trend for statistical significance (Wilcoxon rank-sum test, one-sided, W = -1.58, p = .057) (Fig. 3d). (Note that in the CS method we only tested two rates, and thus could not access group differences as nuanced as in the WUD threshold measure).

Results from the CS procedure expressed as the relative fraction of the oscillatory cycle (in ms) are reported in the Online Supplementary Materials (Table S3).

Correlation of temporal sensitivity, auditory-motor synchronization, and musicality

Our previous analyses suggest a close relationship between auditory-motor speech synchronization behavior and auditory temporal sensitivity. Here, we analyzed the impact of general musical sophistication (Gold-MSI) on these measures. High synchronizers reported higher musicality compared to low synchronizers (Wilcoxon rank-sum test, one-sided, W = 3.27, p < .001, r = 0.3). Given the bimodal distribution of auditory-motor speech synchronization behavior, we demeaned all variables in the high and low synchronizer clusters to control for spurious correlations when calculating correlations among the variables (Fig. 4). Musical sophistication correlated positively with mean PLVs in the SSS-test (rs = 0.46, p < .001), indicating that higher musicality went along with improved auditory-motor synchronization behavior. There was only a trend for a negative correlation between musical sophistication and average relative threshold levels across standard rates in the CS procedure (rs = -0.26, p = .056) and the WUD procedure (rs = -0.23, p = .092). Crucially, threshold levels in the WUD method correlated moderately with mean PLVs in the SSS-test (rs = -0.36, p = .008), even when controlled for musical sophistication (partial correlation rs = -0.29, p = .034). This suggests that differences in auditory-motor synchronization behavior explained individual variance in temporal sensitivity beyond musical sophistication. However, the correlation between average threshold levels in the CS procedure and mean PLVs did not reach statistical significance (rs = -0.21, p = .13).

Fig. 4
figure 4

Scatter plots and correlations between rate-discrimination threshold, auditory-motor coupling, and musicality measures. Correlations were calculated based on high/low group-demeaned variables to account for spurious correlations

Discussion

The experiments we present here lie at the intersection of two fundamental questions in perception that have been investigated by and large independently. On the one hand, rate sensitivity is a basic question in auditory perception. On the other hand, the interaction between perception and action systems has been increasingly studied, with new insights about auditory-motor coupling. We combine these two lines of research in psychophysical experiments and discover a surprising generalization: listeners with increased synchronization of speech perception and production have broader sensitivity to acoustic rates. Our evidence invites two conclusions: first, there are indeed preferred, fine-grained regimes of temporal processing in hearing at low-frequency modulation rates in the theta range. Second, as the speech synchronization test has been shown to estimate coupling between auditory and speech motor cortices (Assaneo, Ripollés, et al., 2019), our findings suggest that the motor system influences perceptual timing; in particular, individual variability in auditory-motor coupling seems to undergird listeners’ temporal processing thresholds, with a larger range of optimal processing in high compared to low auditory-motor synchronizers.

Overall, individuals showed constant low relative rate discrimination thresholds in the theta range. The range of optimal processing varied for high and low synchronizers, with the most likely optimal range for low synchronizers from 4 to 7.14 Hz and high synchronizers from 4 to 10.29 Hz. Thresholds increased at frequencies above the optimal range up to 15 Hz, indicating reduced temporal sensitivity in the alpha range (and above). Some previous studies report reduced temporal sensitivity for rate discrimination at higher rates in the alpha range (Drake & Botte, 1993; Friberg & Sundberg, 1995; McAuley & Kidd, 1998; ten Hoopen et al., 1994; ten Hoopen et al., 2011), whereas others did not (Dau et al., 1997; Elliott & Theunissen, 2009; Sheft & Yost, 1990; Viemeister, 1979). Compared to previous reports, we tested rate discrimination within a fine-grained range of rates in a larger sample of participants. Furthermore, we applied an additional procedure, the constant stimuli method, to confirm that the thresholds were reliably estimated by the adaptive WUD procedure. Our data support previous behavioral findings, and neurophysiological evidence, suggesting that neuronal oscillatory activity in auditory cortex shows optimal temporal processing in the theta range (Teng et al., 2017; Teng & Poeppel, 2020). The constant relative threshold in the theta range corresponds to a linear increase in absolute rate difference required for discrimination, congruent with Weber’s law. In contrast, Weber’s law did not apply for the observed increase of relative thresholds at higher rates, suggesting a decreased temporal resolution at higher rates. In other words, this finding indicates that at higher rates the absolute rate difference required for discrimination failed to scale with the rate, that is, the temporal resolution at higher rates seems to be constrained by a minimum “time constant,” which differed for high and low synchronizers (at ~7–10% relative difference threshold, corresponding to ~6–8 ms). On a neuronal level, the constant relative thresholds might reflect the preferred frequencies of the auditory cortex neuronal population (Teng et al., 2017; Teng & Poeppel, 2020). More specifically, this leads to the prediction that the relative difference threshold is constant - reflecting a constant fraction of the oscillatory cycle (Table S2, S3, Online Supplementary Materials) - only for optimal rates that can be tracked reliably.

Interestingly, in additional to the theta range, an optimal auditory processing range in the gamma band has been reported using amplitude-modulated sound (Galambos, 1992; Teng et al., 2017; Teng & Poeppel, 2020; see also: Poeppel, 2003). In our paradigm – which uses pure tone sequences – investigating rate perception in the gamma range is complicated, as rate perception would cross over to (roughness and) pitch perception in the gamma range (Oxenham, 2012). Further research, with a different experimental paradigm, is required to investigate whether this second auditory processing regime in the gamma band (Giraud, 2020; Hoonhorst et al., 2009; Joliot et al., 1994; Teng et al., 2017; Teng & Poeppel, 2020) is affected by auditory-motor coupling.

Crucially, auditory temporal sensitivity was modulated by interindividual differences in the auditory-motor speech synchronization behavior, which is indicative of the auditory-motor cortex coupling strength (Assaneo et al., 2020; Assaneo, Ripollés, et al., 2019). High compared to low synchronizers showed lower relative difference thresholds at higher rates, supporting a model that suggests a larger range of optimal temporal resolution in the high synchronizers. A possibility is that our findings reflect top-down temporal predictions from the motor system facilitating auditory processing. Previous research suggests that top-down predictions from the motor system can modulate auditory processing even during passive listening (Chen et al., 2008; Grahn & Rowe, 2013). We utilized the previously introduced behavioral spontaneous speech synchronization test (SSS-test) to distinguish individuals based on their spontaneous auditory-motor synchronization behavior (Assaneo, Ripollés, et al., 2019). The original study reported the test-retest reliability and replicability of the bimodal distribution of the synchronization behavior. Crucially, differences in functional and structural frontal-motor to auditory cortex connectivity have been shown to underly this group distinction, rendering it a suitable estimator of auditory-motor coupling strength. (Note that although the SSS-test measures auditory-motor synchronization at ~4.5 Hz, the relation to structural connectivity suggest that more general rate-independent auditory-motor interactions are reflected by the test outcome, see also: Assaneo et al., 2020.) Furthermore, the auditory-motor synchronization behavior most likely reflects an oscillatory mechanism, whereas high synchronizers have shown stronger perceptual facilitation likely related to top-down predictions from the motor system (Assaneo et al., 2020). Based on these previous findings, we propose that the increased rate discrimination sensitivity in high synchronizers reflects a facilitating effect of increased motor cortex recruitment in high compared to low synchronizers. Motor top-down effects particularly show a facilitating effect during demanding listening situations (Stokes et al., 2019; Wu et al., 2014; e.g. such as at “non-optimal” fast stimulation rates). Previously it has been proposed that motor top-down effects can affect processing in demanding listening situations by reducing the hemispheric right lateralization resulting in bilateral auditory cortex recruitment (Assaneo, Rimmele, et al., 2019). However, whether such a neuronal mechanism can account for the observed larger optimal range for rate discrimination in high synchronizers is unknown and requires further research.

An alternative explanation for our findings is that the differences in auditory temporal sensitivity in high and low synchronizers was due to population differences in auditory processing rather than differences in auditory-motor coupling. We cannot entirely exclude this explanation. However, previous evidence shows that the high/low synchronizer distinction comes with differences in functional and structural cortical auditory-motor interactions, rendering this “purely auditory processing” explanation unlikely. Furthermore, musical expertise might affect the temporal sensitivity for rate discrimination. For example, there is evidence for enhanced auditory-motor integration in musicians benefitting auditory processing (Du & Zatorre, 2017). Accordingly, we found that high and low synchronizers differed with respect to musical sophistication. However, the auditory-motor coupling accessed with the SSS-test (Assaneo, Ripollés, et al., 2019) correlated with interindividual differences in temporal sensitivity beyond musicality. Our findings suggest that the SSS-test provides a more specific estimate of auditory-motor synchronization compared to the measure of musical sophistication, which is likely a more indirect estimate of auditory-motor coupling.

In summary, we report a constant auditory temporal sensitivity for rate discrimination in the theta range, with a decrease in sensitivity for higher rates. Individuals with strong compared to weak synchronization of speech perception and production showed higher rate-discrimination sensitivity particularly at faster rates, indicating a larger range of optimal temporal processing. Our behavioral findings suggest constraints of auditory perception consistent with oscillatory theories, which propose neuronal populations in auditory cortex with preferred frequencies in the theta range. Furthermore, our data suggest a crucial role of auditory-motor coupling for enhancing temporal sensitivity by improving temporal resolution at higher stimulation rates.