Abstract
Humans excel at using sounds to make judgements about their immediate environment. In particular, timbre is an auditory attribute that conveys crucial information about the identity of a sound source, especially for music. While timbre has been primarily considered to occupy a multidimensional space, unravelling the acoustic correlates of timbre remains a challenge. Here we re-analyse 17 datasets from published studies between 1977 and 2016 and observe that original results are only partially replicable. We use a data-driven computational account to reveal the acoustic correlates of timbre. Human dissimilarity ratings are simulated with metrics learned on acoustic spectrotemporal modulation models inspired by cortical processing. We observe that timbre has both generic and experiment-specific acoustic correlates. These findings provide a broad overview of former studies on musical timbre and identify its relevant acoustic substrates according to biologically inspired models.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the corresponding author upon request and at https://github.com/EtienneTho/musical-timbre-studies.
Code availability
Custom codes that support the findings of this study are available from the corresponding author upon request and at https://github.com/EtienneTho/musical-timbre-studies.
References
Huang, N., Slaney, M. & Elhilali, M. Connecting deep neural networks to physical, perceptual, and electrophysiological auditory signals. Front. Neurosci. 12, 532 (2018).
Kell, A. J., Yamins, D. L., Shook, E. N., Norman-Haignere, S. V. & McDermott, J. H. A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron 98, 630–644 (2018).
Moore, B. C. An Introduction to the Psychology of Hearing 6th edn (Emerald, 2012).
Siedenburg, K. & McAdams, S. Four distinctions for the auditory “wastebasket” of timbre. Front. Psychol. 8, 1747 (2017).
Plomp, R. in Frequency Analysis and Periodicity Detection in Hearing (eds Plomp, R. & Smoorenburg, G. F.) 397–414 (Sijthoff, 1970).
Wessel, D. L. Timbre space as a musical control structure. Comput. Music J. 3, 45–52 (1979).
Grey, J. M. & Gordon, J. W. Perceptual effects of spectral modifications on musical timbres. J. Acoustical Soc. Am. 63, 1493–1500 (1978).
Grey, J. M. Multidimensional perceptual scaling of musical timbres. J. Acoustical Soc. Am. 61, 1270–1277 (1977).
Krumhansl, C. L. in Structure and Perception of Electroacoustic Sound and Music (eds Nielzen, S. & Olsson, O.) 43–53 (Excerpta Medica, 1989).
Iverson, P. & Krumhansl, C. L. Isolating the dynamic attributes of musical timbre. J. Acoustical Soc. Am. 94, 2595–2603 (1993).
McAdams, S., Winsberg, S., Donnadieu, S., De Soete, G. & Krimphoff, J. Perceptual scaling of synthesized musical timbres: common dimensions, specificities, and latent subject classes. Psychological Res. 58, 177–192 (1995).
Lakatos, S. A common perceptual space for harmonic and percussive timbres. Percept. Psychophys. 62, 1426–1439 (2000).
Barthet, M., Guillemain, P., Kronland-Martinet, R. & Ystad, S. From clarinet control to timbre perception. Acta Acust. U. Acust. 96, 678–689 (2010).
Patil, K., Pressnitzer, D., Shamma, S. & Elhilali, M. Music in our ears: the biological bases of musical timbre perception. PLoS Comput. Biol. 8, e1002759 (2012).
Elliott, T. M., Hamilton, L. S. & Theunissen, F. E. Acoustic structure of the five perceptual dimensions of timbre in orchestral instrument tones. J. Acoustical Soc. Am. 133, 389–404 (2013).
Siedenburg, K., Jones-Mollerup, K. & McAdams, S. Acoustic and categorical dissimilarity of musical timbre: evidence from asymmetries between acoustic and chimeric sounds. Front. Psychol. 6, 1977 (2016).
Ogg, M. & Slevc, L. R. Acoustic correlates of auditory object and event perception: speakers, musical timbres and environmental sounds. Front. Psychol. 10, 1594 (2019).
McAdams, S. in Timbre: Acoustics, Perception, and Cognition (eds Siedenburg, K. et al.) 23–57 (Springer, 2019).
Macherey, O. & Delpierre, A. Perception of musical timbre by cochlear implant listeners: a multidimensional scaling study. Ear Hearing 34, 426–436 (2013).
Peeters, G., Giordano, B. L., Susini, P., Misdariis, N. & McAdams, S. The timbre toolbox: extracting audio descriptors from musical signals. J. Acoustical Soc. Am. 130, 2902–2916 (2011).
Chi, T., Ru, P. & Shamma, S. A. Multiresolution spectrotemporal analysis of complex sounds. J. Acoustical Soc. Am. 118, 887–906 (2005).
Albouy, P., Benjamin, L., Morillon, B. & Zatorre, R. J. Distinct sensitivity to spectrotemporal modulation supports brain asymmetry for speech and melody. Science 367, 1043–1047 (2020).
Theunissen, F. E., Sen, K. & Doupe, A. J. Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds. J. Neurosci. 20, 2315–2331 (2000).
Shamma, S. On the role of space and time in auditory processing. Trends Cogn. Sci. 5, 340–348 (2001).
Chi, T., Gao, Y., Guyton, M. C., Ru, P. & Shamma, S. Spectro-temporal modulation transfer functions and speech intelligibility. J. Acoustical Soc. Am. 106, 2719–2732 (1999).
Suied, C., Dremeau, A., Pressnitzer, D., & Daudet, L. Auditory sketches: sparse representations of sounds based on perceptual models. Proc. International Symposium on Computer Music Modeling and Retrieval 2012 Lecture Notes in Computer Science (eds Aramaki, M. et al.) 7900, 154–170 (Springer, 2013).
Isnard, V., Taffou, M., Viaud-Delmon, I. & Suied, C. Auditory sketches: very sparse representations of sounds are still recognizable. PLoS ONE 11, e0150313 (2016).
Thoret, E., Depalle, P. & McAdams, S. Perceptually salient spectrotemporal modulations for recognition of sustained musical instruments. J. Acoustical Soc. Am. 140, EL478–EL483 (2016).
Thoret, E., Depalle, P. & McAdams, S. Perceptually salient regions of the modulation power spectrum for musical instrument identification. Front. Psychol. 8, 587 (2017).
Halpern, A. R., Zatorre, R. J., Bouffard, M. & Johnson, J. A. Behavioral and neural correlates of perceived and imagined musical timbre. Neuropsychologia 42, 1281–1292 (2004).
Allen, E. J., Burton, P. C., Olman, C. A. & Oxenham, A. J. Representations of pitch and timbre variation in human auditory cortex. J. Neurosci. 37, 1284–1293 (2017).
Ogg, M., Moraczewski, D., Kuchinsky, S. E. & Slevc, L. R. Separable neural representations of sound sources: speaker identity and musical timbre. Neuroimage 191, 116–126 (2019).
Terasawa, H., Slaney, M., & Berger, J. The thirteen colors of timbre. In Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (New Paltz, NY, 2005) 323–326 (IEEE, 2005).
Fritz, J., Shamma, S., Elhilali, M. & Klein, D. Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex. Nat. Neurosci. 6, 1216–1223 (2003).
Kraus, N., Skoe, E., Parbery-Clark, A. & Ashley, R. Experience-induced malleability in neural encoding of pitch, timbre, and timing: implications for language and music. Ann. N. Y. Acad. Sci. 1169, 543–557 (2009).
David, S. V., Fritz, J. B. & Shamma, S. A. Task reward structure shapes rapid receptive field plasticity in auditory cortex. Proc. Natl Acad. Sci. USA 109, 2144–2149 (2012).
Mesgarani, N. & Chang, E. F. Selective cortical representation of attended speaker in multi-talker speech perception. Nature 485, 233–236 (2012).
Kaya, E. M. & Elhilali, M. Modelling auditory attention. Phil. Trans. R. Soc. B: Biol. Sci. 372, 1–10 (2017).
Allen, E. J. et al. Encoding of natural timbre dimensions in human auditory cortex. Neuroimage 166, 60–70 (2018).
Flinker, A., Doyle, W. K., Mehta, A. D., Devinsky, O. & Poeppel, D. Spectrotemporal modulation provides a unifying framework for auditory cortical asymmetries. Nat. Hum. Behav. 3, 393–405 (2019).
Ponsot, E., Burred, J. J., Belin, P. & Aucouturier, J. J. Cracking the social code of speech prosody using reverse correlation. Proc. Natl Acad. Sci. USA 115, 3972–3977 (2018).
Nelken, I. & De Cheveigne, A. An ear for statistics. Nat. Neurosci. 16, 381 (2013).
Bregman, M. R., Patel, A. D. & Gentner, T. Q. Songbirds use spectral shape, not pitch, for sound pattern recognition. Proc. Natl Acad. Sci. USA 113, 1666–1671 (2016).
Lartillot, O., Toiviainen, P., & Eerola, T. in Data Analysis, Machine Learning and Applications (eds Preisach, C. et al.) 261–268 (Springer, 2008).
Aucouturier, J. J. & Bigand, E. Seven problems that keep MIR from attracting the interest of cognition and neuroscience. J. Intell. Inf. Syst. 41, 483–497 (2013).
Bellet, A., Habrard, A., & Sebban, M. A survey on metric learning for feature vectors and structured data. Preprint at arXiv https://arxiv.org/abs/1306.6709 (2013).
McDermott, J. H. & Simoncelli, E. P. Sound texture perception via statistics of the auditory periphery: evidence from sound synthesis. Neuron 71, 926–940 (2011).
Anden, J., Lostanlen, V. & Mallat, S. Joint time-frequency scattering. IEEE Trans. Signal Process. 67, 3704–3718 (2019).
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., & Wierstra, D. in Advances in Neural Information Processing Systems (eds Lee, D. D. et al.) 3630–3638 (Curran Associates, 2016).
Goldberger, J., Hinton, G. E., Roweis, S. T., & Salakhutdinov, R. R. in Advances in Neural Information Processing Systems (eds Saul, L. K., Weiss, Y. & Bottou, L.) 513–520 (MIT Press, 2005).
Zhu, C., Byrd, R. H., Lu, P. & Nocedal, J. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans. Math. Softw. 23, 550–560 (1997).
Acknowledgements
This work was supported by the Canadian Natural Sciences and Engineering Research Council awarded to S.M. (grant nos. RGPIN-2015-05280 and RGPAS 478121-15) and to P.D. (RGPIN- 2018-05662), as well as a Canada Research Chair (grant nos. 950-223484 and 950-231872) awarded to S.M. E.T. was funded through an ILCB/BLRI grant no. ANR-16-CONV-0002 (ILCB), ANR-11-LABX-0036 (BLRI) and the Excellence Initiative of Aix-Marseille University (A*MIDEX), B.C. was founded through EU Marie Skłodowska-Curie fellowship (Project MIM, H2020-MSCA-IF-2014, grant agreement no. 659232). B.C. acknowledges STMS IRCAM-CNRS-Sorbonne Université in Paris where he recieved support from a Marie Sklodowska Curie research fellowship at the beginning of the project. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank the authors of the studies re-analysed for providing the stimuli and data from their experiments; G. Mestdagh, E. Ponsot and B. Morillon for helpful discussions on earlier versions of the manuscript; and M. Elhilali and D. Pressnitzer for help in the initial implementation of the optimization framework.
Author information
Authors and Affiliations
Contributions
E.T., B.C., P.D. and S.M. worked on the conceptualization and methodology, and also reviewed and edited the article. E.T. and B.C. worked on the software, formal analysis and the investigation. E.T. conducted data curation, wrote the original draft and worked on the visualization. SM. supervised the work, conducted project administration and obtained funding.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Primary Handling Editor: Marike Schiffer.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Details of the datasets.
Summary of the properties of the 17 datasets from the 8 different studies: dataset name (when applicable), number of stimuli in the dataset (Nb Sounds), fundamental frequency of the stimuli in Hz (f0), number and type of participants, type of sounds, and supplemental information when applicable.
Extended Data Fig. 2 Multi-Dimensional Scaling analysis.
Spearman correlation (ρ2) of the LAT and SC values with the positions of stimuli along the first two dimensions of the timbre spaces using the same MDS method for all datasets. The full statistics are provided in the Supplementary Table 1.
Extended Data Fig. 3 Replicability of the MDS-based analyses.
Spearman correlations (ρ2) of LAT and SC with perceptual dimensions reported in the original studies and determined with the same MDS parameters here (see Methods). It is noticeable that for almost all datasets, the original correlations reported in the studies are quasi-systematically lower than those computed in this meta-analysis. The full statistics are reported in the Supplementary Table 1.
Extended Data Fig. 4 Cross-validation of the metrics.
For each dataset, the metrics were cross-validated to test their generalizability within the dataset. Explained variances (r2) of the human ratings by the cross-validated metrics for each dataset are presented for: the Training correlations (fitted on the N-1 sounds), the Testing correlations (tested on the removed sound), the within correlation between the N*(N-1)/2 metric pairs characterizing the Internal consistency of the fitted metrics in each dataset, and the average correlation (r2) with the metric Refitted on all sounds with those on the N-1 subsets showing the extent to which this metric is different from the cross-validated one. On median, the metrics were cross-validated with r2 = 0.51 on the testing sets. For each dataset, they are highly consistent within the N-folds (r2: Mdn=0.85), and they strongly correlate with the metric fitted on the whole dataset (r2: Mdn=0.92). For each dataset, the correlation of the refitted metric on whole sounds with those fitted for the cross-validation (last column) are high showing that the metric fitted on whole sounds can be used to perform the analyses.
Supplementary information
Supplementary Information
Supplementary Figs. 1–18 and Supplementary Tables 1–19.
Rights and permissions
About this article
Cite this article
Thoret, E., Caramiaux, B., Depalle, P. et al. Learning metrics on spectrotemporal modulations reveals the perception of musical instrument timbre. Nat Hum Behav 5, 369–377 (2021). https://doi.org/10.1038/s41562-020-00987-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41562-020-00987-5
This article is cited by
-
Navigating Knowledge Dynamics: Algorithmic Music Recombination, Deep Learning, Blockchain, Economic Knowledge, and Copyright Challenges
Journal of the Knowledge Economy (2024)
-
Hearing as adaptive cascaded envelope interpolation
Communications Biology (2023)
-
Shared mental representations underlie metaphorical sound concepts
Scientific Reports (2023)
-
Music in the brain
Nature Reviews Neuroscience (2022)
-
Adaptive auditory brightness perception
Scientific Reports (2021)