Learning metrics on spectrotemporal modulations reveals the perception of musical instrument timbre

Thoret, Etienne; Caramiaux, Baptiste; Depalle, Philippe; McAdams, Stephen

doi:10.1038/s41562-020-00987-5

Article
Published: 30 November 2020

Learning metrics on spectrotemporal modulations reveals the perception of musical instrument timbre

Nature Human Behaviour volume 5, pages 369–377 (2021)Cite this article

1068 Accesses
20 Citations
18 Altmetric
Metrics details

Subjects

Abstract

Humans excel at using sounds to make judgements about their immediate environment. In particular, timbre is an auditory attribute that conveys crucial information about the identity of a sound source, especially for music. While timbre has been primarily considered to occupy a multidimensional space, unravelling the acoustic correlates of timbre remains a challenge. Here we re-analyse 17 datasets from published studies between 1977 and 2016 and observe that original results are only partially replicable. We use a data-driven computational account to reveal the acoustic correlates of timbre. Human dissimilarity ratings are simulated with metrics learned on acoustic spectrotemporal modulation models inspired by cortical processing. We observe that timbre has both generic and experiment-specific acoustic correlates. These findings provide a broad overview of former studies on musical timbre and identify its relevant acoustic substrates according to biologically inspired models.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Two different approaches to investigate the auditory perception of musical instrument timbre.**

**Fig. 2: Replicability of the MDS-based approach.**

**Fig. 3: Correspondence between fitted metrics and standard deviations of the stimuli.**

**Fig. 4: Generalizability of the metrics learned for the different datasets.**

Timbral effects on consonance disentangle psychoacoustic mechanisms and suggest perceptual origins for musical scales

Article Open access 19 February 2024

Raja Marjieh, Peter M. C. Harrison, … Nori Jacoby

Meter enhances the subcortical processing of speech sounds at a strong beat

Article Open access 29 September 2020

Il Joon Moon, Soojin Kang, … Kyung Myun Lee

Intermediate acoustic-to-semantic representations link behavioral and neural responses to natural sounds

Article Open access 16 March 2023

Bruno L. Giordano, Michele Esposito, … Elia Formisano

Data availability

The data that support the findings of this study are available from the corresponding author upon request and at https://github.com/EtienneTho/musical-timbre-studies.

Code availability

Custom codes that support the findings of this study are available from the corresponding author upon request and at https://github.com/EtienneTho/musical-timbre-studies.

References

Huang, N., Slaney, M. & Elhilali, M. Connecting deep neural networks to physical, perceptual, and electrophysiological auditory signals. Front. Neurosci. 12, 532 (2018).
Article PubMed PubMed Central Google Scholar
Kell, A. J., Yamins, D. L., Shook, E. N., Norman-Haignere, S. V. & McDermott, J. H. A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron 98, 630–644 (2018).
Article CAS PubMed Google Scholar
Moore, B. C. An Introduction to the Psychology of Hearing 6th edn (Emerald, 2012).
Siedenburg, K. & McAdams, S. Four distinctions for the auditory “wastebasket” of timbre. Front. Psychol. 8, 1747 (2017).
Article PubMed PubMed Central Google Scholar
Plomp, R. in Frequency Analysis and Periodicity Detection in Hearing (eds Plomp, R. & Smoorenburg, G. F.) 397–414 (Sijthoff, 1970).
Wessel, D. L. Timbre space as a musical control structure. Comput. Music J. 3, 45–52 (1979).
Article Google Scholar
Grey, J. M. & Gordon, J. W. Perceptual effects of spectral modifications on musical timbres. J. Acoustical Soc. Am. 63, 1493–1500 (1978).
Article Google Scholar
Grey, J. M. Multidimensional perceptual scaling of musical timbres. J. Acoustical Soc. Am. 61, 1270–1277 (1977).
Article CAS Google Scholar
Krumhansl, C. L. in Structure and Perception of Electroacoustic Sound and Music (eds Nielzen, S. & Olsson, O.) 43–53 (Excerpta Medica, 1989).
Iverson, P. & Krumhansl, C. L. Isolating the dynamic attributes of musical timbre. J. Acoustical Soc. Am. 94, 2595–2603 (1993).
Article CAS Google Scholar
McAdams, S., Winsberg, S., Donnadieu, S., De Soete, G. & Krimphoff, J. Perceptual scaling of synthesized musical timbres: common dimensions, specificities, and latent subject classes. Psychological Res. 58, 177–192 (1995).
Article CAS Google Scholar
Lakatos, S. A common perceptual space for harmonic and percussive timbres. Percept. Psychophys. 62, 1426–1439 (2000).
Article CAS PubMed Google Scholar
Barthet, M., Guillemain, P., Kronland-Martinet, R. & Ystad, S. From clarinet control to timbre perception. Acta Acust. U. Acust. 96, 678–689 (2010).
Article Google Scholar
Patil, K., Pressnitzer, D., Shamma, S. & Elhilali, M. Music in our ears: the biological bases of musical timbre perception. PLoS Comput. Biol. 8, e1002759 (2012).
Article CAS PubMed PubMed Central Google Scholar
Elliott, T. M., Hamilton, L. S. & Theunissen, F. E. Acoustic structure of the five perceptual dimensions of timbre in orchestral instrument tones. J. Acoustical Soc. Am. 133, 389–404 (2013).
Article Google Scholar
Siedenburg, K., Jones-Mollerup, K. & McAdams, S. Acoustic and categorical dissimilarity of musical timbre: evidence from asymmetries between acoustic and chimeric sounds. Front. Psychol. 6, 1977 (2016).
Article PubMed PubMed Central Google Scholar
Ogg, M. & Slevc, L. R. Acoustic correlates of auditory object and event perception: speakers, musical timbres and environmental sounds. Front. Psychol. 10, 1594 (2019).
Article PubMed PubMed Central Google Scholar
McAdams, S. in Timbre: Acoustics, Perception, and Cognition (eds Siedenburg, K. et al.) 23–57 (Springer, 2019).
Macherey, O. & Delpierre, A. Perception of musical timbre by cochlear implant listeners: a multidimensional scaling study. Ear Hearing 34, 426–436 (2013).
Article PubMed Google Scholar
Peeters, G., Giordano, B. L., Susini, P., Misdariis, N. & McAdams, S. The timbre toolbox: extracting audio descriptors from musical signals. J. Acoustical Soc. Am. 130, 2902–2916 (2011).
Article Google Scholar
Chi, T., Ru, P. & Shamma, S. A. Multiresolution spectrotemporal analysis of complex sounds. J. Acoustical Soc. Am. 118, 887–906 (2005).
Article Google Scholar
Albouy, P., Benjamin, L., Morillon, B. & Zatorre, R. J. Distinct sensitivity to spectrotemporal modulation supports brain asymmetry for speech and melody. Science 367, 1043–1047 (2020).
Article CAS PubMed Google Scholar
Theunissen, F. E., Sen, K. & Doupe, A. J. Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds. J. Neurosci. 20, 2315–2331 (2000).
Article CAS PubMed PubMed Central Google Scholar
Shamma, S. On the role of space and time in auditory processing. Trends Cogn. Sci. 5, 340–348 (2001).
Article CAS PubMed Google Scholar
Chi, T., Gao, Y., Guyton, M. C., Ru, P. & Shamma, S. Spectro-temporal modulation transfer functions and speech intelligibility. J. Acoustical Soc. Am. 106, 2719–2732 (1999).
Article CAS Google Scholar
Suied, C., Dremeau, A., Pressnitzer, D., & Daudet, L. Auditory sketches: sparse representations of sounds based on perceptual models. Proc. International Symposium on Computer Music Modeling and Retrieval 2012 Lecture Notes in Computer Science (eds Aramaki, M. et al.) 7900, 154–170 (Springer, 2013).
Isnard, V., Taffou, M., Viaud-Delmon, I. & Suied, C. Auditory sketches: very sparse representations of sounds are still recognizable. PLoS ONE 11, e0150313 (2016).
Article PubMed PubMed Central CAS Google Scholar
Thoret, E., Depalle, P. & McAdams, S. Perceptually salient spectrotemporal modulations for recognition of sustained musical instruments. J. Acoustical Soc. Am. 140, EL478–EL483 (2016).
Article Google Scholar
Thoret, E., Depalle, P. & McAdams, S. Perceptually salient regions of the modulation power spectrum for musical instrument identification. Front. Psychol. 8, 587 (2017).
Article PubMed PubMed Central Google Scholar
Halpern, A. R., Zatorre, R. J., Bouffard, M. & Johnson, J. A. Behavioral and neural correlates of perceived and imagined musical timbre. Neuropsychologia 42, 1281–1292 (2004).
Article PubMed Google Scholar
Allen, E. J., Burton, P. C., Olman, C. A. & Oxenham, A. J. Representations of pitch and timbre variation in human auditory cortex. J. Neurosci. 37, 1284–1293 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ogg, M., Moraczewski, D., Kuchinsky, S. E. & Slevc, L. R. Separable neural representations of sound sources: speaker identity and musical timbre. Neuroimage 191, 116–126 (2019).
Article PubMed Google Scholar
Terasawa, H., Slaney, M., & Berger, J. The thirteen colors of timbre. In Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (New Paltz, NY, 2005) 323–326 (IEEE, 2005).
Fritz, J., Shamma, S., Elhilali, M. & Klein, D. Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex. Nat. Neurosci. 6, 1216–1223 (2003).
Article CAS PubMed Google Scholar
Kraus, N., Skoe, E., Parbery-Clark, A. & Ashley, R. Experience-induced malleability in neural encoding of pitch, timbre, and timing: implications for language and music. Ann. N. Y. Acad. Sci. 1169, 543–557 (2009).
Article PubMed PubMed Central Google Scholar
David, S. V., Fritz, J. B. & Shamma, S. A. Task reward structure shapes rapid receptive field plasticity in auditory cortex. Proc. Natl Acad. Sci. USA 109, 2144–2149 (2012).
Article CAS PubMed Google Scholar
Mesgarani, N. & Chang, E. F. Selective cortical representation of attended speaker in multi-talker speech perception. Nature 485, 233–236 (2012).
Article CAS PubMed Google Scholar
Kaya, E. M. & Elhilali, M. Modelling auditory attention. Phil. Trans. R. Soc. B: Biol. Sci. 372, 1–10 (2017).
Google Scholar
Allen, E. J. et al. Encoding of natural timbre dimensions in human auditory cortex. Neuroimage 166, 60–70 (2018).
Article PubMed Google Scholar
Flinker, A., Doyle, W. K., Mehta, A. D., Devinsky, O. & Poeppel, D. Spectrotemporal modulation provides a unifying framework for auditory cortical asymmetries. Nat. Hum. Behav. 3, 393–405 (2019).
Article PubMed PubMed Central Google Scholar
Ponsot, E., Burred, J. J., Belin, P. & Aucouturier, J. J. Cracking the social code of speech prosody using reverse correlation. Proc. Natl Acad. Sci. USA 115, 3972–3977 (2018).
Article CAS PubMed Google Scholar
Nelken, I. & De Cheveigne, A. An ear for statistics. Nat. Neurosci. 16, 381 (2013).
Article CAS PubMed Google Scholar
Bregman, M. R., Patel, A. D. & Gentner, T. Q. Songbirds use spectral shape, not pitch, for sound pattern recognition. Proc. Natl Acad. Sci. USA 113, 1666–1671 (2016).
Article CAS PubMed Google Scholar
Lartillot, O., Toiviainen, P., & Eerola, T. in Data Analysis, Machine Learning and Applications (eds Preisach, C. et al.) 261–268 (Springer, 2008).
Aucouturier, J. J. & Bigand, E. Seven problems that keep MIR from attracting the interest of cognition and neuroscience. J. Intell. Inf. Syst. 41, 483–497 (2013).
Article Google Scholar
Bellet, A., Habrard, A., & Sebban, M. A survey on metric learning for feature vectors and structured data. Preprint at arXiv https://arxiv.org/abs/1306.6709 (2013).
McDermott, J. H. & Simoncelli, E. P. Sound texture perception via statistics of the auditory periphery: evidence from sound synthesis. Neuron 71, 926–940 (2011).
Article CAS PubMed PubMed Central Google Scholar
Anden, J., Lostanlen, V. & Mallat, S. Joint time-frequency scattering. IEEE Trans. Signal Process. 67, 3704–3718 (2019).
Article Google Scholar
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., & Wierstra, D. in Advances in Neural Information Processing Systems (eds Lee, D. D. et al.) 3630–3638 (Curran Associates, 2016).
Goldberger, J., Hinton, G. E., Roweis, S. T., & Salakhutdinov, R. R. in Advances in Neural Information Processing Systems (eds Saul, L. K., Weiss, Y. & Bottou, L.) 513–520 (MIT Press, 2005).
Zhu, C., Byrd, R. H., Lu, P. & Nocedal, J. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans. Math. Softw. 23, 550–560 (1997).
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Canadian Natural Sciences and Engineering Research Council awarded to S.M. (grant nos. RGPIN-2015-05280 and RGPAS 478121-15) and to P.D. (RGPIN- 2018-05662), as well as a Canada Research Chair (grant nos. 950-223484 and 950-231872) awarded to S.M. E.T. was funded through an ILCB/BLRI grant no. ANR-16-CONV-0002 (ILCB), ANR-11-LABX-0036 (BLRI) and the Excellence Initiative of Aix-Marseille University (A*MIDEX), B.C. was founded through EU Marie Skłodowska-Curie fellowship (Project MIM, H2020-MSCA-IF-2014, grant agreement no. 659232). B.C. acknowledges STMS IRCAM-CNRS-Sorbonne Université in Paris where he recieved support from a Marie Sklodowska Curie research fellowship at the beginning of the project. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank the authors of the studies re-analysed for providing the stimuli and data from their experiments; G. Mestdagh, E. Ponsot and B. Morillon for helpful discussions on earlier versions of the manuscript; and M. Elhilali and D. Pressnitzer for help in the initial implementation of the optimization framework.

Author information

Authors and Affiliations

Schulich School of Music, McGill University, Montreal, Canada
Etienne Thoret, Philippe Depalle & Stephen McAdams
Aix Marseille Univ, CNRS, PRISM, LIS, Marseille, France
Etienne Thoret
Institute of Language Communication and the Brain (ILCB), Marseille, France
Etienne Thoret
Université Paris-Saclay, CNRS, Inria, LRI, Gif-sur-Yvette, France
Baptiste Caramiaux

Authors

Etienne Thoret
View author publications
You can also search for this author in PubMed Google Scholar
Baptiste Caramiaux
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Depalle
View author publications
You can also search for this author in PubMed Google Scholar
Stephen McAdams
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.T., B.C., P.D. and S.M. worked on the conceptualization and methodology, and also reviewed and edited the article. E.T. and B.C. worked on the software, formal analysis and the investigation. E.T. conducted data curation, wrote the original draft and worked on the visualization. SM. supervised the work, conducted project administration and obtained funding.

Corresponding author

Correspondence to Etienne Thoret.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Primary Handling Editor: Marike Schiffer.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Details of the datasets.

Summary of the properties of the 17 datasets from the 8 different studies: dataset name (when applicable), number of stimuli in the dataset (Nb Sounds), fundamental frequency of the stimuli in Hz (f0), number and type of participants, type of sounds, and supplemental information when applicable.

Extended Data Fig. 2 Multi-Dimensional Scaling analysis.

Spearman correlation (ρ²) of the LAT and SC values with the positions of stimuli along the first two dimensions of the timbre spaces using the same MDS method for all datasets. The full statistics are provided in the Supplementary Table 1.

Extended Data Fig. 3 Replicability of the MDS-based analyses.

Spearman correlations (ρ²) of LAT and SC with perceptual dimensions reported in the original studies and determined with the same MDS parameters here (see Methods). It is noticeable that for almost all datasets, the original correlations reported in the studies are quasi-systematically lower than those computed in this meta-analysis. The full statistics are reported in the Supplementary Table 1.

Extended Data Fig. 4 Cross-validation of the metrics.

For each dataset, the metrics were cross-validated to test their generalizability within the dataset. Explained variances (r²) of the human ratings by the cross-validated metrics for each dataset are presented for: the Training correlations (fitted on the N-1 sounds), the Testing correlations (tested on the removed sound), the within correlation between the N*(N-1)/2 metric pairs characterizing the Internal consistency of the fitted metrics in each dataset, and the average correlation (r²) with the metric Refitted on all sounds with those on the N-1 subsets showing the extent to which this metric is different from the cross-validated one. On median, the metrics were cross-validated with r² = 0.51 on the testing sets. For each dataset, they are highly consistent within the N-folds (r²_: Mdn=0.85), and they strongly correlate with the metric fitted on the whole dataset (r²_: Mdn=0.92). For each dataset, the correlation of the refitted metric on whole sounds with those fitted for the cross-validation (last column) are high showing that the metric fitted on whole sounds can be used to perform the analyses.

Supplementary information

Supplementary Information

Supplementary Figs. 1–18 and Supplementary Tables 1–19.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thoret, E., Caramiaux, B., Depalle, P. et al. Learning metrics on spectrotemporal modulations reveals the perception of musical instrument timbre. Nat Hum Behav 5, 369–377 (2021). https://doi.org/10.1038/s41562-020-00987-5

Download citation

Received: 21 October 2019
Accepted: 18 September 2020
Published: 30 November 2020
Issue Date: March 2021
DOI: https://doi.org/10.1038/s41562-020-00987-5

This article is cited by

Navigating Knowledge Dynamics: Algorithmic Music Recombination, Deep Learning, Blockchain, Economic Knowledge, and Copyright Challenges
- Yue Zhou
- Fei Huang
Journal of the Knowledge Economy (2024)
Hearing as adaptive cascaded envelope interpolation
- Etienne Thoret
- Sølvi Ystad
- Richard Kronland-Martinet
Communications Biology (2023)
Shared mental representations underlie metaphorical sound concepts
- Victor Rosi
- Pablo Arias Sarah
- Patrick Susini
Scientific Reports (2023)
Music in the brain
- Peter Vuust
- Ole A. Heggli
- Morten L. Kringelbach
Nature Reviews Neuroscience (2022)
Adaptive auditory brightness perception
- Kai Siedenburg
- Feline Malin Barg
- Henning Schepker
Scientific Reports (2021)