Emotion and expertise: how listeners with formal music training use cues to perceive emotion

Battcock, Aimee; Schutz, Michael

doi:10.1007/s00426-020-01467-1

Emotion and expertise: how listeners with formal music training use cues to perceive emotion

Original Article
Open access
Published: 29 January 2021

Volume 86, pages 66–86, (2022)
Cite this article

Download PDF

You have full access to this open access article

Psychological Research Aims and scope Submit manuscript

Emotion and expertise: how listeners with formal music training use cues to perceive emotion

Download PDF

4584 Accesses
11 Citations
22 Altmetric
2 Mentions
Explore all metrics

Abstract

Although studies of musical emotion often focus on the role of the composer and performer, the communicative process is also influenced by the listener’s musical background or experience. Given the equivocal nature of evidence regarding the effects of musical training, the role of listener expertise in conveyed musical emotion remains opaque. Here we examine emotional responses of musically trained listeners across two experiments using (1) eight measure excerpts, (2) musically resolved excerpts and compare them to responses collected from untrained listeners in Battcock and Schutz (2019). In each experiment 30 participants with six or more years of music training rated perceived emotion for 48 excerpts from Bach’s Well-Tempered Clavier (WTC) using scales of valence and arousal. Models of listener ratings predict more variance in trained vs. untrained listeners across both experiments. More importantly however, we observe a shift in cue weights related to training. Using commonality analysis and Fischer Z score comparisons as well as margin of error calculations, we show that timing and mode affect untrained listeners equally, whereas mode plays a significantly stronger role than timing for trained listeners. This is not to say the emotional messages are less well recognized by untrained listeners—simply that training appears to shift the relative weight of cues used in making evaluations. These results clarify music training’s potential impact on the specific effects of cues in conveying musical emotion.

The pleasantness of sensory dissonance is mediated by musical style and expertise

Article Open access 31 January 2019

Audiovisual integration in the McGurk effect is impervious to music training

Article Open access 08 February 2024

Cultural familiarity and musical expertise impact the pleasantness of consonance/dissonance but not its perceived tension

Article Open access 26 May 2020

Individual differences and musical training

The communication of musical emotion is both powerful and personal. Audiences bring their individual histories to the listening experience (Ladinig and Schellenberg 2012; Taruffi et al. 2017; Vuoskoski and Eerola 2011), responding differently to the same musical information due to differences in personality traits, experience and expertise or training. Musical training can influence the processing of musical structure (Koelsch et al. 2002; Sherwin and Sajda 2013)—including conveyed emotion. However, there is ongoing debate about whether musical training can be advantageous, with evidence both supporting (Castro and Lima 2014) and failing to demonstrate a clear training effect (Bigand et al. 2006). Here we contribute to ongoing discussion of the relationship between training and processing advantages/disadvantages by exploring a different yet complementary issue—how training affects the relative weighting of cues conveying emotion. To ensure broad relevance, we grounded this exploration in a set of well-known pieces for the piano routinely studied and performed around the world. Although this rich stimulus set poses certain analytical challenges, our application of statistical techniques borrowed from other fields allowed for a “deconstruction” of individual cue weights, affording new insight into a well-explored issue.

Evidence for training’s effect on emotion perception in music

Some evidence suggests training shapes abilities to recognize expressed emotion. For example, in investigating the role of musicality, emotional intelligence, and emotional contagion on listeners’ perception of emotion, Akkermans et al. (2018) used recordings of three different melodies created to express seven different emotions. Participants heard all seven expressions for each melody four times over 28 trials, and rated excerpts on Likert scales representing the seven affective adjectives. Musical training emerged as the only predictor to explain for participants’ decoding accuracy. These findings support the argument that musical training affords some perceptual benefits when assessing communicated emotion.

Training benefits are also found for older musicians in contrast to younger ones (Castro and Lima 2014). In that study participants rated expressed emotion of short polyphonic excerpts on four affective 10-point intensity scales. Years of music training correlated with emotion categorization accuracy, where the middle-aged (range 40–60 years) musicians performed more accurately than non-musicians. Participants’ responses for each emotion could be predicted by various combinations of measured structural cues including tempo, mode, pitch range, dissonance, and rhythmic irregularity. Older musicians’ responses were better predicted in the model compared to non-musicians, which may be related to training advantages in recognition accuracy. Interestingly, differences emerged in the predictive strengths of some cues for negatively valenced emotions, supporting the idea that musicians use cues differently to decode emotion compared to untrained listeners.

Furthermore, changes in mode and tempo affect how listeners with musical training rate perceived valence and arousal differently than those without training (Ramos et al. 2011). Participants with at least six years of formal training on least one instrument heard excerpts consisting of different mode (seven possible Greek modes selected) and tempo (three possible tempos selected) combinations and had to select one of four emotion categories representing the excerpt. The effect of the tempo manipulations on participants’ valence ratings was greater for musical experts and the effect of mode had been modulated by participants’ musical background for both valence and arousal ratings. The authors however, found only slight differences, where both groups exhibited high responsiveness to the experimental manipulations. It is possible however, that with more years of musical training musicians would become increasingly more sensitive to these differences.

Ambiguity in our understanding of training’s effect

Despite the literature suggesting effects of musical training on emotion perception, other evidence suggests untrained participants perform as just as well in tasks assessing accuracy and categorization within examples of music or prosody (Bigand et al. 2006; Juslin 1997; Trimmer and Cuddy 2008). As listeners gain musical knowledge from basic listening experience, it is possible music listening alone is sufficient to create ‘experienced’ listeners (Bigand and Poulin-Charronnat 2006). Although focused on induced emotions, work from Bigand et al. (2006) found emotional responses to music were only weakly influenced by expertise. In that study, participants grouped the emotions induced by excerpts of instrumental Western music similarly regardless of musical background. Interestingly, these findings occurred even though the selected stimuli included excerpts of great complexity, suggesting non-musicians are able to process subtle musical structures in Western music to discern emotion. Bigand and Poulin-Charronnat’s (2006) review highlights several studies covering a range of perceptual tasks including perceived tension and ability to anticipate musical events, which also fail to find a difference or advantage for those with musical training. However, it is unclear if there are additional, more recent studies finding a lack of training effects. This may reflect a potential publication bias to publish only significant findings (Mlinarić et al. 2017).

The effect of musical expertise remains opaque, given conflicting evidence regarding musical training’s effect (Akkermans et al. 2018; Castro and Lima 2014; Koelsch et al. 2002; Sherwin and Sajda 2013), or lack thereof (Bigand et al. 2006; Trimmer and Cuddy 2008). The current study asks participants to directly evaluate valence and arousal, unlike studies providing the possible discrete affect terms. Here we believe the dimensional measurement of emotion is a more reliable tool for rating excerpts that are less overt in their emotional message. This method is found to be more sensitive for ambiguous emotional content in music and shows higher inter-rater consistency for listener ratings of emotion (Eerola and Vuoskoski 2011).

Present study

Our primary motivation for this study comes from interest in interpreting our recent findings regarding emotional communication in Bach’s well-known set of piano pieces The Well-Tempered Clavier (Book 1). Perceptual ratings of those pieces have utility in identifying the specific contributions of cues such as timing, pitch height, and mode to emotional responses (Battcock and Schutz 2019). As part of that study, we examined differences in responses to excerpts cut to eight musical measure segments vs. “variable length” segments cut to end in locations aligned with the piece’s stated key. In other words, excerpts of varying length ensured they both started and ended in consistent modes. In an effort to maximize that study’s generalizability, we used listeners with minimal musical training. Analysis of that data raised important questions about whether more trained individuals would be more sensitive to these manipulations. This issue both complements previous research exploring trade-offs in cue weighting as a function of training, and extends inquiring to the use of complex, polyphonic stimuli frequently studied and performance around the world.

Our specific goal in these two new experiments is to compare the perceptual responses of musically trained listeners to previously collected responses of untrained listeners in an emotion perception task, building on past work using polyphonic stimuli (Castro and Lima 2014). We employ a dimensional approach to measuring emotion (Di Mauro et al. 2018; Russell 1980) in both musically trained and untrained individuals, with the goal of clarifying ongoing debate surrounding the effect of musical expertise on the decoding of emotional cues. This approach extends our previous work exploring the relationship between mode, pitch and timing (quantified as attack rate) and perceived emotion in Bach’s Well-Tempered Clavier (WTC)—a polyphonic 48-piece work balanced with respect to mode and widely performed and studied by musicians (Battcock and Schutz 2019). Using this stimulus set, we previously found timing information more important than mode—however that experiment used non-musicians, raising interesting questions about how training might alter the perceptual role of cues such as mode.

Research exploring the influence of musical training on perceived emotion often uses discrete models, where participants rate emotion on different affective adjective scales (Akkermans et al. 2018; Castro and Lima 2014; Gabrielsson and Juslin 1996). Although that method offers precision for the intended affective terms, it may exert priming effects for listeners. Unlike discrete models of emotion, the dimensional approach affords the ability to represent more variation in conveyed and perceived emotion (Eerola and Vuoskoski 2013). Thus, the ability to measure components of emotion on a fine-grained scale makes dimensional models better suited for detecting differences between trained and untrained listeners.

Specifically, our study involves comparing new data collected from trained musicians to previously collected data from ‘non musician’ participants with less than 1 year of musical training (Battcock and Schutz 2019). We assess these differences in two contexts (1) with excerpts from Bach’s WTC cut to be eight musical measures in length (2) using musically ‘resolved’ excerpts where each excerpt ends in the same nominal key as it started. The cues analyzed—attack rate (timing), mode and pitch height—represent three musical features proven to have a role in communicated musical emotion (Balkwill and Thompson 1999; Dalla Bella et al. 2001; Hevner 1935, 1937). Here, attack rate is chosen as our timing cue as it reflects both information about rhythmic structure as well as tempo. Further, we investigate the predictive weights of cues across participants with and without musical training to determine how expertise affects how listeners decode emotion in music.

Experiment 1 (eight measure excerpts)

Method

The following procedure and stimuli follow that of Battcock and Schutz (2019), the key aspects of which are summarized here. One exception was that these data were collected in two locations (sound attenuating booth as in the previous study, as well as a hotel meeting room). However testing equipment was consistent in both locations. The new studies also included the GoldSmith MSI following the presentation and responses to all 48 excerpts. Other procedure details followed Battcock and Schutz (2019) exactly, including the stimuli and numbers of participants.

Participants

To allow for the most direct comparison with our previous data, we recruited 30 participants for this experiment. Participants had \(\ge\) 6 years of formal musical training from McMaster University and attendees of the Ontario Music Educators Association’s General Assembly held in Hamilton, Ontario (25 females, ages M = 27.36, SD = 13.69, years of training M = 6.73 SD = 0.45). On average, participants scored in the 71st percentile of the overall General Sophistication score and in 79th percentile on the Musical Training subscale using the Goldsmiths Musical Sophistication Index (Gold-MSI) as based on norms reported by the Müllensiefen et al. (2013). Participants’ reported trained instruments included piano, voice, flute, guitar, violin, french horn and the drum and bass, with piano reported as the principle instrument for ~ 57% of participants. Participants either received course credit, or compensation for their participation or participated as volunteers. The experiment met ethics standards according to the McMaster University Research Ethics Board.

Musical stimuli

Our stimuli consisted of excerpts from all 48 pieces of Bach’s Well-Tempered Clavier (Book 1) as recorded by Friedrich Gulda (Bach 1973). Each excerpt contained the first eight musical measures of the pieces and featured a 2-second fade out starting at the ninth measure. Excerpts lasted 7–64 s in duration (M = 30.2 s, SD = 13.6).

Cue quantification

Pitch height information is calculated with an approach initially described by Huron et al. (2010) and later used by Poon and Schutz (2015). This involves summing duration-weighted pitch values within each measure and dividing by the sum of note durations within that measure. Attack rate calculations are based on the tempi chosen by Friedrich Gulda’s performance of the WTC—the recording used for this experiment. In addition, we re-calculated information as needed for experiment 2 (for excerpts of variable length rather than eight measures). We used attack rate rather than tempo, which is more sensitive to the combined effects of tempo and rhythmic structure. For example, Bach’s Ab Major Prelude has a higher tempo marking (108) than the Bb Major Prelude (76), yet its attack rate is considerably slower as its rhythmic structure involves fewer notes per measure (Schutz 2017). Pitch height values varied from 33.13–53.00 (M = 43.90, SD = 4.03) corresponding ~ F3 to ~ C#5, attack rate information for eight measure excerpts range 1.3–10.13 attacks per second (M = 4.91, SD = 2.18). We operationalized mode as the tonal center of the piece, as indicated by the denoted key signature of each score, coded dichotomously (0 = minor, 1 = Major).

Design and procedure

The experiment took place in two locations, the Ontario Music Educators Association (OMEA) general assembly held at the Sheraton in Hamilton, Ontario and McMaster University. Participants from the OMEA event filled out a consent form and completed the experiment in an isolated room. Following the consent form, participants from McMaster University completed the experiment in a sound-attenuating booth (IAC Acoustics, Winchester, US). For both testing locations, the experiment ran on PsychoPy (Peirce et al., 2019), a Python-based program on a 2014 MacBook Air (OS X 10.9.4). Participants heard stimuli at a consistent and comfortable listening level through Sennheiser HDA 200 headphones and provided responses using the MacBook’s trackpad.

Research assistants verbally instructed each participant to rate the perceived emotion after each excerpt using two scales: valence and arousal. The instructions explained valence as referring to how positive or negative the expressed emotion sounded, as rated on a scale from 1 (negative) to 7 (positive), arousal represented the energy of the emotion to be rated on a scale from 1 (low) to 100 (high). Participants had been encouraged to use to the full range of the scales and reminded to rate the emotion they heard and not the emotion they felt. Participants completed four practice trials before beginning the experiment, using recordings of the same album performed by Angela Hewitt (Bach 1998). Each participant listened to an individually randomized order of the 48 excerpts. Following the experiment, participants completed the Goldsmiths Musical Sophistication Index (Müllensiefen et al. 2014) and provided responses of familiarity to the musical stimuli (Appendix D).

Analyses

Regression analysis

We assessed our cues as potential predictors for mean ratings of valence and arousal using standard linear multiple regression analysis from the R Statistical Package. The Major mode is chosen as the reference level for mode, meaning the remaining level of our categorical variable (minor) is contrasted against it in the analysis. For mean ratings of valence, all three cues, attack rate, mode and pitch height emerged as significant predictors (Table 1). For mean ratings of arousal, only attack rate emerged as a significant predictor (Table 1).

Table 1 Regression model for normalized attack rate, mode, pitch height on valence and arousal ratings

Emotion and expertise: how listeners with formal music training use cues to perceive emotion

Abstract

Similar content being viewed by others

The pleasantness of sensory dissonance is mediated by musical style and expertise

Audiovisual integration in the McGurk effect is impervious to music training

Cultural familiarity and musical expertise impact the pleasantness of consonance/dissonance but not its perceived tension

Individual differences and musical training

Evidence for training’s effect on emotion perception in music

Ambiguity in our understanding of training’s effect

Present study

Experiment 1 (eight measure excerpts)

Method

Participants

Musical stimuli

Cue quantification

Design and procedure

Analyses

Regression analysis

Commonality analysis

Results

Regression analysis

Commonality analysis

Comparison to untrained listener data

Experiment 2 (musically resolved excerpts)

Method

Cue quantification

Results

Regression analysis

Commonality analysis

Comparison to non-expert data

Comparison between experiments 1 and 2

Potential effects of familiarity for listeners with music training

General discussion

Musical ‘expertise’ and perception/perceptual differences

Musical training and mode

Concluding thoughts

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Appendices

Appendix A

Appendix B

Appendix C

Appendix D

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation