Introduction

Anthropogenic noise is a form of human-induced rapid environmental change (Sih 2013) that imposes new selection pressure on organisms, especially on those that use sound to communicate. Indeed, many studies show that the acoustic communication of animals can be affected by noise pollution (reviewed by Brumm 2013; Morley et al. 2014; Wiley 2015; Brumm and Horn 2019), with most of the evidence coming from studies on birds (reviewed by Brumm and Zollinger 2013; Dooling and Blumenrath 2013; Slabbekoorn 2013).

It has often been observed that birds in cities sing at a higher pitch than their conspecifics at quieter rural or woodland sites (reviewed by Brumm and Zollinger 2013; Roca et al. 2016). Birds use their songs as mating signals, and thus, differences in the efficiency of signal transmission are likely to have fitness consequences (Catchpole and Slater 2008). Which spectral song components are elevated in noise-polluted areas, however, varies between species. Many studies reported increased minimum frequencies, e.g., in great tits Parus major (Slabbekoorn and den Boer-Visser 2006), house finches Carpodacus mexicanus (Bermudez-Cuamatzin et al. 2009), silvereyes Zosterops lateralis (Potvin et al. 2011), and European robins Erithacus rubecula (Montague et al. 2013). Others found higher peak frequencies, e.g., in Eurasian blackbirds Turdus merula (Ripmeester et al. 2010; Nemeth and Brumm 2009), ash-throated flycatchers Myiarchus cinerascens (Francis et al. 2011), and black-capped chickadees Poecile atricapillus (Proppe et al. 2011). It is repeatedly concluded that these changes are adaptations to anthropogenic low-frequency noise (e.g., Slabbekoorn and den Boer-Visser 2006; Hu and Cardoso 2010; Roca et al. 2016), but this notion is subject of an ongoing debate (c.f. Nemeth et al. 2013; Brumm and Bee 2016).

Several mechanisms have been proposed to explain how city birds could shift their song frequencies upwards, namely developmental plasticity, behavioral plasticity, and microevolutionary changes (Gil and Brumm 2014; Zollinger et al. 2017). The latter is related to population-wide shifts in spectral song characters over several generations; that is, this explanation assumes that city environments select for higher song frequencies. Indeed, higher frequencies may be more audible in the din of urban traffic noise (Pohl et al. 2012), which is typically low in frequency with most energy concentrated below 2 kHz (Can et al. 2010). The idea of noise-related microevolutionary shifts is a special case of the more general acoustic adaptation hypothesis, which argues that bird song features become adapted to the sound transmission characteristics of the environment (Morton 1975; Boncoraglio and Saino 2007; Phillips et al. 2020).

Behavioral plasticity refers to individual song adjustments in response to changes in the environment. A well-studied example of noise-dependent behavioral song plasticity is the Lombard effect, in which birds (just like mammals) regulate the amplitude of their vocalizations depending on environmental noise levels (reviewed by Brumm and Zollinger 2011). Similarly, it had been suspected that birds might also regulate their song pitch, i.e., that they temporarily shift their songs upwards when their song frequency is overlapped by noise (Brumm and Slabbekoorn 2005). However, the current evidence is mixed, with some studies supporting this hypothesis (Verzijden et al. 2010; Bermúdez-Cuamatzin et al. 2011; Goodwin and Podos 2013; LaZerte et al. 2016) while others do not (Grace and Anderson 2015; Potvin and MacDougall-Shackleton 2015; Zollinger et al. 2017; Rios-Chelen et al. 2018). Behavioral plasticity of song frequencies could either be a direct regulation to mitigate masking or a non-functional by-product of the birds’ attempts to sing louder in noise. In fact, the Lombard effect can be accompanied by a passive rise in vocal pitch that occurs irrespective of any release from masking (Lu and Cooke 2009). That this connection might explain the higher songs of city birds had been suggested by Brumm and Naguib (2009), but recent experimental studies showed that this is not the case in great tits nor in white-crowned sparrows (Zonotrichia leucophrys), which stay on pitch when singing and calling with increased amplitude in noise (Templeton et al. 2016; Derryberry et al. 2017; Zollinger et al. 2017). In non-songbirds, however, it appears that raised vocal frequencies in noise can be an epiphenomenon of the Lombard effect (Schuster et al. 2012). Songbirds are special in that they acquire their songs through vocal learning (or more accurately: through vocal production learning, sensu Janik and Slater (2000)) and because of this, their songs are thought to adjust particularly quickly to the acoustic properties of their habitat (Brumm and Naguib 2009; Rios-Chelen et al. 2012).

Song acquisition through vocal production learning usually follows a typical route in birds (Hultsch and Todt 2008). First, in the sensory phase, young birds listen to tutors and memorize their songs. Then, during the sensorimotor phase, tutees gradually match the characteristics of their vocalizations to the memorized tutor template. In a last step, the “song crystallization,” they advance to their full adult songs. If young birds during their sensory learning period are tasked with listening to tutors in an environment filled with low-frequency noise, they may therefore hear the higher frequency components better, and hence be more likely to learn those aspects of the tutor songs (Hansen 1979; Brumm 2006). Another reason why anthropogenic low-frequency noise may lead to increased song frequencies lies in the feedback loop between auditory perception and vocal production that is necessary for normal song development (Konishi 1964; Marler and Waser 1977). Birds need to hear themselves to control their vocal output, which is particularly important for juveniles to monitor their own song during the sensorimotor phase of vocal learning (Brainard and Doupe 2000). Low-frequency noise may interfere with this and so bias song development towards higher frequencies. Thus, anthropogenic noise in cities can potentially affect bird song learning in two ways, by partial masking of tutor songs and by impairing the self-assessment of tutees during vocal development, both of which could result in higher song pitches.

Intriguing as the vocal learning hypothesis may be, the small number of experiments that have tested it so far yielded conflicting results. A study on white-crowned sparrows found that exposure to low-frequency noise during the vocal learning period led to the development of increased minimum and peak frequencies (Moseley et al. 2018). However, a similar effect could not be observed in either great tits (Zollinger et al. 2017) or zebra finches (Taenopygia guttata) (Potvin et al. 2016). When evaluating this mixed evidence, one must consider that none of these previous studies used realistic urban noise but rather exposed birds to synthesized noise. While synthesized noises can be very useful to reliably mask certain frequencies, they may have little ecological relevance. Real urban noise typically fluctuates heavily in amplitude, whereas synthetic noises used in song learning experiments were constant (with the exception of one of two experiments by Potvin et al. (2016), which used traffic noise recordings mixed with various sound files downloaded from the internet).

Here we report findings from two separate song learning experiments with zebra finches that both used natural noise recordings from urban habitats, broadcast at realistic sound levels. While, to our knowledge, it has never been shown that free-ranging zebra finches adjust their song frequencies in anthropogenic noise, zebra finches are the main model for song learning (Tchernichovski et al. 2001; Brainard and Doupe 2002; Hultsch and Todt 2008) and their vocal ontogeny has been studied in detail under various developmental conditions (Zann and Cash 2008; Brumm et al. 2009; Boogert et al. 2018). Therefore, the species is most useful for studies addressing basic properties of auditory learning processes in birds, such as our experiments. If the higher pitch of city birds is the outcome of interference from urban noise during vocal learning, we would expect that birds would develop songs with higher frequencies when they are exposed to noise during their sensitive learning period. This effect is expected to be strongest in the minimum song frequencies because these are masked most heavily by typical low-frequency urban noise.

Material and methods

Urban noise recordings

Urban noise was recorded during April 2013 in the city of Munich, Germany in several bird habitats close to busy roads. The recordings were made with a sampling rate of 44.1 kHz using a solid-state recorder (Marantz PMD-660) with an omnidirectional microphone (Sennheiser ME 62). Fifty-two 5-min recordings were made along four- or six-lane roads. Thus, the recorded urban noise was mainly street traffic, but single sound files also contained pedestrian and bicyclist noises or airplanes flying overhead. As is typical for urban traffic noise, the main sound energy was concentrated at frequencies below 2 kHz (Fig. 1a) and the sound pressure level of the recordings varied considerably within the 5-min files (Fig. 1b). In the set of 52 traffic noise recordings, the average amplitude variation was 16.4 dB (range 13.1–20.3 dB).

Fig. 1
figure 1

Characteristics of the experimental noise. (a) Averaged power spectrum of all 52 urban noise recordings used in experiments 1 and 2 (Hamming window, 8192-point FFT). (b) Amplitude curve of one exemplary 5-min noise recording that was used in both experiments. In this example, the maximum fluctuation of sound pressure levels was 16 dB (RMS, averaging time: 125 ms), which is the average value of maximum amplitude fluctuation across all 52 noise recordings that were used in experiments 1 and 2

Song learning experiments

Adult male zebra finches typically sing a short song motif of 0.5- to 1.5-s duration that is repeated several times to form a singing bout (Zann 1996). It has been shown that the sensitive period for song memorization in this species usually starts after 25 days post-hatch (dph) and that no memorization occurs before at least 17 dph (Braaten 2010). Adult song normally crystallizes at about 90 dph, and after that, it remains largely unchanged for the rest of the bird’s life. We used two experimental approaches to investigate the effect of chronic urban noise on the development of adult song frequencies. In experiment 1, we song tutored young zebra finches individually with passive playback and controlled their auditory input during their entire song ontogeny. In experiment 2, young zebra finches learned their songs in a colony setting with several potential live tutors, which is obviously less controlled but therefore more similar to the natural situation. Learning from live tutors usually results in more accurate copies than passive tutoring with song playback (Derégnaucourt et al. 2013). All zebra finches used in the experiments were bred from 1- to 3-year-old adult birds of mixed ancestry from the colonies at the Max Planck Institute for Ornithology in Seewiesen, Germany.

Experiment 1: Individual song tutoring in sound boxes

We allowed pairs of adult zebra finches to breed in cages (42 × 120 cm and 42 cm high) in custom-made sound-shielded boxes (inner dimensions: 55 × 149 cm and 61 cm high). The boxes were equipped with a ventilation system, lighting set on a 12:12 light:dark cycle (lights on from 0700 to 1900 hours), and two loudspeakers (Kenwood KFC-1789ie) each mounted in the small sides of the sound box facing the center of the cage. Each cage was provided with two perches, a nest box and ad libitum nesting materials, seeds, commercial finch egg food, oyster shell grit, cuttlebone, and water. In addition, birds were offered fresh vegetables and hard-boiled eggs twice a week throughout the experimental period.

Eight days post-hatching (dph), all young were genetically sexed and then were swapped between pairs so that all experimental nests contained two male chicks and one or two female chicks. On 17 dph, the male partner of each pair was removed from the cage to prevent the young to learn his song. Instead, all young were tutored from 18 to 100 dph daily by playback of a particular zebra finch song type that had yielded high learning success in previous experiments. As playback files, we used three recordings (sample rate: 44.1 kHz) from the same male, one with five motifs, one with six, and one with eight. Six times per day (at 0800, 0900, 1100, 1500, 1600, and 1800 hours), 42 motifs were played back (four times the 5-motif file, once the 6-motif file, and twice the 8-motif file in random order). This playback procedure was run by a MATLAB script on a computer whose audio output fed into an Apart Champ 4 amplifier that was connected to the loudspeakers in the sound box. The tutor files were only played back through one of the two loudspeakers in the sound box and alternated randomly between the left and the right speaker. The playback volume of the tutor files was set to a peak amplitude of 75 dB(A) SPL (re. 20 μPa), measured at the position of the perch closest to the loudspeaker. This playback amplitude corresponds to an adult male singing at a distance of about 50 cm (Brumm 2009; Ritschard and Brumm 2011).

In the noise treatment, the 52 5-min files with urban noise were played back constantly during the daylight hours in randomized order from 18 to 120 dph, using the same setup as for the tutor playback, with exception that the noise was broadcast through both loudspeakers. The noise level ranged between 60 and 80 dB(A) SPL (at the position of the two perches) during the day. Nighttime playback (19:00–07:00) consisted of randomized playbacks of an additional 40 noise recordings, which were less dense in the rate of passing traffic than the daytime recordings and were reduced in peak amplitudes, with playback levels ranging between 50 and 70 dB(A) SPL. In the CONTROL treatment, only the tutor songs were played back and no urban noise files. The noise level inside the silent sound-shielded boxes (i.e., when no playback was broadcast and excluding bird noises) ranged between 31 and 33 dB(A) SPL and was mainly due to the ventilation system. In total, we analyzed the songs of 21 males, eleven from the CONTROL group and ten from the NOISE treatment.

Experiment 2: Learning from live birds in a colony setting

In the second experiment, young birds were bred and raised by their parents in aviaries, in which colonies of eight pairs bred at the same time. Thus, the young could potentially learn their songs from several adult tutors, their own fathers, and the other seven males in the aviary. The young were part of previous studies on the effects of traffic noise on physiology and reproductive success (for details of the husbandry and experimental setup, see Dorado-Correa et al. (2018) and Zollinger et al. (2019)). In brief, the aviaries in the NOISE treatment were exposed to the same urban traffic noise recordings used in experiment 1 combined with 28 additional traffic noise files that were recorded in the same way as the other 52 but along larger roads (six- to eight-lane motorways) in the city of Munich. From 18 to 120 dph, the noise files were played back continuously, in randomized order, with peak levels ranging between 65 and 85 dB(A) SPL during the day. The mean amplitude fluctuation across the 80 urban noise files of the combined set was 15.1 dB (range: 9.8–20.3 dB). During the night, the same set of noise files was used as in experiment 1, broadcast with peak levels between 45 and 75 dB(A) SPL. According to published noise maps (Bayerisches Landesamt 2012), the noise treatment in experiment 1 mimicked average noise levels at busy streets in the city of Munich, whereas in experiment 2 it mimicked average peak levels.

In the CONTROL aviaries of experiment 2, the ambient noise (excluding bird sounds) varied between 40 and 48 dB(A) SPL, depending on the cycling of the ventilation system. After 120 dph, the offspring were transferred to a single sex aviary (4.9 m × 2.6 m and 2.7 m high) where they were kept until song recoding. In total, we analyzed the songs of 20 males, nine from the CONTROL group and eleven from the NOISE treatment. Each bird analyzed was from a different nest and no more than two birds came from the same aviary.

Song recording

Song of each male was digitally recorded in the sound boxes described above (with the noise switched off) after the tutees reached maturity and crystallized their adult song type. Recordings were made with a sample rate of 44.1 kHz and 16-bit accuracy using a Monacor ECM-3005 microphone mounted vertically above the cage, whose signal was fed through a Pro Audio PR8E preamplifier into to an external Edirol UA-101 soundcard that was connected to a computer running the acoustic recording and analysis software Sound Analysis Pro 2011 (Tchernichovski et al. 2000). The males from experiment 1 were recorded at an age of 120 days post-hatch and those of experiment 2 at 1609–1988 days post-hatch. To stimulate singing, the males were kept in the cage together with a female.

Song analysis

We randomly selected ten song motifs from each male, in which we measured peak frequency (i.e., the frequency of maximum amplitude), mean frequency (i.e., the 50% percentile of the spectral energy distribution), minimum and maximum frequency (at a threshold of − 20 dB below the peak frequency), and the partial energy below 2 kHz (i.e., the amount of song in the most heavily masked frequency band). All measurements were done automatically in power spectra (350-Hz high pass filter, Hamming window, bandwidth 1.7 Hz, resolution 0.7 Hz) using the acoustic analysis software Avisoft SASLab Pro. (v. 5.2.14). Although the frequency measurements were done automatically by the software, we used a blinded protocol, in which the person conducting the measurements (YL) was not informed about the treatment of the experimental birds. Only after all acoustic measurements were completed, the treatments (NOISE or CONTROL) were revealed to the analyst to conduct statistical tests. This was done to exclude any potential observer-expectancy bias (Traniello and Bakker 2015).

Statistical analyses

We examined differences in peak frequency, mean frequency, maximum frequency, minimum frequency, and energy below 2 kHz with separate linear mixed models (LMM) in R (v. 3.6.1) using the package lme4 (Bates et al. 2015). In each model, the noise treatment was fitted as a fixed factor (CONTROL versus NOISE) and the individual as a random factor to account for the repeated measures design. We assessed the effect of noise exposure on song parameters by comparing models including treatment to null models using likelihood ratio tests with one degree of freedom.

Results

The adult songs developed by the birds in experiment 1 differed noticeably in their spectral parameters, with average peak frequencies ranging between individuals from 3.4 to 5.3 kHz, maximum frequencies between 5.3 and 9.1 kHz, mean frequencies between 3.8 and 5.3 kHz, minimum frequencies between 0.4 and 1.6 kHz, and the proportion of sound energy below 2 kHz from 4.1 to 13.7%. Also the birds in experiment 2 varied noticeably in their average song frequencies, with peak frequencies ranging between individuals from 3.4 to 4.9 kHz, maximum frequencies between 5.6 and 8.0 kHz, mean frequencies between 4.0 and 4.7 kHz, minimum frequencies between 0.4 and 1.8 kHz, and the proportion of sound energy below 2 kHz from 3.7 to 15.4%. However, the variation in spectral song parameters was not related to the experimental treatment (Fig. 2). In particular, we found no meaningful difference in minimum song frequencies between the NOISE and the CONTROL birds in the individually tutored males from experiment 1 (Table 1). Likewise, the noise-exposed birds from experiment 1 did not differ significantly from the controls in any of the other measured frequency parameters (Table 1). The same result was obtained in the birds from experiment 2, which learned their song in the colony setting: neither minimum frequency nor any of the other song parameters differed significantly between the NOISE and CONTROL group (Table 1).

Fig. 2
figure 2

The mean (± SD) spectral song characteristics of adult zebra finches that were exposed to chronic traffic noise during their sensitive vocal learning period (Noise) and control birds that were not exposed to traffic noise (Control). Experiment 1: individual song tutoring (NNoise = 10 birds, NControl = 11 birds); experiment 2: song learning in colony settings (NNoise = 11 birds, NControl = 9 birds). (a) Peak frequency, (b) maximum frequency, (c) mean frequency, (d) minimum frequency, (e) partial energy below 2 kHz

Table 1 Outcome of linear mixed effects models testing the effects of noise exposure during the vocal learning period on adult song parameters in zebra finches. Experiment 1: individual song tutoring (NNoise = 10 birds, NControl = 11 birds); experiment 2: song learning in colony settings (NNoise = 11 birds, NControl = 9 birds)

Discussion

We found that experimental exposure to realistic traffic noise during the sensorimotor learning period did not lead to increased song frequencies or any other systematic spectral differences in adult zebra finch song. It is known that there is substantial between-species variation in whether or how birds alter their song frequencies in response to urban noise (reviewed by Brumm and Zollinger 2013), and our results suggest that there is probably a similar degree of between-species variation in the mechanisms underlying observed changes.

Our results corroborate earlier studies on zebra finches and great tits, in which exposure to synthetic low-frequency noise during the sensorimotor period also did not give rise to increased adult song frequencies (Potvin et al. 2016; Zollinger et al. 2017). On the other hand, our data are in contrast to the results of Moseley et al. (2018) who found that artificial noise with the spectral shape of traffic noise did lead to the development of songs with increased peak frequencies in white-crowned sparrows. How can these divergent findings be explained? One possibility might be that these differences reflect species differences in the sensitivity of the auditory system to noise. The auditory thresholds of zebra finches are at least 15 dB higher than those of American sparrows; however, zebra finches have similar critical ratios (signal detection in masking noise) as American sparrows in the frequency bands across which both communicate (Okanoya and Dooling 1987). This suggests that both would be equally prone to auditory masking by traffic noise. A second potential explanation is that species differ in their responses to elevated background noise levels. Zebra finches differ from white-crowned sparrows and great tits in that they are not territorial, and thus, their song is more of a short-range signal. Hence, zebra finch song might be more robust in the face of noise because of the shorter transmission distance required. However, this cannot explain why noise exposure also did not lead to increased song frequencies in great tits (Zollinger et al. 2017), which are typical territorial birds that use their songs for long-range communication (Snijders et al. 2017).

The observed species differences could come, for example, from an underlying specific difference in the magnitude of ontogenetic plasticity. It could be that some species are unable to increase their song frequencies because of vocal production constraints, or because of stronger innate template rules in the brain that determine species-specific song. However, we are not aware of any published evidence that would suggest such a constraint. Another possible reason for the deviating results lies in the difference between the experimental noises. While the artificial noise in the experiment by Moseley et al. (2018) was constant in amplitude, we used real urban noise that fluctuated markedly. Urban noise levels are typically varying in sound level, depending, for instance, on changes in street traffic density or vehicle type, so that periods of high amplitude noise alternate with periods of comparably low amplitudes. In this context, it is important to note that as few as 20 song presentations are sufficient for young birds to successfully copy tutor song patterns (Hultsch and Todt 1989; Peters et al. 1992; Hultsch 1993). Therefore, if only few renditions of a tutor song coincide with periods of comparably low noise levels in urban habitats, that model song can still be properly memorized by young birds even in noise that reaches high peak values at other times. In this regard, our results show that typical traffic noise in cities is not sufficient to interfere with vocal learning in a way that birds such as the zebra finch develop higher pitched songs. This means that neither the lower frequency end of tutor songs is sufficiently masked by fluctuating traffic noise to prevent copying by tutees, nor that the auditory feedback loop necessary for normal song development is considerably impaired by the noise. Therefore, we would maintain that processes other than vocal learning constraints lead to elevated song frequencies in city birds.

Before the issue of which mechanism is underlying urban song divergence can be solved, it will be informative to have a closer look at the nature of the divergence itself. Although the story of city birds singing at higher minimum frequencies has now become a textbook example of behavioral responses to anthropogenic change (Morton 2017; Webster and Podos 2018), the phenomenon is less widely distributed than often assumed. This is because of two reasons: first, it is often overlooked that in the majority of studied bird species no increase of minimum frequencies in noisy areas was found (reviewed by Brumm and Zollinger 2013), and second, many studies that did report elevated minimum song frequencies in noise are flawed by measurement artifacts (c.f. Zollinger et al. 2012; Brumm et al. 2017; Rios-Chelen et al. 2017). But even in the valid cases, the question remains whether minimum frequencies actually have relevance for bird communication (Nemeth and Brumm 2010). Considering that minimum song frequencies measured at a threshold of −20 dB from the peak frequency contain less than 1% of the total signal energy, their contribution in terms of signal transmission is probably negligible. On the other hand, the reported changes in more biologically meaningful song parameters, such as peak frequency, indicate that there is indeed a case to be made for urban song divergence. Examples are white-crowned sparrows (Luther and Derryberry 2012) and Eurasian blackbirds (Nemeth and Brumm 2009; Ripmeester et al. 2010), in which city populations show an increase in peak frequency of about 200–500 Hz. However, our results indicate that selective learning of high frequencies in urban noise cannot generally explain this pattern.

In conclusion, our findings do not support the hypothesis that developmental plasticity in response to typical urban noise leads to an increase of song frequencies. Of course, more studies using realistic traffic noise are needed before one can draw more general conclusions, but our study suggests that vocal learning is, at least, not a common mechanism used by songbirds to adjust their songs to urban noise. A similar picture is also emerging for the notion of behavioral plasticity, which has been tested more often than the vocal learning hypothesis, but the current evidence is still inconsistent. The idea of microevolutionary changes remains unchallenged to date, but—being a historical hypothesis—this one is also the most difficult to test and continues to be largely unaddressed. Given that the increases in song pitch that are typically observed in city populations are not very effective in reducing masking by traffic noise (Nemeth and Brumm 2010), we propose that environmental selection in cities probably favors birds that use high frequencies because they can produce these at particularly high amplitudes (Nemeth et al. 2013). In addition to environmental selection, the subsequent spread of these songs in the population will, then, be accelerated by cultural transmission.