We as humans are an unusual species in the animal kingdom because we require hearing other humans speaking in order to be able to learn to speak ourselves. In contrast, most species that vocalize perform innate vocalizations that sound stereotyped even when they are raised in isolation. A small handful of species have been shown to learn their vocalizations the way humans do, including songbirds, parrots, hummingbirds, bats, cetaceans, and elephants. Because none of these vocal-learning species are particularly closely related to humans and many are not easy to study, vocal-learning birds have been studied the most. Studies have shown that despite birds and humans being separated by millions of years of evolution, they have striking parallels in the mechanisms and behavioral manifestations of their vocal learning abilities (see Jarvis, 2007, for review).

So far, the direct evidence for vocal learning in other species has come from very unnatural laboratory situations (e.g., comparing birds raised in acoustic isolation to birds raised with “tutor” vocalizations broadcast through a speaker). Evidence from more naturalistic settings has been much more indirect (e.g., based on regional dialects in vocalizations). Thus, although it was clear that songbirds can vocally learn, it was not clear how much they use this ability under natural conditions.

Mennill et al. (2018) show direct evidence of vocal learning in the wild for the first time. The authors took songs of Savannah sparrows (Passerculus sandwichensis) from one population, artificially manipulated the songs, and then played them back to a geographically separated focal population. The artificial songs contained many distinct elements that did not exist in the focal population’s vocalizations. Then, by recording and analyzing the songs produced by the focal population, the authors were able to show that the focal population birds learned to produce the experimentally played-back songs.

The timing of the played-back performances appeared to be highly relevant for learning. The authors presented some songs only in Spring, some only in Summer, and some during both seasons. This re-exposure to songs in summer from spring appeared to be critical. Songs that were presented in both spring and summer were performed by many more birds compared to artificial songs that were presented in spring only or summer only. Savannah sparrows are close-ended learners, only learning to produce song during their first year and then using that same song throughout life. Open-ended learners, in contrast, learn and adapt their songs even into adulthood. Now that there is evidence for close-ended vocal learning in the wild, it would be interesting to study an open-ended learner to see if there are similar timing constraints.

Even just in the Savannah sparrow there are a plethora of questions that now can be asked. For example, what are the minimum and maximum number of repetitions required for wild birds to choose an artificial song over a natural one? How far can the artificial songs deviate from the natural ones before they are no longer copied? Can this limit in deviation be extended over several generations (where the deviations continue to get larger)? The questions are endless. For example, the authors included many vocal elements in their artificial songs that were also used in the focal region. If they had used only novel elements, would the birds have recognized the artificial songs as being conspecific?

Podos (2018) expressed surprise that Mennill et al. were able to train wild birds with songs from loudspeakers at all. Podos’ surprise surprised me. In the wild, most of the time birds do not see each other but only hear each other. A function of acoustic, rather than visual, signals is to make communication possible between animals that cannot see each other. Sounds in the vocal range of birds can be reproduced accurately even with a small loudspeaker, and unless the bird investigates the sound source, it may be impossible for the bird to tell the difference.

Podos (2018) pointed to past laboratory studies showing that, given the choice between live tutors and acoustic-only tutors, birds prefer live tutors even if the live tutors are the wrong species. Given that the wild Savannah sparrows had many live tutors to rely on, they could have easily ignored the loudspeaker songs. However, the laboratory birds Podos mentions were often tested under very impoverished conditions. For example, when provided with visual and acoustic access to heterospecific and only acoustic access to conspecific birds, the white-crowned sparrow (Zonotrichia leucophrys) preferred learning heterospecific over conspecific song (Baptista & Petrinovich, 1984). But because these birds were hand raised in the laboratory, they did not have any experience identifying adult members of their species. Perhaps having access to live tutors allows birds to identify what features of vocalizations can be used to identify conspecifics. And only once this is established will birds use other vocalizations in the acoustic environment. In other words, it is possible that a minimal amount of live interaction with conspecifics is necessary in order to be able to identify conspecific vocalization. This seems likely, because if conspecific recognition were innate, this would potentially put too much constraint on possible vocal variability through cultural evolution. In contrast to the laboratory birds, the wild birds have plenty of experience with their parents to learn to identify what conspecifics sound like before they begin song learning themselves.

In fact, I expected the opposite of the conclusions of Podos (2018): that artificial songs might be preferred to natural ones. Firstly, the artificial songs were played about as often as the most frequent vocalizers in the species. In many species, more frequent singing is associated with greater dominance because birds have more energy to expend on singing at an increased risk of predation. Secondly, the recorded songs were identical each time they were played, unlike naturally occurring songs, which can be less consistent. In many species, consistency is also a sign of dominance because high quality males are more accurately able to perform songs. Finally, Mennill et al. (2018) mentioned that counter-singing is very important in this species. This is when a male overlaps song with another vocalizing male to express dominance. Because the recorded songs were played arbitrarily, it could have been by chance that they were more likely to overlap with other song that could also influence perceived dominance. If repetition, consistency, or acoustic overlap contributed to the perceived dominance of the artificial songs, they may also be perceived as particularly attractive to females. Mennill et al. pointed out that the males that produced artificial songs were just as likely to attract females as those that copied the natural songs. This suggests that the females of this species develop similar taste to the males, perhaps also based on their acoustic experience. However, it would be interesting to study the development of female preference more directly.

In the end, both Podos (2018) and I were correct: there are likely both acoustic and non-acoustic factors at play here. The birds did not copy the artificial songs as much as would be expected by repetition alone. This may have been because these songs were “disembodied,” as Podos says, or it could also have been because they were a bit too deviant from the current norms in the focal population. Using songs that are more similar to normal songs within the population could control for this, but the reason Mennill et al. (2018) used deviant songs was so that it would be easy to identify songs that were copied from the loudspeakers. They analyzed the songs manually, but given current technological advances it may soon be possible to do much more fine-grained analysis using automated techniques.

Overall, Mennill et al.’s (2018) article opens the door to many questions of what leads to changes in acoustic performance of a species over generations. There are interesting parallels to the human literature as well. Research with humans suggests that those with the highest social status are most likely to lead linguistic sound change (Milroy & Milroy, 1985). Perhaps further investigation of vocal learning across generations of wild birds could also contribute to our understanding of how changes in word fads and dialects occur in humans.