Learning the sound inventory of a complex vocal skill via an intrinsic reward,Science Advances

当前位置： X-MOL 学术 › Sci. Adv. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning the sound inventory of a complex vocal skill via an intrinsic reward
Science Advances ( IF 13.6 ) Pub Date : 2024-03-27 , DOI: https://www.science.org/doi/10.1126/sciadv.adj3824
Hazem Toutounji, Anja T. Zai, Ofer Tchernichovski, Richard H. R. Hahnloser, Dina Lipkind

Reinforcement learning (RL) is thought to underlie the acquisition of vocal skills like birdsong and speech, where sounding like one’s “tutor” is rewarding. However, what RL strategy generates the rich sound inventories for song or speech? We find that the standard actor-critic model of birdsong learning fails to explain juvenile zebra finches’ efficient learning of multiple syllables. However, when we replace a single actor with multiple independent actors that jointly maximize a common intrinsic reward, then birds’ empirical learning trajectories are accurately reproduced. The influence of each actor (syllable) on the magnitude of global reward is competitively determined by its acoustic similarity to target syllables. This leads to each actor matching the target it is closest to and, occasionally, to the competitive exclusion of an actor from the learning process (i.e., the learned song). We propose that a competitive-cooperative multi-actor RL (MARL) algorithm is key for the efficient learning of the action inventory of a complex skill.

中文翻译：

通过内在奖励学习复杂声乐技能的声音清单

强化学习（RL）被认为是获得鸟鸣和演讲等声音技能的基础，其中听起来像一个人的“导师”是有益的。然而，什么 RL 策略可以生成丰富的歌曲或语音声音库？我们发现鸟鸣学习的标准演员-评论家模型无法解释幼年斑胸草雀对多音节的有效学习。然而，当我们用多个独立的参与者代替单个参与者时，共同最大化共同的内在奖励，那么鸟类的经验学习轨迹就可以准确地再现。每个参与者（音节）对全局奖励大小的影响是由其与目标音节的声学相似性竞争性确定的。这导致每个演员匹配它最接近的目标，并且偶尔会导致演员从学习过程（即学习的歌曲）中竞争性地被排除。我们提出竞争合作多参与者强化学习 (MARL) 算法是有效学习复杂技能动作清单的关键。

更新日期：2024-03-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>