当前位置:
X-MOL 学术
›
arXiv.cs.GR
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Transflower: probabilistic autoregressive dance generation with multimodal attention
arXiv - CS - Graphics Pub Date : 2021-06-25 , DOI: arxiv-2106.13871 Guillermo Valle-Pérez, Gustav Eje Henter, Jonas Beskow, André Holzapfel, Pierre-Yves Oudeyer, Simon Alexanderson
arXiv - CS - Graphics Pub Date : 2021-06-25 , DOI: arxiv-2106.13871 Guillermo Valle-Pérez, Gustav Eje Henter, Jonas Beskow, André Holzapfel, Pierre-Yves Oudeyer, Simon Alexanderson
Dance requires skillful composition of complex movements that follow
rhythmic, tonal and timbral features of music. Formally, generating dance
conditioned on a piece of music can be expressed as a problem of modelling a
high-dimensional continuous motion signal, conditioned on an audio signal. In
this work we make two contributions to tackle this problem. First, we present a
novel probabilistic autoregressive architecture that models the distribution
over future poses with a normalizing flow conditioned on previous poses as well
as music context, using a multimodal transformer encoder. Second, we introduce
the currently largest 3D dance-motion dataset, obtained with a variety of
motion-capture technologies, and including both professional and casual
dancers. Using this dataset, we compare our new model against two baselines,
via objective metrics and a user study, and show that both the ability to model
a probability distribution, as well as being able to attend over a large motion
and music context are necessary to produce interesting, diverse, and realistic
dance that matches the music.
中文翻译:
Transflower:具有多模态注意力的概率自回归舞蹈生成
舞蹈需要根据音乐的节奏、音调和音色特征巧妙地组合复杂的动作。形式上,以一段音乐为条件生成舞蹈可以表示为对以音频信号为条件的高维连续运动信号进行建模的问题。在这项工作中,我们为解决这个问题做出了两个贡献。首先,我们提出了一种新颖的概率自回归架构,该架构使用多模态转换器编码器对未来姿势的分布进行建模,该架构具有以先前姿势和音乐上下文为条件的归一化流程。其次,我们介绍了目前最大的 3D 舞蹈动作数据集,该数据集是通过各种动作捕捉技术获得的,包括专业和休闲舞者。使用这个数据集,我们将我们的新模型与两个基线进行比较,
更新日期:2021-06-29
中文翻译:
Transflower:具有多模态注意力的概率自回归舞蹈生成
舞蹈需要根据音乐的节奏、音调和音色特征巧妙地组合复杂的动作。形式上,以一段音乐为条件生成舞蹈可以表示为对以音频信号为条件的高维连续运动信号进行建模的问题。在这项工作中,我们为解决这个问题做出了两个贡献。首先,我们提出了一种新颖的概率自回归架构,该架构使用多模态转换器编码器对未来姿势的分布进行建模,该架构具有以先前姿势和音乐上下文为条件的归一化流程。其次,我们介绍了目前最大的 3D 舞蹈动作数据集,该数据集是通过各种动作捕捉技术获得的,包括专业和休闲舞者。使用这个数据集,我们将我们的新模型与两个基线进行比较,