当前位置: X-MOL 学术Vision Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exploring biological motion perception in two-stream convolutional neural networks
Vision Research ( IF 1.8 ) Pub Date : 2020-10-19 , DOI: 10.1016/j.visres.2020.09.005
Yujia Peng 1 , Hannah Lee 1 , Tianmin Shu 2 , Hongjing Lu 3
Affiliation  

Visual recognition of biological motion recruits form and motion processes supported by both dorsal and ventral pathways. This neural architecture inspired the two-stream convolutional neural network (CNN) model, which includes a spatial CNN to process appearance information in a sequence of image frames, a temporal CNN to process optical flow information, and a fusion network to integrate the features extracted by the two CNNs and make final decisions about action recognition. In five simulations, we compared the CNN model's performance with classical findings in biological motion perception. The CNNs trained with raw RGB action videos showed weak performance in recognizing point-light actions. Additional transfer training with actions shown in other display formats (e.g., skeletal) was necessary for CNNs to recognize point-light actions. The CNN models exhibited largely viewpoint-dependent recognition of actions, with a limited ability to generalize to viewpoints close to the training views. The CNNs predicted the inversion effect in the presence of global body configuration, but failed to predict the inversion effect driven solely by local motion signals. The CNNs provided a qualitative account of some behavioral results observed in human biological motion perception for fine discrimination tasks with noisy inputs, such as point-light actions with disrupted local motion signals, and walking actions with temporally misaligned motion cues. However, these successes are limited by the CNNs’ lack of adaptive integration for form and motion processes, and failure to incorporate specialized mechanisms (e.g., a life detector) as well as top-down influences on biological motion perception.



中文翻译:

探索两流卷积神经网络中的生物运动感知

生物运动的视觉识别招募了由背侧和腹侧通路支持的形式和运动过程。这种神经架构启发了双流卷积神经网络 (CNN) 模型,其中包括一个空间 CNN 来处理图像帧序列中的外观信息,一个时间 CNN 来处理光流信息,以及一个融合网络来整合提取的特征由两个 CNN 对动作识别做出最终决定。在五次模拟中,我们将 CNN 模型的性能与生物运动感知中的经典发现进行了比较。使用原始 RGB 动作视频训练的 CNN 在识别点光源动作方面表现不佳。使用以其他显示格式(例如,骨架)显示的动作进行额外的转移训练对于 CNN 识别点光源动作是必要的。CNN 模型表现出很大程度上依赖于视点的动作识别,但泛化到接近训练视图的视点的能力有限。CNNs 预测了存在全局体配置的反转效应,但未能预测仅由局部运动信号驱动的反转效应。CNN 提供了在人类生物运动感知中观察到的一些行为结果的定性描述,用于具有噪声输入的精细辨别任务,例如具有中断局部运动信号的点光源动作,以及具有时间错位运动线索的步行动作。然而,这些成功受限于 CNN 缺乏对形式和运动过程的自适应整合,以及未能结合专门的机制(例如,

更新日期:2020-10-30
down
wechat
bug