CNN-Based Facial Expression Recognition from Annotated RGB-D Images for Human–Robot Interaction,International Journal of Humanoid Robotics

当前位置： X-MOL 学术 › Int. J. Hum. Robot. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CNN-Based Facial Expression Recognition from Annotated RGB-D Images for Human–Robot Interaction
International Journal of Humanoid Robotics ( IF 0.9 ) Pub Date : 2019-06-06 , DOI: 10.1142/s0219843619410020
Jing Li _{1,

2} , Yang Mi ₁ , Gongfa Li ₂ , Zhaojie Ju _{3,

4}

Affiliation

Facial expression recognition has been widely used in human computer interaction (HCI) systems. Over the years, researchers have proposed different feature descriptors, implemented different classification methods, and carried out a number of experiments on various datasets for automatic facial expression recognition. However, most of them used 2D static images or 2D video sequences for the recognition task. The main limitations of 2D-based analysis are problems associated with variations in pose and illumination, which reduce the recognition accuracy. Therefore, an alternative way is to incorporate depth information acquired by 3D sensor, because it is invariant in both pose and illumination. In this paper, we present a two-stream convolutional neural network (CNN)-based facial expression recognition system and test it on our own RGB-D facial expression dataset collected by Microsoft Kinect for XBOX in unspontaneous scenarios since Kinect is an inexpensive and portable device to capture both RGB and depth information. Our fully annotated dataset includes seven expressions (i.e., neutral, sadness, disgust, fear, happiness, anger, and surprise) for 15 subjects (9 males and 6 females) aged from 20 to 25. The two individual CNNs are identical in architecture but do not share parameters. To combine the detection results produced by these two CNNs, we propose the late fusion approach. The experimental results demonstrate that the proposed two-stream network using RGB-D images is superior to that of using only RGB images or depth images.

中文翻译：

用于人机交互的带注释 RGB-D 图像中基于 CNN 的面部表情识别

面部表情识别已广泛应用于人机交互（HCI）系统。多年来，研究人员提出了不同的特征描述符，实现了不同的分类方法，并在各种数据集上进行了多项实验，用于自动面部表情识别。然而，他们中的大多数使用 2D 静态图像或 2D 视频序列来完成识别任务。基于 2D 的分析的主要限制是与姿势和光照变化相关的问题，这会降低识别精度。因此，另一种方法是结合 3D 传感器获取的深度信息，因为它在姿势和光照方面都是不变的。在本文中，我们提出了一个基于双流卷积神经网络 (CNN) 的面部表情识别系统，并在我们自己的 RGB-D 面部表情数据集上对其进行测试，该数据集由 Microsoft Kinect 为 XBOX 在非自发场景中收集，因为 Kinect 是一种廉价且便携的设备，可以同时捕捉RGB 和深度信息。我们的完全注释数据集包括 15 名 20 至 25 岁受试者（9 名男性和 6 名女性）的 7 种表情（即中性、悲伤、厌恶、恐惧、快乐、愤怒和惊讶）。这两个单独的 CNN 在架构上是相同的，但不共享参数。为了结合这两个 CNN 产生的检测结果，我们提出了后期融合方法。实验结果表明，所提出的使用 RGB-D 图像的双流网络优于仅使用 RGB 图像或深度图像的网络。

更新日期：2019-06-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11