当前位置: X-MOL 学术IEEE Trans. Ind. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automation of Recording in Smart Classrooms via Deep Learning and Bayesian Maximum a Posteriori Estimation of Instructor's Pose
IEEE Transactions on Industrial Informatics ( IF 11.7 ) Pub Date : 7-24-2020 , DOI: 10.1109/tii.2020.3011688
Mohammad Sayad Haghighi , Alireza Sheikhjafari , Ali Reza Jolfaei , Faezeh Farivar , Sahar Ahmadzadeh

Internet of Things is making objects smarter and more autonomous. At the other side, online education is gaining momentum and many universities are now offering online degrees. Content preparation for such programs usually involves recording the classes. In this article, we intend to introduce a deep learning-based camera management system as a substitute for the academic filming crew. The solution mainly consists of two cameras and a wearable gadget for the instructor. The fixed camera is used for the instructor's position and pose detection and the pan_tilt_zoom (PTZ) camera does the filming. In the proposed solution, image processing and deep learning techniques are merged together. Face recognition and skeleton detection algorithms are used to detect the position of instructor. But the main contribution lies in the application of deep learning for instructor's skeleton detection and postprocessing of the deep network output for correction of the pose detection results using a Bayesian Maximum A Posteriori (MAP) estimator. This estimator is defined on a Markov state machine. The pose detection result along with the position info is then used by the PTZ camera controller for filming purposes. The proposed solution is implemented by using OpenPose which is a convolutional neural network for detection of body parts. Feeding a neural network pose classifier with 12 features extracted from the output of the deep network yields an accuracy of 89%. However, as we show, the accuracy can be improved by the Markov model and MAP estimator to reach as high as 95.5%.

中文翻译:


通过深度学习和贝叶斯最大教师姿势的后验估计实现智能教室录音自动化



物联网正在使物体变得更加智能、更加自主。另一方面,在线教育正在蓬勃发展,许多大学现在都提供在线学位。此类节目的内容准备通常涉及录制课程。在本文中,我们打算介绍一种基于深度学习的摄像机管理系统,作为学术摄制组的替代品。该解决方案主要包括两个摄像头和一个供教练使用的可穿戴设备。固定摄像机用于教练的位置和姿势检测,平移倾斜变焦 (PTZ) 摄像机进行拍摄。在所提出的解决方案中,图像处理和深度学习技术被合并在一起。使用人脸识别和骨骼检测算法来检测教练的位置。但主要贡献在于将深度学习应用于教师的骨骼检测和深度网络输出的后处理,以使用贝叶斯最大后验概率 (MAP) 估计器校正姿势检测结果。该估计器是在马尔可夫状态机上定义的。然后,PTZ 摄像机控制器使用姿势检测结果和位置信息进行拍摄。所提出的解决方案是通过使用 OpenPose 来实现的,OpenPose 是一种用于检测身体部位的卷积神经网络。向神经网络姿势分类器提供从深度网络输出中提取的 12 个特征,可产生 89% 的准确率。然而,正如我们所展示的,马尔可夫模型和 MAP 估计器可以将准确度提高到高达 95.5%。
更新日期:2024-08-22
down
wechat
bug