Multimodal Engagement Analysis from Facial Videos in the Classroom,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multimodal Engagement Analysis from Facial Videos in the Classroom
arXiv - CS - Multimedia Pub Date : 2021-01-11 , DOI: arxiv-2101.04215
Ömer Sümer, Patricia Goldberg, Sidney D'Mello, Peter Gerjets, Ulrich Trautwein, Enkelejda Kasneci

Student engagement is a key construct for learning and teaching. While most of the literature explored the student engagement analysis on computer-based settings, this paper extends that focus to classroom instruction. To best examine student visual engagement in the classroom, we conducted a study utilizing the audiovisual recordings of classes at a secondary school over one and a half month's time, acquired continuous engagement labeling per student (N=15) in repeated sessions, and explored computer vision methods to classify engagement levels from faces in the classroom. We trained deep embeddings for attentional and emotional features, training Attention-Net for head pose estimation and Affect-Net for facial expression recognition. We additionally trained different engagement classifiers, consisting of Support Vector Machines, Random Forest, Multilayer Perceptron, and Long Short-Term Memory, for both features. The best performing engagement classifiers achieved AUCs of .620 and .720 in Grades 8 and 12, respectively. We further investigated fusion strategies and found score-level fusion either improves the engagement classifiers or is on par with the best performing modality. We also investigated the effect of personalization and found that using only 60-seconds of person-specific data selected by margin uncertainty of the base classifier yielded an average AUC improvement of .084.

中文翻译：

教室中面部视频的多模式参与度分析

学生参与度是学习和教学的关键。虽然大多数文献探讨了基于计算机的环境下的学生参与度分析，但本文将重点扩展到课堂教学。为了最好地检查学生在课堂上的视觉参与度，我们进行了一项研究，利用一个半月时间的中学班级的视听记录，在重复的课程中获得每位学生的连续参与度标签（N = 15），并探索了计算机从教室中的面孔对参与度进行分类的视觉方法。我们针对注意力和情绪特征训练了深层嵌入，针对头部姿势估计训练了Attention-Net，对于面部表情识别训练了Affect-Net。我们还训练了不同的参与度分类器，包括支持向量机，这两个功能均具有随机森林，多层感知器和长期短期记忆。表现最好的参与度分类器在8级和12级分别获得了0.620和.720的AUC。我们进一步研究了融合策略，发现分数级融合可以改善参与度分类器，或者与表现最佳的模态相当。我们还研究了个性化的影响，发现仅使用60秒的基本分类器余量不确定性选择的特定于人的数据，平均AUC改善了0.084。我们进一步研究了融合策略，发现分数级融合可以改善参与度分类器，或者与表现最佳的模态相当。我们还研究了个性化的影响，发现仅使用60秒的基本分类器余量不确定性选择的特定于人的数据，平均AUC改善了0.084。我们进一步研究了融合策略，发现分数级融合可以改善参与度分类器，或者与表现最佳的模态相当。我们还研究了个性化的影响，发现仅使用60秒的基本分类器余量不确定性选择的特定于人的数据，平均AUC改善了0.084。

更新日期：2021-01-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文