当前位置: X-MOL 学术arXiv.cs.NE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fine-Grained Classroom Activity Detection from Audio with Neural Networks
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2021-07-29 , DOI: arxiv-2107.14369
Eric Slyman, Chris Daw, Morgan Skrabut, Ana Usenko, Brian Hutchinson

Instructors are increasingly incorporating student-centered learning techniques in their classrooms to improve learning outcomes. In addition to lecture, these class sessions involve forms of individual and group work, and greater rates of student-instructor interaction. Quantifying classroom activity is a key element of accelerating the evaluation and refinement of innovative teaching practices, but manual annotation does not scale. In this manuscript, we present advances to the young application area of automatic classroom activity detection from audio. Using a university classroom corpus with nine activity labels (e.g., "lecture," "group work," "student question"), we propose and evaluate deep fully connected, convolutional, and recurrent neural network architectures, comparing the performance of mel-filterbank, OpenSmile, and self-supervised acoustic features. We compare 9-way classification performance with 5-way and 4-way simplifications of the task and assess two types of generalization: (1) new class sessions from previously seen instructors, and (2) previously unseen instructors. We obtain strong results on the new fine-grained task and state-of-the-art on the 4-way task: our best model obtains frame-level error rates of 6.2%, 7.7% and 28.0% when generalizing to unseen instructors for the 4-way, 5-way, and 9-way classification tasks, respectively (relative reductions of 35.4%, 48.3% and 21.6% over a strong baseline). When estimating the aggregate time spent on classroom activities, our average root mean squared error is 1.64 minutes per class session, a 54.9% relative reduction over the baseline.

中文翻译:

使用神经网络从音频中进行细粒度课堂活动检测

教师越来越多地将以学生为中心的学习技巧融入课堂,以提高学习成果。除了讲座之外,这些课程还涉及个人和小组工作的形式,以及更高的师生互动率。量化课堂活动是加速创新教学实践评估和改进的关键要素,但手动注释无法扩展。在这份手稿中,我们介绍了从音频中自动检测课堂活动这一年轻应用领域的进展。使用具有九个活动标签(例如,“讲座”、“小组作业”、“学生问题”)的大学课堂语料库,我们提出并评估了深度全连接、卷积和循环神经网络架构,比较了 mel-filterbank 的性能, OpenSmile 和自我监督的声学特征。我们将 9 路分类性能与任务的 5 路和 4 路简化进行比较,并评估两种类型的概括:(1)来自以前见过的讲师的新课程,以及(2)以前未见过的讲师。我们在新的细粒度任务和 4-way 任务上获得了最先进的结果:我们最好的模型在推广到看不见的导师时获得了 6.2%、7.7% 和 28.0% 的帧级错误率分别为 4 路、5 路和 9 路分类任务(相对于强基线分别减少 35.4%、48.3% 和 21.6%)。在估计课堂活动花费的总时间时,我们每节课的平均均方根误差为 1.64 分钟,相对于基线减少了 54.9%。我们将 9 路分类性能与任务的 5 路和 4 路简化进行比较,并评估两种类型的概括:(1)来自以前见过的讲师的新课程,以及(2)以前未见过的讲师。我们在新的细粒度任务和 4-way 任务上获得了最先进的结果:我们最好的模型在推广到看不见的导师时获得了 6.2%、7.7% 和 28.0% 的帧级错误率分别为 4 路、5 路和 9 路分类任务(相对于强基线分别减少 35.4%、48.3% 和 21.6%)。在估计课堂活动花费的总时间时,我们每节课的平均均方根误差为 1.64 分钟,相对于基线减少了 54.9%。我们将 9 路分类性能与任务的 5 路和 4 路简化进行比较,并评估两种类型的概括:(1)来自以前见过的讲师的新课程,以及(2)以前未见过的讲师。我们在新的细粒度任务和 4-way 任务上获得了最先进的结果:我们最好的模型在推广到看不见的导师时获得了 6.2%、7.7% 和 28.0% 的帧级错误率分别为 4 路、5 路和 9 路分类任务(相对于强基线分别减少 35.4%、48.3% 和 21.6%)。在估计课堂活动花费的总时间时,我们每节课的平均均方根误差为 1.64 分钟,相对于基线减少了 54.9%。
更新日期:2021-08-02
down
wechat
bug