Classification and Weakly Supervised Pain Localization using Multiple Segment Representation.,Image and Vision Computing

当前位置： X-MOL 学术 › Image Vis. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Classification and Weakly Supervised Pain Localization using Multiple Segment Representation.
Image and Vision Computing ( IF 4.2 ) Pub Date : 2014-03-04 , DOI: 10.1016/j.imavis.2014.02.008
Karan Sikka ₁ , Abhinav Dhall ₂ , Marian Stewart Bartlett ₁

Affiliation

Automatic pain recognition from videos is a vital clinical application and, owing to its spontaneous nature, poses interesting challenges to automatic facial expression recognition (AFER) research. Previous pain vs no-pain systems have highlighted two major challenges: (1) ground truth is provided for the sequence, but the presence or absence of the target expression for a given frame is unknown, and (2) the time point and the duration of the pain expression event(s) in each video are unknown. To address these issues we propose a novel framework (referred to as MS-MIL) where each sequence is represented as a bag containing multiple segments, and multiple instance learning (MIL) is employed to handle this weakly labeled data in the form of sequence level ground-truth. These segments are generated via multiple clustering of a sequence or running a multi-scale temporal scanning window, and are represented using a state-of-the-art Bag of Words (BoW) representation. This work extends the idea of detecting facial expressions through ‘concept frames’ to ‘concept segments’ and argues through extensive experiments that algorithms such as MIL are needed to reap the benefits of such representation.

The key advantages of our approach are: (1) joint detection and localization of painful frames using only sequence-level ground-truth, (2) incorporation of temporal dynamics by representing the data not as individual frames but as segments, and (3) extraction of multiple segments, which is well suited to signals with uncertain temporal location and duration in the video. Extensive experiments on UNBC-McMaster Shoulder Pain dataset highlight the effectiveness of the approach by achieving competitive results on both tasks of pain classification and localization in videos. We also empirically evaluate the contributions of different components of MS-MIL. The paper also includes the visualization of discriminative facial patches, important for pain detection, as discovered by our algorithm and relates them to Action Units that have been associated with pain expression. We conclude the paper by demonstrating that MS-MIL yields a significant improvement on another spontaneous facial expression dataset, the FEEDTUM dataset.

中文翻译：

使用多段表示的分类和弱监督疼痛定位。

从视频中自动识别疼痛是一项重要的临床应用，由于其自发性，对自动面部表情识别 (AFER) 研究提出了有趣的挑战。以前的疼痛与无疼痛系统突出了两个主要挑战：（1）为序列提供了基本事实，但给定帧的目标表达的存在与否是未知的，以及（2）时间点和持续时间每个视频中疼痛表达事件的数量是未知的。为了解决这些问题，我们提出了一个新的框架（称为 MS-MIL），其中每个序列都表示为一个包含多个片段的袋子，并采用多实例学习（MIL）以序列级别的形式处理这种弱标记数据事实真相。这些片段是通过序列的多重聚类或运行多尺度时间扫描窗口生成的，并使用最先进的词袋 (BoW) 表示法表示。这项工作将通过“概念框架”检测面部表情的想法扩展到“概念片段”，并通过大量实验论证，需要诸如 MIL 之类的算法来获得这种表示的好处。

我们方法的主要优点是：（1）仅使用序列级真实数据联合检测和定位痛苦帧，（2）通过将数据表示为片段而不是单个帧来合并时间动态，以及（3）多段提取，非常适合视频中时间位置和持续时间不确定的信号。UNBC-McMaster 肩痛数据集上的大量实验通过在视频中的疼痛分类和定位任务上取得有竞争力的结果，突出了该方法的有效性。我们还根据经验评估了 MS-MIL 不同组件的贡献。该论文还包括对疼痛检测很重要的判别面部斑块的可视化，正如我们的算法所发现的那样，并将它们与与疼痛表达相关的动作单元相关联。我们通过证明 MS-MIL 对另一个自发面部表情数据集 FEEDTUM 数据集产生显着改进来总结本文。

更新日期：2014-03-04

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11