当前位置: X-MOL 学术Comput. Vis. Image Underst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Unifying frame rate and temporal dilations for improved remote pulse detection
Computer Vision and Image Understanding ( IF 4.3 ) Pub Date : 2021-07-10 , DOI: 10.1016/j.cviu.2021.103246
Jeremy Speth 1 , Nathan Vance 1 , Patrick Flynn 1 , Kevin Bowyer 1 , Adam Czajka 1
Affiliation  

Remote photoplethysmography (rPPG) is the monitoring of blood volume pulse from a camera at a distance. 3-Dimensional Convolutional Neural Networks (3DCNNs) have shown promising performance on the rPPG task, although it is critical that we understand the impact of both video and model parameters. In this paper, we explore the effect of frame rate, temporal kernel width, and – more generally – temporal receptive field on the reliability of heart rate and waveform estimation carried out by 3DCNNs. We train and evaluate 32 3DCNNs with different temporal parameters on a new large-scale database for physiological monitoring in an interview scenario. We show that previous studies reporting null effects of frame rate changes on pulse estimators may no longer be valid when using CNNs, and decreasing the frame rate may actually improve performance. In particular, we found that models trained on videos with frame rates as low as 12.9 frames per second (fps) perform better than those trained on videos recorded at a full 90 fps, perhaps due to the temporal receptive fields becoming larger in time dimension when the fps decreases. Using this insight, we propose RemotePulseNet, a novel 3DCNN architecture that exploits temporally dilated convolutions with increasing dilation rate to drastically increase the receptive field. We compare its performance with that of recent state-of-the-art pulse estimation methods, and show that both RemotePulseNet and the low frame rate 3DCNNs produce high-quality pulse signals from faces captured under a challenging interview scenario. The source code and instructions for obtaining a copy of the test data are made available with this paper.



中文翻译:

统一帧速率和时间膨胀以改进远程脉冲检测

远程光电容积脉搏波 (rPPG) 是从远处的摄像机监测血容量脉冲。3 维卷积神经网络 (3DCNN) 在 rPPG 任务上表现出良好的性能,尽管我们了解视频和模型参数的影响至关重要。在本文中,我们探讨了帧速率、时间内核宽度以及更一般的时间感受野对 3DCNN 进行的心率和波形估计可靠性的影响。我们在一个新的大规模数据库上训练和评估 32 个具有不同时间参数的 3DCNN,用于在采访场景中进行生理监测。我们表明,在使用 CNN 时,先前报告帧速率变化对脉冲估计器的无效影响的研究可能不再有效,而降低帧速率实际上可能会提高性能。特别是,我们发现在帧速率低至 12.9 帧每秒 (fps) 的视频上训练的模型比在以完整 90 fps 录制的视频上训练的模型表现更好,这可能是由于时间维度在时间维度上变得更大fps 降低。利用这一见解,我们提出了 RemotePulseNet,这是一种新颖的 3DCNN 架构,它利用时间膨胀的卷积和增加的膨胀率来大幅增加感受野。我们将其性能与最近最先进的脉冲估计方法的性能进行比较,并表明 RemotePulseNet 和低帧率 3DCNN 都能从在具有挑战性的采访场景下捕获的面部产生高质量的脉冲信号。本文提供了获取测试数据副本的源代码和说明。我们发现,在帧速率低至每秒 12.9 帧 (fps) 的视频上训练的模型比在以完整 90 fps 录制的视频上训练的模型表现更好,这可能是因为当 fps 降低时,时间感受野在时间维度上变得更大. 利用这一见解,我们提出了 RemotePulseNet,这是一种新颖的 3DCNN 架构,它利用时间膨胀卷积和增加的膨胀率来大幅增加感受野。我们将其性能与最近最先进的脉冲估计方法的性能进行比较,并表明 RemotePulseNet 和低帧率 3DCNN 都能从在具有挑战性的采访场景下捕获的面部产生高质量的脉冲信号。本文提供了获取测试数据副本的源代码和说明。我们发现,在帧速率低至每秒 12.9 帧 (fps) 的视频上训练的模型比在以完整 90 fps 录制的视频上训练的模型表现更好,这可能是因为当 fps 降低时,时间感受野在时间维度上变得更大. 利用这一见解,我们提出了 RemotePulseNet,这是一种新颖的 3DCNN 架构,它利用时间膨胀的卷积和增加的膨胀率来大幅增加感受野。我们将其性能与最近最先进的脉冲估计方法的性能进行比较,并表明 RemotePulseNet 和低帧率 3DCNN 都能从在具有挑战性的采访场景下捕获的面部产生高质量的脉冲信号。本文提供了获取测试数据副本的源代码和说明。

更新日期:2021-07-18
down
wechat
bug