当前位置: X-MOL 学术Image Vis. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A study on attention-based LSTM for abnormal behavior recognition with variable pooling
Image and Vision Computing ( IF 4.2 ) Pub Date : 2021-02-04 , DOI: 10.1016/j.imavis.2021.104120
Kai Zhou , Bei Hui , Junfeng Wang , Chunyu Wang , Tingting Wu

Behavior recognition is a well-known computer vision mobile technology. It has been used in many applications such as video surveillance, motion detection on devices, human-computer interaction and sports video, etc. However, most of the existing works ignored the depth and spatio-temporal information so that they resulted in over-fitting and inferior performance. Consequently, a novel framework for behavior recognition is proposed in this paper. In this framework, we propose a target depth estimation algorithm to calculate the 3D spatial position information of the target, and take this information as the input of the behavior recognition model. Simultaneously, in order to obtain more Spatio-temporal information and better handle long-term video, combining with the idea of attention mechanism, we propose a skeleton behavior recognition model which is based on spatio-temporal convolution and attention-based LSTM (ST-CNN & ATT-LSTM). The deep spatial information is merged into each segment, and the model focuses on the key information extraction, which is essential for improving behavior recognition performance. Meanwhile, we use a feature compression method based on variable pooling to solve the problem of inconsistent input sizes caused by multi-person behavior recognition, so that the network can flexibly recognize multi-person skeleton sequences. Finally, the proposed framework is evaluated with real-world surveillance video data, and the results indicate that our framework is superior to existing methods.



中文翻译:

基于注意的LSTM变量池异常行为识别研究

行为识别是一种众所周知的计算机视觉移动技术。它已用于许多应用中,例如视频监视,设备上的运动检测,人机交互和体育视频等。但是,大多数现有作品都忽略了深度和时空信息,从而导致过度拟合且性能较差。因此,本文提出了一种新的行为识别框架。在此框架中,我们提出了一种目标深度估计算法,以计算目标的3D空间位置信息,并将该信息作为行为识别模型的输入。同时,为了获得更多的时空信息并更好地处理长期视频,结合注意力机制的思想,我们提出了一个基于时空卷积和基于注意力的LSTM(ST-CNN&ATT-LSTM)的骨架行为识别模型。将深层空间信息合并到每个段中,并且模型着重于关键信息提取,这对于提高行为识别性能至关重要。同时,我们使用基于变量池的特征压缩方法来解决由于多人行为识别而导致输入大小不一致的问题,从而使网络可以灵活地识别多人骨架序列。最后,利用现实世界中的监控视频数据对提出的框架进行了评估,结果表明我们的框架优于现有方法。将深层空间信息合并到每个段中,并且模型着重于关键信息提取,这对于提高行为识别性能至关重要。同时,我们使用基于变量池的特征压缩方法来解决由于多人行为识别而导致输入大小不一致的问题,从而使网络可以灵活地识别多人骨架序列。最后,利用现实世界中的监控视频数据对提出的框架进行了评估,结果表明我们的框架优于现有方法。将深层空间信息合并到每个段中,并且模型着重于关键信息提取,这对于提高行为识别性能至关重要。同时,我们使用基于变量池的特征压缩方法来解决由于多人行为识别而导致输入大小不一致的问题,从而使网络可以灵活地识别多人骨架序列。最后,利用现实世界中的监控视频数据对提出的框架进行了评估,结果表明我们的框架优于现有方法。我们使用基于变量池的特征压缩方法来解决由于多人行为识别而导致输入大小不一致的问题,使网络可以灵活地识别多人骨架序列。最后,利用现实世界中的监控视频数据对提出的框架进行了评估,结果表明我们的框架优于现有方法。我们使用基于变量池的特征压缩方法来解决由于多人行为识别而导致输入大小不一致的问题,使网络可以灵活地识别多人骨架序列。最后,利用现实世界中的监控视频数据对提出的框架进行了评估,结果表明我们的框架优于现有方法。

更新日期:2021-02-16
down
wechat
bug