Cloud based scalable object recognition from video streams using orientation fusion and convolutional neural networks,Pattern Recognition

当前位置： X-MOL 学术 › Pattern Recogn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Cloud based scalable object recognition from video streams using orientation fusion and convolutional neural networks
Pattern Recognition ( IF 7.5 ) Pub Date : 2021-07-27 , DOI: 10.1016/j.patcog.2021.108207
Muhammad Usman Yaseen ₁ , Ashiq Anjum ₂ , Giancarlo Fortino ₃ , Antonio Liotta ₄ , Amir Hussain ₅

Affiliation

Object recognition from live video streams comes with numerous challenges such as the variation in illumination conditions and poses. Convolutional neural networks (CNNs) have been widely used to perform intelligent visual object recognition. Yet, CNNs still suffer from severe accuracy degradation, particularly on illumination-variant datasets. To address this problem, we propose a new CNN method based on orientation fusion for visual object recognition. The proposed cloud-based video analytics system pioneers the use of bi-dimensional empirical mode decomposition to split a video frame into intrinsic mode functions (IMFs). We further propose these IMFs to endure Reisz transform to produce monogenic object components, which are in turn used for the training of CNNs. Past works have demonstrated how the object orientation component may be used to pursue accuracy levels as high as 93%. Herein we demonstrate how a feature-fusion strategy of the orientation components leads to further improving visual recognition accuracy to 97%. We also assess the scalability of our method, looking at both the number and the size of the video streams under scrutiny. We carry out extensive experimentation on the publicly available Yale dataset, including also a self generated video datasets, finding significant improvements (both in accuracy and scale), in comparison to AlexNet, LeNet and SE-ResNeXt, which are three most commonly used deep learning models for visual object recognition and classification.

中文翻译：

使用方向融合和卷积神经网络从视频流中识别基于云的可扩展对象

从实时视频流中识别物体会带来许多挑战，例如光照条件和姿势的变化。卷积神经网络 (CNN) 已被广泛用于执行智能视觉对象识别。然而，CNN 仍然存在严重的精度下降问题，尤其是在光照变化数据集上。为了解决这个问题，我们提出了一种基于方向融合的新 CNN 方法，用于视觉对象识别。提议的基于云的视频分析系统率先使用二维经验模式分解将视频帧拆分为固有模式函数 (IMF)。我们进一步提出这些 IMF 来承受 Reisz 变换以产生单基因对象组件，这些组件又用于 CNN 的训练。过去的工作已经展示了如何使用面向对象组件来追求高达 93% 的准确度。在这里，我们展示了方向组件的特征融合策略如何将视觉识别准确率进一步提高到 97%。我们还评估了我们方法的可扩展性，查看了所审查视频流的数量和大小。我们对公开可用的 Yale 数据集进行了广泛的实验，还包括一个自生成的视频数据集，与 AlexNet、LeNet 和 SE-ResNeXt 相比，发现显着的改进（在准确性和规模方面），这是三种最常用的深度学习用于视觉对象识别和分类的模型。在这里，我们展示了方向组件的特征融合策略如何将视觉识别准确率进一步提高到 97%。我们还评估了我们方法的可扩展性，查看了所审查视频流的数量和大小。我们对公开可用的 Yale 数据集进行了广泛的实验，还包括一个自生成的视频数据集，与 AlexNet、LeNet 和 SE-ResNeXt 相比，发现显着的改进（在准确性和规模方面），这是三种最常用的深度学习用于视觉对象识别和分类的模型。在这里，我们展示了方向组件的特征融合策略如何将视觉识别准确率进一步提高到 97%。我们还评估了我们方法的可扩展性，查看了所审查视频流的数量和大小。我们对公开可用的 Yale 数据集进行了广泛的实验，还包括一个自生成的视频数据集，与 AlexNet、LeNet 和 SE-ResNeXt 相比，发现显着的改进（在准确性和规模方面），这是三种最常用的深度学习用于视觉对象识别和分类的模型。

更新日期：2021-08-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11