Distance based kernels for video tensors on product of Riemannian matrix manifolds,Journal of Visual Communication and Image Representation

当前位置： X-MOL 学术 › J. Visual Commun. Image Represent. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Distance based kernels for video tensors on product of Riemannian matrix manifolds
Journal of Visual Communication and Image Representation ( IF 2.6 ) Pub Date : 2021-02-05 , DOI: 10.1016/j.jvcir.2021.103045
Krishan Sharma , Renu Rameshan

In this paper, we explore the inherent geometry of video tensors by modeling them as points in product of Riemannian matrix manifolds. A video tensor is decomposed into three modes (factors) using matrix unfolding operation and each mode is represented as a point in a product space of Grassmannian and symmetric positive definite (SPD) matrix manifold. Hence a video is represented as a point in the Cartesian product of three such product spaces. Being a manifold valued (non-Euclidean) representation, application of several state-of-the-art Euclidean machine learning algorithms lead to inferior results. To overcome this, we propose positive definite kernels which map the points from product manifold space to Hilbert space. The proposed kernel functions implicitly make use of geodesic distance on product manifold to obtain a similarity measure and generate a kernel-gram matrix. In addition, we generate a discriminative feature representation for each manifold valued point using kernel-gram matrix diagonalization. Classification is performed in a sparse framework. The proposed methodology is tested over three publicly available datasets for hand gesture, traffic signal and sign language recognition. Experimentation performed over these datasets show that the proposed methodology is powerful in terms of classification accuracy in comparison with the state-of-the-art methods.

中文翻译：

黎曼矩阵流形积乘以基于距离的视频张量内核

在本文中，我们通过将视频张量建模为黎曼矩阵流形乘积中的点来探索视频张量的固有几何形状。使用矩阵展开操作将视频张量分解为三种模式（因子），每种模式都表示为Grassmannian和对称正定（SPD）矩阵流形的乘积空间中的一个点。因此，视频被表示为三个此类乘积空间的笛卡尔乘积中的一个点。作为流形值（非欧几里得）表示形式，几种最先进的欧几里得机器学习算法的应用导致了较差的结果。为了克服这个问题，我们提出了正定核，该正定核将点从积流形空间映射到希尔伯特空间。所提出的核函数隐式地利用乘积流形上的测地距离来获得相似性度量并生成核图矩阵。此外，我们使用核仁图矩阵对角化为每个流形值点生成判别式特征表示。分类是在稀疏框架中执行的。所提出的方法在手势，交通信号和手语识别的三个公共可用数据集上进行了测试。在这些数据集上进行的实验表明，与最新方法相比，该方法在分类准确性方面很强大。分类是在稀疏框架中执行的。所提出的方法在手势，交通信号和手语识别的三个公共可用数据集上进行了测试。在这些数据集上进行的实验表明，与最新方法相比，该方法在分类准确性方面很强大。分类是在稀疏框架中执行的。所提出的方法在手势，交通信号和手语识别的三个公共可用数据集上进行了测试。在这些数据集上进行的实验表明，与最新方法相比，该方法在分类准确性方面很强大。

更新日期：2021-02-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11