Tensor Methods in Computer Vision and Deep Learning,Proceedings of the IEEE

当前位置： X-MOL 学术 › Proc. IEEE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Tensor Methods in Computer Vision and Deep Learning
Proceedings of the IEEE ( IF 23.2 ) Pub Date : 2021-04-30 , DOI: 10.1109/jproc.2021.3074329
Yannis Panagakis , Jean Kossaifi , Grigorios G. Chrysos , James Oldfield , Mihalis A. Nicolaou , Anima Anandkumar , Stefanos Zafeiriou

Tensors, or multidimensional arrays, are data structures that can naturally represent visual data of multiple dimensions. Inherently able to efficiently capture structured, latent semantic spaces and high-order interactions, tensors have a long history of applications in a wide span of computer vision problems. With the advent of the deep learning paradigm shift in computer vision, tensors have become even more fundamental. Indeed, essential ingredients in modern deep learning architectures, such as convolutions and attention mechanisms, can readily be considered as tensor mappings. In effect, tensor methods are increasingly finding significant applications in deep learning, including the design of memory and compute efficient network architectures, improving robustness to random noise and adversarial attacks, and aiding the theoretical understanding of deep networks. This article provides an in-depth and practical review of tensors and tensor methods in the context of representation learning and deep learning, with a particular focus on visual data analysis and computer vision applications. Concretely, besides fundamental work in tensor-based visual data analysis methods, we focus on recent developments that have brought on a gradual increase in tensor methods, especially in deep learning architectures and their implications in computer vision applications. To further enable the newcomer to grasp such concepts quickly, we provide companion Python notebooks, covering key aspects of this article and implementing them, step-by-step with TensorLy.

中文翻译：

计算机视觉和深度学习中的张量方法

张量或多维数组是可以自然地表示多个维度的视觉数据的数据结构。张量本质上能够有效地捕获结构化的、潜在的语义空间和高阶交互，在广泛的计算机视觉问题中有着悠久的应用历史。随着计算机视觉深度学习范式转变的出现，张量变得更加重要。事实上，现代深度学习架构中的基本要素（例如卷积和注意力机制）可以很容易地被视为张量映射。实际上，张量方法在深度学习中越来越重要，包括设计高效的内存和计算网络架构、提高对随机噪声和对抗性攻击的鲁棒性，以及帮助对深度网络的理论理解。本文对表示学习和深度学习背景下的张量和张量方法进行了深入而实用的回顾，特别关注视觉数据分析和计算机视觉应用。具体来说，除了基于张量的视觉数据分析方法的基础工作之外，我们还关注导致张量方法逐渐增加的最新发展，特别是在深度学习架构及其在计算机视觉应用中的影响。为了进一步使新手能够快速掌握这些概念，我们提供了配套的 Python 笔记本，涵盖了本文的关键方面并使用 TensorLy 逐步实现它们。

更新日期：2021-04-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11