Guest Editorial Introduction to the Special Section on Representation Learning for Visual Content Understanding,IEEE Transactions on Circuits and Systems for Video Technology

当前位置： X-MOL 学术 › IEEE Trans. Circ. Syst. Video Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Guest Editorial Introduction to the Special Section on Representation Learning for Visual Content Understanding
IEEE Transactions on Circuits and Systems for Video Technology ( IF 8.3 ) Pub Date : 2020-09-01 , DOI: 10.1109/tcsvt.2020.3009095
Jiwen Lu , Yuxin Peng , Guo-Jun Qi , Jun Yu

Representation learning methods allow a system to automatically learn robust and discriminative features from raw data for given goals, which play an important role in various visual content understanding applications, such as visual object segmentation, detection, tracking, recognition, and search. The performance of visual content understanding tasks is heavily dependent on the choice of data representation (or features) on which they are applied. Conventional feature representation methods usually employ transformations of data that make it easier to extract useful information, such as scale-invariant feature transform (SIFT), local binary patterns (LBP), and histogram of oriented gradients (HOG). In recent years, deep learning techniques have been widely applied to learn data-driven representations with supervised annotations and achieved great success in different visual content understanding tasks. Representative methods include the ResNet method for image classification, the DeepFace method for face recognition, and the feature pyramid networks (FPNs) method for object detection. Despite recent progresses on deep representation learning with a great amount of annotated data, how to effectively learn visual representation with limited data annotations still requires many efforts. This special section focuses on data-effective representation learning methods for visual content understanding.

中文翻译：

视觉内容理解的表征学习特辑的客座编辑介绍

表征学习方法允许系统针对给定目标从原始数据中自动学习鲁棒性和判别性特征，这在各种视觉内容理解应用中发挥着重要作用，例如视觉对象分割、检测、跟踪、识别和搜索。视觉内容理解任务的性能在很大程度上取决于应用它们的数据表示（或特征）的选择。传统的特征表示方法通常采用数据变换来更容易地提取有用信息，例如尺度不变特征变换 (SIFT)、局部二值模式 (LBP) 和定向梯度直方图 (HOG)。最近几年，深度学习技术已被广泛应用于学习带有监督注释的数据驱动表示，并在不同的视觉内容理解任务中取得了巨大成功。代表性的方法包括用于图像分类的 ResNet 方法、用于人脸识别的 DeepFace 方法以及用于对象检测的特征金字塔网络 (FPNs) 方法。尽管最近在具有大量注释数据的深度表示学习方面取得了进展，但如何有效地学习具有有限数据注释的视觉表示仍然需要付出很多努力。本节重点介绍用于视觉内容理解的数据有效表示学习方法。用于人脸识别的 DeepFace 方法，以及用于对象检测的特征金字塔网络 (FPNs) 方法。尽管最近在具有大量注释数据的深度表示学习方面取得了进展，但如何有效地学习具有有限数据注释的视觉表示仍然需要付出很多努力。本节重点介绍用于视觉内容理解的数据有效表示学习方法。用于人脸识别的 DeepFace 方法，以及用于对象检测的特征金字塔网络 (FPNs) 方法。尽管最近在具有大量注释数据的深度表示学习方面取得了进展，但如何有效地学习具有有限数据注释的视觉表示仍然需要付出很多努力。本节重点介绍用于视觉内容理解的数据有效表示学习方法。

更新日期：2020-09-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11