SPFTN: A Joint Learning Framework for Localizing and Segmenting Objects in Weakly Labeled Videos,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SPFTN: A Joint Learning Framework for Localizing and Segmenting Objects in Weakly Labeled Videos
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 11-13-2018 , DOI: 10.1109/tpami.2018.2881114
Dingwen Zhang , Junwei Han , Le Yang , Dong Xu

Object localization and segmentation in weakly labeled videos are two interesting yet challenging tasks. Models built for simultaneous object localization and segmentation have been explored in the conventional fully supervised learning scenario to boost the performance of each task. However, none of the existing works has attempted to jointly learn object localization and segmentation models under weak supervision. To this end, we propose a joint learning framework called Self-Paced Fine-Tuning Network (SPFTN) for localizing and segmenting objects in weakly labelled videos. Learning the deep model jointly for object localization and segmentation under weak supervision is very challenging as the learning process of each single task would face serious ambiguity issue due to the lack of bounding-box or pixel-level supervision. To address this problem, our proposed deep SPFTN model is carefully designed with a novel multi-task self-paced learning objective, which leverages the task-specific prior knowledge and the knowledge that has been already captured to infer the confident training samples for each task. By aggregating the confident knowledge from each single task to mine reliable patterns and learning deep feature representation for both tasks, the proposed learning framework can address the ambiguity issue under weak supervision with simple optimization. Comprehensive experiments on the large-scale YouTube-Objects and DAVIS datasets demonstrate that the proposed approach achieves superior performance when compared with other state-of-the-art methods and the baseline networks/models.

中文翻译：

SPFTN：用于弱标记视频中对象的本地化和分割的联合学习框架

弱标记视频中的对象定位和分割是两个有趣但具有挑战性的任务。在传统的完全监督学习场景中探索了为同时对象定位和分割构建的模型，以提高每项任务的性能。然而，现有的工作都没有尝试在弱监督下联合学习对象定位和分割模型。为此，我们提出了一种称为自定步调微调网络（SPFTN）的联合学习框架，用于定位和分割弱标记视频中的对象。在弱监督下联合学习用于对象定位和分割的深度模型非常具有挑战性，因为由于缺乏边界框或像素级监督，每个单一任务的学习过程将面临严重的模糊性问题。为了解决这个问题，我们提出的深度 SPFTN 模型经过精心设计，具有新颖的多任务自定进度学习目标，该目标利用特定于任务的先验知识和已捕获的知识来推断每个任务的置信训练样本。通过聚合每个单个任务的置信知识来挖掘可靠的模式并学习这两个任务的深度特征表示，所提出的学习框架可以通过简单的优化来解决弱监督下的模糊性问题。对大规模 YouTube-Objects 和 DAVIS 数据集的综合实验表明，与其他最先进的方法和基线网络/模型相比，所提出的方法具有卓越的性能。

更新日期：2024-08-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11