Self-Supervision based Task-Specific Image Collection Summarization,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Self-Supervision based Task-Specific Image Collection Summarization
arXiv - CS - Multimedia Pub Date : 2020-12-19 , DOI: arxiv-2012.10657
Anurag Singh, Deepak Kumar Sharma, Sudhir Kumar Sharma, Joel J. P. C. Rodrigues

Successful applications of deep learning (DL) requires large amount of annotated data. This often restricts the benefits of employing DL to businesses and individuals with large budgets for data-collection and computation. Summarization offers a possible solution by creating much smaller representative datasets that can allow real-time deep learning and analysis of big data and thus democratize use of DL. In the proposed work, our aim is to explore a novel approach to task-specific image corpus summarization using semantic information and self-supervision. Our method uses a classification-based Wasserstein generative adversarial network (CLSWGAN) as a feature generating network. The model also leverages rotational invariance as self-supervision and classification on another task. All these objectives are added on a features from resnet34 to make it discriminative and robust. The model then generates a summary at inference time by using K-means clustering in the semantic embedding space. Thus, another main advantage of this model is that it does not need to be retrained each time to obtain summaries of different lengths which is an issue with current end-to-end models. We also test our model efficacy by means of rigorous experiments both qualitatively and quantitatively.

中文翻译：

基于自我监督的特定任务图像收集摘要

深度学习（DL）的成功应用需要大量带注释的数据。这通常将使用DL的好处限制在预算和数据收集和计算预算较大的企业和个人。汇总通过创建更小的代表性数据集提供了一种可能的解决方案，该数据集可以允许实时深度学习和大数据分析，从而使DL的使用民主化。在拟议的工作中，我们的目的是探索一种使用语义信息和自我监督的任务特定图像语料库摘要的新方法。我们的方法使用基于分类的Wasserstein生成对抗网络（CLSWGAN）作为特征生成网络。该模型还利用旋转不变性作为对其他任务的自我监督和分类。所有这些目标都添加到resnet34的功能中，以使其具有区分性和鲁棒性。然后，该模型通过在语义嵌入空间中使用K-means聚类在推理时生成摘要。因此，该模型的另一个主要优点是，无需每次都对其进行重新训练即可获得不同长度的摘要，这对于当前的端到端模型是一个问题。我们还通过严格的实验定性和定量地测试了模型的有效性。

更新日期：2020-12-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文