Scalable Hash From Triplet Loss Feature Aggregation For Video De-duplication,Journal of Visual Communication and Image Representation

当前位置： X-MOL 学术 › J. Visual Commun. Image Represent. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Scalable Hash From Triplet Loss Feature Aggregation For Video De-duplication
Journal of Visual Communication and Image Representation ( IF 2.6 ) Pub Date : 2020-09-08 , DOI: 10.1016/j.jvcir.2020.102908
Wei Jia , Li Li , Zhu Li , Shuai Zhao , Shan Liu

The producing, sharing and consuming life cycle of video content creates massive amount of duplicates in video segments due to variable bit rate representation and fragmentation in the playbacks. The inefficiency of this duplicates to storage and communication motivate researchers in both academia and industry to come up with computationally efficient video deduplication solutions for storage and CDN providers. Moreover, the increasing demands of high resolution and quality aggravate the status of heavy burden of cluster storage side and restricted bandwidth resources. Hence, video de-duplication in storage and transmission is becoming an important feature for video cloud storage and Content Delivery Network (CDN) service providers. Despite of the necessity of optimizing the multimedia data de-duplication approach, it is a challenging task because we should match as many as possible duplicated videos under not removing videos by mistake. The current video de-duplication schemes mostly relies on the URL based solution, which is not able to deal with non-cacheable content like video, which the same piece of content may have totally different URL identification and fragmentation and different quality representations further complicate the problem. In this paper, we propose a novel content based video segmentation identification scheme that is invariant to the underlying codec and operational bit rates, it computes robust features from a triplet loss deep learning network that captures the invariance of the same content under different coding tools and strategy, while a scalable hashing solution is developed based on Fisher Vector aggregation of the convolutional features from the Triplet loss network. Our simulation results demonstrate the great improvement in terms of large scale video repository de-duplication compared with state-of-the-art methods.

中文翻译：

用于视频重复数据删除的三重损失特性聚合中的可扩展哈希

视频内容的产生，共享和消耗生命周期由于回放中的可变比特率表示和分段而在视频段中创建大量重复项。这种复制对存储和通信的效率低下，促使学术界和工业界的研究人员提出了针对存储和CDN提供商的计算有效的视频重复数据删除解决方案。此外，对高分辨率和质量的日益增长的需求加剧了集群存储侧沉重的负担和有限的带宽资源的状况。因此，存储和传输中的视频重复数据删除已成为视频云存储和内容交付网络（CDN）服务提供商的重要功能。尽管有必要优化多媒体重复数据删除方法，这是一项艰巨的任务，因为在不误删除视频的情况下，我们应匹配尽可能多的重复视频。当前的视频重复数据删除方案主要依赖于基于URL的解决方案，该解决方案无法处理不可缓存的内容（例如视频），同一内容可能具有完全不同的URL标识和分段，并且不同的质量表示方式使该解决方案更加复杂问题。在本文中，我们提出了一种新颖的基于内容的视频分段识别方案，该方案对于基础编解码器和操作比特率是不变的，它从三重损失深度学习网络计算出强大的功能，该网络在不同的编码工具下捕获相同内容的不变性，并且战略，同时，基于Triplet损失网络中卷积特征的Fisher Vector聚合，开发了可扩展的哈希解决方案。我们的仿真结果表明，与最新方法相比，大规模视频存储库重复数据删除技术有了很大的改进。

更新日期：2020-09-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>