• arXiv.cs.CV Pub Date : 2021-01-21
Ruilong Li; Shan Yang; David A. Ross; Angjoo Kanazawa

In this paper, we present a transformer-based learning framework for 3D dance generation conditioned on music. We carefully design our network architecture and empirically study the keys for obtaining qualitatively pleasing results. The critical components include a deep cross-modal transformer, which well learns the correlation between the music and dance motion; and the full-attention with future-N

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Minh-Quan Dao; Vincent Frémont

Multi-object tracking (MOT) is an integral part of any autonomous driving pipelines because itproduces trajectories which has been taken by other moving objects in the scene and helps predicttheir future motion. Thanks to the recent advances in 3D object detection enabled by deep learning,track-by-detection has become the dominant paradigm in 3D MOT. In this paradigm, a MOT systemis essentially made

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Edwin Arkel Rios; Wen-Huang Cheng; Bo-Cheng Lai

In this work we tackle the challenging problem of anime character recognition. Anime, referring to animation produced within Japan and work derived or inspired from it. For this purpose we present DAF:re (DanbooruAnimeFaces:revamped), a large-scale, crowd-sourced, long-tailed dataset with almost 500 K images spread across more than 3000 classes. Additionally, we conduct experiments on DAF:re and similar

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Thomas Oberlin; Mathieu Verm

This paper proposes a new way of regularizing an inverse problem in imaging (e.g., deblurring or inpainting) by means of a deep generative neural network. Compared to end-to-end models, such approaches seem particularly interesting since the same network can be used for many different problems and experimental conditions, as soon as the generative model is suited to the data. Previous works proposed

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Yingxue Pang; Jianxin Lin; Tao Qin; Zhibo Chen

Image-to-image translation (I2I) aims to transfer images from a source domain to a target domain while preserving the content representations. I2I has drawn increasing attention and made tremendous progress in recent years because of its wide range of applications in many computer vision and image processing problems, such as image synthesis, segmentation, style transfer, restoration, and pose estimation

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Jinhai Yang; Hua Yang

Crowd segmentation is a fundamental task serving as the basis of crowded scene analysis, and it is highly desirable to obtain refined pixel-level segmentation maps. However, it remains a challenging problem, as existing approaches either require dense pixel-level annotations to train deep learning models or merely produce rough segmentation maps from optical or particle flows with physical models.

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Sovan Biswas; Yaser Souri; Juergen Gall

In this paper, we propose an approach that spatially localizes the activities in a video frame where each person can perform multiple activities at the same time. Our approach takes the temporal scene context as well as the relations of the actions of detected persons into account. While the temporal context is modeled by a temporal recurrent neural network (RNN), the relations of the actions are modeled

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Sovan Biswas; Juergen Gall

Since collecting and annotating data for spatio-temporal action detection is very expensive, there is a need to learn approaches with less supervision. Weakly supervised approaches do not require any bounding box annotations and can be trained only from labels that indicate whether an action occurs in a video clip. Current approaches, however, cannot handle the case when there are multiple persons

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Megha Nawhal; Greg Mori

We introduce Activity Graph Transformer, an end-to-end learnable model for temporal action localization, that receives a video as input and directly predicts a set of action instances that appear in the video. Detecting and localizing action instances in untrimmed videos requires reasoning over multiple action instances in a video. The dominant paradigms in the literature process videos temporally

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Yunpeng Gong; Zhiyong Zeng

In order to make full use of structural information of grayscale images and reduce adverse impact of illumination variation for person re-identification (ReID), an effective data augmentation method is proposed in this paper, which includes Random Grayscale Transformation, Random Grayscale Patch Replacement and their combination. It is discovered that structural information has a significant effect

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Tian Zhang; Dongliang Chang; Zhanyu Ma; Jun Guo

Fine-grained visual classification aims to recognize images belonging to multiple sub-categories within a same category. It is a challenging task due to the inherently subtle variations among highly-confused categories. Most existing methods only take individual image as input, which may limit the ability of models to recognize contrastive clues from different images. In this paper, we propose an effective

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Mercedes Garcia-Salguero; Javier Gonzalez-Jimenez

The Relative Pose problem (RPp) for cameras aims to estimate the relative orientation and translation (pose) given a set of pair-wise feature correspondences between two central and calibrated cameras. The RPp is stated as an optimization problem where the squared, normalized epipolar error is minimized over the set of normalized essential matrices. In this work, we contribute an efficient and complete

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Hirokatsu Kataoka; Kazushige Okayasu; Asato Matsumoto; Eisuke Yamagata; Ryosuke Yamada; Nakamasa Inoue; Akio Nakamura; Yutaka Satoh

Is it possible to use convolutional neural networks pre-trained without any natural images to assist natural image understanding? The paper proposes a novel concept, Formula-driven Supervised Learning. We automatically generate image patterns and their category labels by assigning fractals, which are based on a natural law existing in the background knowledge of the real world. Theoretically, the use

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Chaoyou Fu; Yibo Hu; Xiang Wu; Hailin Shi; Tao Mei; Ran He

Visible-Infrared person re-identification (VI-ReID) aims at matching cross-modality pedestrian images, breaking through the limitation of single-modality person ReID in dark environment. In order to mitigate the impact of large modality discrepancy, existing works manually design various two-stream architectures to separately learn modality-specific and modality-sharable representations. Such a manual

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Nan Jiang; Kuiran Wang; Xiaoke Peng; Xuehui Yu; Qiang Wang; Junliang Xing; Guorong Li; Guodong Guo; Jian Zhao; Zhenjun Han

Unmanned Aerial Vehicle (UAV) offers lots of applications in both commerce and recreation. With this, monitoring the operation status of UAVs is crucially important. In this work, we consider the task of tracking UAVs, providing rich information such as location and trajectory. To facilitate research in this topic, we propose a dataset, Anti-UAV, with more than 300 video pairs containing over 580k

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Cong Wang; Yan Huang; Yuexian Zou; Yong Xu

In recent years, single image dehazing deep models based on Atmospheric Scattering Model (ASM) have achieved remarkable results. But the dehazing outputs of those models suffer from color shift. Analyzing the ASM model shows that the atmospheric light factor (ALF) is set as a scalar which indicates ALF is constant for whole image. However, for images taken in real-world, the illumination is not uniformly

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Deesha Chavan; Dev Saad; Debarati B. Chakraborty

Predicting on-road abnormalities such as road accidents or traffic violations is a challenging task in traffic surveillance. If such predictions can be done in advance, many damages can be controlled. Here in our wok, we tried to formulate a solution for automated collision prediction in traffic surveillance videos with computer vision and deep networks. It involves object detection, tracking, trajectory

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Enze Xie; Wenjia Wang; Wenhai Wang; Peize Sun; Hang Xu; Ding Liang; Ping Luo

This work presents a new fine-grained transparent object segmentation dataset, termed Trans10K-v2, extending Trans10K-v1, the first large-scale transparent object segmentation dataset. Unlike Trans10K-v1 that only has two limited categories, our new dataset has several appealing benefits. (1) It has 11 fine-grained categories of transparent objects, commonly occurring in the human domestic environment

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Debarati B. Chakrabortya; Vinay Detania; Shah Parshv Jigneshkumar

This article defines new methods for unsupervised fire region segmentation and fire threat detection from video stream. Fire in control serves a number of purposes to human civilization, but it could simultaneously be a threat once its spread becomes uncontrolled. There exists many methods on fire region segmentation and fire non-fire classification. But the approaches to determine the threat associated

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Bowen Li; Changhon Fu; Fangqiang Ding; Junjie Ye; Fuling Lin

Visual object tracking, which is representing a major interest in image processing field, has facilitated numerous real world applications. Among them, equipping unmanned aerial vehicle (UAV) with real time robust visual trackers for all day aerial maneuver, is currently attracting incremental attention and has remarkably broadened the scope of applications of object tracking. However, prior tracking

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Ravi Raj; Varad Bhatnagar; Aman Kumar Singh; Sneha Mane; Nilima Walde

A comparative study of various techniques which can be used for summarization of Videos i.e. Video to Video conversion is presented along with respective architecture, results, strengths and shortcomings. In all approaches, a lengthy video is converted into a shorter video which aims to capture all important events that are present in the original video. The definition of 'important event' may vary

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Xiangyu He; Qinghao Hu; Peisong Wang; Jian Cheng

Convolutional neural networks are able to learn realistic image priors from numerous training samples in low-level image generation and restoration. We show that, for high-level image recognition tasks, we can further reconstruct "realistic" images of each category by leveraging intrinsic Batch Normalization (BN) statistics without any training data. Inspired by the popular VAE/GAN methods, we regard

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Yuxiang Zhang; Sachin Mehta; Anat Caspi

Semantic segmentation aims to robustly predict coherent class labels for entire regions of an image. It is a scene understanding task that powers real-world applications (e.g., autonomous navigation). One important application, the use of imagery for automated semantic understanding of pedestrian environments, provides remote mapping of accessibility features in street environments. This application

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Zhongxia Zhang; Mingwen Wang

Finger vein recognition has drawn increasing attention as one of the most popular and promising biometrics due to its high distinguishes ability, security and non-invasive procedure. The main idea of traditional schemes is to directly extract features from finger vein images or patterns and then compare features to find the best match. However, the features extracted from images contain much redundant

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Ruimin Feng; Jiayi Zhao; He Wang; Baofeng Yang; Jie Feng; Yuting Shi; Ming Zhang; Chunlei Liu; Yuyao Zhang; Jie Zhuang; Hongjiang Wei

Quantitative susceptibility mapping (QSM) estimates the underlying tissue magnetic susceptibility from the MRI gradient-echo phase signal and has demonstrated great potential in quantifying tissue susceptibility in various brain diseases. However, the intrinsic ill-posed inverse problem relating the tissue phase to the underlying susceptibility distribution affects the accuracy for quantifying tissue

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Mustafa Hajij; Ghada Zamzmi; Fawwaz Batayneh

Topological Data Analysis (TDA) has emerged recently as a robust tool to extract and compare the structure of datasets. TDA identifies features in data such as connected components and holes and assigns a quantitative measure to these features. Several studies reported that topological features extracted by TDA tools provide unique information about the data, discover new insights, and determine which

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-20
Giovanna Menardi

Image segmentation aims at identifying regions of interest within an image, by grouping pixels according to their properties. This task resembles the statistical one of clustering, yet many standard clustering methods fail to meet the basic requirements of image segmentation: segment shapes are often biased toward predetermined shapes and their number is rarely determined automatically. Nonparametric

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-20
Wei Gong; Laila Khalid

Machine learning is completely changing the trends in the fashion industry. From big to small every brand is using machine learning techniques in order to improve their revenue, increase customers and stay ahead of the trend. People are into fashion and they want to know what looks best and how they can improve their style and elevate their personality. Using Deep learning technology and infusing it

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-20

This paper presents a method for text line segmentation of challenging historical manuscript images. These manuscript images contain narrow interline spaces with touching components, interpenetrating vowel signs and inconsistent font types and sizes. In addition, they contain curved, multi-skewed and multi-directed side note lines within a complex page layout. Therefore, bounding polygon labeling would

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Chao Li; Wenjian Huang; Xi Chen; Yiran Wei; Stephen J. Price; Carola-Bibiane Schönlieb

We present an Expectation-Maximization (EM) Regularized Deep Learning (EMReDL) model for the weakly supervised tumor segmentation. The proposed framework was tailored to glioblastoma, a type of malignant tumor characterized by its diffuse infiltration into the surrounding brain tissue, which poses significant challenge to treatment target and tumor burden estimation based on conventional structural

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Lang Huang; Chao Zhang; Hongyang Zhang

We propose self-adaptive training -- a unified training algorithm that dynamically calibrates and enhances training process by model predictions without incurring extra computational cost -- to advance both supervised and self-supervised learning of deep neural networks. We analyze the training dynamics of deep networks on training data that are corrupted by, e.g., random noise and adversarial examples

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Jacson Rodrigues Correia-Silva; Rodrigo F. Berriel; Claudine Badue; Alberto F. De Souza; Thiago Oliveira-Santos

Convolutional neural networks have been successful lately enabling companies to develop neural-based products, which demand an expensive process, involving data acquisition and annotation; and model generation, usually requiring experts. With all these costs, companies are concerned about the security of their models against copies and deliver them as black-boxes accessed by APIs. Nonetheless, we argue

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Edward Stow; Riku Murai; Sajad Saeedi; Paul H. J. Kelly

Focal-plane Sensor-processors (FPSPs) are a camera technology that enable low power, high frame rate computation, making them suitable for edge computation. Unfortunately, these devices' limited instruction sets and registers make developing complex algorithms difficult. In this work, we present Cain - a compiler that targets SCAMP-5, a general-purpose FPSP - which generates code from multiple convolutional

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Andrew Brock; Soham De; Samuel L. Smith

Batch Normalization is a key component in almost all state-of-the-art image classifiers, but it also introduces practical challenges: it breaks the independence between training examples within a batch, can incur compute and memory overhead, and often results in unexpected bugs. Building on recent theoretical analyses of deep ResNets at initialization, we propose a simple set of analysis tools to characterize

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Thomas Pfeil

Deep neural networks have usually to be compressed and accelerated for their usage in low-power, e.g. mobile, devices. Recently, massively-parallel hardware accelerators were developed that offer high throughput and low latency at low power by utilizing in-memory computation. However, to exploit these benefits the computational graph of a neural network has to fit into the in-computation memory of

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Hongxiang Hao; Hanlin Mo; Hua Li

In this paper, we focus on removing interference of motion blur by the derivation of motion blur invariants.Unlike earlier work, we don't restore any blurred image. Based on geometric moment and mathematical model of motion blur, we prove that geometric moments of blurred image and original image are linearly related. Depending on this property, we can analyse whether an existing moment-based feature

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Ying Nie; Kai Han; Zhenhua Liu; An Xiao; Yiping Deng; Chunjing Xu; Yunhe Wang

Modern single image super-resolution (SISR) system based on convolutional neural networks (CNNs) achieves fancy performance while requires huge computational costs. The problem on feature redundancy is well studied in visual recognition task, but rarely discussed in SISR. Based on the observation that many features in SISR models are also similar to each other, we propose to use shift operation to

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21

One of the problems of conventional visual quality evaluation criteria such as PSNR and MSE is the lack of appropriate standards based on the human visual system (HVS). They are calculated based on the difference of the corresponding pixels in the original and manipulated image. Hence, they practically do not provide a correct understanding of the image quality. Watermarking is an image processing

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Zhaowei Cai; Avinash Ravichandran; Subhransu Maji; Charless Fowlkes; Zhuowen Tu; Stefano Soatto

We present a plug-in replacement for batch normalization (BN) called exponential moving average normalization (EMAN), which improves the performance of existing student-teacher based self- and semi-supervised learning techniques. Unlike the standard BN, where the statistics are computed within each batch, EMAN, used in the teacher, updates its statistics by exponential moving average from the BN statistics

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-21
Suemin Lee; Ivan V. Bajić

Deep Neural Networks (DNNs) have become ubiquitous in medical image processing and analysis. Among them, U-Nets are very popular in various image segmentation tasks. Yet, little is known about how information flows through these networks and whether they are indeed properly designed for the tasks they are being proposed for. In this paper, we employ information-theoretic tools in order to gain insight

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-20
Lin Zhang; Tiziano Portenier; Orcun Goksel

Purpose. Given the high level of expertise required for navigation and interpretation of ultrasound images, computational simulations can facilitate the training of such skills in virtual reality. With ray-tracing based simulations, realistic ultrasound images can be generated. However, due to computational constraints for interactivity, image quality typically needs to be compromised. Methods. We

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-20
Balázs Maga

As the COVID-19 pandemic aggravated the excessive workload of doctors globally, the demand for computer aided methods in medical imaging analysis increased even further. Such tools can result in more robust diagnostic pipelines which are less prone to human errors. In our paper, we present a deep neural network to which we refer to as Attention BCDU-Net, and apply it to the task of lung and heart segmentation

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-20
Vegard Antun; Matthew J. Colbrook; Anders C. Hansen

Deep learning (DL) has had unprecedented success and is now entering scientific computing with full force. However, DL suffers from a universal phenomenon: instability, despite universal approximating properties that often guarantee the existence of stable neural networks (NNs). We show the following paradox. There are basic well-conditioned problems in scientific computing where one can prove the

更新日期：2021-01-22
• arXiv.cs.CV Pub Date : 2021-01-19
Ammarah Farooq; Muhammad Awais; Josef Kittler; Syed Safwan Khalid

Cross-modal person re-identification (Re-ID) is critical for modern video surveillance systems. The key challenge is to align inter-modality representations according to semantic information present for a person and ignore background information. In this work, we present AXM-Net, a novel CNN based architecture designed for learning semantically aligned visual and textual representations. The underlying

更新日期：2021-01-21
• arXiv.cs.CV Pub Date : 2021-01-20
Achala Shakya; Mantosh Biswas; Mahesh Pal

SAR (VV and VH polarization) and optical data are widely used in image fusion to use the complimentary information of each other and to obtain the better-quality image (in terms of spatial and spectral features) for the improved classification results. This paper uses anisotropic diffusion with PCA for the fusion of SAR and optical data and patch-based SVM Classification with LBP (LBP-PSVM). Fusion

更新日期：2021-01-21
• arXiv.cs.CV Pub Date : 2021-01-19
Huixiang Luo; Hao Cheng; Yuting Gao; Ke Li; Mengdan Zhang; Fanxu Meng; Xiaowei Guo; Feiyue Huang; Xing Sun

Conventional semi-supervised learning (SSL) methods, e.g., MixMatch, achieve great performance when both labeled and unlabeled dataset are drawn from the same distribution. However, these methods often suffer severe performance degradation in a more realistic setting, where unlabeled dataset contains out-of-distribution (OOD) samples. Recent approaches mitigate the negative influence of OOD samples

更新日期：2021-01-21
• arXiv.cs.CV Pub Date : 2021-01-20
Wentao Xie; Guanghui Ren; Si Liu

Video relation detection problem refers to the detection of the relationship between different objects in videos, such as spatial relationship and action relationship. In this paper, we present video relation detection with trajectory-aware multi-modal features to solve this task. Considering the complexity of doing visual relation detection in videos, we decompose this task into three sub-tasks: object

更新日期：2021-01-21
• arXiv.cs.CV Pub Date : 2021-01-20
Yi-Fan Zhang; Weiqiang Ren; Zhang Zhang; Zhen Jia; Liang Wang; Tieniu Tan

In object detection, bounding box regression (BBR) is a crucial step that determines the object localization performance. However, we find that most previous loss functions for BBR have two main drawbacks: (i) Both $\ell_n$-norm and IOU-based loss functions are inefficient to depict the objective of BBR, which leads to slow convergence and inaccurate regression results. (ii) Most of the loss functions

更新日期：2021-01-21
• arXiv.cs.CV Pub Date : 2021-01-20
Xiaopei Zhu; Xiao Li; Jianmin Li; Zheyao Wang; Xiaolin Hu

Thermal infrared detection systems play an important role in many areas such as night security, autonomous driving, and body temperature detection. They have the unique advantages of passive imaging, temperature sensitivity and penetration. But the security of these systems themselves has not been fully explored, which poses risks in applying these systems. We propose a physical attack method with

更新日期：2021-01-21
• arXiv.cs.CV Pub Date : 2021-01-20
Marrit Leenstra; Diego Marcos; Francesca Bovolo; Devis Tuia

While annotated images for change detection using satellite imagery are scarce and costly to obtain, there is a wealth of unlabeled images being generated every day. In order to leverage these data to learn an image representation more adequate for change detection, we explore methods that exploit the temporal consistency of Sentinel-2 times series to obtain a usable self-supervised learning signal

更新日期：2021-01-21
• arXiv.cs.CV Pub Date : 2021-01-20
Xiatian Zhu; Antoine Toisoul; Juan-Manuel Prez-Ra; Li Zhang; Brais Martinez; Tao Xiang

Few-shot action recognition aims to recognize action classes with few training samples. Most existing methods adopt a meta-learning approach with episodic training. In each episode, the few samples in a meta-training task are split into support and query sets. The former is used to build a classifier, which is then evaluated on the latter using a query-centered loss for model updating. There are however

更新日期：2021-01-21
• arXiv.cs.CV Pub Date : 2021-01-20
Fei Du; Bo Xu; Jiasheng Tang; Yuqi Zhang; Fan Wang; Hao Li

We extend the classical tracking-by-detection paradigm to this tracking-any-object task. Solid detection results are first extracted from TAO dataset. Some state-of-the-art techniques like \textbf{BA}lanced-\textbf{G}roup \textbf{S}oftmax (\textbf{BAGS}\cite{li2020overcoming}) and DetectoRS\cite{qiao2020detectors} are integrated during detection. Then we learned appearance features to represent any

更新日期：2021-01-21
• arXiv.cs.CV Pub Date : 2021-01-20
Zhonghao Zhang; Yipeng Liu; Xingyu Cao; Fei Wen; Ce Zhu

Deep learning has been used to image compressive sensing (CS) for enhanced reconstruction performance. However, most existing deep learning methods train different models for different subsampling ratios, which brings additional hardware burden. In this paper, we develop a general framework named scalable deep compressive sensing (SDCS) for the scalable sampling and reconstruction (SSR) of all existing

更新日期：2021-01-21
• arXiv.cs.CV Pub Date : 2021-01-20
Zhangzi Zhu; Tianlei Wang; Hong Qu

Despite the fact that image captioning models have been able to generate impressive descriptions for a given image, challenges remain: (1) the controllability and diversity of existing models are still far from satisfactory; (2) models sometimes may produce extremely poor-quality captions. In this paper, two novel methods are introduced to solve the problems respectively. Specifically, for the former

更新日期：2021-01-21
• arXiv.cs.CV Pub Date : 2021-01-20
Yaoxin Zhuo; Baoxin Li

Federated Learning (FL) is a paradigm that aims to support loosely connected clients in learning a global model collaboratively with the help of a centralized server. The most popular FL algorithm is Federated Averaging (FedAvg), which is based on taking weighted average of the client models, with the weights determined largely based on dataset sizes at the clients. In this paper, we propose a new

更新日期：2021-01-21
• arXiv.cs.CV Pub Date : 2021-01-20
Olga Moskvyak; Frederic Maire; Feras Dayoub; Mahsa Baktashmotlagh

Knowledge about the locations of keypoints of an object in an image can assist in fine-grained classification and identification tasks, particularly for the case of objects that exhibit large variations in poses that greatly influence their visual appearance, such as wild animals. However, supervised training of a keypoint detection network requires annotating a large image dataset for each animal

更新日期：2021-01-21
• arXiv.cs.CV Pub Date : 2021-01-20
Lin Mingbao; Ji Rongrong; Li Shaojie; Wang Yan; Wu Yongjian; Huang Feiyue; Ye Qixiang

Popular network pruning algorithms reduce redundant information by optimizing hand-crafted parametric models, and may cause suboptimal performance and long time in selecting filters. We innovatively introduce non-parametric modeling to simplify the algorithm design, resulting in an automatic and efficient pruning approach called EPruner. Inspired by the face recognition community, we use a message

更新日期：2021-01-21
• arXiv.cs.CV Pub Date : 2021-01-20
Zhi Chen; Ruihong Qiu; Sen Wang; Zi Huang; Jingjing Li; Zheng Zhang

Generalized Zero-Shot Learning (GZSL) aims to recognize images from both seen and unseen categories. Most GZSL methods typically learn to synthesize CNN visual features for the unseen classes by leveraging entire semantic information, e.g., tags and attributes, and the visual features of the seen classes. Within the visual features, we define two types of features that semantic-consistent and semantic-unrelated

更新日期：2021-01-21
• arXiv.cs.CV Pub Date : 2021-01-20
Ishan Dave; Rohit Gupta; Mamshad Nayeem Rizve; Mubarak Shah

Contrastive learning has nearly closed the gap between supervised and self-supervised learning of image representations. Existing extensions of contrastive learning to the domain of video data however, rely on naive transposition of ideas from image-based methods and do not fully utilize the temporal dimension present in video. We develop a new temporal contrastive learning framework consisting of

更新日期：2021-01-21
• arXiv.cs.CV Pub Date : 2021-01-20
Long Chen; Junyu Dong; Huiyu Zhou

Underwater object detection technique is of great significance for various applications in underwater the scenes. However, class imbalance issue is still an unsolved bottleneck for current underwater object detection algorithms. It leads to large precision discrepancies among different classes that the dominant classes with more training data achieve higher detection precisions while the minority classes

更新日期：2021-01-21
Contents have been reproduced by permission of the publishers.

down
wechat
bug