当前期刊: IEEE Transactions on Image Processing Go to current issue    加入关注   
显示样式:        排序: 导出
我的关注
我的收藏
您暂时未登录!
登录
  • Spaghetti Labeling: Directed Acyclic Graphs for Block-Based Connected Components Labeling.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-10-22
    Federico Bolelli,Stefano Allegretti,Lorenzo Baraldi,Costantino Grana

    Connected Components Labeling is an essential step of many Image Processing and Computer Vision tasks. Since the first proposal of a labeling algorithm, which dates back to the sixties, many approaches have optimized the computational load needed to label an image. In particular, the use of decision forests and state prediction have recently appeared as valuable strategies to improve performance. However, due to the overhead of the manual construction of prediction states and the size of the resulting machine code, the application of these strategies has been restricted to small masks, thus ignoring the benefit of using a block-based approach. In this paper, we combine a block-based mask with state prediction and code compression: the resulting algorithm is modeled as a Directed Rooted Acyclic Graph with multiple entry points, which is automatically generated without manual intervention. When tested on synthetic and real datasets, in comparison with optimized implementations of state-of-the-art algorithms, the proposed approach shows superior performance, surpassing the results obtained by all compared approaches in all settings.

    更新日期:2020-01-04
  • Learning Sparse and Identity-preserved Hidden Attributes for Person Re-identification.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-10-22
    Zheng Wang,Junjun Jiang,Yang Wu,Mang Ye,Xiang Bai,Shinrichi Satoh

    Person re-identification (Re-ID) aims at matching person images captured in non-overlapping camera views. To represent person appearance, low-level visual features are sensitive to environmental changes, while high-level semantic attributes, such as "short-hair" or "long-hair", are relatively stable. Hence, researches have started to design semantic attributes to reduce the visual ambiguity. However, to train a prediction model for semantic attributes, it requires plenty of annotations, which are hard to obtain in practical large-scale applications. To alleviate the reliance on annotation efforts, we propose to incrementally generate Deep Hidden Attribute (DHA) based on baseline deep network for newly uncovered annotations. In particular, we propose an auto-encoder model that can be plugged into any deep network to mine latent information in an unsupervised manner. To optimize the effectiveness of DHA, we reform the auto-encoder model with additional orthogonal generation module, along with identity-preserving and sparsity constraints. 1) Orthogonally generating: In order to make DHAs different from each other, Singular Vector Decomposition (SVD) is introduced to generate DHAs orthogonally. 2) Identity-preserving constraint: The generated DHAs should be distinct for telling different persons, so we associate DHAs with person identities. 3) Sparsity constraint: To enhance the discriminability of DHAs, we also introduce the sparsity constraint to restrict the number of effective DHAs for each person. Experiments conducted on public datasets have validated the effectiveness of the proposed network. On two large-scale datasets, i.e., Market-1501 and DukeMTMC-reID, the proposed method outperforms the state-of-the-art methods.

    更新日期:2020-01-04
  • Variational Bayesian Blind Color Deconvolution of Histopathological Images.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-10-22
    Natalia Hidalgo-Gavira,Javier Mateos,Miguel Vega,Rafael Molina,Aggelos K Katsaggelos

    Most whole-slide histological images are stained with two or more chemical dyes. Slide stain separation or color deconvolution is a crucial step within the digital pathology workflow. In this paper, the blind color deconvolution problem is formulated within the Bayesian framework. Starting from a multi-stained histological image, our model takes into account both spatial relations among the concentration image pixels and similarity between a given reference color-vector matrix and the estimated one. Using Variational Bayes inference, three efficient new blind color deconvolution methods are proposed which provide automated procedures to estimate all the model parameters in the problem. A comparison with classical and current state-of-the-art color deconvolution algorithms using real images has been carried out demonstrating the superiority of the proposed approach.

    更新日期:2020-01-04
  • Deep Adversarial Metric Learning.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : null
    Yueqi Duan,Jiwen Lu,Wenzhao Zheng,Jie Zhou

    Learning an effective distance measurement between sample pairs plays an important role in visual analysis, where the training procedure largely relies on hard negative samples. However, hard negative samples usually account for the tiny minority in the training set, which may fail to fully describe the data distribution close to the decision boundary. In this paper, we present a deep adversarial metric learning (DAML) framework to generate synthetic hard negatives from the original negative samples, which is widely applicable to existing supervised deep metric learning algorithms. Different from existing sampling strategies which simply ignore numerous easy negatives, our DAML aim to exploit them by generating synthetic hard negatives adversarial to the learned metric as complements. We simultaneously train the feature embedding and hard negative generator in an adversarial manner, so that adequate and targeted synthetic hard negatives are created to learn more precise distance metrics. As a single transformation may not be powerful enough to describe the global input space under the attack of the hard negative generator, we further propose a deep adversarial multi-metric learning (DAMML) method by learning multiple local transformations for more complete description. We simultaneously exploit the collaborative and competitive relationships among multiple metrics, where the metrics display unity against the generator for effective distance measurement as well as compete for more training data through a metric discriminator to avoid overlapping. Extensive experimental results on five benchmark datasets show that our DAML and DAMML effectively boost the performance of existing deep metric learning approaches through adversarial learning.

    更新日期:2020-01-04
  • Combining Faster R-CNN and Model-Driven Clustering for Elongated Object Detection.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-10-28
    Fen Fang,Liyuan Li,Hongyuan Zhu,Joo-Hwee Lim

    While analyzing the performance of state-of-the-art R-CNN based generic object detectors, we find that the detection performance for objects with low object-region-percentages (ORPs) of the bounding boxes are much lower than the overall average. Elongated objects are examples. To address the problem of low ORPs for elongated object detection, we propose a hybrid approach which employs a Faster R-CNN to achieve robust detections of object parts, and a novel model-driven clustering algorithm to group the related partial detections and suppress false detections. First, we train a Faster R-CNN with partial region proposals of suitable and stable ORPs. Next, we introduce a deep CNN (DCNN) for orientation classification on the partial detections. Then, on the outputs of the Faster R-CNN and DCNN, the algorithm of adaptive model-driven clustering first initializes a model of an elongated object with a data-driven process on local partial detections, and refines the model iteratively by model-driven clustering and data-driven model updating. By exploiting Faster R-CNN to produce robust partial detections and model-driven clustering to form a global representation, our method is able to generate a tight oriented bounding box for elongated object detection. We evaluate the effectiveness of our approach on two typical elongated objects in the COCO dataset, and other typical elongated objects, including rigid objects (pens, screwdrivers and wrenches) and non-rigid objects (cracks). Experimental results show that, compared with the state-of-the-art approaches, our method achieves a large margin of improvements for both detection and localization of elongated objects in images.

    更新日期:2020-01-04
  • Semantic Image Segmentation by Scale-Adaptive Networks.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-10-28
    Zilong Huang,Chunyu Wang,Xinggang Wang,Wenyu Liu,Jingdong Wangb

    Semantic image segmentation is an important yet unsolved problem. One of the major challenges is the large variability of the object scales. To tackle this scale problem, we propose a Scale-Adaptive Network (SAN) which consists of multiple branches with each one taking charge of the segmentation of the objects of a certain range of scales. Given an image, SAN first computes a dense scale map indicating the scale of each pixel which is automatically determined by the size of the enclosing object. Then the features of different branches are fused according to the scale map to generate the final segmentation map. To ensure that each branch indeed learns the features for a certain scale, we propose a scale-induced ground-truth map and enforce a scale-aware segmentation loss for the corresponding branch in addition to the final loss. Extensive experiments over the PASCAL-Person-Part, the PASCAL VOC 2012, and the Look into Person datasets demonstrate that our SAN can handle the large variability of the object scales and outperforms the state-of-the-art semantic segmentation methods.

    更新日期:2020-01-04
  • Mask SSD: An Effective Single-stage Approach to Object Instance Segmentation.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : null
    Hui Zhang,Yonglin Tian,Kunfeng Wang,Wensheng Zhang,Fei-Yue Wang

    We propose Mask SSD, an efficient and effective approach to address the challenging instance segmentation task. Based on a single-shot detector, Mask SSD detects all instances in an image and marks the pixels that belong to each instance. It consists of a detection subnetwork that predicts object categories and bounding box locations, and an instance-level segmentation subnetwork that generates the foreground mask for each instance. In the detection subnetwork, multi-scale and feedback features from different layers are used to better represent objects of various sizes and provide high-level semantic information. Then, we adopt an assistant classification network to guide per-class score prediction, which consists of objectness prior and category likelihood. The instance-level segmentation subnetwork outputs pixel-wise segmentation for each detection while providing the multi-scale and feedback features from different layers as input. These two subnetworks are jointly optimized by a multi-task loss function, which renders Mask SSD direct prediction on detection and segmentation results. We conduct extensive experiments on PASCAL VOC, SBD, and MS COCO datasets to evaluate the performance of Mask SSD. Experimental results verify that as compared with state-of-the-art approaches, our proposed method has a comparable precision with less speed overhead.

    更新日期:2020-01-04
  • Learning Latent Low-Rank and Sparse Embedding for Robust Image Feature Extraction.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-09-11
    Zhenwen Ren,Quansen Sun,Bin Wu,Xiaoqian Zhang,Wenzhu Yan

    To defy the curse of dimensionality, the inputs are always projected from the original high-dimensional space into the target low-dimension space for feature extraction. However, due to the existence of noise and outliers, the feature extraction task for corrupted data is still a challenging problem. Recently, a robust method called low rank embedding (LRE) was proposed. Despite the success of LRE in experimental studies, it also has many disadvantages: 1) The learned projection cannot quantitatively interpret the importance of features. 2) LRE does not perform data reconstruction so that the features may not be capable of holding the main energy of the original "clean" data. 3) LRE explicitly transforms error into the target space. 4) LRE is an unsupervised method, which is only suitable for unsupervised scenarios. To address these problems, in this paper, we propose a novel method to exploit the latent discriminative features. In particular, we first utilize an orthogonal matrix to hold the main energy of the original data. Next, we introduce an ℓ1,1-norm term to encourage the features to be more compact, discriminative and interpretable. Then, we enforce a columnwise ℓ2,1-norm constraint on an error component to resist noise. Finally, we integrate a classification loss term into the objective function to fit supervised scenarios. Our method performs better than several state-of-the-art methods in terms of effectiveness and robustness, as demonstrated on six publicly available datasets.

    更新日期:2020-01-04
  • Convolutional Analysis Operator Learning: Acceleration and Convergence.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-09-05
    Il Yong Chun,Jeffrey A Fessler

    Convolutional operator learning is gaining attention in many signal processing and computer vision applications. Learning kernels has mostly relied on so-called patch-domain approaches that extract and store many overlapping patches across training signals. Due to memory demands, patch-domain methods have limitations when learning kernels from large datasets - particularly with multi-layered structures, e.g., convolutional neural networks - or when applying the learned kernels to high-dimensional signal recovery problems. The so-called convolution approach does not store many overlapping patches, and thus overcomes the memory problems particularly with careful algorithmic designs; it has been studied within the "synthesis" signal model, e.g., convolutional dictionary learning. This paper proposes a new convolutional analysis operator learning (CAOL) framework that learns an analysis sparsifying regularizer with the convolution perspective, and develops a new convergent Block Proximal Extrapolated Gradient method using a Majorizer (BPEG-M) to solve the corresponding block multi-nonconvex problems. To learn diverse filters within the CAOL framework, this paper introduces an orthogonality constraint that enforces a tight-frame filter condition, and a regularizer that promotes diversity between filters. Numerical experiments show that, with sharp majorizers, BPEG-M significantly accelerates the CAOL convergence rate compared to the state-of-the-art block proximal gradient (BPG) method. Numerical experiments for sparse-view computational tomography show that a convolutional sparsifying regularizer learned via CAOL significantly improves reconstruction quality compared to a conventional edge-preserving regularizer. Using more and wider kernels in a learned regularizer better preserves edges in reconstructed images.

    更新日期:2020-01-04
  • Optical-Flow Based Nonlinear Weighted Prediction for SDR and Backward Compatible HDR Video Coding.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-10-16
    David Gommelet,Julien Le Tanou,Aline Roumy,Michael Ropert,Christine Guillemot

    Tone Mapping Operators (TMO) designed for videos can be classified into two categories. In a first approach, TMOs are temporal filtered to reduce temporal artifacts and provide a Standard Dynamic Range (SDR) content with improved temporal consistency. This however does not improve the SDR coding Rate Distortion (RD) performances. A second approach is to design the TMO with the goal of optimizing the SDR coding rate-distortion performances. This second category of methods may lead to SDR videos altering the artistic intent compared with the produced HDR content. In this paper, we combine the benefits of the two approaches by introducing new Weighted Prediction (WP) methods inside the HEVC SDR codec. As a first step, we demonstrate the interest of the WP methods compared to TMO optimized for RD performances. Then we present the newly introduced WP algorithm and WP modes. The WP algorithm consists in performing a global motion compensation between frames using an optical flow, and the new modes are based on non linear functions in contrast with the literature using only linear functions. The contribution of each novelty is studied independently and in a second time they are all put in competition to maximize the RD performances. Tests were made for HDR backward compatible compression but also for SDR compression only. In both cases, the proposed WP methods improve the RD performances while maintaining the SDR temporal coherency.

    更新日期:2020-01-04
  • Discriminative and Uncorrelated Feature Selection with Constrained Spectral Analysis in Unsupervised Learning.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-02
    Xuelong Li,Han Zhang,Rui Zhang,Feiping Nie

    The existing unsupervised feature extraction methods frequently explore low-redundant features by an uncorrelated constraint. However, the constrained models might incur trivial solutions, due to the singularity of scatter matrix triggered by high-dimensional data. In this paper, we propose a regularized regression model with a generalized uncorrelated constraint for feature selection, which leads to three merits: 1) exploring the low-redundant and discriminative features; 2) avoiding the trivial solutions and 3) simplifying the optimization. Besides that, the local cluster structure is achieved via a novel constrained spectral analysis for the unsupervised learning, where Must-Links and Cannot-Links are transformed into a intrinsic graph and a penalty graph respectively, rather than incorporated into a mixed affinity graph. Accordingly, a discriminative and uncorrelated feature selection with constrained spectral analysis (DUCFS) is proposed with adopting σ-norm regularization for interpolating between F-norm and ℓ2,1-norm. Due to the flexible gradient and global differentiability, our model converges fast. Extensive experiments on benchmark datasets among several state-of-the-art approaches verify the effectiveness of the proposed method.

    更新日期:2020-01-04
  • Face hallucination using cascaded super-resolution and identity priors.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-10-16
    Klemen Grm,Walter J Scheirer,Vitomir Struc

    In this paper we address the problem of hallucinating high-resolution facial images from low-resolution inputs at high magnification factors. We approach this task with convolutional neural networks (CNNs) and propose a novel (deep) face hallucination model that incorporates identity priors into the learning procedure. The model consists of two main parts: i) a cascaded super-resolution network that upscales the low-resolution facial images, and ii) an ensemble of face recognition models that act as identity priors for the super-resolution network during training. Different from most competing super-resolution techniques that rely on a single model for upscaling (even with large magnification factors), our network uses a cascade of multiple SR models that progressively upscale the low-resolution images using steps of 2×. This characteristic allows us to apply supervision signals (target appearances) at different resolutions and incorporate identity constraints at multiple-scales. The proposed C-SRIP model (Cascaded Super Resolution with Identity Priors) is able to upscale (tiny) low-resolution images captured in unconstrained conditions and produce visually convincing results for diverse low-resolution inputs. We rigorously evaluate the proposed model on the Labeled Faces in the Wild (LFW), Helen and CelebA datasets and report superior performance compared to the existing state-of-the-art.

    更新日期:2020-01-04
  • Unsupervised Rotation Factorization in Restricted Boltzmann Machines.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-10-22
    Mario Valerio Giuffrida,Sotirios A Tsaftaris

    Finding suitable image representations for the task at hand is critical in computer vision. Different approaches extending the original Restricted Boltzmann Machine (RBM) model have recently been proposed to offer rotation-invariant feature learning. In this paper, we present an extended novel RBM that learns rotation invariant features by explicitly factorizing for rotation nuisance in 2D image inputs within an unsupervised framework. While the goal is to learn invariant features, our model infers an orientation per input image during training, using information related to the reconstruction error. The training process is regularised by a Kullback-Leibler divergence, offering stability and consistency. We used the γ-score, a measure that calculates the amount of invariance, to mathematically and experimentally demonstrate that our approach indeed learns rotation invariant features. We show that our method outperforms the current state-of-the-art RBM approaches for rotation invariant feature learning on three different benchmark datasets, by measuring the performance with the test accuracy of an SVM classifier. Our implementation is available at https://bitbucket.org/tuttoweb/rotinvrbm.

    更新日期:2020-01-04
  • Semi-Linearized Proximal Alternating Minimization for a Discrete Mumford-Shah Model.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-10-12
    Marion Foare,Nelly Pustelnik,Laurent Condat

    The Mumford-Shah model is a standard model in image segmentation, and due to its difficulty, many approximations have been proposed. The major interest of this functional is to enable joint image restoration and contour detection. In this work, we propose a general formulation of the discrete counterpart of the Mumford-Shah functional, adapted to nonsmooth penalizations, fitting the assumptions required by the Proximal Alternating Linearized Minimization (PALM), with convergence guarantees. A second contribution aims to relax some assumptions on the involved functionals and derive a novel Semi-Linearized Proximal Alternated Minimization (SL-PAM) algorithm, with proved convergence. We compare the performances of the algorithm with several nonsmooth penalizations, for Gaussian and Poisson denoising, image restoration and RGB-color denoising. We compare the results with state-of-the-art convex relaxations of the Mumford-Shah functional, and a discrete version of the Ambrosio-Tortorelli functional. We show that the SL-PAM algorithm is faster than the original PALM algorithm, and leads to competitive denoising, restoration and segmentation results.

    更新日期:2020-01-04
  • A Deep Learning Reconstruction Framework for Differential Phase-Contrast Computed Tomography with Incomplete Data.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : null
    Jian Fu,Jianbing Dong,Feng Zhao

    Differential phase-contrast computed tomography (DPC-CT) is a powerful analysis tool for soft-tissue and low-atomic-number samples. Limited by the implementation conditions, DPC-CT with incomplete projections happens quite often. Conventional reconstruction algorithms face difficulty when given incomplete data. They usually involve complicated parameter selection operations, which are also sensitive to noise and are time-consuming. In this paper, we report a new deep learning reconstruction framework for incomplete data DPC-CT. It involves the tight coupling of the deep learning neural network and DPC-CT reconstruction algorithm in the domain of DPC projection sinograms. The estimated result is not an artifact caused by the incomplete data, but a complete phase-contrast projection sinogram. After training, this framework is determined and can be used to reconstruct the final DPC-CT images for a given incomplete projection sinogram. Taking the sparse-view, limited-view and missing-view DPC-CT as examples, this framework is validated and demonstrated with synthetic and experimental data sets. Compared with other methods, our framework can achieve the best imaging quality at a faster speed and with fewer parameters. This work supports the application of the state-of-the-art deep learning theory in the field of DPC-CT.

    更新日期:2020-01-04
  • Super Diffusion for Salient Object Detection.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-26
    Peng Jiang,Zhiyi Pan,Changhe Tu,Nuno Vasconcelos,Baoquan Chen,Jingliang Peng

    One major branch of saliency object detection methods are diffusion-based which construct a graph model on a given image and diffuse seed saliency values to the whole graph by a diffusion matrix. While their performance is sensitive to specific feature spaces and scales used for the diffusion matrix definition, little work has been published to systematically promote the robustness and accuracy of salient object detection under the generic mechanism of diffusion. In this work, we firstly present a novel view of the working mechanism of the diffusion process based on mathematical analysis, which reveals that the diffusion process is actually computing the similarity of nodes with respect to the seeds based on diffusion maps. Following this analysis, we propose super diffusion, a novel inclusive learning-based framework for salient object detection, which makes the optimum and robust performance by integrating a large pool of feature spaces, scales and even features originally computed for non-diffusion-based salient object detection. A closed-form solution of the optimal parameters for the integration is determined through supervised learning. At the local level, we propose to promote each individual diffusion before the integration. Our mathematical analysis reveals the close relationship between saliency diffusion and spectral clustering. Based on this, we propose to re-synthesize each individual diffusion matrix from the most discriminative eigenvectors and the constant eigenvector (for saliency normalization). The proposed framework is implemented and experimented on prevalently used benchmark datasets, consistently leading to state-of-the-art performance.

    更新日期:2019-11-01
  • The Structure Transfer Machine Theory and Applications.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-26
    Baochang Zhang,Wankou Yang,Ze Wang,Lian Zhuo,Jungong Han,Xiantong Zhen

    Representation learning is a fundamental but challenging problem, especially when the distribution of data is unknown. In this paper, we propose a new representation learning method, named Structure Transfer Machine (STM), which enables feature learning process to converge at the representation expectation in a probabilistic way. We theoretically show that such an expected value of the representation (mean) is achievable if the manifold structure can be transferred from the data space to the feature space. The resulting structure regularization term, named manifold loss, is incorporated into the loss function of the typical deep learning pipeline. The STM architecture is constructed to enforce the learned deep representation to satisfy the intrinsic manifold structure from the data, which results in robust features that suit various application scenarios, such as digit recognition, image classification and object tracking. Compared with state-of-the-art CNN architectures, we achieve better results on several commonly used public benchmarks.

    更新日期:2019-11-01
  • Discriminative Residual Analysis for Image Set Classification with Posture and Age Variations.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-26
    Chuan-Xian Ren,You-Wei Luo,Xiao-Lin Xu,Dao-Qing Dai,Hong Yan

    Image set recognition has been widely applied in many practical problems like real-time video retrieval and image caption tasks. Due to its superior performance, it has grown into a significant topic in recent years. However, images with complicated variations, e.g., postures and human ages, are difficult to address, as these variations are continuous and gradual with respect to image appearance. Consequently, the crucial point of image set recognition is to mine the intrinsic connection or structural information from the image batches with variations. In this work, a Discriminant Residual Analysis (DRA) method is proposed to improve the classification performance by discovering discriminant features in related and unrelated groups. Specifically, DRA attempts to obtain a powerful projection which casts the residual representations into a discriminant subspace. Such a projection subspace is expected to magnify the useful information of the input space as much as possible, then the relation between the training set and the test set described by the given metric or distance will be more precise in the discriminant subspace. We also propose a nonfeasance strategy by defining another approach to construct the unrelated groups, which help to reduce furthermore the cost of sampling errors. Two regularization approaches are used to deal with the probable small sample size problem. Extensive experiments are conducted on benchmark databases, and the results show superiority and efficiency of the new methods.

    更新日期:2019-11-01
  • MAVA: Multi-level Adaptive Visual-textual Alignment by Cross-media Bi-attention Mechanism.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-26
    Yuxin Peng,Jinwei Qi,Yunkan Zhuo

    The rapidly developing information technology leads to a fast growth of visual and textual contents, and it comes with huge challenges to make correlation and perform crossmedia retrieval between images and sentences. Existing methods mainly explore cross-media correlation from either global-level instances as the whole images and sentences, or local-level fine-grained patches as the discriminative image regions and key words, which ignore the complementary information from the relation between local-level fine-grained patches. Naturally, relation understanding is highly important for learning crossmedia correlation. People focus on not only the alignment between discriminative image regions and key words, but also their relations lying in the visual and textual context. Therefore, in this paper, we propose Multi-level Adaptive Visual-textual Alignment (MAVA) approach with the following contributions. First, we propose cross-media multi-pathway fine-grained network to extract not only the local fine-grained patches as discriminative image regions and key words, but also visual relations between image regions as well as textual relations from the context of sentences, which contain complementary information to exploit fine-grained characteristics within different media types. Second, we propose visual-textual bi-attention mechanism to distinguish the fine-grained information with different saliency from both local and relation levels, which can provide more discriminative hints for correlation learning. Third, we propose cross-media multi-level adaptive alignment to explore global, local and relation alignments. An adaptive alignment strategy is further proposed to enhance the matched pairs of different media types, and discard those misalignments adaptively to learn more precise cross-media correlation. Extensive experiments are conducted to perform image-sentence matching on 2 widely-used cross-media datasets, namely Flickr-30K and MS-COCO, comparing with 10 state-of-the-art methods, which can fully verify the effectiveness of our proposed MAVA approach.

    更新日期:2019-11-01
  • Context-Interactive CNN for Person Re-Identification.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-22
    Wenfeng Song,Shuai Li,Tao Chang,Aimin Hao,Qinping Zhao,Hong Qin

    Despite growing progresses in recent years, cross-scenario person re-identification remains challenging, mainly due to the pedestrians commonly surrounded by highly-complex environment contexts. In reality, the human perception mechanism could adaptively find proper contextualized spatial-temporal clues towards pedestrian recognition. However, conventional methods fall short in adaptively leveraging the long-term spatial-temporal information due to ever-increasing computational cost. Moreover, CNN-based deep learning methods are hard to conduct optimization due to the non-differentiable property of the built-in context search operation. To ameliorate, this paper proposes a novel Context-Interactive CNN (CI-CNN) to dynamically find both spatial and temporal contexts by embedding multi-task Reinforcement Learning (MTRL). The CI-CNN streamlines the multi-task reinforcement learning by using an actor-critic agent to capture the temporal-spatial context simultaneously, which comprises a context-policy network and a context-critic network. The former network learns policies to determine the optimal spatial context region and temporal sequence range. Based on the inferred temporal-spatial cues, the latter one focuses on the identification task and provides feedback for the policy network. Thus, CI-CNN can simultaneously zoom in/out the perception field in spatial and temporal domain for the context interaction with the environment. By fostering the collaborative interaction between the person and context, our method could achieve outstanding performance on various public benchmarks, which confirms the rationality of our hypothesis, and verifies the effectiveness of our CI-CNN framework.

    更新日期:2019-11-01
  • Latent Elastic-Net Transfer Learning.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-22
    Na Han,Jigang Wu,Xiaozhao Fang,Shengli Xie,Shanhua Zhan,Kan Xie,Xuelong Li

    Subspace learning based transfer learning methods commonly find a common subspace where the discrepancy of the source and target domains is reduced. The final classification is also performed in such subspace. However, the minimum discrepancy does not guarantee the best classification performance and thus the common subspace may be not the best discriminative. In this paper, we propose a latent elastic-net transfer learning (LET) method by simultaneously learning a latent subspace and a discriminative subspace. Specifically, the data from different domains can be well interlaced in the latent subspace by minimizing Maximum Mean Discrepancy (MMD). Since the latent subspace decouples inputs and outputs and, thus a more compact data representation is obtained for discriminative subspace learning. Based on the latent subspace, we further propose a low-rank constraint based matrix elastic-net regression to learn another subspace in which the intrinsic intra-class structure correlations of data from different domains is well captured. In doing so, a better discriminative alignment is guaranteed and thus LET finally learns another discriminative subspace for classification. Experiments on visual domains adaptation tasks show the superiority of the proposed LET method.

    更新日期:2019-11-01
  • A Multi-domain and Multi-modal Representation Disentangler for Cross-Domain Image Manipulation and Classification.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-22
    Fu-En Yang,Jing-Cheng Chang,Chung-Chi Tsai,Yu-Chiang Frank Wang

    Learning interpretable data representation has been an active research topic in deep learning and computer vision. While representation disentanglement is an effective technique for addressing this task, existing works cannot easily handle the problems in which manipulating and recognizing data across multiple domains are desirable. In this paper, we present a unified network architecture of Multi-domain and Multi-modal Representation Disentangler (M2RD), with the goal of learning domain-invariant content representation with the associated domain-specific representation observed. By advancing adversarial learning and disentanglement techniques, the proposed model is able to perform continuous image manipulation across data domains with multiple modalities. More importantly, the resulting domain-invariant feature representation can be applied for unsupervised domain adaptation. Finally, our quantitative and qualitative results would confirm the effectiveness and robustness of the proposed model over state-of-the-art methods on the above tasks.

    更新日期:2019-11-01
  • Adaptive Sample-level Graph Combination for Partial Multiview Clustering.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-22
    Liu Yang,Chenyang Shen,Qinghua Hu,Liping Jing,Yingbo Li

    Multiview clustering explores complementary information among distinct views to enhance clustering performance under the assumption that all samples have complete information in all available views. However, this assumption does not hold in many real applications, where the information of some samples in one or more views may be missing, leading to partial multiview clustering problems. In this case, significant performance degeneration is usually observed. A collection of partial multiview clustering algorithms has been proposed to address this issue and most treat all different views equally during clustering. In fact, because different views provide features collected from different angles/feature spaces, they might play different roles in the clustering process. With the diversity of different views considered, in this study, a novel adaptive method is proposed for partial multiview clustering by automatically adjusting the contributions of different views. The samples are divided into complete and incomplete sets, while a joint learning mechanism is established to facilitate the connection between them and thereby improve clustering performance. More specifically, the method is characterized by a joint optimization model comprising two terms. The first term mines the underlying cluster structure from both complete and incomplete samples by adaptively updating their importance in all available views. The second term is designed to group all data with the aid of the cluster structure modeled in the first term. These two terms seamlessly integrate the complementary information among multiple views and enhance the performance of partial multiview clustering. Experimental results on real-world datasets illustrate the effectiveness and efficiency of our proposed method.

    更新日期:2019-11-01
  • Semi-Supervised Image Dehazing.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-22
    Lerenhan Li,Yunlong Dong,Wenqi Ren,Jinshan Pan,Changxin Gao,Nong Sang,Ming-Hsuan Yang

    We present an effective semi-supervised learning algorithm for single image dehazing. The proposed algorithm applies a deep Convolutional Neural Network (CNN) containing a supervised learning branch and an unsupervised learning branch. In the supervised branch, the deep neural network is constrained by the supervised loss functions, which are mean squared, perceptual, and adversarial losses. In the unsupervised branch, we exploit the properties of clean images via sparsity of dark channel and gradient priors to constrain the network. We train the proposed network on both the synthetic data and real-world images in an end-to-end manner. Our analysis shows that the proposed semi-supervised learning algorithm is not limited to synthetic training datasets and can be generalized well to real-world images. Extensive experimental results demonstrate that the proposed algorithm performs favorably against the state-of-the-art single image dehazing algorithms on both benchmark datasets and real-world images.

    更新日期:2019-11-01
  • Color Channel Compensation (3C): A fundamental pre-processing step for image enhancement.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-22
    Codruta O Ancuti,Cosmin Ancuti,Christophe De Vleeschouwer,Mateu Sbert

    This article introduces a novel solution to improve image enhancement in terms of color appearance. Our approach, called Color Channel Compensation (3C), overcomes artifacts resulting from the severely non-uniform color spectrum distribution encountered in images captured under hazy night-time conditions, underwater, or under non-uniform artificial illumination. Our solution is founded on the observation that, under such adverse conditions, the information contained in at least one color channel is close to completely lost, making the traditional enhancing techniques subject to noise and color shifting. In those cases, our pre-processing method proposes to reconstruct the lost channel based on the opponent color channel. Our algorithm subtracts a local mean from each opponent color pixel. Thereby, it partly recovers the lost color from the two colors (red-green or blue-yellow) involved in the opponent color channel. The proposed approach, whilst simple, is shown to consistently improve the outcome of conventional restoration methods. To prove the utility of our 3C operator, we provide an extensive qualitative and quantitative evaluation for white balancing, image dehazing, and underwater enhancement applications.

    更新日期:2019-11-01
  • Phase asymmetry ultrasound despeckling with fractional anisotropic diffusion and total variation.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-22
    Kunqiang Mei,Bin Hu,Baowei Fei,Binjie Qin

    We propose an ultrasound speckle filtering method for not only preserving various edge features but also filtering tissue-dependent complex speckle noises in ultrasound images. The key idea is to detect these various edges using a phase congruence-based edge significance measure called phase asymmetry (PAS), which is invariant to the intensity amplitude of edges and takes 0 in non-edge smooth regions and 1 at the idea step edge, while also taking intermediate values at slowly varying ramp edges. By leveraging the PAS metric in designing weighting coefficients to maintain a balance between fractional-order anisotropic diffusion and total variation (TV) filters in TV cost function, we propose a new fractional TV framework to not only achieve the best despeckling performance with ramp edge preservation but also reduce the staircase effect produced by integral-order filters. Then, we exploit the PAS metric in designing a new fractional-order diffusion coefficient to properly preserve low-contrast edges in diffusion filtering. Finally, different from fixed fractional-order diffusion filters, an adaptive fractional order is introduced based on the PAS metric to enhance various weak edges in the spatially transitional areas between objects. The proposed fractional TV model is minimized using the gradient descent method to obtain the final denoised image. The experimental results and real application of ultrasound breast image segmentation show that the proposed method outperforms other state-of-the-art ultrasound despeckling filters for both speckle reduction and feature preservation in terms of visual evaluation and quantitative indices. The best scores on feature similarity indices have achieved 0.867, 0.844 and 0.834 under three different levels of noise, while the best breast ultrasound segmentation accuracy in terms of the mean and median dice similarity coefficient are 96.25% and 96.15%, respectively.

    更新日期:2019-11-01
  • Unsupervised Deep Contrast Enhancement with Power Constraint for OLED Displays.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-22
    Yong-Goo Shin,Seung Park,Yoon-Jae Yeo,Min-Jae Yoo,Sung-Jea Ko

    Various power-constrained contrast enhance-ment (PCCE) techniques have been applied to an organic light emitting diode (OLED) display for reducing the pow-er demands of the display while preserving the image qual-ity. In this paper, we propose a new deep learning-based PCCE scheme that constrains the power consumption of the OLED displays while enhancing the contrast of the displayed image. In the proposed method, the power con-sumption is constrained by simply reducing the brightness a certain ratio, whereas the perceived visual quality is pre-served as much as possible by enhancing the contrast of the image using a convolutional neural network (CNN). Furthermore, our CNN can learn the PCCE technique without a reference image by unsupervised learning. Ex-perimental results show that the proposed method is supe-rior to conventional ones in terms of image quality assess-ment metrics such as a visual saliency-induced index (VSI) and a measure of enhancement (EME).1.

    更新日期:2019-11-01
  • Deep Guided Learning for Fast Multi-Exposure Image Fusion.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-22
    Kede Ma,Zhengfang Duanmu,Hanwei Zhu,Yuming Fang,Zhou Wang

    We propose a fast multi-exposure image fusion (MEF) method, namely MEF-Net, for static image sequences of arbitrary spatial resolution and exposure number. We first feed a low-resolution version of the input sequence to a fully convolutional network for weight map prediction. We then jointly upsample the weight maps using a guided filter. The final image is computed by a weighted fusion. Unlike conventional MEF methods, MEF-Net is trained end-to-end by optimizing the perceptually calibrated MEF structural similarity (MEF-SSIM) index over a database of training sequences at full resolution. Across an independent set of test sequences, we find that the optimized MEF-Net achieves consistent improvement in visual quality for most sequences, and runs 10 to 1000 times faster than state-of-the-art methods. The code is made publicly available at.

    更新日期:2019-11-01
  • MonoFENet: Monocular 3D Object Detection with Feature Enhancement Networks.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-15
    Wentao Bao,Bin Xu,Zhenzhong Chen

    Monocular 3D object detection has the merit of low cost and can be served as an auxiliary module for autonomous driving system, becoming a growing concern in recent years. In this paper, we present a monocular 3D object detection method with feature enhancement networks, which we call MonoFENet. Specifically, with the estimated disparity from the input monocular image, the features of both the 2D and 3D streams can be enhanced and utilized for accurate 3D localization. For the 2D stream, the input image is used to generate 2D region proposals as well as to extract appearance features. For the 3D stream, the estimated disparity is transformed into 3D dense point cloud, which is then enhanced by the associated front view maps. With the RoI Mean Pooling layer, 3D geometric features of RoI point clouds are further enhanced by the proposed point feature enhancement (PointFE) network. The region-wise features of image and point cloud are fused for the final 2D and 3D bounding boxes regression. The experimental results on the KITTI benchmark reveal that our method can achieve state-of-the-art performance for monocular 3D object detection.

    更新日期:2019-11-01
  • A Context Knowledge Map Guided Coarse-to-fine Action Recognition.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-15
    Yanli Ji,Yue Zhan,Yang Yang,Xing Xu,Fumin Shen,Heng Tao Shen

    Human actions involve a wide variety and a large number of categories, which leads to a big challenge in action recognition. However, according to similarities on human body poses, scenes, interactive objects, human actions can be grouped into some semantic groups, i.e sports, cooking, etc. Therefore, in this paper, we propose a novel approach which recognizes human actions from coarse to fine. Taking full advantage of contributions from high-level semantic contexts, a context knowledge map guided recognition method is designed to realize the coarse-to-fine procedure. In the approach, we define semantic contexts with interactive objects, scenes and body motions in action videos, and build a context knowledge map to automatically define coarse-grained groups. Then fine-grained classifiers are proposed to realize accurate action recognition. The coarse-to-fine procedure narrows action categories in target classifiers, so it is beneficial to improving recognition performance. We evaluate the proposed approach on the CCV, the HMDB-51, and the UCF101 database. Experiments verify its significant effectiveness, on average, improving more than 5% of recognition precisions than current approaches. Compared with the state-of-the-art, it also obtains outstanding performance. The proposed approach achieves higher accuracies of 93.1%, 95.4% and 74.5% in the CCV, the UCF-101 and the HMDB51 database, respectively.

    更新日期:2019-11-01
  • PaDNet: Pan-Density Crowd Counting.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-15
    Yukun Tian,Yiming Lei,Junping Zhang,James Z Wang

    Crowd counting is a highly challenging problem in computer vision and machine learning. Most previous methods have focused on consistent density crowds, i.e., either a sparse or a dense crowd, meaning they performed well in global estimation while neglecting local accuracy. To make crowd counting more useful in the real world, we propose a new perspective, named pan-density crowd counting, which aims to count people in varying density crowds. Specifically, we propose the Pan-Density Network (PaDNet) which is composed of the following critical components. First, the Density-Aware Network (DAN) contains multiple subnetworks pretrained on scenarios with different densities. This module is capable of capturing pandensity information. Second, the Feature Enhancement Layer (FEL) effectively captures the global and local contextual features and generates a weight for each density-specific feature. Third, the Feature Fusion Network (FFN) embeds spatial context and fuses these density-specific features. Further, the metrics Patch MAE (PMAE) and Patch RMSE (PRMSE) are proposed to better evaluate the performance on the global and local estimations. Extensive experiments on four crowd counting benchmark datasets, the ShanghaiTech, the UCF-CC-50, the UCSD, and the UCFQNRF, indicate that PaDNet achieves state-of-the-art recognition performance and high robustness in pan-density crowd counting.

    更新日期:2019-11-01
  • High-order Feature Learning for Multi-atlas based Label Fusion: Application to Brain Segmentation with MRI.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-15
    Liang Sun,Wei Shao,Mingliang Wang,Daoqiang Zhang,Mingxia Liu

    Multi-atlas based segmentation methods have shown their effectiveness in brain regions-of-interesting (ROIs) segmentation, by propagating labels from multiple atlases to a target image based on the similarity between patches in the target image and multiple atlas images. Most of the existing multiatlas based methods use image intensity features to calculate the similarity between a pair of image patches for label fusion. In particular, using only low-level image intensity features cannot adequately characterize the complex appearance patterns (e.g., the high-order relationship between voxels within a patch) of brain magnetic resonance (MR) images. To address this issue, this paper develops a high-order feature learning framework for multi-atlas based label fusion, where high-order features of image patches are extracted and fused for segmenting ROIs of structural brain MR images. Specifically, an unsupervised feature learning method (i.e., means-covariances restricted Boltzmann machine, mcRBM) is employed to learn high-order features (i.e., mean and covariance features) of patches in brain MR images. Then, a group-fused sparsity dictionary learning method is proposed to jointly calculate the voting weights for label fusion, based on the learned high-order and the original image intensity features. The proposed method is compared with several state-of-the-art label fusion methods on ADNI, NIREP and LONI-LPBA40 datasets. The Dice ratio achieved by our method is 88:30%, 88:83%, 79:54% and 81:02% on left and right hippocampus on the ADNI, NIREP and LONI-LPBA40 datasets, respectively, while the best Dice ratio yielded by the other methods are 86:51%, 87:39%, 78:48% and 79:65% on three datasets, respectively.

    更新日期:2019-11-01
  • Unsupervised Single Image Dehazing Using Dark Channel Prior Loss.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-15
    Alona Golts,Daniel Freedman,Michael Elad

    Single image dehazing is a critical stage in many modern-day autonomous vision applications. Early prior-based methods often involved a time-consuming minimization of a hand-crafted energy function. Recent learning-based approaches utilize the representational power of deep neural networks (DNNs) to learn the underlying transformation between hazy and clear images. Due to inherent limitations in collecting matching clear and hazy images, these methods resort to training on synthetic data, constructed from indoor images and corresponding depth information. This may result in a possible domain shift when treating outdoor scenes. We propose a completely unsupervised method of training via minimization of the well-known, Dark Channel Prior (DCP) energy function. Instead of feeding the network with synthetic data, we solely use real-world outdoor images and tune the network's parameters by directly minimizing the DCP. Although our "Deep DCP" technique can be regarded as a fast approximator of DCP, it actually improves its results significantly. This suggests an additional regularization obtained via the network and learning process. Experiments show that our method performs on par with large-scale supervised methods.

    更新日期:2019-11-01
  • Fast online 3D reconstruction of dynamic scenes from individual single-photon detection events.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-15
    Yoann Altmann,Stephen McLaughlin,Michael E Davies

    In this paper, we present an algorithm for online 3D reconstruction of dynamic scenes using individual times of arrival (ToA) of photons recorded by single-photon detector arrays. One of the main challenges in 3D imaging using single-photon Lidar is the integration time required to build ToA histograms and reconstruct reliably 3D profiles in the presence of non-negligible ambient illumination. This long integration time also prevents the analysis of rapid dynamic scenes using existing techniques. We propose a new method which does not rely on the construction of ToA histograms but allows, for the first time, individual detection events to be processed online, in a parallel manner in different pixels, while accounting for the intrinsic spatiotemporal structure of dynamic scenes. Adopting a Bayesian approach, a Bayesian model is constructed to capture the dynamics of the 3D profile and an approximate inference scheme based on assumed density filtering is proposed, yielding a fast and robust reconstruction algorithm able to process efficiently thousands to millions of frames, as usually recorded using single-photon detectors. The performance of the proposed method, able to process hundreds of frames per second, is assessed using a series of experiments conducted with static and dynamic 3D scenes and the results obtained pave the way to a new family of real-time 3D reconstruction solutions.

    更新日期:2019-11-01
  • Group-Group Loss Based Global-Regional Feature Learning for Vehicle Re-Identification.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-15
    Xiaobin Liu,Shiliang Zhang,Xiaoyu Wang,Qi Tian

    Vehicle Re-Identification (Re-ID) is challenging because vehicles of the same model commonly show similar appearance. We tackle this challenge by proposing a Global-Regional Feature (GRF) that depicts extra local details to enhance discrimination power in addition to the global context. It is motivated by the observation that, vehicles of same color, maker, and model can be distinguished by their regional difference, e.g., the decorations on the windshields. To accelerate the GRF learning and promote its discrimination power, we propose a Group-Group Loss (GGL) to optimize the distance within and across vehicle image groups. Different from the siamese or triplet loss, GGL is directly computed on image groups rather than individual sample pairs or triplets. By avoiding traversing numerous sample combinations, GGL makes the model training easier and more efficient. Those two contributions highlight this work from previous methods on vehicle Re-ID task, which commonly learn global features with triplet loss or its variants. We evaluate our methods on two large-scale vehicle Re-ID datasets, i.e., VeRi and VehicleID. Experimental results show our methods achieve promising performance in comparison with recent works.

    更新日期:2019-11-01
  • Class-specific Reconstruction Transfer Learning for Visual Recognition Across Domains.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-13
    Shanshan Wang,Lei Zhang,Wangmeng Zuo,Bob Zhang

    Subspace learning and reconstruction have been widely explored in recent transfer learning work. Generally, a specially designed projection and reconstruction transfer functions bridging multiple domains for heterogeneous knowledge sharing are wanted. However, we argue that the existing subspace reconstruction based domain adaptation algorithms neglect the class prior, such that the learned transfer function is biased, especially when data scarcity of some class is encountered. Different from those previous methods, in this paper, we propose a novel class-wise reconstruction-based adaptation method called Class-specific Reconstruction Transfer Learning (CRTL), which optimizes a well modeled transfer loss function by fully exploiting intra-class dependency and inter-class independency. The merits of the CRTL are three-fold. 1) Using a class-specific reconstruction matrix to align the source domain with the target domain fully exploits the class prior in modeling the domain distribution consistency, which benefits the cross-domain classification. 2) Furthermore, to keep the intrinsic relationship between data and labels after feature augmentation, a projected Hilbert-Schmidt Independence Criterion (pHSIC), that measures the dependency between data and label, is first proposed in transfer learning community by mapping the data from raw space to RKHS. 3) In addition, by imposing low-rank and sparse constraints on the class-specific reconstruction coefficient matrix, the global and local data structure that contributes to domain correlation can be effectively preserved. Extensive experiments on challenging benchmark datasets demonstrate the superiority of the proposed method over state-of-the-art representation-based domain adaptation methods. The demo code is available in https://github.com/wangshanshanCQU/CRTL.

    更新日期:2019-11-01
  • Improved Techniques for Adversarial Discriminative Domain Adaptation.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-13
    Aaron Chadha,Yiannis Andreopoulos

    Adversarial discriminative domain adaptation (ADDA) is an efficient framework for unsupervised domain adaptation in image classification, where the source and target domains are assumed to have the same classes, but no labels are available for the target domain. While ADDA has already achieved better training efficiency and competitive accuracy on image classification in comparison to other adversarial based methods, we investigate whether we can improve its performance with a new framework and new loss formulations. Following the framework of semi-supervised GANs, we first extend the discriminator output over the source classes, in order to model the joint distribution over domain and task. We thus leverage on the distribution over the source encoder posteriors (which is fixed during adversarial training) and propose maximum mean discrepancy (MMD) and reconstruction-based loss functions for aligning the target encoder distribution to the source domain. We compare and provide a comprehensive analysis of how our framework and loss formulations extend over simple multi-class extensions of ADDA and other discriminative variants of semi-supervised GANs. In addition, we introduce various forms of regularization for stabilizing training, including treating the discriminator as a denoising autoencoder and regularizing the target encoder with source examples to reduce overfitting under a contraction mapping (i.e., when the target per-class distributions are contracting during alignment with the source). Finally, we validate our framework on standard datasets like MNIST, USPS, SVHN, MNIST-M and Office-31. We additionally examine how the proposed framework benefits recognition problems based on sensing modalities that lack training data. This is realized by introducing and evaluating on a neuromorphic vision sensing (NVS) sign language recognition dataset, where the source domain constitutes emulated neuromorphic spike events converted from conventional pixel-based video and the target domain is experimental (real) spike events from an NVS camera. Our results on all datasets show that our proposal is both simple and efficient, as it competes or outperforms the state-of-the-art in unsupervised domain adaptation, such as DIFA and MCDDA, whilst offering lower complexity than other recent adversarial methods.

    更新日期:2019-11-01
  • Reconstruction of Binary Shapes from Blurred Images via Hankel-structured Low-rank Matrix Recovery.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-13
    Saeed Razavikia,Arash Amini,Sajad Daei

    With the dominance of digital imaging systems, we are often dealing with discrete-domain samples of an analog image. Due to physical limitations, all imaging devices apply a blurring kernel on the input image before taking samples to form the output pixels. In this paper, we focus on the reconstruction of binary shape images from few blurred samples. This problem has applications in medical imaging, shape processing, and image segmentation. Our method relies on representing the analog shape image in a discrete grid much finer than the sampling grid. We formulate the problem as the recovery of a rank r matrix that is formed by a Hankel structure on the pixels. We further propose efficient ADMM-based algorithms to recover the low-rank matrix in both noiseless and noisy settings. We also analytically investigate the number of required samples for successful recovery in the noiseless case. For this purpose, we study the problem in the random sampling framework, and show that with O(r log4(n1n2)) random samples (where the size of the image is assumed to be n1 x n2) we can guarantee the perfect reconstruction with high probability under mild conditions. We further prove the robustness of the proposed recovery in the noisy setting by showing that the reconstruction error in the noisy case is bounded when the input noise is bounded. Simulation results confirm that our proposed method outperform the conventional total variation minimization in the noiseless settings.

    更新日期:2019-11-01
  • Distilling Channels for Efficient Deep Tracking.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-13
    Shiming Ge,Zhao Luo,Chunhui Zhang,Yingying Hua,Dacheng Tao

    Deep trackers have proven success in visual tracking. Typically, these trackers employ optimally pre-trained deep networks to represent all diverse objects with multi-channel features from some fixed layers. The deep networks employed are usually trained to extract rich knowledge from massive data used in object classification and so they are capable to represent generic objects very well. However, these networks are too complex to represent a specific moving object, leading to poor generalization as well as high computational and memory costs. This paper presents a novel and general framework termed channel distillation to facilitate deep trackers. To validate the effectiveness of channel distillation, we take discriminative correlation filter (DCF) and ECO for example. We demonstrate that an integrated formulation can turn feature compression, response map generation, and model update into a unified energy minimization problem to adaptively select informative feature channels that improve the efficacy of tracking moving objects on the fly. Channel distillation can accurately extract good channels, alleviating the influence of noisy channels and generally reducing the number of channels, as well as adaptively generalizing to different channels and networks. The resulting deep tracker is accurate, fast, and has low memory requirements. Extensive experimental evaluations on popular benchmarks clearly demonstrate the effectiveness and generalizability of our framework.

    更新日期:2019-11-01
  • Accurate Transmission Estimation for Removing Haze and Noise from a Single Image.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-13
    Qingbo Wu,Jingang Zhang,Wenqi Ren,Wangmeng Zuo,Xiaochun Cao

    Image noise usually causes depth-dependent visual artifacts in single image dehazing. Most existing dehazing methods exploit a two-step strategy in the restoration, which inevitably leads to inaccurate transmission maps and low-quality scene radiance for noisy and hazy inputs. To address these problems, we present a novel variational model for joint recovery of the transmission map and the scene radiance from a single image. In the model, we propose a transmission-aware non-local regularization to avoid noise amplification by adaptively suppressing noise and preserving fine details in the recovered image. Meanwhile, to improve the accuracy of transmission estimation, we introduce a semantic-guided regularization to smooth out the transmission map while keeping depth inconsistency at the boundaries of different objects. Furthermore, we design an alternating scheme to jointly optimize the transmission map and the scene radiance as well as the segmentation map. Extensive experiments on synthetic and real-world data demonstrate that the proposed algorithm performs favorably against state-of-the-art dehazing methods on noisy and hazy images.

    更新日期:2019-11-01
  • Tensor Multi-task Learning for Person Re-identification.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-07
    Zhizhong Zhang,Yuan Xie,Wensheng Zhang,Yongqiang Tang,Qi Tian

    This paper presents a tensor multi-task model for person re-identification (Re-ID). Due to discrepancy among cameras, our approach regards Re-ID from multiple cameras as different but related classification tasks, each task corresponding to a specific camera. In each task, we distinguish the person identity as a one-vs-all linear classification problem, where one classifier is associated with a specific person. By constructing all classifiers into a task-specific projection matrix, the proposed method could utilize all the matrices to form a tensor structure, and jointly train all the tasks in a uniform tensor space. In this space, by assuming the features of the same person under different cameras are generated from a latent subspace, and different identities under the same perspective share similar patterns, the high-order correlations, not only across different tasks but also within a certain task, can be captured by utilizing a new type of low-rank tensor constraint. Therefore, the learned classifiers transform the original feature vector into the latent space, where feature distributions across cameras can be well-aligned. Moreover, this model can be incorporated into multiple visual features to boost the performance, and easily extended to the unsupervised setting. Extensive experiments and comparisons with recent Re-ID methods manifest the competitive performance of our method.

    更新日期:2019-11-01
  • Super-Resolution Phase Retrieval from Designed Coded Diffraction Patterns.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-02
    Jorge Bacca,Samuel Pinilla,Henry Arguello

    Super-resolution phase retrieval is an inverse problem that appears in diffractive optical imaging (DOI) and consists in estimating a high-resolution image from low-resolution phaseless measurements. DOI has three diffraction zones where the data can be acquired, known as near, middle, and far fields. Recent works have studied super-resolution phase retrieval under a setup that records coded diffraction patterns at the near and far fields. However, the attainable resolution of the image is mainly governed by the sensor characteristics, whose cost increases in proportion to the resolution. Also, these methodologies lack theoretical analysis. Hence, this work derives super-resolution models from low-resolution coded phaseless measurements at any diffraction zone that in contrast to prior contributions, the attainable resolution of the image is determined by the resolution of the coded aperture. For the proposed models, the existence of a unique solution (up to a global unimodular constant) is guaranteed with high probability, which can be increased by designing the coded aperture. Therefore, a strategy that designs the spatial distribution of the coded aperture is developed. Additionally, a super-resolution phase retrieval algorithm that minimizes a smoothed nonconvex least-squares objective function is proposed. The method first approximates the image by a spectral algorithm, which is then refined based upon a sequence of alternate steps. Simulation results show that the proposed algorithm overcomes state-of-the-art methods in reconstructing the high-resolution image. In addition, the reconstruction quality using designed coded apertures is higher than that of the non-designed ensembles.

    更新日期:2019-11-01
  • Local-Adaptive Image Alignment Based on Triangular Facet Approximation.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-02
    Jing Li,Baosong Deng,Rongfu Tang,Zhengming Wang,Ye Yan

    Accurate and efficient image alignment is the core problem in the research of panoramic stitching nowadays. This paper proposes a local-adaptive image alignment method based on triangular facet approximation, which directly manipulates the matching data in the camera coordinates, and therefore rises superior to the imaging model of cameras. A more robust planar transformation model is proposed and extended to be local-adaptive via combining it with two weighting strategies. By approximating the scene as a combination of adjacent triangular facets, the planar and spherical triangulation strategies are introduced to more efficiently align normal and fisheye images respectively. The efficiency of the proposed method are verified through the comparative experiments on several challenging cases both qualitatively and quantitatively.

    更新日期:2019-11-01
  • Low-Rank Approximation via Generalized Reweighted Iterative Nuclear and Frobenius Norms.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-02
    Yan Huang,Guisheng Liao,Yijian Xiang,Lei Zhang,Jie Li,Arye Nehorai

    The low-rank approximation problem has recently attracted wide concern due to its excellent performance in real-world applications such as image restoration, traffic monitoring, and face recognition. Compared with the classic nuclear norm, the Schatten-p norm is stated to be a closer approximation to restrain the singular values for practical applications in the real world. However, Schatten-p norm minimization is a challenging non-convex, non-smooth, and non-Lipschitz problem. In this paper, inspired by the reweighted ℓ1 and ℓ2 norm for compressive sensing, the generalized iterative reweighted nuclear norm (GIRNN) and the generalized iterative reweighted Frobenius norm (GIRFN) algorithms are proposed to approximate Schatten-p norm minimization. By involving the proposed algorithms, the problem becomes more tractable and the closed solutions are derived from the iteratively reweighted subproblems. In addition, we prove that both proposed algorithms converge at a linear rate to a bounded optimum. Numerical experiments for the practical matrix completion (MC), robust principal component analysis (RPCA), and image decomposition problems are illustrated to validate the superior performance of both algorithms over some common state-of-the-art methods.

    更新日期:2019-11-01
  • Repeated Look-up Tables.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-11-02
    Erik Reinhard,Elena Garces,Jurgen Stauder

    Efficient hardware implementations routinely approximate mathematical functions with look-up tables, while keeping the error of the approximation under control. For a certain class of commonly occurring 1D functions, namely monotonically increasing or decreasing functions, we found that it is possible to approximate such functions by repeated application of a very low resolution 1D look-up table. There are many advantages to cascading multiple identical LUTs, including the promise of a very simple hardware design and the use of standard linear interpolation. Further, the complexity associated with unequal bin sizes can be avoided. We show that for realistic applications, including gamma correction, high dynamic range encoding and decoding curves, as well as tone mapping and inverse tone mapping applications, multiple cascaded look-up tables can reduce the approximation error by more than 50% compared to a single look-up table with the same total memory footprint.

    更新日期:2019-11-01
  • Low cost gaze estimation: knowledge-based solutions.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-10-22
    Ion Martinikorena,Andoni Larumbe-Bergera,Mikel Ariz,Sonia Porta,Rafael Cabeza,Arantxa Villanueva

    Eye tracking technology in low resolution scenarios is not a completely solved issue to date. The possibility of using eye tracking in a mobile gadget is a challenging objective that would permit to spread this technology to non-explored fields. In this paper, a knowledge based approach is presented to solve gaze estimation in low resolution settings. The understanding of the high resolution paradigm permits to propose alternative models to solve gaze estimation. In this manner, three models are presented: a geometrical model, an interpolation model and a compound model, as solutions for gaze estimation for remote low resolution systems. Since this work considers head position essential to improve gaze accuracy, a method for head pose estimation is also proposed. The methods are validated in an optimal framework, I2Head database, which combines head and gaze data. The experimental validation of the models demonstrates their sensitivity to image processing inaccuracies, critical in the case of the geometrical model. Static and extreme movement scenarios are analyzed showing the higher robustness of compound and geometrical models in the presence of user's displacement. Accuracy values of about 3° have been obtained, increasing to values close to 5° in extreme displacement settings, results fully comparable with the state-of-the-art.

    更新日期:2019-11-01
  • Deep Active Shape Model for Robust Object Fitting.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : null
    Daniela O Medley,Carlos Santiago,Jacinto C Nascimento

    Object recognition and localization is still a very challenging problem, despite recent advances in deep learning (DL) approaches, especially for objects with varying shapes and appearances. Statistical models, such as an Active Shape Model (ASM), rely on a parametric model of the object, allowing an easy incorporation of prior knowledge about shape and appearance in a principled way. To take advantage of these benefits, this paper proposes a new ASM framework that addresses two tasks: (i) comparing the performance of several image features used to extract observations from an input image; and (ii) improving the performance of the model fitting by relying on a probabilistic framework that allows the use of multiple observations and is robust to the presence of outliers. The goal in (i) is to maximize the quality of the observations by exploring a wide set of handcrafted features (HOG, SIFT, and texture templates) and more recent DL-based features. Regarding (ii), we use the Generalized Expectation-Maximization algorithm to deal with outliers and to extend the fitting process to multiple observations. The proposed framework is evaluated in the context of facial landmark fitting and the segmentation of the endocardium of the left ventricle in cardiac magnetic resonance volumes. We experimentally observe that the proposed approach is robust not only to outliers, but also to adverse initialization conditions and to large search regions (from where the observations are extracted from the image). Furthermore, the results of the proposed combination of the ASM with DL-based features are competitive with more recent DL approaches (e.g. FCN [1], U-Net [2] and CNN Cascade [3]), showing that it is possible to combine the benefits of statistical models and DL into a new deep ASM probabilistic framework.

    更新日期:2019-11-01
  • Attended End-to-end Architecture for Age Estimation from Facial Expression Videos.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : null
    Wenjie Pei,Hamdi Dibeklioglu,Tadas Baltrusaitis,David M J Tax

    The main challenges of age estimation from facial expression videos lie not only in the modeling of the static facial appearance, but also in the capturing of the temporal facial dynamics. Traditional techniques to this problem focus on constructing handcrafted features to explore the discriminative information contained in facial appearance and dynamics separately. This relies on sophisticated feature-refinement and framework-design. In this paper, we present an end-toend architecture for age estimation, called Spatially-Indexed Attention Model (SIAM), which is able to simultaneously learn both the appearance and dynamics of age from raw videos of facial expressions. Specifically, we employ convolutional neural networks to extract effective latent appearance representations and feed them into recurrent networks to model the temporal dynamics. More importantly, we propose to leverage attention models for salience detection in both the spatial domain for each single image and the temporal domain for the whole video as well. We design a specific spatially-indexed attention mechanism among the convolutional layers to extract the salient facial regions in each individual image, and a temporal attention layer to assign attention weights to each frame. This two-pronged approach not only improves the performance by allowing the model to focus on informative frames and facial areas, but it also offers an interpretable correspondence between the spatial facial regions as well as temporal frames, and the task of age estimation. We demonstrate the strong performance of our model in experiments on a large, gender-balanced database with 400 subjects with ages spanning from 8 to 76 years. Experiments reveal that our model exhibits significant superiority over the state-of-the-art methods given sufficient training data.

    更新日期:2019-11-01
  • BMAN: Bidirectional Multi-scale Aggregation Networks for Abnormal Event Detection.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : null
    Sangmin Lee,Hak Gu Kim,Yong Man Ro

    Abnormal event detection is an important task in video surveillance systems. In this paper, we propose a novel bidirectional multi-scale aggregation networks (BMAN) for abnormal event detection. The proposed BMAN learns spatiotemporal patterns of normal events to detect deviations from the learned normal patterns as abnormalities. The BMAN consists of two main parts: an inter-frame predictor and an appearancemotion joint detector. The inter-frame predictor is devised to encode normal patterns, which generates an inter-frame using bidirectional multi-scale aggregation based on attention. With the feature aggregation, robustness for object scale variations and complex motions is achieved in normal pattern encoding. Based on the encoded normal patterns, abnormal events are detected by the appearance-motion joint detector in which both appearance and motion characteristics of scenes are considered. Comprehensive experiments are performed, and the results show that the proposed method outperforms the existing state-of-the-art methods. The resulting abnormal event detection is interpretable on the visual basis of where the detected events occur. Further, we validate the effectiveness of the proposed network designs by conducting ablation study and feature visualization.

    更新日期:2019-11-01
  • Fast Single Image Dehazing Using Saturation Based Transmission Map Estimation.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : null
    Se Eun Kim,Tae Hee Park,Il Kyu Eom

    Single image dehazing has been a challenging problem because of its ill-posed nature. For this reason, numerous efforts have been made in the field of haze removal. This paper proposes a simple, fast, and powerful algorithm for haze removal. The medium transmission is derived as a function of the saturation of the scene radiance only, and the saturation of scene radiance is estimated using a simple stretching method. A different medium transmission can be estimated for each pixel because this method does not assume that transmission is constant in a small patch. Furthermore, this paper presents a color veil removing algorithm, which is useful for an image with fine or yellow dust, using the white balance technique. The proposed algorithm requires no training, prior, and refinement process. The simulation results show that the proposed dehazing scheme outperforms state-of-theart dehazing approaches in terms of both computational complexity and dehazing efficiency.

    更新日期:2019-11-01
  • Two-Dimensional Quaternion Sparse Discriminant Analysis.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : null
    Xiaolin Xiao,Yongyong Chen,Yue-Jiao Gong,Yicong Zhou

    Linear discriminant analysis has been incorporated with various representations and measurements for dimension reduction and feature extraction. In this paper, we propose two-dimensional quaternion sparse discriminant analysis (2D-QSDA) that meets the requirements of representing RGB and RGB-D images. 2D-QSDA advances in three aspects: 1) including sparse regularization, 2D-QSDA relies only on the important variables, and thus shows good generalization ability to the out-of-sample data which are unseen during the training phase; 2) benefited from quaternion representation, 2D-QSDA well preserves the high order correlation among different image channels and provides a unified approach to extract features from RGB and RGB-D images; 3) the spatial structure of the input images is retained via the matrix-based processing. We tackle the constrained trace ratio problem of 2D-QSDA by solving a corresponding constrained trace difference problem, which is then transformed into a quaternion sparse regression (QSR) model. Afterward, we reformulate the QSR model to an equivalent complex form to avoid the processing of the complicated structure of quaternions. A nested iterative algorithm is designed to learn the solution of 2D-QSDA in the complex space and then we convert this solution back to the quaternion domain. To improve the separability of 2D-QSDA, we further propose 2D-QSDAw using the weighted pairwise between-class distances. Extensive experiments on RGB and RGB-D databases demonstrate the effectiveness of 2D-QSDA and 2D-QSDAw compared with peer competitors.

    更新日期:2019-11-01
  • PML-LocNet: Improving Object Localization with Prior-induced Multi-view Learning Network.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : null
    Xiaopeng Zhang,Yang Yang,Hongkai Xiong,Jiashi Feng

    This paper introduces a new model for Weakly Supervised Object Localization (WSOL) problems where only image-level supervision is provided. The key to solve such problems is to infer the object locations accurately. Previous methods usually model the missing object locations as latent variables, and alternate between updating their estimates and learning a detector accordingly. However, the performance of such alternative optimization is sensitive to the quality of the initial latent variables and the resulted localization model is prone to overfitting to improper localizations. To address these issues, we develop a Prior-induced Multi-view Learning Localization Network (PML-LocNet) which exploits both view diversity and sample diversity to improve object localization. In particular, the view diversity is imposed by a two-phase multi-view learning strategy, with which the complementarity among learned features from different views and the consensus among localized instances from each view are leveraged to benefit localization. The sample diversity is pursued by harnessing coarse-to-fine priors at both image and instance levels. With these priors, more emphasis would go to the reliable samples and the contributions of the unreliable ones would be decreased, such that the intrinsic characteristics of each sample can be exploited to make the model more robust during network learning. PML-LocNet can be easily combined with existing WSOL models to further improve the localization accuracy. Its effectiveness has been proved experimentally. Notably, it achieves 69.3% CorLoc and 50.4% mAP on PASCAL VOC 2007, surpassing the state-of-the-arts by a large margin.

    更新日期:2019-11-01
  • Robust Low-Rank Tensor Minimization via a New Tensor Spectral k-Support Norm.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2019-10-22
    Jian Lou,Yiu-Ming Cheung

    Recently, based on a new tensor algebraic framework for third-order tensors, the tensor singular value decomposition (t-SVD) and its associated tubal rank definition have shed new light on low-rank tensor modeling. Its applications to robust image/video recovery and background modeling show promising performance due to its superior capability in modeling cross-channel/frame information. Under the t-SVD framework, we propose a new tensor norm called tensor spectral k-support norm (TSP-k) by an alternative convex relaxation. As an interpolation between the existing tensor nuclear norm (TNN) and tensor Frobenius norm (TFN), it is able to simultaneously drive minor singular values to zero to induce low-rankness, and to capture more global information for better preserving intrinsic structure. We provide the proximal operator and the polar operator for the TSP-k norm as key optimization blocks, along with two showcase optimization algorithms for medium-and large-size tensors. Experiments on synthetic, image and video datasets in medium and large sizes, all verify the superiority of the TSP-k norm and the effectiveness of both optimization methods in comparison with the existing counterparts.

    更新日期:2019-11-01
  • Diverse expected gradient active learning for relative attributes.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2014-07-02
    Xinge You,Ruxin Wang,Dacheng Tao

    The use of relative attributes for semantic understanding of images and videos is a promising way to improve communication between humans and machines. However, it is extremely labor- and time-consuming to define multiple attributes for each instance in large amount of data. One option is to incorporate active learning, so that the informative samples can be actively discovered and then labeled. However, most existing active-learning methods select samples one at a time (serial mode), and may therefore lose efficiency when learning multiple attributes. In this paper, we propose a batch-mode active-learning method, called diverse expected gradient active learning. This method integrates an informativeness analysis and a diversity analysis to form a diverse batch of queries. Specifically, the informativeness analysis employs the expected pairwise gradient length as a measure of informativeness, while the diversity analysis forces a constraint on the proposed diverse gradient angle. Since simultaneous optimization of these two parts is intractable, we utilize a two-step procedure to obtain the diverse batch of queries. A heuristic method is also introduced to suppress imbalanced multiclass distributions. Empirical evaluations of three different databases demonstrate the effectiveness and efficiency of the proposed approach.

    更新日期:2019-11-01
  • An analysis and method for contrast enhancement turbulence mitigation.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2014-07-02
    Kristofor B Gibson,Truong Q Nguyen

    A common problem for imaging in the atmosphere is fog and atmospheric turbulence. Over the years, many researchers have provided insight into the physics of either the fog or turbulence but not both. Most recently, researchers have proposed methods to remove fog in images fast enough for real-time processing. Additionally, methods have been proposed by other researchers that address the atmospheric turbulence problem. In this paper, we provide an analysis that incorporates both physics models: 1) fog and 2) turbulence. We observe how contrast enhancements (fog removal) can affect image alignment and image averaging. We present in this paper, a new joint contrast enhancement and turbulence mitigation (CETM) method that utilizes estimations from the contrast enhancement algorithm to improve the turbulence removal algorithm. We provide a new turbulent mitigation object metric that measures temporal consistency. Finally, we design the CETM to be efficient such that it can operate in fractions of a second for near real-time applications.

    更新日期:2019-11-01
  • CSMMI: class-specific maximization of mutual information for action and gesture recognition.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2014-07-02
    Jun Wan,Vassilis Athitsos,Pat Jangyodsuk,Hugo Jair Escalante,Qiuqi Ruan,Isabelle Guyon

    In this paper, we propose a novel approach called class-specific maximization of mutual information (CSMMI) using a submodular method, which aims at learning a compact and discriminative dictionary for each class. Unlike traditional dictionary-based algorithms, which typically learn a shared dictionary for all of the classes, we unify the intraclass and interclass mutual information (MI) into an single objective function to optimize class-specific dictionary. The objective function has two aims: 1) maximizing the MI between dictionary items within a specific class (intrinsic structure) and 2) minimizing the MI between the dictionary items in a given class and those of the other classes (extrinsic structure). We significantly reduce the computational complexity of CSMMI by introducing an novel submodular method, which is one of the important contributions of this paper. This paper also contributes a state-of-the-art end-to-end system for action and gesture recognition incorporating CSMMI, with feature extraction, learning initial dictionary per each class by sparse coding, CSMMI via submodularity, and classification based on reconstruction errors. We performed extensive experiments on synthetic data and eight benchmark data sets. Our experimental results show that CSMMI outperforms shared dictionary methods and that our end-to-end system is competitive with other state-of-the-art approaches.

    更新日期:2019-11-01
  • A distributed Canny edge detector: algorithm and FPGA implementation.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2014-07-02
    Qian Xu,Srenivas Varadarajan,Chaitali Chakrabarti,Lina J Karam

    The Canny edge detector is one of the most widely used edge detection algorithms due to its superior performance. Unfortunately, not only is it computationally more intensive as compared with other edge detection algorithms, but it also has a higher latency because it is based on frame-level statistics. In this paper, we propose a mechanism to implement the Canny algorithm at the block level without any loss in edge detection performance compared with the original frame-level Canny algorithm. Directly applying the original Canny algorithm at the block-level leads to excessive edges in smooth regions and to loss of significant edges in high-detailed regions since the original Canny computes the high and low thresholds based on the frame-level statistics. To solve this problem, we present a distributed Canny edge detection algorithm that adaptively computes the edge detection thresholds based on the block type and the local distribution of the gradients in the image block. In addition, the new algorithm uses a nonuniform gradient magnitude histogram to compute block-based hysteresis thresholds. The resulting block-based algorithm has a significantly reduced latency and can be easily integrated with other block-based image codecs. It is capable of supporting fast edge detection of images and videos with high resolutions, including full-HD since the latency is now a function of the block size instead of the frame size. In addition, quantitative conformance evaluations and subjective tests show that the edge detection performance of the proposed algorithm is better than the original frame-based algorithm, especially when noise is present in the images. Finally, this algorithm is implemented using a 32 computing engine architecture and is synthesized on the Xilinx Virtex-5 FPGA. The synthesized architecture takes only 0.721 ms (including the SRAM READ/WRITE time and the computation time) to detect edges of 512 × 512 images in the USC SIPI database when clocked at 100 MHz and is faster than existing FPGA and GPU implementations.

    更新日期:2019-11-01
  • Reversible symmetric nonexpansive convolution: an effective image boundary processing for M-channel lifting-based linear-phase filter banks.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2014-06-12
    Taizo Suzuki,Masaaki Ikehara

    We present an effective image boundary processing for M-channel (M ∈ IN, M ≥ 2) lifting-based linear-phase filter banks that are applied to unified lossy and lossless image compression (coding), i.e., lossy-to-lossless image coding. The reversible symmetric extension we propose is achieved by manipulating building blocks on the image boundary and reawakening the symmetry of each building block that has been lost due to rounding error on each lifting step. In addition, complexity is reduced by extending nonexpansive convolution, called reversible symmetric nonexpansive convolution, because the number of input signals does not even temporarily increase. Our method not only achieves reversible boundary processing, but also is comparable with irreversible symmetric extension in lossy image coding and outperformed periodic extension in lossy-to-lossless image coding.

    更新日期:2019-11-01
  • A sequential framework for image change detection.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2014-05-13
    Andrew J Lingg,Edmund Zelnio,Fred Garber,Brian D Rigling

    We present a sequential framework for change detection. This framework allows us to use multiple images from reference and mission passes of a scene of interest in order to improve detection performance. It includes a change statistic that is easily updated when additional data becomes available. Detection performance using this statistic is predictable when the reference and image data are drawn from known distributions. We verify our performance prediction by simulation. Additionally, we show that detection performance improves with additional measurements on a set of synthetic aperture radar images and a set of visible images with unknown probability distributions.

    更新日期:2019-11-01
  • A probabilistic graph-based framework for plug-and-play multi-cue visual tracking.
    IEEE Trans. Image Process. (IF 6.790) Pub Date : 2014-05-13
    Shimrit Feldman-Haber,Yosi Keller

    In this paper, we propose a novel approach for integrating multiple tracking cues within a unified probabilistic graph-based Markov random fields (MRFs) representation. We show how to integrate temporal and spatial cues encoded by unary and pairwise probabilistic potentials. As the inference of such high-order MRF models is known to be NP-hard, we propose an efficient spectral relaxation-based inference scheme. The proposed scheme is exemplified by applying it to a mixture of five tracking cues, and is shown to be applicable to wider sets of cues. This paves the way for a modular plug-and-play tracking framework that can be easily adapted to diverse tracking scenarios. The proposed scheme is experimentally shown to compare favorably with contemporary state-of-the-art schemes, and provides accurate tracking results.

    更新日期:2019-11-01
Contents have been reproduced by permission of the publishers.
导出
全部期刊列表>>
2020新春特辑
限时免费阅读临床医学内容
ACS材料视界
科学报告最新纳米科学与技术研究
清华大学化学系段昊泓
自然科研论文编辑服务
中国科学院大学楚甲祥
上海纽约大学William Glover
中国科学院化学研究所
课题组网站
X-MOL
北京大学分子工程苏南研究院
华东师范大学分子机器及功能材料
中山大学化学工程与技术学院
试剂库存
天合科研
down
wechat
bug