
样式: 排序: IF: - GO 导出 标记为已读
-
JNMR: Joint Non-Linear Motion Regression for Video Frame Interpolation IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-09-19 Meiqin Liu, Chenming Xu, Chao Yao, Chunyu Lin, Yao Zhao
Video frame interpolation (VFI) aims to generate predictive frames by motion-warping from bidirectional references. Most examples of VFI utilize spatiotemporal semantic information to realize motion estimation and interpolation. However, due to variable acceleration, irregular movement trajectories, and camera movement in real-world cases, they can not be sufficient to deal with non-linear middle frame
-
Deep Multi-Exposure Image Fusion for Dynamic Scenes IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-09-19 Xiao Tan, Huaian Chen, Rui Zhang, Qihan Wang, Yan Kan, Jinjin Zheng, Yi Jin, Enhong Chen
Recently, learning-based multi-exposure fusion (MEF) methods have made significant improvements. However, these methods mainly focus on static scenes and are prone to generate ghosting artifacts when tackling a more common scenario, i.e., the input images include motion, due to the lack of a benchmark dataset and solution for dynamic scenes. In this paper, we fill this gap by creating an MEF dataset
-
Plug-and-Play Priors for Multi-Shot Compressive Hyperspectral Imaging IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-09-19 Ting Xie, Licheng Liu, Lina Zhuang
Multi-shot coded aperture snapshot spectral imaging (CASSI) uses multiple measurement snapshots to encode the three-dimensional hyperspectral image (HSI). Increasing the number of snapshots will multiply the number of measurements, making CASSI system more appropriate for detailed spatial or spectrally rich scenes. However, the reconstruction algorithms still face the challenge of being ineffective
-
Salient Object Detection in Optical Remote Sensing Images Driven by Transformer IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-09-18 Gongyang Li, Zhen Bai, Zhi Liu, Xinpeng Zhang, Haibin Ling
Existing methods for Salient Object Detection in Optical Remote Sensing Images (ORSI-SOD) mainly adopt Convolutional Neural Networks (CNNs) as the backbone, such as VGG and ResNet. Since CNNs can only extract features within certain receptive fields, most ORSI-SOD methods generally follow the local-to-contextual paradigm. In this paper, we propose a novel Global Extraction Local Exploration Network
-
Broad Spectrum Image Deblurring via an Adaptive Super-Network IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-09-18 Qiucheng Wu, Yifan Jiang, Junru Wu, Victor Kulikov, Vidit Goel, Nikita Orlov, Humphrey Shi, Zhangyang Wang, Shiyu Chang
In blurry images, the degree of image blurs may vary drastically due to different factors, such as varying speeds of shaking cameras and moving objects, as well as defects of the camera lens. However, current end-to-end models failed to explicitly take into account such diversity of blurs. This unawareness compromises the specialization at each blur level, yielding sub-optimal deblurred images as well
-
Semi-Supervised Crowd Counting via Multiple Representation Learning IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-09-13 Xing Wei, Yunfeng Qiu, Zhiheng Ma, Xiaopeng Hong, Yihong Gong
There has been a growing interest in counting crowds through computer vision and machine learning techniques in recent years. Despite that significant progress has been made, most existing methods heavily rely on fully-supervised learning and require a lot of labeled data. To alleviate the reliance, we focus on the semi-supervised learning paradigm. Usually, crowd counting is converted to a density
-
Robust Cross-Domain Pseudo-Labeling and Contrastive Learning for Unsupervised Domain Adaptation NIR-VIS Face Recognition IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-09-12 Yiming Yang, Weipeng Hu, Haiqi Lin, Haifeng Hu
Near-infrared and visible face recognition (NIR-VIS) is attracting increasing attention because of the need to achieve face recognition in low-light conditions to enable 24-hour secure retrieval. However, annotating identity labels for a large number of heterogeneous face images is time-consuming and expensive, which limits the application of the NIR-VIS face recognition system to larger scale real-world
-
Hyperspectral Meets Optical Flow: Spectral Flow Extraction for Hyperspectral Image Classification IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-09-12 Bing Liu, Yifan Sun, Anzhu Yu, Zhixiang Xue, Xibing Zuo
Hyperspectral image (HSI) classification has always been recognised as a difficult task. It is therefore a research hotspot in remote sensing image processing and analysis, and a number of studies have been conducted to better extract spectral and spatial features. This study aimed to track the variation of the spectrum in hyperspectral images from a sequential data perspective to obtain more distinguishable
-
Cycle-Consistent Weakly Supervised Visual Grounding With Individual and Contextual Representations IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-09-11 Ruisong Zhang, Chuang Wang, Cheng-Lin Liu
Visual grounding, aiming to align image regions with textual queries, is a fundamental task for cross-modal learning. We study the weakly supervised visual grounding, where only image-text pairs at a coarse-grained level are available. Due to the lack of fine-grained correspondence information, existing approaches often encounter matching ambiguity. To overcome this challenge, we introduce the cycle
-
CONVIQT: Contrastive Video Quality Estimator IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-09-07 Pavan C. Madhusudana, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms. Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner. Distortion type identification and degradation level determination is employed as an auxiliary task to train a deep learning model containing a deep Convolutional Neural
-
Multiview Clustering by Consensus Spectral Rotation Fusion IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-09-07 Jie Chen, Hua Mao, Dezhong Peng, Changqing Zhang, Xi Peng
Multiview clustering (MVC) aims to partition data into different groups by taking full advantage of the complementary information from multiple views. Most existing MVC methods fuse information of multiple views at the raw data level. They may suffer from performance degradation due to the redundant information contained in the raw data. Graph learning-based methods often heavily depend on one specific
-
Improving Embedding Generalization in Few-Shot Learning With Instance Neighbor Constraints IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-09-05 Zhenyu Zhou, Lei Luo, Qing Liao, Xinwang Liu, En Zhu
Recently, metric-based meta-learning methods have been effectively applied to few-shot image classification. These methods classify images based on the relationship between samples in an embedding space, avoiding over-fitting that can occur when training classifiers with limited samples. However, finding an embedding space with good generalization properties remains a challenge. Our work highlights
-
Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-09-05 Sheng Zhou, Dan Guo, Jia Li, Xun Yang, Meng Wang
Text-based visual question answering (TextVQA) faces the significant challenge of avoiding redundant relational inference. To be specific, a large number of detected objects and optical character recognition (OCR) tokens result in rich visual relationships. Existing works take all visual relationships into account for answer prediction. However, there are three observations: (1) a single subject in
-
Dual Level Adaptive Weighting for Cloth-Changing Person Re-Identification IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-09-05 Fangyi Liu, Mang Ye, Bo Du
For the long-term person re-identification (ReID) task, pedestrians are likely to change clothes, which poses a key challenge in overcoming drastic appearance variations caused by these cloth changes. However, analyzing how cloth changes influence identity-invariant representation learning is difficult. In this context, varying cloth-changed samples are not adaptively utilized, and their effects on
-
Experts Collaboration Learning for Continual Multi-Modal Reasoning IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-09-05 Li Xu, Jun Liu
Multi-modal reasoning, which aims to capture logical and causal structures in visual content and associate them with cues from other modality inputs (e.g., texts) to perform various types of reasoning, is an important research topic in artificial intelligence (AI). Existing works for multi-modal reasoning mainly exploit offline learning, where the training samples of all types of reasoning tasks are
-
Translation, Association and Augmentation: Learning Cross-Modality Re-Identification From Single-Modality Annotation IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-09-05 Bin Yang, Jun Chen, Xianzheng Ma, Mang Ye
Daytime visible modality (RGB) and night-time infrared (IR) modality person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem. However, training a cross-modality ReID model requires plenty of cross-modality (visible-infrared) identity labels that are more expensive than single-modality person ReID. To alleviate this issue, this paper studies unsupervised domain
-
Double Auto-Weighted Tensor Robust Principal Component Analysis IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-09-05 Yulong Wang, Kit Ian Kou, Hong Chen, Yuan Yan Tang, Luoqing Li
Tensor Robust Principal Component Analysis (TRPCA), which aims to recover the low-rank and sparse components from their sum, has drawn intensive interest in recent years. Most existing TRPCA methods adopt the tensor nuclear norm (TNN) and the tensor $\ell _{1}$ norm as the regularization terms for the low-rank and sparse components, respectively. However, TNN treats each singular value of the low-rank
-
HybrUR: A Hybrid Physical-Neural Solution for Unsupervised Underwater Image Restoration IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-09-01 Shuaizheng Yan, Xingyu Chen, Zhengxing Wu, Min Tan, Junzhi Yu
Robust vision restoration of underwater images remains a challenge. Owing to the lack of well-matched underwater and in-air images, unsupervised methods based on the cyclic generative adversarial framework have been widely investigated in recent years. However, when using an end-to-end unsupervised approach with only unpaired image data, mode collapse could occur, and the color correction of the restored
-
Efficient Human Vision Inspired Action Recognition Using Adaptive Spatiotemporal Sampling IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-31 Khoi-Nguyen C. Mac, Minh N. Do, Minh P. Vo
Adaptive sampling that exploits the spatiotemporal redundancy in videos is critical for always-on action recognition on wearable devices with limited computing and battery resources. The commonly used fixed sampling strategy is not context-aware and may under-sample the visual content, and thus adversely impacts both computation efficiency and accuracy. Inspired by the concepts of foveal vision and
-
Test-Time Adaptation for Optical Flow Estimation Using Motion Vectors IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-31 Seyed Mehdi Ayyoubzadeh, Wentao Liu, Irina Kezele, Yuanhao Yu, Xiaolin Wu, Yang Wang, Tang Jin
Due to the prohibitive cost as well as technical challenges in annotating ground-truth optical flow for large-scale realistic video datasets, the existing deep learning models for optical flow estimation mostly rely on synthetic data for training, which in turn may lead to significant performance degradation under test-data distribution shift in real-world environments. In this work, we propose the
-
B2C-AFM: Bi-Directional Co-Temporal and Cross-Spatial Attention Fusion Model for Human Action Recognition IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-30 Fangtai Guo, Tianlei Jin, Shiqiang Zhu, Xiangming Xi, Wen Wang, Qiwei Meng, Wei Song, Jiakai Zhu
Human Action Recognition plays a driving engine of many human-computer interaction applications. Most current researches focus on improving the model generalization by integrating multiple homogeneous modalities, including RGB images, human poses, and optical flows. Furthermore, contextual interactions and out-of-context sign languages have been validated to depend on scene category and human per se
-
Deep-Based Film Grain Removal and Synthesis IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-30 Zoubida Ameur, Wassim Hamidouche, Edouard François, Miloš Radosavljević, Daniel Menard, Claire-Hélène Demarty
In this paper, deep learning-based techniques for film grain removal and synthesis that can be applied in video coding are proposed. Film grain is inherent in analog film content because of the physical process of capturing images and video on film. It can also be present in digital content where it is purposely added to reflect the era of analog film and to evoke certain emotions in the viewer or
-
Zero-Shot Camouflaged Object Detection IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-29 Haoran Li, Chun-Mei Feng, Yong Xu, Tao Zhou, Lina Yao, Xiaojun Chang
The goal of Camouflaged object detection (COD) is to detect objects that are visually embedded in their surroundings. Existing COD methods only focus on detecting camouflaged objects from seen classes, while they suffer from performance degradation to detect unseen classes. However, in a real-world scenario, collecting sufficient data for seen classes is extremely difficult and labeling them requires
-
Collaborative Contrastive Refining for Weakly Supervised Person Search IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-29 Chengyou Jia, Minnan Luo, Caixia Yan, Linchao Zhu, Xiaojun Chang, Qinghua Zheng
Weakly supervised person search involves training a model with only bounding box annotations, without human-annotated identities. Clustering algorithms are commonly used to assign pseudo-labels to facilitate this task. However, inaccurate pseudo-labels and imbalanced identity distributions can result in severe label and sample noise. In this work, we propose a novel Collaborative Contrastive Refining
-
Brain Network Analysis of Schizophrenia Patients Based on Hypergraph Signal Processing IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-28 Xiaoying Song, Ke Wu, Li Chai
Since high-order relationships among multiple brain regions-of-interests (ROIs) are helpful to explore the pathogenesis of neurological diseases more deeply, hypergraph-based brain networks are more suitable for brain science research. Unlike the existing hypergraph based brain network (brain hypernetwork), where hyperedges containing the same number of ROIs are assumed to have equal weights (to some
-
MDF-Net: A Multi-Scale Dynamic Fusion Network for Breast Tumor Segmentation of Ultrasound Images IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-28 Wenbo Qi, H. C. Wu, S. C. Chan
Breast tumor segmentation of ultrasound images provides valuable information of tumors for early detection and diagnosis. Accurate segmentation is challenging due to low image contrast between areas of interest; speckle noises, and large inter-subject variations in tumor shape and size. This paper proposes a novel Multi-scale Dynamic Fusion Network (MDF-Net) for breast ultrasound tumor segmentation
-
4D LUT: Learnable Context-Aware 4D Lookup Table for Image Enhancement IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-22 Chengxu Liu, Huan Yang, Jianlong Fu, Xueming Qian
Image enhancement aims at improving the aesthetic visual quality of photos by retouching the color and tone, and is an essential technology for professional digital photography. Recent years deep learning-based image enhancement algorithms have achieved promising performance and attracted increasing popularity. However, typical efforts attempt to construct a uniform enhancer for all pixels’ color transformation
-
Rethinking Cross-Domain Pedestrian Detection: A Background-Focused Distribution Alignment Framework for Instance-Free One-Stage Detectors IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-25 Yancheng Cai, Bo Zhang, Baopu Li, Tao Chen, Hongliang Yan, Jingdong Zhang, Jiahao Xu
Cross-domain pedestrian detection aims to generalize pedestrian detectors from one label-rich domain to another label-scarce domain, which is crucial for various real-world applications. Most recent works focus on domain alignment to train domain-adaptive detectors either at the instance level or image level. From a practical point of view, one-stage detectors are faster. Therefore, we concentrate
-
Automated Learning for Deformable Medical Image Registration by Jointly Optimizing Network Architectures and Objective Functions IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-25 Xin Fan, Zi Li, Ziyang Li, Xiaolin Wang, Risheng Liu, Zhongxuan Luo, Hao Huang
Deformable image registration plays a critical role in various tasks of medical image analysis. A successful registration algorithm, either derived from conventional energy optimization or deep networks, requires tremendous efforts from computer experts to well design registration energy or to carefully tune network architectures with respect to medical data available for a given registration task/scenario
-
Randomized Spectrum Transformations for Adapting Object Detector in Unseen Domains IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-24 Lei Zhang, Lingyun Qin, Mingjun Xu, Weijie Chen, Shiliang Pu, Wensheng Zhang
We propose a Meta Learning on Randomized Transformations (MLRT) to learn domain invariant object detectors. Domain generalization is a problem about learning an invariant model from multiple source domains which can generalize well on unseen target domains. This problem is overlooked in object detection field, which is formally named as domain generalizable object detection (DGOD). Moreover, existing
-
Masked Embedding Modeling With Rapid Domain Adjustment for Few-Shot Image Classification IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-24 Reece Walsh, Islam Osman, Mohamed S. Shehata
In few-shot classification, performing well on a testing dataset is a challenging task due to the restricted amount of labelled data available and the unknown distribution. Many previously proposed techniques rely on prototypical representations of the support set in order to classify a query set. Although this approach works well with a large, in-domain support set, accuracy suffers when transitioning
-
DisAVR: Disentangled Adaptive Visual Reasoning Network for Diagram Question Answering IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-24 Yaxian Wang, Bifan Wei, Jun Liu, Lingling Zhang, Jiaxin Wang, Qianying Wang
Diagram Question Answering (DQA) aims to correctly answer questions about given diagrams, which demands an interplay of good diagram understanding and effective reasoning. However, the same appearance of objects in diagrams can express different semantics. This kind of visual semantic ambiguity problem makes it challenging to represent diagrams sufficiently for better understanding. Moreover, since
-
Learning Resolution-Adaptive Representations for Cross-Resolution Person Re-Identification IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-23 Lin Yuanbo Wu, Lingqiao Liu, Yang Wang, Zheng Zhang, Farid Boussaid, Mohammed Bennamoun, Xianghua Xie
Cross-resolution person re-identification (CRReID) is a challenging and practical problem that involves matching low-resolution (LR) query identity images against high-resolution (HR) gallery images. Query images often suffer from resolution degradation due to the different capturing conditions from real-world cameras. State-of-the-art solutions for CRReID either learn a resolution-invariant representation
-
BLPSeg: Balance the Label Preference in Scribble-Supervised Semantic Segmentation IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-21 Yude Wang, Jie Zhang, Meina Kan, Shiguang Shan, Xilin Chen
Scribble-supervised semantic segmentation is an appealing weakly supervised technique with low labeling cost. Existing approaches mainly consider diffusing the labeled region of scribble by low-level feature similarity to narrow the supervision gap between scribble labels and mask labels. In this study, we observe an annotation bias between scribble and object mask, i.e., label workers tend to scribble
-
Condition-Adaptive Graph Convolution Learning for Skeleton-Based Gait Recognition IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-21 Xiaohu Huang, Xinggang Wang, Zhidianqiu Jin, Bo Yang, Botao He, Bin Feng, Wenyu Liu
Graph convolutional networks have been widely applied in skeleton-based gait recognition. A key challenge in this task is to distinguish the individual walking styles of different subjects across various views. Existing state-of-the-art methods employ uniform convolutions to extract features from diverse sequences and ignore the effects of viewpoint changes. To overcome these limitations, we propose
-
Space-Time Super-Resolution for Light Field Videos IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-21 Zeyu Xiao, Zhen Cheng, Zhiwei Xiong
Light field (LF) cameras suffer from a fundamental trade-off between spatial and angular resolutions. Additionally, due to the significant amount of data that needs to be recorded, the Lytro ILLUM, a modern LF camera, can only capture three frames per second. In this paper, we consider space-time super-resolution (SR) for LF videos, aiming at generating high-resolution and high-frame-rate LF videos
-
Multi-Biometric Unified Network for Cloth-Changing Person Re-Identification IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-15 Guoqing Zhang, Jie Liu, Yuhao Chen, Yuhui Zheng, Hongwei Zhang
Person re-identification (re-ID) aims to match the same person across different cameras. However, most existing re-ID methods assume that people wear the same clothes in different views, which limit their performance in identifying target pedestrians who change clothes. Cloth-changing re-ID is a quite challenging problem as clothes occupying a large number of pixels in an image becomes invalid or even
-
FsaNet: Frequency Self-Attention for Semantic Segmentation IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-18 Fengyu Zhang, Ashkan Panahi, Guangjun Gao
Considering the spectral properties of images, we propose a new self-attention mechanism with highly reduced computational complexity, up to a linear rate. To better preserve edges while promoting similarity within objects, we propose individualized processes over different frequency bands. In particular, we study a case where the process is merely over low-frequency components. By ablation study,
-
TTVFI: Learning Trajectory-Aware Transformer for Video Frame Interpolation IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-11 Chengxu Liu, Huan Yang, Jianlong Fu, Xueming Qian
Video frame interpolation (VFI) aims to synthesize an intermediate frame between two consecutive frames. State-of-the-art approaches usually adopt a two-step solution, which includes 1) generating locally-warped pixels by calculating the optical flow based on pre-defined motion patterns (e.g., uniform motion, symmetric motion), 2) blending the warped pixels to form a full frame through deep neural
-
Sketch-Segformer: Transformer-Based Segmentation for Figurative and Creative Sketches IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-10 Yixiao Zheng, Jiyang Xie, Aneeshan Sain, Yi-Zhe Song, Zhanyu Ma
Sketch is a well-researched topic in the vision community by now. Sketch semantic segmentation in particular, serves as a fundamental step towards finer-level sketch interpretation. Recent works use various means of extracting discriminative features from sketches and have achieved considerable improvements on segmentation accuracy. Common approaches for this include attending to the sketch-image as
-
Multi-View Diffusion Process for Spectral Clustering and Image Retrieval IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-10 Qilin Li, Senjian An, Ling Li, Wanquan Liu, Yanda Shao
This paper presents a novel approach to multi-view graph learning that combines weight learning and graph learning in an alternating optimization framework. Multi-view graph learning refers to the problem of constructing a unified affinity graph using heterogeneous sources of data representation, which is a popular technique in many learning systems where no prior knowledge of data distribution is
-
Efficient Layer Compression Without Pruning IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-10 Jie Wu, Dingshun Zhu, Leyuan Fang, Yue Deng, Zhun Zhong
Network pruning is one of the chief means for improving the computational efficiency of Deep Neural Networks (DNNs). Pruning-based methods generally discard network kernels, channels, or layers, which however inevitably will disrupt original well-learned network correlation and thus lead to performance degeneration. In this work, we propose an Efficient Layer Compression (ELC) approach to efficiently
-
What is the Real Need for Scene Text Removal? Exploring the Background Integrity and Erasure Exhaustivity Properties IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-09 Yuxin Wang, Hongtao Xie, Zixiao Wang, Yadong Qu, Yongdong Zhang
As a crucial application in privacy protection, scene text removal (STR) has received amounts of attention in recent years. However, existing approaches coarsely erasing texts from images ignore two important properties: the background texture integrity (BI) and the text erasure exhaustivity (EE). These two properties directly determine the erasure performance, and how to maintain them in a single
-
Reducing Vision-Answer Biases for Multiple-Choice VQA IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-09 Xi Zhang, Feifei Zhang, Changsheng Xu
Multiple-choice visual question answering (VQA) is a challenging task due to the requirement of thorough multimodal understanding and complicated inter-modality relationship reasoning. To solve the challenge, previous approaches usually resort to different multimodal interaction modules. Despite their effectiveness, we find that existing methods may exploit a new discovered bias (vision-answer bias)
-
Secure Outsourced SIFT: Accurate and Efficient Privacy-Preserving Image SIFT Feature Extraction IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-09 Xiang Liu, Xueli Zhao, Zhihua Xia, Qian Feng, Peipeng Yu, Jian Weng
Cloud computing has become an important IT infrastructure in the big data era; more and more users are motivated to outsource the storage and computation tasks to the cloud server for convenient services. However, privacy has become the biggest concern, and tasks are expected to be processed in a privacy-preserving manner. This paper proposes a secure SIFT feature extraction scheme with better integrity
-
Entropic Descent Archetypal Analysis for Blind Hyperspectral Unmixing IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-08 Alexandre Zouaoui, Gedeon Muhawenayo, Behnood Rasti, Jocelyn Chanussot, Julien Mairal
In this paper, we introduce a new algorithm based on archetypal analysis for blind hyperspectral unmixing, assuming linear mixing of endmembers. Archetypal analysis is a natural formulation for this task. This method does not require the presence of pure pixels (i.e., pixels containing a single material) but instead represents endmembers as convex combinations of a few pixels present in the original
-
ICE: Implicit Coordinate Encoder for Multiple Image Neural Representation IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-07 Fernando Rivas-Manzaneque, Angela Ribeiro, Orlando Avila-García
In recent years, implicit neural representations (INR) have shown their great potential to solve many computer graphics and computer vision problems. With this technique, signals such as 2D images or 3D shapes can be fit by training multi-layer perceptrons (MLP) on continuous functions, providing many advantages over conventional discrete representations. Despite being considered a promising approach
-
Unsupervised Low-Light Video Enhancement With Spatial-Temporal Co-Attention Transformer IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-07 Xiaoqian Lv, Shengping Zhang, Chenyang Wang, Weigang Zhang, Hongxun Yao, Qingming Huang
Existing low-light video enhancement methods are dominated by Convolution Neural Networks (CNNs) that are trained in a supervised manner. Due to the difficulty of collecting paired dynamic low/normal-light videos in real-world scenes, they are usually trained on synthetic, static, and uniform motion videos, which undermines their generalization to real-world scenes. Additionally, these methods typically
-
Doppler and Pair-Wise Optical Flow Constrained 3D Motion Compensation for 3D Ultrasound Imaging IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-04 Yinran Chen, Zichen Zhuang, Jianwen Luo, Xiongbiao Luo
Volumetric (3D) ultrasound imaging using a 2D matrix array probe is increasingly developed for various clinical procedures. However, 3D ultrasound imaging suffers from motion artifacts due to tissue motions and a relatively low frame rate. Current Doppler-based motion compensation (MoCo) methods only allow 1D compensation in the in-range dimension. In this work, we propose a new 3D-MoCo framework that
-
Composed Image Retrieval via Cross Relation Network With Hierarchical Aggregation Transformer IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-02 Qu Yang, Mang Ye, Zhaohui Cai, Kehua Su, Bo Du
Composing Text and Image to Image Retrieval (CTI-IR) aims at finding the target image, which matches the query image visually along with the query text semantically. However, existing works ignore the fact that the reference text usually serves multiple functions, e.g., modification and auxiliary. To address this issue, we put forth a unified solution, namely Hierarchical Aggregation Transformer incorporated
-
Unmixing Guided Unsupervised Network for RGB Spectral Super-Resolution IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-01 Qiaoying Qu, Bin Pan, Xia Xu, Tao Li, Zhenwei Shi
Spectral super-resolution has attracted research attention recently, which aims to generate hyperspectral images from RGB images. However, most of the existing spectral super-resolution algorithms work in a supervised manner, requiring pairwise data for training, which is difficult to obtain. In this paper, we propose an Unmixing Guided Unsupervised Network (UnGUN), which does not require pairwise
-
Acquiring 360° Light Field by a Moving Dual-Fisheye Camera IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-01 I-Chan Lo, Homer H. Chen
In this paper, we propose an efficient deep learning pipeline for light field acquisition using a back-to-back dual-fisheye camera. The proposed pipeline generates a light field from a sequence of 360° raw images captured by the dual-fisheye camera. It has three main components: a convolutional network (CNN) that enforces a spatiotemporal consistency constraint on the subviews of the 360° light field
-
VTAE: Variational Transformer Autoencoder With Manifolds Learning IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-01 Pourya Shamsolmoali, Masoumeh Zareapoor, Huiyu Zhou, Dacheng Tao, Xuelong Li
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables and these models use a non-linear function (generator) to map latent samples into the data space. On the other hand, the non-linearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor
-
Adversarial Dense Contrastive Learning for Semi-Supervised Semantic Segmentation IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-01 Ying Wang, Ziwei Xuan, Chiuman Ho, Guo-Jun Qi
Semi-supervised dense prediction tasks, such as semantic segmentation, can be greatly improved through the use of contrastive learning. However, this approach presents two key challenges: selecting informative negative samples from a highly redundant pool and implementing effective data augmentation. To address these challenges, we present an adversarial contrastive learning method specifically for
-
MuTrans: Multiple Transformers for Fusing Feature Pyramid on 2D and 3D Object Detection IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-01 Bangquan Xie, Liang Yang, Ailin Wei, Xiaoxiong Weng, Bing Li
One of the major components of the neural network, the feature pyramid plays a vital part in perception tasks, like object detection in autonomous driving. But it is a challenge to fuse multi-level and multi-sensor feature pyramids for object detection. This paper proposes a simple yet effective framework named MuTrans (Mu ltiple Trans formers) to fuse feature pyramid in single-stream 2D detector or
-
Spatially Varying Prior Learning for Blind Hyperspectral Image Fusion IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-01 Junwei Xu, Fangfang Wu, Xin Li, Weisheng Dong, Tao Huang, Guangming Shi
In recent years, researchers have become more interested in hyperspectral image fusion (HIF) as a potential alternative to expensive high-resolution hyperspectral imaging systems, which aims to recover a high-resolution hyperspectral image (HR-HSI) from two images obtained from low-resolution hyperspectral (LR-HSI) and high-spatial-resolution multispectral (HR-MSI). It is generally assumed that degeneration
-
Hierarchical Belief Propagation on Image Segmentation Pyramid IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-01 Tingman Yan, Xilian Yang, Genke Yang, Qunfei Zhao
The Markov random field (MRF) for stereo matching can be solved using belief propagation (BP). However, the solution space grows significantly with the introduction of high-resolution stereo images and 3D plane labels, making the traditional BP algorithms impractical in inference time and convergence. We present an accurate and efficient hierarchical BP framework using the representation of the image
-
SVCNet: Scribble-Based Video Colorization Network With Temporal Aggregation IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-08-01 Yuzhi Zhao, Lai-Man Po, Kangcheng Liu, Xuehui Wang, Wing-Yin Yu, Pengfei Xian, Yujia Zhang, Mengyang Liu
In this paper, we propose a scribble-based video colorization network with temporal aggregation called SVCNet. It can colorize monochrome videos based on different user-given color scribbles. It addresses three common issues in the scribble-based video colorization area: colorization vividness, temporal consistency, and color bleeding. To improve the colorization quality and strengthen the temporal
-
MagConv: Mask-Guided Convolution for Image Inpainting IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-07-28 Xuexin Yu, Long Xu, Jia Li, Xiangyang Ji
Standard convolution applied to image inpainting would lead to color discrepancy and blurriness for treating valid and invalid/hole regions without difference, which was partially amended by partial convolution (PConv). In PConv, a binary/hard mask was maintained as an indicator of valid and invalid pixels, where valid pixels and invalid pixels were treated differently. However, it can not describe
-
Decode-MOT: How Can We Hurdle Frames to Go Beyond Tracking-by-Detection? IEEE Trans. Image Process. (IF 10.6) Pub Date : 2023-07-28 Seong-Ho Lee, Dae-Hyeon Park, Seung-Hwan Bae
The speed of tracking-by-detection (TBD) greatly depends on the number of running a detector because the detection is the most expensive operation in TBD. In many practical cases, multi-object tracking (MOT) can be, however, achieved based tracking-by-motion (TBM) only. This is a possible solution without much loss of MOT accuracy when the variations of object cardinality and motions are not much within