样式: 排序: IF: - GO 导出 标记为已读
-
End-to-end dense video grounding via parallel regression Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-07 Fengyuan Shi, Weilin Huang, Limin Wang
Video grounding aims to localize the corresponding moment in an untrimmed video given a sentence description. Existing methods often address this task in an indirect “one-to-many” way, i.e., predicting more than one proposal for one sentence description, by casting it as a propose-and-match or fusion-and-detection problem. Solving these surrogate problems often requires sophisticated label assignment
-
FAM: Improving columnar vision transformer with feature attention mechanism Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-03-06 Lan Huang, Xingyu Bai, Jia Zeng, Mengqiang Yu, Wei Pang, Kangping Wang
Vision Transformer has garnered outstanding performance in visual tasks due to its capability for global modeling of image information. However, during the self-attention computation of image tokens, a common issue of attention map homogenization arises, impacting the final performance of the model as attention maps propagate through feature maps layer by layer. In this research, we propose a token-based
-
Background no more: Action recognition across domains by causal interventions Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-28 Sarah Rastegar, Hazel Doughty, Cees G.M. Snoek
We aim to recognize actions under an appearance distribution shift between a source training domain and a target test domain. To enable such video domain generalization, our key idea is to intervene on the action to remove the confounding effect of the domain background on the class label using causal inference. Towards this, we propose to learn a causally debiased model on a source domain that intervenes
-
Simple contrastive learning in a self-supervised manner for robust visual question answering Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-27 Shuwen Yang, Luwei Xiao, Xingjiao Wu, Junjie Xu, Linlin Wang, Liang He
Recent observations have revealed that Visual Question Answering models are susceptible to learning the spurious correlations formed by dataset biases, i.e., the language priors, instead of the intended solution. For instance, given a question and a relative image, some VQA systems are prone to provide the frequently occurring answer in the dataset while disregarding the image content. Such a preferred
-
Learning key lines for multi-object tracking Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-20 Yi-Fan Li, Hong-Bing Ji, Xi Chen, Yong-Liang Yang, Yu-Kun Lai
Most online multi-object tracking methods utilize bounding boxes and center points inherited from detectors as the base models to represent targets. Limited performance is obtained with these base models alone for tracking. Complex networks are generally applied on top to extract high-level discriminative features such as appearance embeddings and motion predictions for data association. However, the
-
SpATr: MoCap 3D human action recognition based on spiral auto-encoder and transformer network Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-17 Hamza Bouzid, Lahoucine Ballihi
Recent technological advancements have significantly expanded the potential of human action recognition through harnessing the power of 3D data. This data provides a richer understanding of actions, including depth information that enables more accurate analysis of spatial and temporal characteristics. In this context, We study the challenge of 3D human action recognition. Unlike prior methods, that
-
Combinational sign language recognition Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-17 Liqing Gao, Wei Feng, Fan Lyu, Liang Wan
Traditional Sign Language Recognition (SLR) suffers from the scale limitation of SL datasets, which may lead to over-fitting in narrow context and application. In this paper, to solve the problem, we for the first time propose a Combinational Sign Language Recognition (CombSLR) framework, which can serve as an augmentation to extend existing datasets by combining continuous videos (called Template)
-
-
MAEDAY: MAE for few- and zero-shot AnomalY-Detection Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-16 Eli Schwartz, Assaf Arbelle, Leonid Karlinsky, Sivan Harary, Florian Scheidegger, Sivan Doveh, Raja Giryes
We propose using Masked Auto-Encoder (MAE), a transformer model self-supervisedly trained on image inpainting, for anomaly detection (AD). Assuming anomalous regions are harder to reconstruct compared with normal regions. MAEDAY is the first image-reconstruction-based anomaly detection method that utilizes a pre-trained model, enabling its use for Few-Shot Anomaly Detection (FSAD). We also show the
-
Quantifying model uncertainty for semantic segmentation of Fluorine-19 MRI using stochastic gradient MCMC Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-16 Masoumeh Javanbakhat, Ludger Starke, Sonia Waiczies, Christoph Lippert
Fluorine-19 (F) MRI is an emerging theranostic tool for studying diseases and treatments simultaneously, particularly in challenging neuroinflammatory conditions. However, the low signal-to-noise ratio (SNR) of F MRI necessitates computational methods to reliably detect F signal regions and segment these from the background. In this study, we demonstrate that Bayesian fully convolutional neural networks
-
Human-Scene Network: A novel baseline with self-rectifying loss for weakly supervised video anomaly detection Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-15 Snehashis Majhi, Rui Dai, Quan Kong, Lorenzo Garattoni, Gianpiero Francesca, François Brémond
Video anomaly detection in surveillance systems with only video-level labels () is challenging. This is due to (i) the complex integration of a large variety of scenarios including human and scene-based anomalies characterized by subtle or sharp spatio-temporal cues in real-world videos and (ii) non-optimal optimization between normal and anomaly instances under weak supervision. In this paper, we
-
Exploring using jigsaw puzzles for out-of-distribution detection Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-15 Yeonguk Yu, Sungho Shin, Minhwan Ko, Kyoobin Lee
Out-of-distribution (OOD) detection involves binary classification whether the given data is from outside the training data or not. Previous studies proposed outlier exposure (OE) that trains the model on an outlier dataset designed to represent potential future OOD data, thereby enhancing OOD detection performance. However, obtaining an outlier dataset representing all possible future OOD data can
-
Domain generalized federated learning for Person Re-identification Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-15 Fangyi Liu, Mang Ye, Bo Du
-
Survey on fast dense video segmentation techniques Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-14 Quentin Monnier, Tania Pouli, Kidiyo Kpalma
Semantic segmentation aims at classifying image pixels according to given categories. Deep learning approaches have proven to be very effective for this task. However, extensions to video content are more challenging, typically requiring more complex architectures, given the temporal constraints and the additional data that video introduces. At the same time, video application tend to necessitate real-time
-
Rethink arbitrary style transfer with transformer and contrastive learning Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-10 Zhanjie Zhang, Jiakai Sun, Guangyuan Li, Lei Zhao, Quanwei Zhang, Zehua Lan, Haolin Yin, Wei Xing, Huaizhong Lin, Zhiwen Zuo
Arbitrary style transfer holds widespread attention in research and boasts numerous practical applications. The existing methods, which either employ cross-attention to incorporate deep style attributes into content attributes or use adaptive normalization to adjust content features, fail to generate high-quality stylized images. In this paper, we introduce an innovative technique to improve the quality
-
Transformer-based assignment decision network for multiple object tracking Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-09 Athena Psalta, Vasileios Tsironis, Konstantinos Karantzalos
Data association is a crucial component for any multiple object tracking (MOT) method that follows the tracking-by-detection paradigm. To generate complete trajectories such methods employ a data association process to establish assignments between detections and existing targets during each timestep. Recent data association approaches try to solve either a multi-dimensional linear assignment task
-
Re-scoring using image-language similarity for few-shot object detection Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-08 Min Jae Jung, Seung Dae Han, Joohee Kim
Few-shot object detection, which focuses on detecting novel objects with few labels, is an emerging challenge in the community. Recent studies show that adapting a pre-trained model or modified loss function can improve performance. In this paper, we explore leveraging the power of Contrastive Language-Image Pre-training (CLIP) and hard negative classification loss in low data setting. Specifically
-
Video Frame-wise Explanation Driven Contrastive Learning for Procedural Text Generation Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-08 Zhihao Wang, Lin Li, Zhongwei Xie, Chuanbo Liu
Procedural text generation from visual observation of instructional videos, such as assembling, biochemical experiments, and cooking, is an essential task for scene understanding and real-world applications. The major difference from general captioning tasks is two-fold: it has a flow of material combination in instructional steps, and the materials change their state through action-involved manipulations
-
Attention-based multimodal image matching Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-07 Aviad Moreshet, Yosi Keller
We propose a method for matching multimodal image patches using a multiscale Transformer-Encoder that focuses on the feature maps of a Siamese CNN. It effectively combines multiscale image embeddings while improving task-specific and appearance-invariant image cues. We also introduce a residual attention architecture that allows for end-to-end training by using a residual connection. To the best of
-
Self-supervised multi-scale semantic consistency regularization for unsupervised image-to-image translation Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-07 Heng Zhang, Yi-Jun Yang, Wei Zeng
Unsupervised image-to-image translation aims to learn a domain mapping function that preserves the semantics of an input image while adapting its style to target domains without paired data. However, if there is a large semantic mismatch between the source and target domains, current methods often suffer from semantics distortion. Based on dense self-supervised representation learning, a novel Multi-Scale
-
Simplifying open-set video domain adaptation with contrastive learning Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-07 Giacomo Zara, Victor Guilherme Turrisi da Costa, Subhankar Roy, Paolo Rota, Elisa Ricci
In an effort to reduce annotation costs in action recognition, unsupervised video domain adaptation methods have been proposed that aim to adapt a predictive model from a labelled dataset (i.e., source domain) to an unlabelled dataset (i.e., target domain). In this work we address a more realistic scenario, called open-set video domain adaptation (OUVDA), where the target dataset contains “unknown”
-
Revisiting coarse-to-fine strategy for low-light image enhancement with deep decomposition guided training Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-06 Hai Jiang, Yang Ren, Songchen Han
Previous coarse-to-fine strategies typically spend equal effort in feature extraction and feature reconstruction, and gradually improve the brightness of images from bottom to top, resulting in computational resources not being well consumed for restoration. In this paper, we propose a new deep framework for Robust and Fast Low-Light Image Enhancement, dubbed RFLLIE. Specifically, we first use a lightweight
-
GMC: A general framework of multi-stage context learning and utilization for visual detection tasks Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-05 Xuan Wang, Hao Tang, Zhigang Zhu
Various contextual information has been employed by many approaches for visual detection tasks. However, most of the existing approaches only focus on specific context for specific tasks. In this paper, GMC, a general framework is proposed for multistage context learning and utilization, with various deep network architectures for various visual detection tasks. The GMC framework encompasses three
-
Towards efficient image and video style transfer via distillation and learnable feature transformation Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-02 Jing Huo, Meihao Kong, Wenbin Li, Jing Wu, Yu-Kun Lai, Yang Gao
Despite the recent rapid development of neural style transfer, existing style transfer methods are still somewhat inefficient or have a large model size, which limits their application on computational resource limited devices. The major problem lies in that they usually adopt a pre-trained VGG-19 backbone which is relatively large or the feature transformation module is computationally heavy. To address
-
Enhancing video anomaly detection with learnable memory network: A new approach to memory-based auto-encoders Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-02 Zhiqiang Wang, Xiaojing Gu, Xingsheng Gu, Jingyu Hu
The aim of video anomaly detection is to detect anomalous events in a video sequence. In an unsupervised setting, enhancing detection accuracy hinges on the ability to learn normal features during the training phase and subsequently generate large errors when abnormal video frames are encountered during the testing phase. The transformer is an innovative neural network that utilizes a self-attention
-
Deep parametric Retinex decomposition model for low-light image enhancement Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-02 Xiaofang Li, Weiwei Wang, Xiangchu Feng, Min Li
Images captured under low light conditions often suffer from various degradations. The Retinex models are highly effective in enhancing low-light images. The analytical optimization models are interpretable but inflexible to various scenes. The data-driven learning models are flexible to various scenes but less interpretable. To reconcile the advantages of both, we propose a parametric Retinex model
-
MERLIN-Seg: Self-supervised despeckling for label-efficient semantic segmentation Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-02-02 Emanuele Dalsasso, Clément Rambour, Nicolas Trouvé, Nicolas Thome
-
Space time recurrent memory network Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-30 Hung Nguyen, Chanho Kim, Fuxin Li
-
Temporal adaptive feature pyramid network for action detection Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-24 Xuezhi Xiang, Hang Yin, Yulong Qiao, Abdulmotaleb El Saddik
-
CPRNC: Channels pruning via reverse neuron crowding for model compression Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-23 Pingfan Wu, Hengyi Huang, Han Sun, Dong Liang, Ningzhong Liu
Channel pruning is an efficient technique for model compression, removing redundant parts of a convolutional neural network with minor degradation in classification accuracy. Previous criteria of channel pruning ignore neurons’ intrinsic relationship and the high correlation with input samples. Inspired by the visual crowding phenomenon in neuroscience, this paper presents a novel channel pruning method
-
Unsupervised deep learning of foreground objects from low-rank and sparse dataset Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-18 Keita Takeda, Tomoya Sakai
-
On the coherency of quantitative evaluation of visual explanations Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-18 Benjamin Vandersmissen, José Oramas
Recent years have shown an increased development of methods for justifying the predictions of neural networks through visual explanations. These explanations usually take the form of heatmaps which assign a saliency (or relevance) value to each pixel of the input image that expresses how relevant the pixel is for the prediction of a label. Complementing this development, evaluation methods have been
-
Hierarchical compositional representations for few-shot action recognition Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-13 Changzhen Li, Jie Zhang, Shuzhe Wu, Xin Jin, Shiguang Shan
Recently action recognition has received more and more attention for its comprehensive and practical applications in intelligent surveillance and human–computer interaction. However, few-shot action recognition has not been well explored and remains challenging because of data scarcity. In this paper, we propose a novel hierarchical compositional representations (HCR) learning approach for few-shot
-
Semantic-aware Transformer for shadow detection Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-18 Kai Zhou, Jing-Long Fang, Wen Wu, Yan-Li Shao, Xing-Qi Wang, Dan Wei
Shadow detection is significant for scene understanding. Ambiguities in a shadow image, such as shadow-like non-shadow regions and shadow regions with non-shadow patterns, are still very challenging for prevalent CNN-based methods. This work attempts to alleviate this problem from a new perspective of shape semantics, and then proposes a Semantic-aware Transformer (SaT) in a multi-task learning manner
-
A novel slime mold algorithm for grayscale and color image contrast enhancement Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-16 Guoyuan Ma, Xiaofeng Yue, Juan Zhu, Zeyuan Liu, Zongheng Zhang, Yuan Zhou, Chang Li
Image enhancement is a key step in image pre-processing. To address the problem of low quality and visual effect of images under low illumination conditions, this paper proposes an image enhancement method with hyperbolic oscillation factor and quadratic interpolation of slime mold algorithm (SSMA) in non-complete beta function dynamically looking to adjust the grayscale curve. The new strategy mainly
-
GradPaint: Gradient-guided inpainting with diffusion models Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-17 Asya Grechka, Guillaume Couairon, Matthieu Cord
Denoising Diffusion Probabilistic Models (DDPMs) have recently achieved remarkable results in conditional and unconditional image generation. The pre-trained models can be adapted without further training to different downstream tasks, by guiding their iterative denoising process at inference time to satisfy additional constraints. For the specific task of image inpainting, the current guiding mechanism
-
Fourier analysis on robustness of graph convolutional neural networks for skeleton-based action recognition Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-11 Nariki Tanaka, Hiroshi Kera, Kazuhiko Kawamoto
Using Fourier analysis, we explore the robustness and vulnerability of graph convolutional neural networks (GCNs) for skeleton-based action recognition. We adopt a joint Fourier transform (JFT), a combination of the graph Fourier transform (GFT) and the discrete Fourier transform (DFT), to examine the robustness of adversarially-trained GCNs against adversarial attacks and common corruptions. Experimental
-
Enhancing image-based facial expression recognition through muscle activation-based facial feature extraction Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-11 Manuel A. Solis-Arrazola, Raul E. Sanchez-Yañez, Carlos H. Garcia-Capulin, Horacio Rostro-Gonzalez
This article introduces a non-intrusive method to estimate facial muscle activity from images, diverging from conventional electrode-based approaches. Our methodology capitalizes on an inclusive set of features encompassing a diverse range of facial muscles, often overlooked in research, thus significantly expanding the scope of analyzing muscle activity within facial expressions. Our method is based
-
-
PPformer: Using pixel-wise and patch-wise cross-attention for low-light image enhancement Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-15 Jiachen Dang, Yong Zhong, Xiaolin Qin
Recently, transformer-based methods have shown strong competition compared to CNN-based methods on the low-light image enhancement task, by employing the self-attention for feature extraction. Transformer-based methods perform well in modeling long-range pixel dependencies, which are essential for low-light image enhancement to achieve better lighting, natural colors, and higher contrast. However,
-
Lmser-pix2seq: Learning stable sketch representations for sketch healing Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-15 Tengjie Li, Sicong Zang, Shikui Tu, Lei Xu
Sketch healing aims to recreate a complete sketch from the corrupted one. Sketches are abstract and sparse, making it difficult for neural networks to learn high-quality representations of sketches that include colors, textures, and other details. This presents a significant challenge for sketch healing. The features extracted from the corrupted sketch may be inconsistent with the ones from the corresponding
-
TECD_Attention: Texture-enhanced and cross-domain attention modeling for visual place recognition Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-13 Zhenyu Li, Zhenbiao Dong
Visual place recognition (VPR) is a challenging task for visual computing in the field of robot navigation. However, most of the existing methods fail to learn the most salient features of place images by simple CNN feature or popular Transformer feature due to the inconsistency problem commonly existing in VPR datasets, which limits the robustness and interpretability of the model. In addition, existing
-
Local to global purification strategy to realize collaborative camouflaged object detection Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-13 Jinghui Tong, Yaqiu Bi, Cong Zhang, Hongbo Bi, Ye Yuan
The purpose of camouflaged object detection is to detect objects in images that are not easily perceived by human eyes. Aiming at the problems of low recognition performance and unsatisfied texture information extraction in the complex environment in the current camouflaged object detection algorithms, we propose to improve the accuracy by simultaneously detecting a group of images containing the same
-
LandmarkBreaker: A proactive method to obstruct DeepFakes via disrupting facial landmark extraction Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-13 Yuezun Li, Pu Sun, Honggang Qi, Siwei Lyu
The recent development of Deep Neural Networks (DNN) has significantly increased the realism of AI-synthesized faces, with the most notable examples being the DeepFakes. In particular, DeepFake can synthesize the face of the target subject from the face of another subject, while retaining the same face attributes. With the increased number of social media portals, DeepFake videos rapidly spread through
-
Emerging image generation with flexible control of perceived difficulty Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-10 Jingmeng Li, Hui Wei, Surun Yang, Lukang Fu
Emerging images (EI) are two-tone and contain a number of discrete speckles. If certain speckles are appropriately organized together, we will perceive a meaningful object, which reflects the closed-loop information processing of human visual cognition. EIs hold significant application value. They can be used in studies of perceptual organization in cognitive psychology. Additionally, they can also
-
DFNet-Trans: An end-to-end multibranching network for depth estimation for transparent objects Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-06 Xiangyin Meng, Jie Wen, Yang Li, Chenlong Wang, Jingzhen Zhang
Transparent objects play a vital role in modern industries and find widespread applications across various engineering scenarios. However, capturing accurate depth maps of transparent objects remains challenging due to their reflective and refractive properties, which pose difficulties for most commercial-grade optical sensors. In this paper, we propose a novel depth estimation method called DFNet-Trans
-
Transformer with large convolution kernel decoder network for salient object detection in optical remote sensing images Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-04 Pengwei Dong, Bo Wang, Runmin Cong, Hai-Han Sun, Chongyi Li
Despite salient object detection in optical remote sensing images (ORSI-SOD) has made great strides in recent years, it is still a very challenging topic due to various scales and shapes of objects, cluttered backgrounds, and diverse imaging orientations. Most previous deep learning-based methods fails to effectively capture local and global features, resulting in ambiguous localization and semantic
-
Sparse graph matching network for temporal language localization in videos Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2023-12-29 Guangli Wu, Tongjie Xu, Jing Zhang
Temporal language localization in videos aims to retrieve the moment that best matches the text description in the untrimmed video using the query text. Existing methods using graph convolutional networks have been effective in feature representation and cross-modal interaction, but the existing methods do not consider the sparsity constraint of the graph when constructing the graph structure, which
-
Efficient cross-information fusion decoder for semantic segmentation Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-03 Songyang Zhang, Ge Ren, Xiaoxi Zeng, Liang Zhang, Kailun Du, Gege Liu, Hong Lin
For fine-scale prediction tasks such as semantic segmentation, existing segmentation models cannot support detailed segmentation due to the difficulty of assigning deep feature semantics generated by the encoder to shallow features, thus making the segmentation of details ambiguous in semantic segmentation scenarios. In addition, high-precision models often require large quantities of computational
-
Joint learning of foreground, background and edge for salient object detection Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-03 Qin Wu, Pengcheng Zhu, Zhilei Chai, Guodong Guo
Although significant progress has been made in saliency detection, predicting saliency remains challenging when the scene is complex, especially when salient and non-salient regions are similar or salient objects have intricate contours. Previous advanced methods rarely explored learning in the background of images. In fact, background and foreground of an image contain complementary information. In
-
Indoor Synthetic Data Generation: A Systematic Review Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2024-01-02 Hannah Schieber, Kubilay Can Demir, Constantin Kleinbeck, Seung Hee Yang, Daniel Roth
Objective: Deep learning-based object recognition, 6D pose estimation, and semantic scene understanding require a large amount of training data to achieve generalization. Time-consuming annotation processes, privacy, and security aspects lead to a scarcity of real-world datasets. To overcome this lack of data, synthetic data generation has been proposed, including multiple facets in the area of domain
-
Towards adversarial robustness verification of no-reference image- and video-quality metrics Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2023-12-30 Ekaterina Shumitskaya, Anastasia Antsiferova, Dmitriy Vatolin
In this paper, we propose a new method of analysing the stability of modern deep image- and video-quality metrics to different adversarial attacks. The stability analysis of quality metrics is becoming important because nowadays the majority of metrics employ neural networks. Unlike traditional quality metrics based on nature scene statistics or other hand-crafter features, learning-based methods are
-
IGMG: Instance-guided multi-granularity for domain generalizable person re-identification Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2023-12-29 Amran Bhuiyan, Jimmy Xiangji Huang, Aijun An
-
Hybrid AI for panoptic segmentation: An informed deep learning approach with integration of prior spatial relationships knowledge Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2023-12-29 Fatima Ezzahra Benkirane, Nathan Crombez, Vincent Hilaire, Yassine Ruichek
-
Adaptive Locally-Aligned Transformer for low-light video enhancement Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2023-12-29 Yiwen Cao, Yukun Su, Jingliang Deng, Yu Zhang, Qingyao Wu
Low-light enhancement is a crucial task that aims to enhance the under-exposed input in computer vision. While state-of-the-art static single-image enhancement methods have made remarkable progress, yet, few attempts are explored the spatial-temporal sequence problem in low-light video enhancement. In this paper, we propose a simple yet highly effective method, termed as Adaptive Locally-Aligned Transformer
-
Twin-SegNet: Dynamically coupled complementary segmentation networks for generalized medical image segmentation Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2023-12-28 Shahed Ahmed, Md. Kamrul Hasan
Image segmentation is deemed an important task in biomedicine, often required for proper diagnosis and prognosis of many diseases. Deep learning (DL) based segmentation methods have received considerable attention in recent years due to the increasing availability of clinical datasets. Many novel ideas have been proposed over the years driving progress in the field of automatic segmentation research
-
Multimodel fore-/background alignment for seam-based parallax-tolerant image stitching Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2023-12-28 Zhihao Zhang, Jie He, Mouquan Shen, Jiantao Shi, Xianqiang Yang
Image stitching with large parallax is a challenging computer vision problem. Although existing seam-based approaches were proposed to achieve pleasing results, issues like object dislocation, disappearance, and duplication can still occur. In this paper, to alleviate these problems, we propose a novel seam-based parallax-tolerant image stitching method, which relies on accurately aligning background
-
Learning feature contexts by transformer and CNN hybrid deep network for weakly supervised person search Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2023-12-22 Ning Lv, Xuezhi Xiang, Xinyao Wang, Yulong Qiao, Abdulmotaleb El Saddik
-
MLGPnet: Multi-granularity neural network for 3D shape recognition using pyramid data Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2023-12-14 Zekun Li, Hock Soon Seah, Baolong Guo, Muli Yang
This paper presents a Multi-Granularity 3D shape recognition network comprising point-granularity, line-granularity, and Pyramid-granularity networks, as well as multi-granularity convolutional layers (MLGPnet). The network takes pyramid data with high-level features generated from mesh data as input. The point-granularity, line-granularity, and pyramid-granularity networks respectively generate features
-
Semantic manipulation through the lens of Geometric Algebra Comput. Vis. Image Underst. (IF 4.5) Pub Date : 2023-12-12 Raphael dos S. Evangelista, Andre Luiz da S. Pereira, Rogério Ferreira de Moraes, Leandro A.F. Fernandes