-
Integration of ultrasound and mammogram for multimodal classification of breast cancer using hybrid residual neural network and machine learning Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-12 Kushangi Atrey, Bikesh Kumar Singh, Narendra Kuber Bodhey
Breast cancer (BC) is one of the topmost causes of mortality in women all over the world. Early detection and classification of the tumor allow proper treatment of patients and chances of survival. In this article, we propose a hybrid residual neural network (ResNet) and machine learning framework and integrate the features of both mammography (MG) and ultrasound (US) images to perform the multimodal
-
Authenticating and securing healthcare records: A deep learning-based zero watermarking approach Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-12 Ashima Anand, Jatin Bedi, Ashutosh Aggarwal, Muhammad Attique Khan, Imad Rida
Security in medical records is critical to patient privacy and confidentiality. Digital Patient Records (DPR) hold sensitive information that can reveal a patient's health status and history. Their unauthorized access or exposure can lead to severe consequences, including identity theft, discrimination, and medical malpractice. Therefore, ensuring proper security measures is critical in protecting
-
Camouflaged object detection via cross-level refinement and interaction network Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-11 Yanliang Ge, Junchao Ren, Qiao Zhang, Min He, Hongbo Bi, Cong Zhang
The purpose of camouflaged object detection (COD) focuses on detecting objects that seamlessly blend into their surroundings. Camouflaged objects pose a substantial challenge in the realm of computer vision due to various factors, including occlusion, limited illumination, and diminutive dimensions. In this paper, we propose a cross-level refinement and interaction network (CRI-Net) to capture camouflaged
-
RGB road scene material segmentation Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-11 Sudong Cai, Ryosuke Wakaki, Shohei Nobuhara, Ko Nishino
We introduce RGB road scene material segmentation, , per-pixel segmentation of materials in real-world driving views with pure RGB images, as a novel computer vision task by building a benchmark dataset and by deriving a new method. Our dataset, KITTI-Materials, is based on the well-established KITTI dataset and consists of 1000 frames covering 24 different road scenes of urban/suburban landscapes
-
MRFormer: Multiscale retractable transformer for medical image progressive denoising via noise level estimation Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-11 Can Bai, Xianjun Han
Clear medical images are important for auxiliary diagnoses, but the images generated by various medical devices inevitably contain considerable noise. Although various models have been proposed for denoising, these methods ignore the fact that different types of medical images have different noise levels, which leads to unsatisfactory test results. In addition, collecting many medical images for training
-
Multi-axis interactive multidimensional attention network for vehicle re-identification Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-06 Xiyu Pang, Yanli Zheng, Xiushan Nie, Yilong Yin, Xi Li
Learning fine-grained discriminative information is essential to address the challenges of small inter-class differences and large intra-class differences in vehicle re-identification (Re-ID). Attentional mechanism is often used to capture important global information in images rather than fine-grained discriminative information. Studies have shown that the multi-axis interaction of information can
-
Arbitrary 3D stylization of radiance fields Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-06 Sijia Zhang, Ting Liu, Zhuoyuan Li, Yi Sun
3D Stylization that creates stylized multi-view images is quite challenging, as it requires not only generating images which align with the desired style but also maintaining consistency across different perspectives. Most previous image style transfer methods focus on the 2D image domain and stylize each view independently, suffering from multi-view inconsistency. To tackle this challenging problem
-
An improved skin lesion detection solution using multi-step preprocessing features and NASNet transfer learning model Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-05 Abdulaziz Altamimi, Fadwa Alrowais, Hanen Karamti, Muhammad Umer, Lucia Cascone, Imran Ashraf
Computer-aided diagnosis has shown its potential for accurate detection of various diseases like skin lesion. Skin lesion has been recognized as a challenging task since manual identification through visual analysis of images can be inefficient, tedious, and error-prone. Although automatic diagnosis approaches are used to overcome this challenge, it is crucial to address problems such as variations
-
Robust visual tracking via modified Harris hawks optimization Image Vis. Comput. (IF 4.7) Pub Date : 2024-03-01 Yuqi Xiao, Yongjun Wu
Due to its outstanding efficiency and high precision, Harris hawks optimization (HHO for short) is suitable for solving the problem of visual target tracking under conditions of occlusion, deformation, rotation and in other complicated tracking scenes. A visual target tracker based on HHO is proposed in this study. To further promote the efficiency and stability of the standard HHO method and reduce
-
Image captioning: Semantic selection unit with stacked residual attention Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-28 Lifei Song, Fei Li, Ying Wang, Yu Liu, Yuanhua Wang, Shiming Xiang
Semantic information and attention mechanism play important roles in the task of image captioning. Semantic information can strengthen the relationship between images and languages, while attention operation can steer the relevant regions spatially in the image. However, in most current works, semantic attributes are always confined to be learned from pairs of images and sentences, which ignore to
-
Improved YOLOv7 models based on modulated deformable convolution and swin transformer for object detection in fisheye images Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-28 Jie Zhou, Degang Yang, Tingting Song, Yichen Ye, Xin Zhang, Yingze Song
Thanks to the wide view field, the fisheye camera can get much more visual information. Thus, it is widely used in the field of computer vision. However, projection is often required for fisheye images to be used for object detection. Meanwhile, the projection will lead to distortion in fisheye images, and the discontinuous image edges will make the objects incomplete. Fisheye images are characterized
-
C2F: An effective coarse-to-fine network for video summarization Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-28 Ye Jin, Xiaoyan Tian, Zhao Zhang, Peng Liu, Xianglong Tang
The objective of video summarization is to develop a concise and condensed summary that accurately captures the original video content. The methods currently used to summarize supervised videos and consider the task a sequence-to-sequence problem. However, modeling the order of long videos presents three challenges: (1) capturing both local and global relationships simultaneously is challenging; (2)
-
Multi-object tracking with adaptive measurement noise and information fusion Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-27 Xi Huang, Yinwei Zhan
Multi-object tracking (MOT) is a challenging task in computer vision that aims to estimate the trajectories of multiple objects in a video sequence. Observation-Centric SORT (OCSORT) is a pure motion-based MOT algorithm that uses the Kalman filter as the motion model and three observation-centric techniques: Re-Update, Momentum and Recovery, to enhance the data association. However, OCSORT is limited
-
Gaze analysis: A survey on its applications Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-26 Carmen Bisogni, Michele Nappi, Genoveffa Tortora, Alberto Del Bimbo
The examination of ocular movements has a wide range of applications due to the current developments in sensors that are now able to collect this biometric. This type of investigation is known as “gaze analysis”. The gaze has successfully examined a subject's physical and mental status in the past. As a result, over the last few decades, a large and diverse amount of literature on this subject has
-
Attention guided multi-level feature aggregation network for camouflaged object detection Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-24 Anzhi Wang, Chunhong Ren, Shuang Zhao, Shibiao Mu
Camouflaged object detection (COD) aims to identify objects that are visually blended into their highly similar surroundings, which is an extremely complex and challenging visual task in real-world scenarios, and has recently attracted increasing research interest in the field of computer vision due to its valuable applications. The existing deep learning based methods of COD have the following problems:
-
Audio-visual saliency prediction with multisensory perception and integration Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-23 Jiawei Xie, Zhi Liu, Gongyang Li, Yingjie Song
Audio-visual saliency prediction (AVSP) is a task that aims to model human attention patterns in the perception of auditory and visual scenes. Given the challenges associated with perceiving and combining multi-modal saliency features from videos, this paper presents a multi-sensory framework for AVSP. This framework is designed to extract audio, motion and image saliency features and integrate them
-
BPMB: BayesCNNs with perturbed multi-branch structure for robust facial expression recognition Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-23 Shuaishi Liu, Dongxu Zhao, Zhongbo Sun, Yuekun Chen
Wild Facial Expression Recognition (FER) task has been a long-standing challenge due to the various forms of uncertainty exist in expression data. When expression data is fed into a convolutional neural network (CNN), the model's estimated parameters also become uncertain. This uncertainty gives rise to concerns regarding the reliability of the recognition results. To quantify these uncertainties and
-
Non-probability sampling network based on anomaly pedestrian trajectory discrimination for pedestrian trajectory prediction Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-22 Quankai Liu, Haifeng Sang, Jinyu Wang, Wangxing Chen, Yulong Liu
Pedestrian trajectory prediction in first-person view is an important support for achieving fully automated driving in cities. However, existing pedestrian trajectory prediction methods still have significant shortcomings in terms of pedestrian trajectory diversity, dynamic scene constraints, and dependence on long-term trajectory prediction. We proposes a non-probability sampling network based on
-
Foreground and background separated image style transfer with a single text condition Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-21 Yue Yu, Jianming Wang, Nengli Li
Traditional image-based style transfer requires additional reference style images, making it less user-friendly. Text-based methods are more convenient but suffer from issues like slow generation, unclear content, and poor quality. In this work, we propose a new style transfer method SA2-CS (means Semantic-Aware and Salient Attention CLIPStyler), which is based on the Comparative Language Image Pretraining
-
Explicit knowledge transfer of graph-based correlation distillation and diversity data hallucination for few-shot object detection Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-21 Meng Wang, Yang Wang, Haipeng Liu
-
PTET: A progressive token exchanging transformer for infrared and visible image fusion Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-21 Jun Huang, Ziang Chen, Yong Ma, Fan Fan, Linfeng Tang, Xinyu Xiang
Integrating complementary information from different modalities is one of the key challenges in image fusion. Most of the existing deep learning-based methods still rely on a one-off fusion layer to integrate the features extracted from two modalities into one. Such an information interaction pattern only considers significant feature integration but neglects the removal of hazardous information that
-
CVAD-GAN: Constrained video anomaly detection via generative adversarial network Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-19 Rituraj Singh, Anikeit Sethi, Krishanu Saini, Sumeet Saurav, Aruna Tiwari, Sanjay Singh
Automatic detection of abnormal behavior in video sequences is a fundamental and challenging problem for intelligent video surveillance systems. However, the existing state-of-the-art Video Anomaly Detection (VAD) methods are computationally expensive and lack the desired robustness in real-world scenarios. The contemporary VAD methods cannot detect the fundamental features absent during training,
-
POSER: POsed vs spontaneous emotion recognition using fractal encoding Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-19 Carmen Bisogni, Lucia Cascone, Michele Nappi, Chiara Pero
Emotion recognition from facial expressions is a fundamental human ability that can be harnessed and transferred to machines. The ability to differentiate between spontaneous and posed emotions holds significant importance in various domains, including behavioral biometrics, forensics, and security. This paper introduces a novel method, called POsed vs Spontaneous Emotion Recognition (POSER), which
-
Multiple object detection and tracking from drone videos based on GM-YOLO and multi-tracker Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-19 Yubin Yuan, Yiquan Wu, Langyue Zhao, Huixian Chen, Yao Zhang
Multiple object tracking in drone videos is a vital vision task with broad application prospects, but most trackers use spatial or appearance clues alone to correlate detections. Our proposed Multi-Tracker uses a novel similarity measure that combines position and appearance information. We designed the GM-YOLO network to provide high-quality detections as input to Multi-Tracker. Add a Coordinate Attention
-
Multi-depth branch network for efficient image super-resolution Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-18 Huiyuan Tian, Li Zhang, Shijian Li, Min Yao, Gang Pan
A longstanding challenge in Super-Resolution (SR) is how to efficiently enhance high-frequency details in Low-Resolution (LR) images while maintaining semantic coherence. This is particularly crucial in practical applications where SR models are often deployed on low-power devices. To address this issue, we propose an innovative asymmetric SR architecture featuring Multi-Depth Branch Module (MDBM)
-
EMNet: Edge-guided multi-level network for salient object detection in low-light images Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-18 Lianghu Jing, Bo Wang
Salient object detection (SOD) has achieved remarkable performance in well-lit scenes. However, when generalized to low-light scenes, the performance of SOD shows significant decrease owing to more challenging conditions such as weak brightness, low contrast, and poor signal-to-noise ratio. To address this issue, we propose a novel edge-guided and multi-level network (EMNet) for SOD in low light images
-
ECT: Fine-grained edge detection with learned cause tokens Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-17 Shaocong Xu, Xiaoxue Chen, Yuhang Zheng, Guyue Zhou, Yurong Chen, Hongbin Zha, Hao Zhao
In this study, we tackle the challenging fine-grained edge detection task, which refers to predicting specific edges caused by reflectance, illumination, normal, and depth changes, respectively. Prior methods exploit multi-scale convolutional networks, which are limited in three aspects: (1) Convolutions are operators while identifying the cause of edge formation requires looking at far away pixels
-
Feature decoupling and interaction network for defending against adversarial examples Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-17 Wang Weidong, Li Zhi, Liu Shuaiwei, Zhang Li, Yang Jin, Wang Yi
Recently, it was found that deep neural networks (DNNs) are susceptible to adversarial input perturbations. Most defense strategies adopt the denoising method based on preprocessing, which mitigates the impacts of adversarial perturbations on DNNs by learning the distributions of nonadversarial datasets and projecting adversarial inputs into the learned nonadversarial manifolds. However, existing defense
-
Integrating prior knowledge into a bibranch pyramid network for medical image segmentation Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-17 Xianjun Han, Tiantian Li, Can Bai, Hongyu Yang
Medical image segmentation is crucial for obtaining accurate diagnoses, and while convolutional neural network (CNN)-based methods have made strides in recent years, they struggle with modeling long-range dependencies. Transformer-based methods improve this task but require more computational resources. The segment anything model (SAM) can generate pixel-level segmentation results for natural images
-
Gated contextual transformer network for multi-modal retinal image clinical description generation Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-15 Nagur Shareef Shaik, Teja Krishna Cherukuri
Generating semantically meaningful and coherent clinical description for the diagnosis of retinal images has been a challenging task for both Computer Vision and Natural Language Processing domains. This is mainly due to the fact that the clinical descriptions generated by the language model are completely dependent on the type of retinal image representations learned by the vision model. This work
-
Remote sensing scene classification using multi-domain sematic high-order network Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-15 Yuanyuan Lu, Yanhui Zhu, Hao Feng, Yang Liu
Recently, convolutional neural networks (CNNs), which obtain powerful deep features in an end-to-end manner, have achieved powerful performance in remote sensing scene classification. However, the average or maximum pooling operations defined in the spatial domain and coarser-resolution features with high levels cannot extract reliable features and clear boundaries for small-scale targets in remote
-
Multi-branch residual image semantic segmentation combined with inverse weight gated-control Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-15 Haicheng Qu, Xiaona Wang, Ying Wang, Yao Chen
The loss of pixel-level information in the multi-class segmentation task based on the U-net model results in unclear boundaries and low semantic segmentation accuracy. Aiming at this, a deep multi-branch residual Unet (IWG-MRUN) with fused inverse weight gated-control is proposed to improve the quality of image semantic segmentation. Specifically, we first introduce a deep multi-branch residual module
-
-
Computer vision and deep learning meet plankton: Milestones and future directions Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-13 Massimiliano Ciranni, Vittorio Murino, Francesca Odone, Vito Paolo Pastore
Planktonic organisms play a pivotal role within aquatic ecosystems, serving as the foundation of the aquatic food chain while also playing a critical role in climate regulation and the production of oxygen.
-
Deep hybrid manifold for image set classification Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-12 Xianhua Zeng, Jueqiu Guo, Yifan Wei, Yang Zhuo
The exponential growth of the data volume of image sets, which contain more information than a single image, has attracted increasing attention from researchers. Image set data are often described as covariance matrices or linear subspaces, and the unique geometries they span are symmetric positive definite (SPD) manifolds and Grassmann manifolds, respectively. Image set data are often described as
-
Person search over security video surveillance systems using deep learning methods: A review Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-12 S. Irene, A. John Prakash, V. Rhymend Uthariaraj
Person search has become one of the most critical and challenging applications in today's video surveillance systems. It helps in locating a person in surveillance videos, which is plausible only with advanced deep learning models, large scale datasets and high compute power GPUs. This survey features exhaustive analysis of deep learning based person search through image, textual and attributes based
-
Deep learning methods for single camera based clinical in-bed movement action recognition Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-07 Tamás Karácsony, László Attila Jeni, Fernando De la Torre, João Paulo Silva Cunha
Many clinical applications involve in-bed patient activity monitoring, from intensive care and neuro-critical infirmary, to semiology-based epileptic seizure diagnosis support or sleep monitoring at home, which require accurate recognition of in-bed movement actions from video streams.
-
Point-level feature learning based on vision transformer for occluded person re-identification Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-07 Hua Gao, Chenchen Hu, Guang Han, Jiafa Mao, Wei Huang, Qiu Guan
Person re-identification is challenging due to the presence of variations in pose and occlusion, which significantly impact the matching of visual features across different camera views and pose considerable difficulty for accurate person re-identification. This paper proposes a novel method for occluded person re-identification by introducing point-level feature learning based on vision transformers
-
A Point-2s reinforcement learning biomimetic model for estimating and analyzing human 3D motion posture Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-06 Yubo Wang
With the rapid progress of computer vision and artificial intelligence, the accuracy of estimating and analyzing human body movements and postures has always been a highly focused research field. However, current methods still have some shortcomings in accurately estimating the pose of 3D human movements. This study aims to propose an effective method to accurately estimate the motion posture of 3D
-
Recent advances in deterministic human motion prediction: A review Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-04 Tenghao Deng, Yan Sun
In recent years, the rapid advancement of deep learning and the advent of extensive human motion datasets have significantly enhanced the prominence of human motion prediction technology. This article presents an overview of prevalent model architectures within this field, critically examining their advantages and drawbacks. It methodically reviews recent research breakthroughs, offering in-depth analyses
-
Shadow detection using a cross-attentional dual-decoder network with self-supervised image reconstruction features Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-02 Ruben Fernandez-Beltran, Angélica Guzmán-Ponce, Rafael Fernandez, Jian Kang, Ginés García-Mateos
-
Feature attention fusion network for occluded person re-identification Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-01 Xuyao Zhuang, Dan Wei, Danyang Liang, Lei Jiang
Occluded Person re-identification (ReID) is a person retrieval task which aims to match occluded person images with the holistic image. In this paper, we propose a novel framework by using person key-points estimation and attention mechanism based on occluded person Re-identification, which is used to get discriminative features and robust alignment. We use a CNN backbone and a key-points estimation
-
R2-trans: Fine-grained visual categorization with redundancy reduction Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-01 Shuo Ye, Shujian Yu, Yu Wang, Xinge You
Fine-grained visual categorization (FGVC) aims to discriminate similar subcategories, whose main challenge is the large intraclass diversities and subtle inter-class differences. Existing FGVC methods usually select discriminant regions found by a trained model, which is prone to neglect other potential discriminant information. On the other hand, the massive interactions between the sequence of image
-
The impact of introducing textual semantics on item instance retrieval with highly similar appearance: An empirical study Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-01 Bo Li, Jiansheng Zhu, Linlin Dai, Hui Jing, Zhizheng Huang
Feature representation plays an important role in image instance retrieval (IIR). In practical applications, we find that items of different categories but highly similar in appearance are easy to become the objects of incorrect retrieval. We analyze that extracting features from the appearance dimension alone may cause objects with similar appearance to have smaller similar distances in feature space
-
Depth awakens: A depth-perceptual attention fusion network for RGB-D camouflaged object detection Image Vis. Comput. (IF 4.7) Pub Date : 2024-02-01 Xinran Liu, Lin Qi, Yuxuan Song, Qi Wen
Camouflaged object detection (COD) presents a persistent challenge in accurately identifying objects that seamlessly blend into their surroundings. However, most existing COD models overlook the fact that visual systems operate within a genuine 3D environment. The scene depth inherent in a single 2D image provides rich spatial clues that can assist in the detection of camouflaged objects. Therefore
-
Noisy label facial expression recognition via face-specific label distribution learning Image Vis. Comput. (IF 4.7) Pub Date : 2024-01-28 Hyunuk Shin, Bokyeung Lee, Bonhwa Ku, Hanseok Ko
-
Multi-view daily action recognition based on Hooke balanced matrix and broad learning system Image Vis. Comput. (IF 4.7) Pub Date : 2024-01-26 Zhigang Liu, Bingshuo Lu, Yin Wu, Chunlei Gao
Daily action recognition is a challenging task in computer vision, and so the multi-layer methods are proposed recently. However, the feature concatenation strategy in multi-view clustering can be regarded as equal-scale feature fusion and ignores the information difference between views. To deal with this problem, we firstly propose the multi-view feature fusion strategy, which constructs Hooke balanced
-
Drone-NeRF: Efficient NeRF based 3D scene reconstruction for large-scale drone survey Image Vis. Comput. (IF 4.7) Pub Date : 2024-01-26 Zhihao Jia, Bing Wang, Changhao Chen
Neural rendering has garnered substantial attention owing to its capacity for creating realistic 3D scenes. However, its applicability to extensive scenes remains challenging, with limitations in effectiveness. In this work, we propose the Drone-NeRF framework to enhance the efficient reconstruction of unbounded large-scale scenes suited for drone oblique photography using Neural Radiance Fields (NeRF)
-
Three dimensional tracking of rigid objects in motion using 2D optical flows Image Vis. Comput. (IF 4.7) Pub Date : 2024-01-20 Ramesh Marikhu, Matthew N. Dailey, Mongkol Ekpanyapong
-
A novel approach for breast cancer detection using optimized ensemble learning framework and XAI Image Vis. Comput. (IF 4.7) Pub Date : 2024-01-20 Raafat M. Munshi, Lucia Cascone, Nazik Alturki, Oumaima Saidani, Amal Alshardan, Muhammad Umer
Breast cancer (BC) is a common and highly lethal ailment. It stands as the second leading contributor to cancer-related deaths in women worldwide. The timely identification of this condition is of utmost importance in mitigating mortality rates. This research paper presents a novel framework for the precise identification of BC, utilising a combination of image and numerical data features with explainable
-
Enhancing lung abnormalities diagnosis using hybrid DCNN-ViT-GRU model with explainable AI: A deep learning approach Image Vis. Comput. (IF 4.7) Pub Date : 2024-01-22 Md Khairul Islam, Md Mahbubur Rahman, Md Shahin Ali, S.M. Mahim, Md Sipon Miah
In this study, we propose a novel approach called DCNN-ViT-GRU, which combines deep Convolutional Neural Networks (CNNs) with Gated Recurrent Units (GRUs) and the Vision Transformer (ViT) model for the accurate detection and classification of lung abnormalities. By leveraging the strengths of both CNNs and the ViT model, our architecture automatically extracts meaningful features from lung images,
-
LSTPNet: Long short-term perception network for dynamic facial expression recognition in the wild Image Vis. Comput. (IF 4.7) Pub Date : 2024-01-20 Chengcheng Lu, Yiben Jiang, Keren Fu, Qijun Zhao, Hongyu Yang
In-the-wild dynamic facial expression recognition (DFER) is a very challenging task, and previous methods based on convolutional neural networks (CNNs), recurrent neural networks (RNNs), or Transformers emphasize the extraction of either short-term temporal information or long-term temporal information from facial video sequences. Different from existing methods, this paper proposes a long short-term
-
Weakly supervised point cloud semantic segmentation with the fusion of heterogeneous network features Image Vis. Comput. (IF 4.7) Pub Date : 2024-01-22 Yingchun Niu, Jianqin Yin
Weakly supervised point cloud segmentation has emerged as a prominent research area to address the problem of manual annotation costs. A crucial challenge in weakly supervised point cloud segmentation is the implicit augmentation of the total amount of supervision signals. In this article, we propose a novel method that utilizes the fusion of features from different networks to enhance the supervision
-
Image-based human re-identification: Which covariates are actually (the most) important? Image Vis. Comput. (IF 4.7) Pub Date : 2024-01-20 Kailash Hambarde, Hugo Proença
Human re-identification (re-ID) is nowadays among the most popular topics in computer vision, due to the increasing importance given to safety/security in modern societies. Being expected to sun in totally uncontrolled data acquisition settings (e.g., visual surveillance) automated re-ID not only depends on various factors that may occur in non-controlled data acquisition settings, but - most importantly
-
EESSO: Exploiting Extreme and Smooth Signals via Omni-frequency learning for Text-based Person Retrieval Image Vis. Comput. (IF 4.7) Pub Date : 2024-01-20 Jingyi Xue, Zijie Wang, Guan-Nan Dong, Aichun Zhu
-
Variation-aware semantic image synthesis Image Vis. Comput. (IF 4.7) Pub Date : 2024-01-18 Mingle Xu, Jaehwan Lee, Sook Yoon, Hyongsuk Kim, Dong Sun Park
Semantic image synthesis (SIS) aims to produce photorealistic images aligning to given conditional semantic layout and has witnessed a significant improvement in recent years. Although the diversity in image-level has been discussed heavily, class-level mode collapse widely exists in current algorithms. Therefore, we declare a new requirement for SIS to achieve more photorealistic images, variation-aware
-
Speech driven video editing via an audio-conditioned diffusion model Image Vis. Comput. (IF 4.7) Pub Date : 2024-01-14 Dan Bigioi, Shubhajit Basak, Michał Stypułkowski, Maciej Zieba, Hugh Jordan, Rachel McDonnell, Peter Corcoran
Taking inspiration from recent developments in visual generative tasks using diffusion models, we propose a method for end-to-end speech-driven video editing using a denoising diffusion model. Given a video of a talking person, and a separate auditory speech recording, the lip and jaw motions are re-synchronised without relying on intermediate structural representations such as facial landmarks or
-
A deep learning-based illumination transform for devignetting photographs of dermatological lesions Image Vis. Comput. (IF 4.7) Pub Date : 2024-01-12 Vipin Venugopal, Malaya Kumar Nath, Justin Joseph, M. Vipin Das
-
-
One-shot lip-based biometric authentication: Extending behavioral features with authentication phrase information Image Vis. Comput. (IF 4.7) Pub Date : 2024-01-08 Brando Koch, Ratko Grbić
Lip-based biometric authentication (LBBA) is an authentication method based on a person's lip movements during speech in the form of video data. LBBA can utilize both physical and behavioral characteristics of lip movements without requiring any additional sensory equipment apart from an RGB camera. Current approaches employ deep siamese neural networks trained with one-shot learning to generate embedding