-
Cluster adaptation networks for unsupervised domain adaptation Image Vis. Comput. (IF 3.103) Pub Date : 2021-02-12 Qiang Zhou; Wen’an Zhou; Shirui Wang
Domain adaptation is an important technology for transferring source domain knowledge to new, unseen target domains. Recently, domain adaptation models are applied to learn domain invariant representations by minimizing distribution distance or adversarial training in the feature space. However, existing adversarial domain adaptation methods fail to preserve the data structure in the feature space
-
Certifiable relative pose estimation Image Vis. Comput. (IF 3.103) Pub Date : 2021-02-24 Mercedes Garcia-Salguero; Jesus Briales; Javier Gonzalez-Jimenez
-
A survey of iris datasets Image Vis. Comput. (IF 3.103) Pub Date : 2021-01-23 Lubos Omelina; Jozef Goga; Jarmila Pavlovicova; Milos Oravec; Bart Jansen
-
Multi-stream slowFast graph convolutional networks for skeleton-based action recognition Image Vis. Comput. (IF 3.103) Pub Date : 2021-02-23 Ning Sun; Ling Leng; Jixin Liu; Guang Han
Recently, many efforts have been made to model spatial–temporal features from human skeleton for action recognition by using graph convolutional networks (GCN). Skeleton sequence can precisely represent human pose with a small number of joints while there is still a lot of redundancies across the skeleton sequence in the term of temporal dependency. In order to improve the effectiveness of spatial–temporal
-
WRGPruner: A new model pruning solution for tiny salient object detection Image Vis. Comput. (IF 3.103) Pub Date : 2021-02-20 Fengwei Jia; Xuan Wang; Jian Guan; Huale Li; Chen Qiu; Shuhan Qi
The model pruning is one of the predominant model compression tasks to decrease the demands in computing power and memory footprint. However, most existing pruning methods have overly broad application areas, which defects in a sub-optimal solution specifically to solve certain specified difficult problems in the tasks of salient object detection. In this paper, we propose a novel solution, dubbed
-
Boundary graph convolutional network for temporal action detection Image Vis. Comput. (IF 3.103) Pub Date : 2021-02-20 Yaosen Chen; Bing Guo; Yan Shen; Wei Wang; Weichen Lu; Xinhua Suo
-
Knowledge distillation methods for efficient unsupervised adaptation across multiple domains Image Vis. Comput. (IF 3.103) Pub Date : 2021-01-06 Le Thanh Nguyen-Meidine; Atif Belal; Madhu Kiran; Jose Dolz; Louis-Antoine Blais-Morin; Eric Granger
-
2D progressive fusion module for action recognition Image Vis. Comput. (IF 3.103) Pub Date : 2021-02-18 Zhongwei Shen; Xiao-Jun Wu; Josef Kittler
Network convergence as well as recognition accuracy are essential issues when applying Convolutional Neural Networks (CNN) to human action recognition. Most deep learning methods neglect model convergence when striving to improve the abstraction capability, thus degrading the performances sharply when computing resources are limited. To mitigate this problem, we propose a structure named 2D Progressive
-
Generative adversarial networks and their application to 3D face generation: A survey Image Vis. Comput. (IF 3.103) Pub Date : 2021-02-18 Mukhiddin Toshpulatov; Wookey Lee; Suan Lee
Generative adversarial networks (GANs) have been extensively studied in recent years and have been used to address several problems in the fields of image generation and computer vision. Despite significant advancements in computer vision, applying GANs to real-world problems such as 3D face generation remains a challenge. Owing to the proliferation of fake images generated by GANs, it is important
-
Triangulate geometric constraint combined with visual-flow fusion network for accurate 6DoF pose estimation Image Vis. Comput. (IF 3.103) Pub Date : 2021-02-17 Zhihong Jiang; Xin Wang; Xiao Huang; Hui Li
Estimating the 6D object pose based on a monocular RGB image is a challenging task in computer vision, which produces false positives under the influence of occlusion or cluttered environments. In addition, the prediction of translation is affected by changes of the image size. In this work, we present a novel two-stage method TGCPose6D for robust 6DoF object pose estimation which is composed of 2D
-
A study on attention-based LSTM for abnormal behavior recognition with variable pooling Image Vis. Comput. (IF 3.103) Pub Date : 2021-02-04 Kai Zhou; Bei Hui; Junfeng Wang; Chunyu Wang; Tingting Wu
Behavior recognition is a well-known computer vision mobile technology. It has been used in many applications such as video surveillance, motion detection on devices, human-computer interaction and sports video, etc. However, most of the existing works ignored the depth and spatio-temporal information so that they resulted in over-fitting and inferior performance. Consequently, a novel framework for
-
Interactive multi-scale feature representation enhancement for small object detection Image Vis. Comput. (IF 3.103) Pub Date : 2021-02-06 Qiyuan Zheng; Ying Chen
In the field of detection, there is a wide gap between the performance of small objects and that of medium, large objects. Some studies show that this gap is due to the contradiction between the classification-based backbone and localization. Although the reduction in the feature map size is beneficial for the extraction of abstract features, it will cause the loss of detailed features in the localization
-
Beyond modality alignment: Learning part-level representation for visible-infrared person re-identification Image Vis. Comput. (IF 3.103) Pub Date : 2021-02-04 Peng Zhang; Qiang Wu; Xunxiang Yao; Jingsong Xu
Visible-Infrared person re-IDentification (VI-reID) aims to automatically retrieve the pedestrian of interest exposed to sensors in different modalities, such as visible camera v.s. infrared sensor. It struggles to learn both modality-invariant and discriminant representations. Unfortunately, existing VI-reID work mainly focuses on tackling the modality difference, which fine-grained level discriminant
-
Image captioning via proximal policy optimization Image Vis. Comput. (IF 3.103) Pub Date : 2021-02-08 Le Zhang; Yanshuo Zhang; Xin Zhao; Zexiao Zou
Image captioning is the task of generating captions of images in natural language. The training typically consists of two phases, first minimizing the XE (cross-entropy) loss, and then with RL (reinforcement learning) over CIDEr scores. Although there are many innovations in neural architectures, fewer works are proposed for the RL phase. Motivated by one recent state-of-the-art architecture X-Transformer
-
A Tibetan Thangka data set and relative tasks Image Vis. Comput. (IF 3.103) Pub Date : 2021-02-16 Yanchun Ma; Yongjian Liu; Qing Xie; Shengwu Xiong; Lihua Bai; Anshu Hu
Data set of high quality is the cornerstone of the current data-driven machine learning models, and plays an important role in promoting the development of various application areas. At present, image analysis and processing techniques have intensively involved into the tasks of inheriting and protecting culture resources. However, currently there are few effective image data sets about the traditional
-
Whether normalized or not? Towards more robust iris recognition using dynamic programming Image Vis. Comput. (IF 3.103) Pub Date : 2021-01-30 Yifeng Chen; Cheng Wu; Yiming Wang
-
Weighted boxes fusion: Ensembling boxes from different object detection models Image Vis. Comput. (IF 3.103) Pub Date : 2021-02-03 Roman Solovyev; Weimin Wang; Tatiana Gabruseva
Object detection is a crucial task in computer vision systems with a wide range of applications in autonomous driving, medical imaging, retail, security, face recognition, robotics, and others. Nowadays, neural networks-based models are used to localize and classify instances of objects of particular classes. When real-time inference is not required, ensembles of models help to achieve better results
-
Multi-source material image optimized selection based multi-option composition Image Vis. Comput. (IF 3.103) Pub Date : 2021-02-03 Hao Wu; Ding An; Xiaoyu Zhu; Zhiyi Zhang; Guodong Fan; Zhen Hua
Image composition aims to composite a material region into a target image. Using this technique, more images could be interactive, as there are millions of images created daily in modern life. However, it is difficult for the majority of traditional composition methods to retrieve relatively realistic semantically valid material images. Also, even if the minority of traditional composition methods
-
Novel features for art movement classification of portrait paintings Image Vis. Comput. (IF 3.103) Pub Date : 2021-02-09 Shao Liu; Jiaqi Yang; Sos S. Agaian; Changhe Yuan
The increasing availability of extensive digitized fine art collections opens up new research directions. In particular, correctly identifying the artistic style or art movement of paintings is crucial for large artistic database indexing, painter authentication, and mobile recognition of painters. Even though the implementation of CNN on artwork classification improved the performance dramatically
-
Improving eye movement biometrics in low frame rate eye-tracking devices using periocular and eye blinking features Image Vis. Comput. (IF 3.103) Pub Date : 2021-02-06 Sherif Nagib Abbas Seha; Dimitrios Hatzinakos; Ali Shahidi Zandi; Felix J.E. Comeau
-
Collaborative knowledge distillation for incomplete multi-view action prediction Image Vis. Comput. (IF 3.103) Pub Date : 2021-01-21 Deepak Kumar; Chetan Kumar; Ming Shao
Predicting future actions is a key in visual understanding, surveillance, and human behavior analysis. Current methods for video-based prediction are primarily using single-view data, while in the real world multiple cameras and produced videos are readily available, which may potentially benefit the action prediction tasks. However, it may bring up a new challenge: subjects in the videos are more
-
Multi-information-based convolutional neural network with attention mechanism for pedestrian trajectory prediction Image Vis. Comput. (IF 3.103) Pub Date : 2021-01-23 Ruiping Wang; Yong Cui; Xiao Song; Kai Chen; Hong Fang
Predicting pedestrian trajectory is useful in many applications, such as autonomous driving and unmanned vehicles. However, it is a challenging task because of the complexity of the interactions among pedestrians and the environment. Most existing works employ long short-term memory networks to learn pedestrian behaviors, but their prediction accuracy is not good, and their computing speed is relatively
-
A pooling-based feature pyramid network for salient object detection Image Vis. Comput. (IF 3.103) Pub Date : 2021-01-08 Caijuan Shi; Weiming Zhang; Changyu Duan; Houru Chen
How to effectively utilize and fuse deep features has become a critical point for salient object detection. Most existing methods usually adopt the convolutional features based on U-shape structures and fuse multi-scale convolutional features without fully considering the different characteristics between high-level features and low-level features. Furthermore, existing salient object detection methods
-
Optokinetic response for mobile device biometric liveness assessment Image Vis. Comput. (IF 3.103) Pub Date : 2021-01-10 Jesse Lowe; Reza Derakhshani
As a practical pursuit of quantified uniqueness, biometrics explores the parameters that make us who we are and provides the tools we need to secure the integrity of that identity. In our culture of constant connectivity, an increasing reliance on biometrically secured mobile devices is transforming them into a target for bad actors. While no system will ever prevent all forms of intrusion, even state
-
Motion saliency based multi-stream multiplier ResNets for action recognition Image Vis. Comput. (IF 3.103) Pub Date : 2021-01-11 Ming Zong; Ruili Wang; Xiubo Chen; Zhe Chen; Yuanhao Gong
In this paper, we propose a Motion Saliency based multi-stream Multiplier ResNets (MSM-ResNets) for action recognition. The proposed MSM-ResNets model consists of three interactive streams: the appearance stream, motion stream and motion saliency stream. Similar to conventional two-stream CNNs models, the appearance stream and motion stream are responsible for capturing the appearance information and
-
Tracking fiducial markers with discriminative correlation filters Image Vis. Comput. (IF 3.103) Pub Date : 2020-12-31 Francisco J. Romero-Ramirez; Rafael Muñoz-Salinas; Rafael Medina-Carnicer
In the last few years, squared fiducial markers have become a popular and efficient tool to solve monocular localization and tracking problems at a very low cost. Nevertheless, marker detection is affected by noise and blur: small camera movements may cause image blurriness that prevents marker detection. The contribution of this paper is two-fold. First, it proposes a novel approach for estimating
-
An unsupervised domain adaptation scheme for single-stage artwork recognition in cultural sites Image Vis. Comput. (IF 3.103) Pub Date : 2021-01-09 Giovanni Pasqualino; Antonino Furnari; Giovanni Signorello; Giovanni Maria Farinella
Recognizing artworks in a cultural site using images acquired from the user's point of view (First Person Vision) allows to build interesting applications for both the visitors and the site managers. However, current object detection algorithms working in fully supervised settings need to be trained with large quantities of labeled data, whose collection requires a lot of times and high costs in order
-
Clothing generation by multi-modal embedding: A compatibility matrix-regularized GAN model Image Vis. Comput. (IF 3.103) Pub Date : 2021-01-06 Linlin Liu; Haijun Zhang; Dongliang Zhou
Clothing compatibility learning has gained increasing research attention due to the fact that a properly coordinated outfit can represent personality and improve an individual's appearance greatly. In this paper, we propose a Compatibility Matrix-Regularized Generative Adversarial Network (CMRGAN) for compatible item generation. In particular, we utilize a multi-modal embedding to transform the image
-
ScPnP: A non-iterative scale compensation solution for PnP problems Image Vis. Comput. (IF 3.103) Pub Date : 2020-12-09 Chengzhe Meng; Weiwei Xu
This paper presents an accurate non-iterative method for the Perspective-n-Point problem(PnP). Our main idea is to mitigate scale bias by multiplying an independent inverse average depth variable onto the object space error. The introduced variable is of order 2 in the objective function and the optimality conditions constitute a polynomial system with three third-order and one first-order unknowns
-
Point cloud classification with deep normalized Reeb graph convolution Image Vis. Comput. (IF 3.103) Pub Date : 2020-12-13 Weiming Wang; Yang You; Wenhai Liu; Cewu Lu
Recently, plenty of deep learning methods have been proposed to handle point clouds. Almost all of them input the entire point cloud and ignore the information redundancy lying in point clouds. This paper addresses this problem by extracting the Reeb graph from point clouds, which is a much more informative and compact representation of point clouds, and then filter the graph with deep graph convolution
-
Projection-dependent input processing for 3D object recognition in human robot interaction systems Image Vis. Comput. (IF 3.103) Pub Date : 2020-12-11 P.S. Febin Sheron; K.P. Sridhar; S. Baskar; P. Mohamed Shakeel
Human-Robot Interaction (HRI) provides assisted services in different real-time applications. The robotic systems identify objects through digital visualization wherein a three-dimensional (3D) image is converged to a plane-based projection. The projection is analyzed using the co-ordinates and identification points for recognizing the object. In such a converging process, the misidentification of
-
Unsupervised face Frontalization for pose-invariant face recognition Image Vis. Comput. (IF 3.103) Pub Date : 2020-12-13 Yanfei Liu; Junhua Chen
Face frontalization aims to normalize profile faces to frontal ones for pose-invariant face recognition. Current works have achieved promising results in face frontalization by using deep learning techniques. However, training deep models of face frontalization usually needs paired training data which is undoubtedly costly and time-consuming to acquire. To address this issue, we propose a Pose Conditional
-
Pixel-wise ordinal classification for salient object grading Image Vis. Comput. (IF 3.103) Pub Date : 2020-12-08 Yanzhu Liu; Yanan Wang; Adams Wai Kin Kong
Driven by business intelligence applications for rating attraction of products in shops, a new problem — salient object grading is studied in this paper. In computer vision, plenty of salient object detection approaches have been proposed, while most existing studies detect objects in a binary manner: salient or not. This paper focuses on a new problem setting that requires detecting all salient objects
-
A Survey on Object Detection for the Internet of Multimedia Things (IoMT) using Deep Learning and Event-based Middleware: Approaches, Challenges, and Future Directions Image Vis. Comput. (IF 3.103) Pub Date : 2020-12-29 Asra Aslam; Edward Curry
An enormous amount of sensing devices (scalar or multimedia) collect and generate information (in the form of events) over the Internet of Things (IoT). Present research on IoT mainly focus on the processing of scalar sensor data events and barely considers the challenges posed by multimedia based events. In this paper, we systematically review the existing solutions available for the Internet of Multimedia
-
Cuepervision: self-supervised learning for continuous domain adaptation without catastrophic forgetting Image Vis. Comput. (IF 3.103) Pub Date : 2020-12-05 Mark Schutera; Frank M. Hafner; Jochen Abhau; Veit Hagenmeyer; Ralf Mikut; Markus Reischl
Perception systems, to a large extent, rely on neural networks. Commonly, the training of neural networks uses a finite amount of data. The usual assumption is that an appropriate training dataset is available, which covers all relevant domains. This abstract will follow the example of different lighting conditions in autonomous driving scenarios. In real-world datasets, a single source domain, such
-
A framework of human action recognition using length control features fusion and weighted entropy-variances based feature selection Image Vis. Comput. (IF 3.103) Pub Date : 2020-12-10 Farhat Afza; Muhammad Attique Khan; Muhammad Sharif; Seifedine Kadry; Gunasekaran Manogaran; Tanzila Saba; Imran Ashraf; Robertas Damaševičius
-
Attention-guided aggregation stereo matching network Image Vis. Comput. (IF 3.103) Pub Date : 2020-12-10 Yaru Zhang; Yaqian Li; Chao Wu; Bin Liu
Existing stereo matching networks based on deep learning lack multi-level and multi-module attention and integration for feature information. Therefore, we propose an attention-guided aggregation stereo matching network to encode and integrate information multiple times. Specifically, we design a residual network based on the 2D channel attention block to adaptively calibrate weight response, improving
-
ReMOT: A model-agnostic refinement for multiple object tracking Image Vis. Comput. (IF 3.103) Pub Date : 2020-12-13 Fan Yang; Xin Chang; Sakriani Sakti; Yang Wu; Satoshi Nakamura
Although refinement is commonly used in visual tasks to improve pre-obtained results, it has not been studied for Multiple Object Tracking (MOT) tasks. This could be attributed to two reasons: i) it has not been explored what kinds of errors should — and could — be reduced in MOT refinement; ii) the refinement target, namely, the tracklets, are intertwined and interactive in a 3D spatio-temporal space
-
A comprehensive review on deep learning-based methods for video anomaly detection Image Vis. Comput. (IF 3.103) Pub Date : 2020-11-30 Rashmiranjan Nayak; Umesh Chandra Pati; Santos Kumar Das
Video surveillance systems are popular and used in public places such as market places, shopping malls, hospitals, banks, streets, education institutions, city administrative offices, and smart cities to enhance the safety of public lives and assets. Most of the time, the timely and accurate detection of video anomalies is the main objective of security applications. The video anomalies such as anomalous
-
Video-based person re-identification by intra-frame and inter-frame graph neural network Image Vis. Comput. (IF 3.103) Pub Date : 2020-11-28 Guiqing Liu; Jinzhao Wu
In the past few years, video-based person re-identification (Re-ID) have attracted growing research attention. The crucial problem for this task is how to learn robust video feature representation, which can weaken the influence of factors such as occlusion, illumination, and background etc. A great deal of previous works utilize spatio-temporal information to represent pedestrian video, but the correlations
-
Efficient pedestrian detection in top-view fisheye images using compositions of perspective view patches Image Vis. Comput. (IF 3.103) Pub Date : 2020-11-11 Sheng-Ho Chiang; Tsaipei Wang; Yi-Fu Chen
Pedestrian detection in images is a topic that has been studied extensively, but existing detectors designed for perspective images do not perform as successfully on images taken with top-view fisheye cameras, mainly due to the orientation variation of people in such images. In our proposed approach, several perspective views are generated from a fisheye image and then concatenated to form a composite
-
Improved generative adversarial network and its application in image oil painting style transfer Image Vis. Comput. (IF 3.103) Pub Date : 2020-12-07 Yuan Liu
In view of the difficulty in training the algorithm of image oil painting style migration and reconstruction based on the generative adversarial network, and the loss gradient of generator and discriminator disappears, this paper proposes an improved generative adversarial network based on gradient penalty, and constructs the total variance loss function to carry out the research of image oil painting
-
Crowd density detection method based on crowd gathering mode and multi-column convolutional neural network Image Vis. Comput. (IF 3.103) Pub Date : 2020-12-05 Liu Bai; Cheng Wu; Feng Xie; Yiming Wang
-
Bias alleviating generative adversarial network for generalized zero-shot classification Image Vis. Comput. (IF 3.103) Pub Date : 2020-11-28 Xiao Li; Min Fang; Haikun Li
Generalized zero-shot classification is predicting the labels of the test images coming from seen or unseen classes. The task is difficult because of the bias problem, that is, unseen samples are easily to be misclassified to seen classes. Many methods have handled the problem by training a generative adversarial network (GAN) to generate fake samples. However, the GAN model trained with seen samples
-
Industrial visual perception technology in Smart City Image Vis. Comput. (IF 3.103) Pub Date : 2020-11-12 Zhihan Lv; Dongliang Chen
In order to study the application effect and function of industrial visual perception technology in smart city, the image processing and quality evaluation system was constructed by using convolutional neural network (CNN) and Internet of things (IoT) technology. The system was simulated, and then the quality performance of image and video obtained by using industrial visual perception technology was
-
I-SOCIAL-DB: A labeled database of images collected from websites and social media for Iris recognition Image Vis. Comput. (IF 3.103) Pub Date : 2020-11-03 Ruggero Donida Labati; Angelo Genovese; Vincenzo Piuri; Fabio Scotti; Sarvesh Vishwakarma
-
Vehicle re-identification based on unsupervised local area detection and view discrimination Image Vis. Comput. (IF 3.103) Pub Date : 2020-09-19 Yuefeng Wang; Huadong Li; Ying Wei; Chuyuan Wang; Lin Wang
Vehicle re-identification is an important part of intelligent transportation. Although much work has been done on this subject in recent years, vehicle re-identification is still a challenging task due to its obvious illumination change, high similarity between inter-class and great changes under different views. As discriminatory local areas and vehicle view information is the key to improving the
-
Dependable information processing method for reliable human-robot interactions in smart city applications Image Vis. Comput. (IF 3.103) Pub Date : 2020-10-07 Zafer Al-Makhadmeh; Amr Tolba
Human-robot interaction (HRI) is a multidisciplinary area that consists of several technologies that are used to create various smart city applications. The knowledge gain and analysis of the smart city environment improves response time. This paper introduces the dependable information processing (DIP) method for handling multi-attribute environmental information in a smart city application. Information
-
Deep learning-based object detection in low-altitude UAV datasets: A survey Image Vis. Comput. (IF 3.103) Pub Date : 2020-10-11 Payal Mittal; Raman Singh; Akashdeep Sharma
Deep learning-based object detection solutions emerged from computer vision has captivated full attention in recent years. The growing UAV market trends and interest in potential applications such as surveillance, visual navigation, object detection, and sensors-based obstacle avoidance planning have been holding good promises in the area of deep learning. Object detection algorithms implemented in
-
Cross-database and cross-attack Iris presentation attack detection using micro stripes analyses Image Vis. Comput. (IF 3.103) Pub Date : 2020-10-29 Meiling Fang; Naser Damer; Fadi Boutros; Florian Kirchbuchner; Arjan Kuijper
With the widespread use of mobile devices, iris recognition systems encounter more challenges, such as the vulnerability of Presentation Attack Detection (PAD). Recent works pointed out the contact lens attacks, especially images captured under the uncontrolled environment, as a hard task for iris PAD. In this paper, we propose a novel framework for detecting iris presentation attacks that especially
-
Multimodal image fusion based on point-wise mutual information Image Vis. Comput. (IF 3.103) Pub Date : 2020-10-19 Donghao Shen; Masoumeh Zareapoor; Jie Yang
Multimodal image fusion aims to generate a fused image from different signals that captured by multimodal sensors. Although the images obtained by multimodal sensors have different appearances, the information included in these images might be redundant and noisy. In the previous studies, the fusion rule and their properties that guiding how to merge the features from multiple images is relatively
-
CAM: A fine-grained vehicle model recognition method based on visual attention model Image Vis. Comput. (IF 3.103) Pub Date : 2020-10-02 Ye Yu; Longdao Xu; Wei Jia; Wenjia Zhu; Yunxiang Fu; Qiang Lu
Vehicle model recognition (VMR) is a typical fine-grained classification task in computer vision. To improve the representation power of classical CNN networks for this special task, we focus on enhancing the subtle difference of features and their spatial encoding based on the attention mechanism, and then propose a novel architectural unit, which we term the “convolutional attention model” (CAM)
-
A survey of micro-expression recognition Image Vis. Comput. (IF 3.103) Pub Date : 2020-10-17 Ling Zhou; Xiuyan Shao; Qirong Mao
The limited capacity to recognize micro-expressions with subtle and rapid motion changes is a long-standing problem that presents a unique challenge for expression recognition systems and even for humans. The problem regarding micro-expression is less covered by research when compared to macro-expression. Nevertheless, micro-expression recognition (MER) is imperative to exploit the full potential of
-
Infrared and visible image fusion via global variable consensus Image Vis. Comput. (IF 3.103) Pub Date : 2020-09-30 Donghao Shen; Masoumeh Zareapoor; Jie Yang
In this paper, we propose an infrared and visible image fusion framework based on the consensus problem. Most current infrared and visible image fusion models aim to transfer only one characteristic of each source domain to the final fusion result. This mechanism limits the performances of fusion algorithms under different conditions. We present a general fusion framework based to solve the global
-
Deep multimodal fusion for semantic image segmentation: A survey Image Vis. Comput. (IF 3.103) Pub Date : 2020-10-07 Yifei Zhang; Désiré Sidibé; Olivier Morel; Fabrice Mériaudeau
Recent advances in deep learning have shown excellent performance in various scene understanding tasks. However, in some complex environments or under challenging conditions, it is necessary to employ multiple modalities that provide complementary information on the same scene. A variety of studies have demonstrated that deep multimodal fusion for semantic image segmentation achieves significant performance
-
Facial expression recognition using human machine interaction and multi-modal visualization analysis for healthcare applications Image Vis. Comput. (IF 3.103) Pub Date : 2020-10-07 Torki Altameem; Ayman Altameem
The application of computer vision (CV) in healthcare applications is familiar with the wireless and communication technology. CV methods are incorporated in the healthcare for providing programmed interactions towards patient monitoring. The requirements of systems are the analysis and detection of the images' visualization of patients. In this paper, a multi-modal visualization analysis (MMVA) method
-
Optimization of face recognition algorithm based on deep learning multi feature fusion driven by big data Image Vis. Comput. (IF 3.103) Pub Date : 2020-09-18 Yinghui Zhu; Yuzhen Jiang
Today, with the rapid development of science and technology, the era of big data has been proposed and triggered reforms in all walks of life. Face recognition is a biometric recognition method with the characteristics of non-contact, non mandatory, friendly and harmonious, which has a good application prospect in the fields of national security and social security. With the deepening of the research
-
Synergetic reconstruction from 2D pose and 3D motion for wide-space multi-person video motion capture in the wild Image Vis. Comput. (IF 3.103) Pub Date : 2020-09-28 Takuya Ohashi; Yosuke Ikegami; Yoshihiko Nakamura
Although many studies have investigated markerless motion capture, the technology has not been applied to real sports or concerts. In this paper, we propose a markerless motion capture method with spatiotemporal accuracy and smoothness from multiple cameras in wide-space and multi-person environments. The proposed method predicts each person's 3D pose and determines the bounding box of multi-camera
-
Synthetic guided domain adaptive and edge aware network for crowd counting Image Vis. Comput. (IF 3.103) Pub Date : 2020-09-28 Zhijie Cao; Pourya Shamsolmoali; Jie Yang
Crowd counting is an important surveillance application and receives significant attention from the computer vision community. Most of the current methods treat crowd counting by density map estimation and use the Fully Convolution Network (FCN) for prediction. The mainstream framework is to predict density maps and use the sum up the density maps to get the number of people. In such methods, the main
-
R4 Det: Refined single-stage detector with feature recursion and refinement for rotating object detection in aerial images Image Vis. Comput. (IF 3.103) Pub Date : 2020-09-30 Peng Sun; Yongbin Zheng; Zongtan Zhou; Wanying Xu; Qiang Ren
The detection of objects with multi-orientations and multi-scales in aerial images is receiving increasing attention because of numerous useful applications in computer vision, image understanding, satellite remote sensing and surveillance. However, such detection can be exceedingly challenging because of a birds eye view, multi-scale rotating objects with large aspect ratios, dense distributions and
Contents have been reproduced by permission of the publishers.