当前期刊: Computer Vision and Image Understanding Go to current issue    加入关注   
显示样式:        排序: 导出
  • Learning a confidence measure in the disparity domain from O(1) features
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2020-01-18
    Matteo Poggi; Fabio Tosi; Stefano Mattoccia

    Depth sensing is of paramount importance for countless applications and stereo represents a popular, effective and cheap solution for this purpose. As highlighted by recent works concerned with stereo, uncertainty estimation can be a powerful cue to improve accuracy in stereo. Most confidence measures rely on features, mainly extracted from the cost volume, fed to a random forest or a convolutional neural network trained to estimate match uncertainty. In contrast, we propose a novel strategy for confidence estimation based on features computed in the disparity domain, making our proposal suited for any stereo system including COTS devices, and in constant time. We exhaustively assess the performance of our proposals, referred to as O1 and O2, on KITTI and Middlebury datasets with three popular and different stereo algorithms (CENSUS, MC-CNN and SGM), as well as a deep stereo network (PSM-Net). We also evaluate how well confidence measures generalize to different environments/datasets.

  • A progressive learning framework based on single-instance annotation for weakly supervised object detection
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2020-01-15
    Ming Zhang; Bing Zeng

    Fully-supervised object detection (FSOD) and weakly-supervised object detection (WSOD) are two extremes in the field of object detection. The former relies entirely on detailed bounding-box annotations while the later discards them completely. To balance these two extremes, we propose to make use of the so-called single-instance annotations, i.e., all images that contain only a single object are labelled with the corresponding bounding-boxes. By using such instance annotations of the simplest images, we propose a progressive learning framework that integrates image-level learning, single-instance learning, and multi-instance learning into an end-to-end network. Specifically, our framework is composed of three parallel streams that share a proposal feature extractor. The first stream is supervised by image-level annotations, which provides global information of all training data for the shared feature extractor. The second stream is supervised by single-instance annotations to bridge the features learning gap between the image level and instance level. To further learn from complex images, we propose an overlap-based instance mining algorithm to mine pseudo multi-instance annotations from the detection results of the second stream, and use them to supervise the third stream. Our method achieves a trade-off between the detection accuracy and annotation cost. Extensive experiments demonstrate the effectiveness of our proposed method on the PASCAL VOC and MS-COCO dataset, implying that a few single-instance annotations can improve the detection performance of WSOD significantly (more than 10%) and reduce the average annotation cost of FSOD greatly (more than 5 times).

  • Triplanar convolution with shared 2D kernels for 3D classification and shape retrieval
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2020-01-15
    Eu Young Kim; Seung Yeon Shin; Soochahn Lee; Kyong Joon Lee; Kyoung Ho Lee; Kyoung Mu Lee

    Increasing the depth of Convolutional Neural Networks (CNNs) has been recognized to provide better generalization performance. However, in the case of 3D CNNs, stacking layers increases the number of learnable parameters linearly, making it more prone to learn redundant features. In this paper, we propose a novel 3D CNN structure that learns shared 2D triplanar features viewed from the three orthogonal planes, which we term S3PNet. Due to the reduced dimension of the convolutions, the proposed S3PNet is able to learn 3D representations with substantially fewer learnable parameters. Experimental evaluations show that the combination of 2D representations on the different orthogonal views learned through the S3PNet is sufficient and effective for 3D representation, with the results outperforming current methods based on fully 3D CNNs. We support this with extensive evaluations on widely used 3D data sources in computer vision: CAD models, LiDAR point clouds, RGB-D images, and 3D Computed Tomography scans. Experiments further demonstrate that S3PNet has better generalization capability for smaller training sets, and learns more of kernels with less redundancy compared to kernels learned from 3D CNNs.

  • Monocular human pose estimation: A survey of deep learning-based methods
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2020-01-07
    Yucheng Chen; Yingli Tian; Mingyi He

    Vision-based monocular human pose estimation, as one of the most fundamental and challenging problems in computer vision, aims to obtain posture of the human body from input images or video sequences. The recent developments of deep learning techniques have been brought significant progress and remarkable breakthroughs in the field of human pose estimation. This survey extensively reviews the recent deep learning-based 2D and 3D human pose estimation methods published since 2014. This paper summarizes the challenges, main frameworks, benchmark datasets, evaluation metrics, performance comparison, and discusses some promising future research directions.

  • 更新日期:2020-01-04
  • Photometric camera characterization from a single image with invariance to light intensity and vignetting
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2019-12-13
    Pedro M.C. Rodrigues; João P. Barreto; Michel Antunes

    Photometric characterization of a camera entails describing how the camera transforms the light reaching its sensors into an image and how this image can be defined in a standard color space. Although the research in this area has been extensive, the current literature lacks practical methods designed for cameras operating under near light. There are two major application scenarios considered in this paper that would benefit from this type of approaches. Camera rigs for minimally-invasive-procedures cannot be calibrated in the operating room with the current methods. This comes from the fact that existing approaches need multiple images, assume uniform lighting, and/or use over-simplistic camera models, which does not allow for the calibration of near light setups in a fast and reliable way. The second scenario refers to the calibration of cellphone cameras, which currently cannot be calibrated at close range with a single image, specially if the flash is used, as there would be non-uniform lighting on the scene. In this work, we describe a method to characterize cameras from a single image of a known target. This enables both geometric and photometric calibrations to be performed on-the-fly without making assumptions on the vignetting nor on the spatial properties of the light. The presented method showed good repeatability and color accuracy even when compared to multiple-image approaches. Applications to laparoscopic cameras, generic cameras (such as cellphone cameras), and cameras other than trichromatic are shown to be viable.

  • Learning feature aggregation in temporal domain for re-identification
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2019-11-28
    Jakub Špaňhel; Jakub Sochor; Roman Juránek; Petr Dobeš; Vojtěch Bartl; Adam Herout

    Person re-identification is a standard and established problem in the computer vision community. In recent years, vehicle re-identification is also getting more attention. In this paper, we focus on both these tasks and propose a method for aggregation of features in temporal domain as it is common to have multiple observations of the same object. The aggregation is based on weighting different elements of the feature vectors by different weights and it is trained in an end-to-end manner by a Siamese network. The experimental results show that our method outperforms other existing methods for feature aggregation in temporal domain on both vehicle and person re-identification tasks. Furthermore, to push research in vehicle re-identification further, we introduce a novel dataset CarsReId74k. The dataset is not limited to frontal/rear viewpoints. It contains 17,681 unique vehicles, 73,976 observed tracks, and 277,236 positive pairs. The dataset was captured by 66 cameras from various angles.

  • Cascade multi-head attention networks for action recognition
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2020-01-02
    Jiaze Wang; Xiaojiang Peng; Yu Qiao

    Long-term temporal information yields crucial cues for video action understanding. Previous researches always rely on sequential models such as recurrent networks, memory units, segmental models, self-attention mechanism to integrate the local temporal features for long-term temporal modeling. Recurrent or memory networks record temporal patterns (or relations) by memory units, which are proved to be difficult to capture long-term information in machine translation. Self-attention mechanisms directly aggregate all local information with attention weights which is more straightforward and efficient than the former. However, the attention weights from self-attention ignore the relations between local information and global information which may lead to unreliable attention. To this end, we propose a new attention network architecture, termed as Cascade multi-head ATtention Network (CATNet), which constructs video representations with two-level attentions, namely multi-head local self-attentions and relation based global attentions. Starting from the segment features generated by backbone networks, CATNet first learns multiple attention weights for each segment to capture the importance of local features in a self-attention manner. With the local attention weights, CATNet integrates local features into several global representations, and then learns the second level attention for the global information by a relation manner. Extensive experiments on Kinetics, HMDB51, and UCF101 show that our CATNet boosts the baseline network with a large margin. With only RGB information, we respectively achieve 75.8%, 75.2%, and 96.0% on these three datasets, which are comparable or superior to the state of the arts.

  • Graph-matching-based correspondence search for nonrigid point cloud registration
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2020-01-02
    Seunggyu Chang; Chanho Ahn; Minsik Lee; Songhwai Oh

    Nonrigid registration finds transformations to fit a source point cloud/mesh to a target point cloud/mesh. Most nonrigid registration algorithms consist of two steps; finding correspondence and optimization. Among these, finding correspondence plays an important role in registration performance. However, when two point clouds have large displacement, it is hard to know correct correspondences and an algorithm often fails to find correct transformations. In this paper, we propose a novel graph-matching-based correspondence search for nonrigid registration and a corresponding optimization method for finding transformation to complete nonrigid registration. Considering global connectivity as well as local similarity for the correspondence search, the proposed method finds good correspondences according to semantics and consequently finds correct transformations even when the motion is large. Our algorithm is experimentally validated on human body and animal datasets, which verifies that it is capable of finding correct transformations to fit a source to a target.

  • Guess where? Actor-supervision for spatiotemporal action localization
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2019-12-09
    Victor Escorcia; Cuong D. Dao; Mihir Jain; Bernard Ghanem; Cees Snoek

    This paper addresses the problem of spatiotemporal localization of actions in videos. Compared to leading approaches, which all learn to localize based on carefully annotated boxes on training video frames, we adhere to a solution only requiring video class labels. We introduce an actor-supervised architecture that exploits the inherent compositionality of actions in terms of actor transformations, to localize actions. We make two contributions. First, we propose actor proposals derived from a detector for human and non-human actors intended for images, which are linked over time by Siamese similarity matching to account for actor deformations. Second, we propose an actor-based attention mechanism enabling localization from action class labels and actor proposals. It exploits a new actor pooling operation and is end-to-end trainable. Experiments on four action datasets show actor supervision is state-of-the-art for action localization from video class labels and is even competitive to some box-supervised alternatives.

  • Graph convolutional neural network for multi-scale feature learning
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2019-12-02
    Michael Edwards; Xianghua Xie; Robert I. Palmer; Gary K.L. Tam; Rob Alcock; Carl Roobottom
  • Momental directional patterns for dynamic texture recognition
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2019-12-02
    Thanh Tuan Nguyen; Thanh Phuong Nguyen; Frédéric Bouchara; Xuan Son Nguyen

    Understanding the chaotic motions of dynamic textures (DTs) is a challenging problem of video representation for different tasks in computer vision. This paper presents a new approach for an efficient DT representation by addressing the following novel concepts. First, a model of moment volumes is introduced as an effective pre-processing technique for enriching the robust and discriminative information of dynamic voxels with low computational cost. Second, two important extensions of Local Derivative Pattern operator are proposed to improve its performance in capturing directional features. Third, we present a new framework, called Momental Directional Patterns, taking into account the advantages of filtering and local-feature-based approaches to form effective DT descriptors. Furthermore, motivated by convolutional neural networks, the proposed framework is boosted by utilizing more global features extracted from max-pooling videos to improve the discrimination power of the descriptors. Our proposal is verified on benchmark datasets, i.e., UCLA, DynTex, and DynTex++, for DT classification issue. The experimental results substantiate the interest of our method.

  • Human Visual System vs Convolution Neural Networks in food recognition task: An empirical comparison
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2019-11-30
    Pedro Furtado; Manuel Caldeira; Pedro Martins

    Automated food recognition from food plate is useful for smartphone-based applications promoting healthy lifestyles and for automated carbohydrate counting, e.g. targeted at type I diabetic patients, but the variation of appearance of food items makes it a difficult task. Convolution Neural Networks (CNNs) raised to prominence in recent years, and they will enable those applications if they are able to match HVS accuracy at least in meal classification. In this work we run an experimental comparison of accuracy between CNNs and HVS based on a simple meal recognition task. We set up a survey for humans with two phases, training and testing, and also give the food dataset to state-of-the-art CNNs. The results, considering some relevant variations in the setup, allow us to reach conclusions regarding the comparison, characteristics and limitations of CNNs, which are relevant for future improvements.

  • Comparison of monocular depth estimation methods using geometrically relevant metrics on the IBims-1 dataset
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2019-11-26
    Tobias Koch; Lukas Liebel; Marco Körner; Friedrich Fraundorfer
  • An efficient EM-ICP algorithm for non-linear registration of large 3D point sets
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2019-11-12
    Benoit Combès; Sylvain Prima

    In this paper, we present a new method for non-linear pairwise registration of 3D point sets. In this method, we consider the points of the first set as the draws of a Gaussian mixture model whose centres are the displaced points of the second set. Next we perform a maximum a posteriori estimation of the parameters (which include the unknown transformation) of this model using the expectation–maximisation (EM) algorithm. Compared to other methods using the same “EM-ICP” framework, we propose four key modifications leading to an efficient algorithm allowing for fast registration of large 3D point sets: (1) truncation of the cost function; (2) symmetrisation of the point-to-point correspondences; (3) specification of priors on these correspondences using differential geometry; (4) efficient encoding of deformations using the RKHS theory and the Fourier analysis. We evaluate the added value of these modifications and compare our method to the state-of-the-art CPD algorithm on real and simulated data.

  • Simultaneous compression and quantization: A joint approach for efficient unsupervised hashing
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2019-11-12
    Tuan Hoang; Thanh-Toan Do; Huu Le; Dang-Khoa Le-Tan; Ngai-Man Cheung

    For unsupervised data-dependent hashing, the two most important requirements are to preserve similarity in the low-dimensional feature space and to minimize the binary quantization loss. A well-established hashing approach is Iterative Quantization (ITQ), which addresses these two requirements in separate steps. In this paper, we revisit the ITQ approach and propose novel formulations and algorithms to the problem. Specifically, we propose a novel approach, named Simultaneous Compression and Quantization (SCQ), to jointly learn to compress (reduce dimensionality) and binarize input data in a single formulation under strict orthogonal constraint. With this approach, we introduce a loss function and its relaxed version, termed Orthonormal Encoder (OnE) and Orthogonal Encoder (OgE) respectively, which involve challenging binary and orthogonal constraints. We propose to attack the optimization using novel algorithms based on recent advance in cyclic coordinate descent approach. Comprehensive experiments on unsupervised image retrieval demonstrate that our proposed methods consistently outperform other state-of-the-art hashing methods. Notably, our proposed methods outperform recent deep neural networks and GAN based hashing in accuracy, while being very computationally-efficient.

  • 更新日期:2020-01-04
  • An Entropic Optimal Transport loss for learning deep neural networks under label noise in remote sensing images
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2019-11-06
    Bharath Bhushan Damodaran; Rémi Flamary; Vivien Seguy; Nicolas Courty

    Deep neural networks have established as a powerful tool for large scale supervised classification tasks. The state-of-the-art performances of deep neural networks are conditioned to the availability of large number of accurately labeled samples. In practice, collecting large scale accurately labeled datasets is a challenging and tedious task in most scenarios of remote sensing image analysis, thus cheap surrogate procedures are employed to label the dataset. Training deep neural networks on such datasets with inaccurate labels easily overfits to the noisy training labels and degrades the performance of the classification tasks drastically. To mitigate this effect, we propose an original solution with entropic optimal transportation. It allows to learn in an end-to-end fashion deep neural networks that are, to some extent, robust to inaccurately labeled samples. We empirically demonstrate on several remote sensing datasets, where both scene and pixel-based hyperspectral images are considered for classification. Our method proves to be highly tolerant to significant amounts of label noise and achieves favorable results against state-of-the-art methods.

  • CRF with deep class embedding for large scale classification
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2019-11-06
    Eran Goldman; Jacob Goldberger

    This paper presents a novel deep learning architecture for classifying structured objects in ultrafine-grained datasets, where classes may not be clearly distinguishable by their appearance but rather by their context. We model sequences of images as linear-chain CRFs, and jointly learn the parameters from both local-visual features and neighboring class information. The visual features are learned by convolutional layers, whereas class-structure information is reparametrized by factorizing the CRF pairwise potential matrix. This forms a context-based semantic similarity space, learned alongside the visual similarities, and dramatically increases the learning capacity of contextual information. This new parametrization, however, forms a highly nonlinear objective function which is challenging to optimize. To overcome this, we develop a novel surrogate likelihood which allows for a local likelihood approximation of the original CRF with integrated batch-normalization. This model overcomes the difficulties of existing CRF methods to learn the contextual relationships thoroughly when there is a large number of classes and the data is sparse. The performance of the proposed method is illustrated on a huge dataset that contains images of retail-store product displays, and shows significantly improved results compared to linear CRF parametrization, unnormalized likelihood optimization, and RNN modeling. We also show improved results on a standard OCR dataset.

  • CompactNets: Compact Hierarchical Compositional Networks for Visual Recognition
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2019-10-28
    Hans Lobel; René Vidal; Alvaro Soto

    CNN-based models currently provide state-of-the-art performance in image categorization tasks. While these methods are powerful in terms of representational capacity, they are generally not conceived with explicit means to control complexity. This might lead to scenarios where resources are used in a non-optimal manner, increasing the number of unspecialized or repeated neurons, and overfitting to data. In this work we propose CompactNets, a new approach to visual recognition that learns a hierarchy of shared, discriminative, specialized, and compact representations. CompactNets naturally capture the notion of compositional compactness, a characterization of complexity in compositional models, consisting on using the smallest number of patterns to build a suitable visual representation. We employ a structural regularizer with group-sparse terms in the objective function, that induces on each layer, an efficient and effective use of elements from the layer below. In particular, this allows groups of top-level features to be specialized based on category information. We evaluate CompactNets on the ILSVRC12 dataset, obtaining compact representations and competitive performance, using an order of magnitude less parameters than common CNN-based approaches. We show that CompactNets are able to outperform other group-sparse-based approaches, in terms of performance and compactness. Finally, transfer-learning experiments on small-scale datasets demonstrate high generalization power, providing remarkable categorization performance with respect to alternative approaches.

  • Visual tracking in video sequences based on biologically inspired mechanisms
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2018-10-26
    Alireza Sokhandan; Amirhassan Monadjemi

    Visual tracking is the process of locating one or more objects based on their appearance. The high variation in the conditions and states of a moving object and presence of challenges such as background clutter, illumination variation, occlusion, etc. makes this problem extremely complex, and hard to achieve a robust algorithm in this field. However, unlike the machine vision, in the biological vision, the task of visual tracking is ideally conducted even in the worst conditions. Consequently, in this paper, taking into account the superior performance of biological vision in visual tracking, a biologically inspired visual tracking algorithm is introduced. The proposed algorithm inspiring the task-driven recognition procedure of the primary layers of the ventral pathway, and visual cortex mechanisms including spatial–temporal processing, motion perception, attention, and saliency to track a single object in the video sequence. For this purpose, a set of low-level features including the oriented-edges, color, and motion information (inspired by the layer V1) extracted from the target area and based on the discrimination rate that each feature creates with the background (inspired by the saliency mechanism), a subset of these features are employed to generate the appearance model and identify the target location. Moreover, by memorizing the shape and motion information (inspired by the short-term memory) scale variation and occlusion are handled. The experimental results showed that the proposed algorithm can well handle most of the visual tracking challenges, achieve high precision in target locating and act in a real-time manner.

  • A novel algebraic solution to the perspective-three-line pose problem
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2018-09-15
    Ping Wang; Guili Xu; Yuehua Cheng

    In this work, we present a novel algebraic method to the perspective-three-line (P3L) problem for determining the position and attitude of a calibrated camera from features of three known reference lines. Unlike other methods, the proposed method uses an intermediate camera frame F and an intermediate world frame E, with sparse known line coordinates, facilitating formulations of the P3L problem. Additionally, the rotation matrix between the frame E and the frame F is parameterized by using its orthogonality, and then a closed-form solution for the P3L pose problem is obtained from subsequent substitutions. This algebraic method makes the processes more easily followed and significantly improves the performance. The experimental results show that the proposed method offers numerical stability, accuracy and efficiency comparable or better than that of state-of-the-art method.

  • 更新日期:2020-01-04
  • Descriptor extraction based on a multilayer dictionary architecture for classification of natural images
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2018-08-29
    Stefen Chan Wai Tim; Michele Rombaut; Denis Pellerin; Anuvabh Dutt

    This paper presents a descriptor extraction method in the context of image classification, based on a multilayer structure of dictionaries. We propose to learn an architecture of discriminative dictionaries for classification in a supervised framework using a patch-level approach. This method combines many layers of sparse coding and pooling in order to reduce the dimension of the problem. The supervised learning of dictionary atoms allows them to be specialized for a classification task. The method has been tested on known datasets of natural images such as MNIST, CIFAR-10 and STL, in various conditions, especially when the size of the training set is limited, and in a transfer learning application. The results are also compared with those obtained with Convolutional Neural Network (CNN) of similar complexity in terms of number of layers and processing pipeline.

  • Identifying motion pathways in highly crowded scenes: A non-parametric tracklet clustering approach
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2018-08-24
    Allam S. Hassanein; Mohamed E. Hussein; Walid Gomaa; Yasushi Makihara; Yasushi Yagi

    Many approaches that address the analysis of crowded scenes rely on using short trajectory fragments, also known as tracklets, of moving objects to identify motion pathways. Typically, such approaches aim at defining meaningful relationships among tracklets. However, defining these relationships and incorporating them in a crowded scene analysis framework is a challenge. In this article, we introduce a robust approach to identifying motion pathways based on tracklet clustering. We formulate a novel measure, inspired by line geometry, to capture the pairwise similarities between tracklets. For tracklet clustering, the recent distance dependent Chinese restaurant process (DD-CRP) model is adapted to use the estimated pairwise tracklet similarities. The motion pathways are identified based on two hierarchical levels of DD-CRP clustering such that the output clusters correspond to the pathways of moving objects in the crowded scene. Moreover, we extend our DD-CRP clustering adaptation to incorporate the source and sink gate probabilities for each tracklet as a high-level semantic prior for improving clustering performance. For qualitative evaluation, we propose a robust pathway matching metric, based on the chi-square distance, that accounts for both spatial coverage and motion orientation in the matched pathways. Our experimental evaluation on multiple crowded scene datasets, principally, the challenging Grand Central Station dataset, demonstrates the state-of-the-art performance of our approach. Finally, we demonstrate the task of motion abnormality detection, both at the tracklet and frame levels, against the normal motion patterns encountered in the motion pathways identified by our method, with competent quantitative performance on multiple datasets.

  • Spatial transformation and registration of brain images using elastically deformable models.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 1997-05-01
    C Davatzikos

    The development of algorithms for the spatial transformation and registration of tomographic brain images is a key issue in several clinical and basic science medical applications, including computer-aided neurosurgery, functional image analysis, and morphometrics. This paper describes a technique for the spatial transformation of brain images, which is based on elastically deformable models. A deformable surface algorithm is used to find a parametric representation of the outer cortical surface and then to define a map between corresponding cortical regions in two brain images. Based on the resulting map, a three-dimensional elastic warping transformation is then determined, which brings two images into register. This transformation models images as inhomogeneous elastic objects which are deformed into registration with each other by external force fields. The elastic properties of the images can vary from one region to the other, allowing more variable brain regions, such as the ventricles, to deform more freely than less variable ones. Finally, the framework of prestrained elasticity is used to model structural irregularities, and in particular the ventricular expansion occurring with aging or diseases, and the growth of tumors. Performance measurements are obtained using magnetic resonance images.

  • Expressive visual text-to-speech as an assistive technology for individuals with autism spectrum conditions.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2016-07-05
    S A Cassidy,B Stenger,L Van Dongen,K Yanagisawa,R Anderson,V Wan,S Baron-Cohen,R Cipolla

    Adults with Autism Spectrum Conditions (ASC) experience marked difficulties in recognising the emotions of others and responding appropriately. The clinical characteristics of ASC mean that face to face or group interventions may not be appropriate for this clinical group. This article explores the potential of a new interactive technology, converting text to emotionally expressive speech, to improve emotion processing ability and attention to faces in adults with ASC. We demonstrate a method for generating a near-videorealistic avatar (XpressiveTalk), which can produce a video of a face uttering inputted text, in a large variety of emotional tones. We then demonstrate that general population adults can correctly recognize the emotions portrayed by XpressiveTalk. Adults with ASC are significantly less accurate than controls, but still above chance levels for inferring emotions from XpressiveTalk. Both groups are significantly more accurate when inferring sad emotions from XpressiveTalk compared to the original actress, and rate these expressions as significantly more preferred and realistic. The potential applications for XpressiveTalk as an assistive technology for adults with ASC is discussed.

  • Tensor scale: An analytic approach with efficient computation and applications.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2012-10-01
    Ziyue Xu,Punam K Saha,Soura Dasgupta

    Scale is a widely used notion in computer vision and image understanding that evolved in the form of scale-space theory where the key idea is to represent and analyze an image at various resolutions. Recently, we introduced a notion of local morphometric scale referred to as "tensor scale" using an ellipsoidal model that yields a unified representation of structure size, orientation and anisotropy. In the previous work, tensor scale was described using a 2-D algorithmic approach and a precise analytic definition was missing. Also, the application of tensor scale in 3-D using the previous framework is not practical due to high computational complexity. In this paper, an analytic definition of tensor scale is formulated for n-dimensional (n-D) images that captures local structure size, orientation and anisotropy. Also, an efficient computational solution in 2- and 3-D using several novel differential geometric approaches is presented and the accuracy of results is experimentally examined. Also, a matrix representation of tensor scale is derived facilitating several operations including tensor field smoothing to capture larger contextual knowledge. Finally, the applications of tensor scale in image filtering and n-linear interpolation are presented and the performance of their results is examined in comparison with respective state-of-art methods. Specifically, the performance of tensor scale based image filtering is compared with gradient and Weickert's structure tensor based diffusive filtering algorithms. Also, the performance of tensor scale based n-linear interpolation is evaluated in comparison with standard n-linear and windowed-sinc interpolation methods.

  • Robust measurement of individual localized changes to the aging hippocampus.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2014-08-19
    Jing Xie,Evan Fletcher,Baljeet Singh,Owen Carmichael

    Alzheimer's Disease (AD) is characterized by a stereotypical spatial pattern of hippocampus (HP) atrophy over time, but reliable and precise measurement of localized longitudinal change to individual HP in AD have been elusive. We present a method for quantifying subject-specific spatial patterns of longitudinal HP change that aligns serial HP surface pairs together, cuts slices off the ends of the HP that were not shared in the two delineations being aligned, estimates weighted correspondences between baseline and follow-up HP, and finds a concise set of localized spatial change patterns that explains HP changes while down-weighting HP surface points whose estimated changes are biologically implausible. We tested our method on a synthetic HP change dataset as well as a set of 320 real elderly HP measured at 1-year intervals. Our results suggests that the proposed steps reduce the amount of implausible HP changes indicated among individual HP, increase the strength of association between HP change and cognitive function related to AD, and enhance the estimation of reliable spatially-localized HP change patterns.

  • Interactive object modelling based on piecewise planar surface patches.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2014-02-11
    Johann Prankl,Michael Zillich,Markus Vincze

    Detecting elements such as planes in 3D is essential to describe objects for applications such as robotics and augmented reality. While plane estimation is well studied, table-top scenes exhibit a large number of planes and methods often lock onto a dominant plane or do not estimate 3D object structure but only homographies of individual planes. In this paper we introduce MDL to the problem of incrementally detecting multiple planar patches in a scene using tracked interest points in image sequences. Planar patches are reconstructed and stored in a keyframe-based graph structure. In case different motions occur, separate object hypotheses are modelled from currently visible patches and patches seen in previous frames. We evaluate our approach on a standard data set published by the Visual Geometry Group at the University of Oxford [24] and on our own data set containing table-top scenes. Results indicate that our approach significantly improves over the state-of-the-art algorithms.

  • 2D/3D Image Registration using Regression Learning.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2013-09-24
    Chen-Rui Chou,Brandon Frederick,Gig Mageras,Sha Chang,Stephen Pizer

    In computer vision and image analysis, image registration between 2D projections and a 3D image that achieves high accuracy and near real-time computation is challenging. In this paper, we propose a novel method that can rapidly detect an object's 3D rigid motion or deformation from a 2D projection image or a small set thereof. The method is called CLARET (Correction via Limited-Angle Residues in External Beam Therapy) and consists of two stages: registration preceded by shape space and regression learning. In the registration stage, linear operators are used to iteratively estimate the motion/deformation parameters based on the current intensity residue between the target projec-tion(s) and the digitally reconstructed radiograph(s) (DRRs) of the estimated 3D image. The method determines the linear operators via a two-step learning process. First, it builds a low-order parametric model of the image region's motion/deformation shape space from its prior 3D images. Second, using learning-time samples produced from the 3D images, it formulates the relationships between the model parameters and the co-varying 2D projection intensity residues by multi-scale linear regressions. The calculated multi-scale regression matrices yield the coarse-to-fine linear operators used in estimating the model parameters from the 2D projection intensity residues in the registration. The method's application to Image-guided Radiation Therapy (IGRT) requires only a few seconds and yields good results in localizing a tumor under rigid motion in the head and neck and under respiratory deformation in the lung, using one treatment-time imaging 2D projection or a small set thereof.

  • Particle Filters and Occlusion Handling for Rigid 2D-3D Pose Tracking.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2013-09-24
    Jehoon Lee,Romeil Sandhu,Allen Tannenbaum

    In this paper, we address the problem of 2D-3D pose estimation. Specifically, we propose an approach to jointly track a rigid object in a 2D image sequence and to estimate its pose (position and orientation) in 3D space. We revisit a joint 2D segmentation/3D pose estimation technique, and then extend the framework by incorporating a particle filter to robustly track the object in a challenging environment, and by developing an occlusion detection and handling scheme to continuously track the object in the presence of occlusions. In particular, we focus on partial occlusions that prevent the tracker from extracting an exact region properties of the object, which plays a pivotal role for region-based tracking methods in maintaining the track. To this end, a dynamical choice of how to invoke the objective functional is performed online based on the degree of dependencies between predictions and measurements of the system in accordance with the degree of occlusion and the variation of the object's pose. This scheme provides the robustness to deal with occlusions of an obstacle with different statistical properties from that of the object of interest. Experimental results demonstrate the practical applicability and robustness of the proposed method in several challenging scenarios.

  • Ricci Flow-based Spherical Parameterization and Surface Registration.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2013-09-11
    X Chen,H He,G Zou,X Zhang,X Gu,J Hua

    This paper presents an improved Euclidean Ricci flow method for spherical parameterization. We subsequently invent a scale space processing built upon Ricci energy to extract robust surface features for accurate surface registration. Since our method is based on the proposed Euclidean Ricci flow, it inherits the properties of Ricci flow such as conformality, robustness and intrinsicalness, facilitating efficient and effective surface mapping. Compared with other surface registration methods using curvature or sulci pattern, our method demonstrates a significant improvement for surface registration. In addition, Ricci energy can capture local differences for surface analysis as shown in the experiments and applications.

  • Simultaneous Segmentation of Prostatic Zones Using Active Appearance Models With Multiple Coupled Levelsets.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2013-09-03
    Robert Toth,Justin Ribault,John Gentile,Dan Sperling,Anant Madabhushi

    In this work we present an improvement to the popular Active Appearance Model (AAM) algorithm, that we call the Multiple-Levelset AAM (MLA). The MLA can simultaneously segment multiple objects, and makes use of multiple levelsets, rather than anatomical landmarks, to define the shapes. AAMs traditionally define the shape of each object using a set of anatomical landmarks. However, landmarks can be difficult to identify, and AAMs traditionally only allow for segmentation of a single object of interest. The MLA, which is a landmark independent AAM, allows for levelsets of multiple objects to be determined and allows for them to be coupled with image intensities. This gives the MLA the flexibility to simulataneously segmentation multiple objects of interest in a new image. In this work we apply the MLA to segment the prostate capsule, the prostate peripheral zone (PZ), and the prostate central gland (CG), from a set of 40 endorectal, T2-weighted MRI images. The MLA system we employ in this work leverages a hierarchical segmentation framework, so constructed as to exploit domain specific attributes, by utilizing a given prostate segmentation to help drive the segmentations of the CG and PZ, which are embedded within the prostate. Our coupled MLA scheme yielded mean Dice accuracy values of .81, .79 and .68 for the prostate, CG, and PZ, respectively using a leave-one-out cross validation scheme over 40 patient studies. When only considering the midgland of the prostate, the mean DSC values were .89, .84, and .76 for the prostate, CG, and PZ respectively.

  • GC-ASM: Synergistic Integration of Graph-Cut and Active Shape Model Strategies for Medical Image Segmentation.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2013-04-16
    Xinjian Chen,Jayaram K Udupa,Abass Alavi,Drew A Torigian

    Image segmentation methods may be classified into two categories: purely image based and model based. Each of these two classes has its own advantages and disadvantages. In this paper, we propose a novel synergistic combination of the image based graph-cut (GC) method with the model based ASM method to arrive at the GC-ASM method for medical image segmentation. A multi-object GC cost function is proposed which effectively integrates the ASM shape information into the GC framework. The proposed method consists of two phases: model building and segmentation. In the model building phase, the ASM model is built and the parameters of the GC are estimated. The segmentation phase consists of two main steps: initialization (recognition) and delineation. For initialization, an automatic method is proposed which estimates the pose (translation, orientation, and scale) of the model, and obtains a rough segmentation result which also provides the shape information for the GC method. For delineation, an iterative GC-ASM algorithm is proposed which performs finer delineation based on the initialization results. The proposed methods are implemented to operate on 2D images and evaluated on clinical chest CT, abdominal CT, and foot MRI data sets. The results show the following: (a) An overall delineation accuracy of TPVF > 96%, FPVF < 0.6% can be achieved via GC-ASM for different objects, modalities, and body regions. (b) GC-ASM improves over ASM in its accuracy and precision to search region. (c) GC-ASM requires far fewer landmarks (about 1/3 of ASM) than ASM. (d) GC-ASM achieves full automation in the segmentation step compared to GC which requires seed specification and improves on the accuracy of GC. (e) One disadvantage of GC-ASM is its increased computational expense owing to the iterative nature of the algorithm.

  • Text Extraction from Scene Images by Character Appearance and Structure Modeling.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2013-01-15
    Chucai Yi,Yingli Tian

    In this paper, we propose a novel algorithm to detect text information from natural scene images. Scene text classification and detection are still open research topics. Our proposed algorithm is able to model both character appearance and structure to generate representative and discriminative text descriptors. The contributions of this paper include three aspects: 1) a new character appearance model by a structure correlation algorithm which extracts discriminative appearance features from detected interest points of character samples; 2) a new text descriptor based on structons and correlatons, which model character structure by structure differences among character samples and structure component co-occurrence; and 3) a new text region localization method by combining color decomposition, character contour refinement, and string line alignment to localize character candidates and refine detected text regions. We perform three groups of experiments to evaluate the effectiveness of our proposed algorithm, including text classification, text detection, and character identification. The evaluation results on benchmark datasets demonstrate that our algorithm achieves the state-of-the-art performance on scene text classification and detection, and significantly outperforms the existing algorithms for character identification.

  • A Multiple Object Geometric Deformable Model for Image Segmentation.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2013-01-15
    John A Bogovic,Jerry L Prince,Pierre-Louis Bazin

    Deformable models are widely used for image segmentation, most commonly to find single objects within an image. Although several methods have been proposed to segment multiple objects using deformable models, substantial limitations in their utility remain. This paper presents a multiple object segmentation method using a novel and efficient object representation for both two and three dimensions. The new framework guarantees object relationships and topology, prevents overlaps and gaps, enables boundary-specific speeds, and has a computationally efficient evolution scheme that is largely independent of the number of objects. Maintaining object relationships and straightforward use of object-specific and boundary-specific smoothing and advection forces enables the segmentation of objects with multiple compartments, a critical capability in the parcellation of organs in medical imaging. Comparing the new framework with previous approaches shows its superior performance and scalability.

  • Optimal-Flow Minimum-Cost Correspondence Assignment in Particle Flow Tracking.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2011-07-02
    Alexandre Matov,Marcus M Edvall,Ge Yang,Gaudenz Danuser

    A diversity of tracking problems exists in which cohorts of densely packed particles move in an organized fashion, however the stability of individual particles within the cohort is low. Moreover, the flows of cohorts can regionally overlap. Together, these conditions yield a complex tracking scenario that can not be addressed by optical flow techniques that assume piecewise coherent flows, or by multiparticle tracking techniques that suffer from the local ambiguity in particle assignment. Here, we propose a graph-based assignment of particles in three consecutive frames to recover from image sequences the instantaneous organized motion of groups of particles, i.e. flows. The algorithm makes no a priori assumptions on the fraction of particles participating in organized movement, as this number continuously alters with the evolution of the flow fields in time. Graph-based assignment methods generally maximize the number of acceptable particles assignments between consecutive frames and only then minimize the association cost. In dense and unstable particle flow fields this approach produces many false positives. The here proposed approach avoids this via solution of a multi-objective optimization problem in which the number of assignments is maximized while their total association cost is minimized at the same time. The method is validated on standard benchmark data for particle tracking. In addition, we demonstrate its application to live cell microscopy where several large molecular populations with different behaviors are tracked.

  • A framework for comparing different image segmentation methods and its use in studying equivalences between level set and fuzzy connectedness frameworks.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2011-03-29
    Krzysztof Chris Ciesielski,Jayaram K Udupa

    In the current vast image segmentation literature, there seems to be considerable redundancy among algorithms, while there is a serious lack of methods that would allow their theoretical comparison to establish their similarity, equivalence, or distinctness. In this paper, we make an attempt to fill this gap. To accomplish this goal, we argue that: (1) every digital segmentation algorithm [Formula: see text] should have a well defined continuous counterpart [Formula: see text], referred to as its model, which constitutes an asymptotic of [Formula: see text] when image resolution goes to infinity; (2) the equality of two such models [Formula: see text] and [Formula: see text] establishes a theoretical (asymptotic) equivalence of their digital counterparts [Formula: see text] and [Formula: see text]. Such a comparison is of full theoretical value only when, for each involved algorithm [Formula: see text], its model [Formula: see text] is proved to be an asymptotic of [Formula: see text]. So far, such proofs do not appear anywhere in the literature, even in the case of algorithms introduced as digitizations of continuous models, like level set segmentation algorithms.The main goal of this article is to explore a line of investigation for formally pairing the digital segmentation algorithms with their asymptotic models, justifying such relations with mathematical proofs, and using the results to compare the segmentation algorithms in this general theoretical framework. As a first step towards this general goal, we prove here that the gradient based thresholding model [Formula: see text] is the asymptotic for the fuzzy connectedness Udupa and Samarasekera segmentation algorithm used with gradient based affinity [Formula: see text]. We also argue that, in a sense, [Formula: see text] is the asymptotic for the original front propagation level set algorithm of Malladi, Sethian, and Vemuri, thus establishing a theoretical equivalence between these two specific algorithms. Experimental evidence of this last equivalence is also provided.

  • Intensity Standardization Simplifies Brain MR Image Segmentation.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2010-02-18
    Ying Zhuge,Jayaram K Udupa

    Typically, brain MR images present significant intensity variation across patients and scanners. Consequently, training a classifier on a set of images and using it subsequently for brain segmentation may yield poor results. Adaptive iterative methods usually need to be employed to account for the variations of the particular scan. These methods are complicated, difficult to implement and often involve significant computational costs. In this paper, a simple, non-iterative method is proposed for brain MR image segmentation. Two preprocessing techniques, namely intensity inhomogeneity correction, and more importantly MR image intensity standardization, used prior to segmentation, play a vital role in making the MR image intensities have a tissue-specific numeric meaning, which leads us to a very simple brain tissue segmentation strategy.Vectorial scale-based fuzzy connectedness and certain morphological operations are utilized first to generate the brain intracranial mask. The fuzzy membership value of each voxel within the intracranial mask for each brain tissue is then estimated. Finally, a maximum likelihood criterion with spatial constraints taken into account is utilized in classifying all voxels in the intracranial mask into different brain tissue groups. A set of inhomogeneity corrected and intensity standardized images is utilized as a training data set. We introduce two methods to estimate fuzzy membership values. In the first method, called SMG (for simple membership based on a gaussian model), the fuzzy membership value is estimated by fitting a multivariate Gaussian model to the intensity distribution of each brain tissue whose mean intensity vector and covariance matrix are estimated and fixed from the training data sets. The second method, called SMH (for simple membership based on a histogram), estimates fuzzy membership value directly via the intensity distribution of each brain tissue obtained from the training data sets. We present several studies to evaluate the performance of these two methods based on 10 clinical MR images of normal subjects and 10 clinical MR images of Multiple Sclerosis (MS) patients. A quantitative comparison indicates that both methods have overall better accuracy than the k-nearest neighbors (kNN) method, and have much better efficiency than the Finite Mixture (FM) model based Expectation-Maximization (EM) method. Accuracy is similar for our methods and EM method for the normal subject data sets, but much better for our methods for the patient data sets.

  • Volumetric Video Compression for Interactive Playback.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2004-12-01
    Bong-Soo Sohn,Chandrajit Bajaj,Vinay Siddavanahalli

    We develop a volumetric video system which supports interactive browsing of compressed time-varying volumetric features (significant isosurfaces and interval volumes). Since the size of even one volumetric frame in a time-varying 3D data set is very large, transmission and on-line reconstruction are the main bottlenecks for interactive remote visualization of time-varying volume and surface data. We describe a compression scheme for encoding time-varying volumetric features in a unified way, which allows for on-line reconstruction and rendering. To increase the run-time decompression speed and compression ratio, we decompose the volume into small blocks and encode only the significant blocks that contribute to the isosurfaces and interval volumes. The results show that our compression scheme achieves high compression ratio with fast reconstruction, which is effective for client-side rendering of time-varying volumetric features.

  • Linguistic Summarization of Video for Fall Detection Using Voxel Person and Fuzzy Logic.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2010-01-05
    Derek Anderson,Robert H Luke,James M Keller,Marjorie Skubic,Marilyn Rantz,Myra Aud

    In this paper, we present a method for recognizing human activity from linguistic summarizations of temporal fuzzy inference curves representing the states of a three-dimensional object called voxel person. A hierarchy of fuzzy logic is used, where the output from each level is summarized and fed into the next level. We present a two level model for fall detection. The first level infers the states of the person at each image. The second level operates on linguistic summarizations of voxel person's states and inference regarding activity is performed. The rules used for fall detection were designed under the supervision of nurses to ensure that they reflect the manner in which elders perform these activities. The proposed framework is extremely flexible. Rules can be modified, added, or removed, allowing for per-resident customization based on knowledge about their cognitive and physical ability.

  • Limited view CT reconstruction and segmentation via constrained metric labeling.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2009-10-06
    Vikas Singh,Lopamudra Mukherjee,Petru M Dinu,Jinhui Xu,Kenneth R Hoffmann

    This paper proposes a new discrete optimization framework for tomographic reconstruction and segmentation of CT volumes when only a few projection views are available. The problem has important clinical applications in coronary angiographic imaging. We first show that the limited view reconstruction and segmentation problem can be formulated as a "constrained" version of the metric labeling problem. This lays the groundwork for a linear programming framework that brings metric labeling classification and classical algebraic tomographic reconstruction (ART) together in a unified model. If the imaged volume is known to be comprised of a finite set of attenuation coefficients (a realistic assumption), given a regular limited view reconstruction, we view it as a task of voxels reassignment subject to maximally maintaining consistency with the input reconstruction and the objective of ART simultaneously. The approach can reliably reconstruct (or segment) volumes with several multiple contrast objects. We present evaluations using experiments on cone beam computed tomography.

  • Trajectory Fusion for Three-dimensional Volume Reconstruction.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2009-04-04
    Sang-Chul Lee,Peter Bajcsy

    We address the 3D volume reconstruction problem from depth adjacent sub-volumes acquired by a confocal laser scanning microscope (CLSM). Our goal is to align the sub-volumes by estimating a set of optimal global transformations that preserve morphological continuity of medical structures, e.g., blood vessels, in the reconstructed 3D volume. We approach the problem by learning morphological characteristics of structures of interest in each sub-volume to understand global alignment transformations. Based on the observations of morphology, sub-volumes are aligned by connecting the morphological features at the sub-volume boundaries by minimizing morphological discontinuity. To minimize the discontinuity, we introduce three morphological discontinuity metrics: discontinuity magnitude at sub-volume boundary points, and overall and junction discontinuity residuals after polynomial curve fitting to multiple aligned sub-volumes. The proposed techniques have been applied to the problem of aligning CLSM sub-volumes acquired from four consecutive physical cross sections. Our experimental results demonstrated significant improvements of morphological smoothness of medical structures in comparison with the results obtained by feature matching at the sub-volume boundaries. The experimental results were evaluated by visual inspection and by quantifying morphological discontinuity metrics.

  • Computer-based System for the Virtual-Endoscopic Guidance of Bronchoscopy.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2008-11-04
    J P Helferty,A J Sherbondy,A P Kiraly,W E Higgins

    The standard procedure for diagnosing lung cancer involves two stages: three-dimensional (3D) computed-tomography (CT) image assessment, followed by interventional bronchoscopy. In general, the physician has no link between the 3D CT image assessment results and the follow-on bronchoscopy. Thus, the physician essentially performs bronchoscopic biopsy of suspect cancer sites blindly. We have devised a computer-based system that greatly augments the physician's vision during bronchoscopy. The system uses techniques from computer graphics and computer vision to enable detailed 3D CT procedure planning and follow-on image-guided bronchoscopy. The procedure plan is directly linked to the bronchoscope procedure, through a live registration and fusion of the 3D CT data and bronchoscopic video. During a procedure, the system provides many visual tools, fused CT-video data, and quantitative distance measures; this gives the physician considerable visual feedback on how to maneuver the bronchoscope and where to insert the biopsy needle. Central to the system is a CT-video registration technique, based on normalized mutual information. Several sets of results verify the efficacy of the registration technique. In addition, we present a series of test results for the complete system for phantoms, animals, and human lung-cancer patients. The results indicate that not only is the variation in skill level between different physicians greatly reduced by the system over the standard procedure, but that biopsy effectiveness increases.

  • Iterative Relative Fuzzy Connectedness for Multiple Objects with Multiple Seeds.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2008-09-05
    Krzysztof Chris Ciesielski,Jayaram K Udupa,Punam K Saha,Ying Zhuge

    In this paper we present a new theory and an algorithm for image segmentation based on a strength of connectedness between every pair of image elements. The object definition used in the segmentation algorithm utilizes the notion of iterative relative fuzzy connectedness, IRFC. In previously published research, the IRFC theory was developed only for the case when the segmentation was involved with just two segments, an object and a background, and each of the segments was indicated by a single seed. (See Udupa, Saha, Lotufo [15] and Saha, Udupa [14].) Our theory, which solves a problem of Udupa and Saha from [13], allows simultaneous segmentation involving an arbitrary number of objects. Moreover, each segment can be indicated by more than one seed, which is often more natural and easier than a single seed object identification.The first iteration step of the IRFC algorithm gives a segmentation known as relative fuzzy connectedness, RFC, segmentation. Thus, the IRFC technique is an extension of the RFC method. Although the RFC theory, due to Saha and Udupa [19], is developed in the multi object/multi seed framework, the theoretical results presented here are considerably more delicate in nature and do not use the results from [19]. On the other hand, the theoretical results from [19] are immediate consequences of the results presented here. Moreover, the new framework not only subsumes previous fuzzy connectedness descriptions but also sheds new light on them. Thus, there are fundamental theoretical advances made in this paper.We present examples of segmentations obtained via our IRFC based algorithm in the multi object/multi seed environment, and compare it with the results obtained with the RFC based algorithm. Our results indicate that, in many situations, IRFC outperforms RFC, but there also exist instances where the gain in performance is negligible.

  • Simultaneous Tumor Segmentation, Image Restoration, and Blur Kernel Estimation in PET Using Multiple Regularizations.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2017-06-13
    Laquan Li,Jian Wang,Wei Lu,Shan Tan

    Accurate tumor segmentation from PET images is crucial in many radiation oncology applications. Among others, partial volume effect (PVE) is recognized as one of the most important factors degrading imaging quality and segmentation accuracy in PET. Taking into account that image restoration and tumor segmentation are tightly coupled and can promote each other, we proposed a variational method to solve both problems simultaneously in this study. The proposed method integrated total variation (TV) semi-blind de-convolution and Mumford-Shah segmentation with multiple regularizations. Unlike many existing energy minimization methods using either TV or L2 regularization, the proposed method employed TV regularization over tumor edges to preserve edge information, and L2 regularization inside tumor regions to preserve the smooth change of the metabolic uptake in a PET image. The blur kernel was modeled as anisotropic Gaussian to address the resolution difference in transverse and axial directions commonly seen in a clinic PET scanner. The energy functional was rephrased using the Γ-convergence approximation and was iteratively optimized using the alternating minimization (AM) algorithm. The performance of the proposed method was validated on a physical phantom and two clinic datasets with non-Hodgkin's lymphoma and esophageal cancer, respectively. Experimental results demonstrated that the proposed method had high performance for simultaneous image restoration, tumor segmentation and scanner blur kernel estimation. Particularly, the recovery coefficients (RC) of the restored images of the proposed method in the phantom study were close to 1, indicating an efficient recovery of the original blurred images; for segmentation the proposed method achieved average dice similarity indexes (DSIs) of 0.79 and 0.80 for two clinic datasets, respectively; and the relative errors of the estimated blur kernel widths were less than 19% in the transversal direction and 7% in the axial direction.

  • Statistical Shape Model for Manifold Regularization: Gleason grading of prostate histology.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2013-07-28
    Rachel Sparks,Anant Madabhushi

    Gleason patterns of prostate cancer histopathology, characterized primarily by morphological and architectural attributes of histological structures (glands and nuclei), have been found to be highly correlated with disease aggressiveness and patient outcome. Gleason patterns 4 and 5 are highly correlated with more aggressive disease and poorer patient outcome, while Gleason patterns 1-3 tend to reflect more favorable patient outcome. Because Gleason grading is done manually by a pathologist visually examining glass (or digital) slides subtle morphologic and architectural differences of histological attributes, in addition to other factors, may result in grading errors and hence cause high inter-observer variability. Recently some researchers have proposed computerized decision support systems to automatically grade Gleason patterns by using features pertaining to nuclear architecture, gland morphology, as well as tissue texture. Automated characterization of gland morphology has been shown to distinguish between intermediate Gleason patterns 3 and 4 with high accuracy. Manifold learning (ML) schemes attempt to generate a low dimensional manifold representation of a higher dimensional feature space while simultaneously preserving nonlinear relationships between object instances. Classification can then be performed in the low dimensional space with high accuracy. However ML is sensitive to the samples contained in the dataset; changes in the dataset may alter the manifold structure. In this paper we present a manifold regularization technique to constrain the low dimensional manifold to a specific range of possible manifold shapes, the range being determined via a statistical shape model of manifolds (SSMM). In this work we demonstrate applications of the SSMM in (1) identifying samples on the manifold which contain noise, defined as those samples which deviate from the SSMM, and (2) accurate out-of-sample extrapolation (OSE) of newly acquired samples onto a manifold constrained by the SSMM. We demonstrate these applications of the SSMM in the context of distinguish between Gleason patterns 3 and 4 using glandular morphologic features in a prostate histopathology dataset of 58 patient studies. Identifying and eliminating noisy samples from the manifold via the SSMM results in a statistically significant improvement in area under the receiver operator characteristic curve (AUC), 0.832 ± 0.048 with removal of noisy samples compared to a AUC of 0.779 ± 0.075 without removal of samples. The use of the SSMM for OSE of newly acquired glands also shows statistically significant improvement in AUC, 0.834 ± 0.051 with the SSMM compared to 0.779 ± 0.054 without the SSMM. Similar results were observed for the synthetic Swiss Roll and Helix datasets.

  • Shape Matching and Registration by Data-driven EM.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2008-03-01
    Zhuowen Tu,Songfeng Zheng,Alan Yuille

    In this paper, we present an efficient and robust algorithm for shape matching, registration, and detection. The task is to geometrically transform a source shape to fit a target shape. The measure of similarity is defined in terms of the amount of transformation required. The shapes are represented by sparse-point or continuous-contour representations depending on the form of the data. We formulate the problem as probabilistic inference using a generative model and the EM algorithm. But this algorithm has problems with initialization and computing the E-step. To address these problems, we define a discriminative model which makes use of shape features. This gives a hybrid algorithm which combines the generative and discriminative models. The resulting algorithm is very fast, due to the effectiveness of shape-features for solving correspondence requiring typically only four iterations. The convergence time of the algorithm is under a second. We demonstrate the effectiveness of the algorithm by testing it on standard datasets, such as MPEG7, for shape matching and by applying it to a range of matching, registration, and foreground/background segmentation problems.

  • Contour based object detection using part bundles.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2010-07-01
    ChengEn Lu,Nagesh Adluru,Haibin Ling,Guangxi Zhu,Longin Jan Latecki

    In this paper we propose a novel framework for contour based object detection from cluttered environments. Given a contour model for a class of objects, it is first decomposed into fragments hierarchically. Then, we group these fragments into part bundles, where a part bundle can contain overlapping fragments. Given a new image with set of edge fragments we develop an efficient voting method using local shape similarity between part bundles and edge fragments that generates high quality candidate part configurations. We then use global shape similarity between the part configurations and the model contour to find optimal configuration. Furthermore, we show that appearance information can be used for improving detection for objects with distinctive texture when model contour does not sufficiently capture deformation of the objects.

  • Modeling 4D Pathological Changes by Leveraging Normative Models.
    Comput. Vis. Image Underst. (IF 2.645) Pub Date : 2016-11-08
    Bo Wang,Marcel Prastawa,Andrei Irimia,Avishek Saha,Wei Liu,S Y Matthew Goh,Paul M Vespa,John D Van Horn,Guido Gerig

    With the increasing use of efficient multimodal 3D imaging, clinicians are able to access longitudinal imaging to stage pathological diseases, to monitor the efficacy of therapeutic interventions, or to assess and quantify rehabilitation efforts. Analysis of such four-dimensional (4D) image data presenting pathologies, including disappearing and newly appearing lesions, represents a significant challenge due to the presence of complex spatio-temporal changes. Image analysis methods for such 4D image data have to include not only a concept for joint segmentation of 3D datasets to account for inherent correlations of subject-specific repeated scans but also a mechanism to account for large deformations and the destruction and formation of lesions (e.g., edema, bleeding) due to underlying physiological processes associated with damage, intervention, and recovery. In this paper, we propose a novel framework that provides a joint segmentation-registration framework to tackle the inherent problem of image registration in the presence of objects not present in all images of the time series. Our methodology models 4D changes in pathological anatomy across time and and also provides an explicit mapping of a healthy normative template to a subject's image data with pathologies. Since atlas-moderated segmentation methods cannot explain appearance and locality pathological structures that are not represented in the template atlas, the new framework provides different options for initialization via a supervised learning approach, iterative semisupervised active learning, and also transfer learning, which results in a fully automatic 4D segmentation method. We demonstrate the effectiveness of our novel approach with synthetic experiments and a 4D multimodal MRI dataset of severe traumatic brain injury (TBI), including validation via comparison to expert segmentations. However, the proposed methodology is generic in regard to different clinical applications requiring quantitative analysis of 4D imaging representing spatio-temporal changes of pathologies.

Contents have been reproduced by permission of the publishers.
上海纽约大学William Glover