当前期刊: arXiv - CS - Graphics Go to current issue    加入关注   
显示样式:        排序: 导出
  • Manifold Approximation by Moving Least-Squares Projection (MMLS)
    arXiv.cs.GR Pub Date : 2016-06-22
    Barak Sober; David Levin

    In order to avoid the curse of dimensionality, frequently encountered in Big Data analysis, there was a vast development in the field of linear and nonlinear dimension reduction techniques in recent years. These techniques (sometimes referred to as manifold learning) assume that the scattered input data is lying on a lower dimensional manifold, thus the high dimensionality problem can be overcome by learning the lower dimensionality behavior. However, in real life applications, data is often very noisy. In this work, we propose a method to approximate $\mathcal{M}$ a $d$-dimensional $C^{m+1}$ smooth submanifold of $\mathbb{R}^n$ ($d \ll n$) based upon noisy scattered data points (i.e., a data cloud). We assume that the data points are located "near" the lower dimensional manifold and suggest a non-linear moving least-squares projection on an approximating $d$-dimensional manifold. Under some mild assumptions, the resulting approximant is shown to be infinitely smooth and of high approximation order (i.e., $O(h^{m+1})$, where $h$ is the fill distance and $m$ is the degree of the local polynomial approximation). The method presented here assumes no analytic knowledge of the approximated manifold and the approximation algorithm is linear in the large dimension $n$. Furthermore, the approximating manifold can serve as a framework to perform operations directly on the high dimensional data in a computationally efficient manner. This way, the preparatory step of dimension reduction, which induces distortions to the data, can be avoided altogether.

  • Fast quasi-conformal regional flattening of the left atrium
    arXiv.cs.GR Pub Date : 2018-11-16
    Marta Nuñez-Garcia; Gabriel Bernardino; Francisco Alarcón; Gala Caixal; Lluís Mont; Oscar Camara; Constantine Butakoff

    Two-dimensional representation of 3D anatomical structures is a simple and intuitive way for analysing patient information across populations and image modalities. It also allows convenient visualizations that can be included in clinical reports for a fast overview of the whole structure. While cardiac ventricles, especially the left ventricle, have an established standard representation (e.g. bull's eye plot), the 2D depiction of the left atrium (LA) is challenging due to its sub-structural complexity including the pulmonary veins (PV) and the left atrial appendage (LAA). Quasi-conformal flattening techniques, successfully applied to cardiac ventricles, require additional constraints in the case of the LA to place the PV and LAA in the same geometrical 2D location for different cases. Some registration-based methods have been proposed but 3D (or 2D) surface registration is time-consuming and prone to errors if the geometries are very different. We propose a novel atrial flattening methodology where a quasi-conformal 2D map of the LA is obtained quickly and without errors related to registration. In our approach, the LA is divided into 5 regions which are then mapped to their analogue two-dimensional regions. A dataset of 67 human left atria from magnetic resonance images (MRI) was studied to derive a population-based 2D LA template representing the averaged relative locations of the PVs and LAA. The clinical application of the proposed methodology is illustrated on different use cases including the integration of MRI and electroanatomical data.

  • Everybody's Talkin': Let Me Talk as You Want
    arXiv.cs.GR Pub Date : 2020-01-15
    Linsen Song; Wayne Wu; Chen Qian; Ran He; Chen Change Loy

    We present a method to edit a target portrait footage by taking a sequence of audio as input to synthesize a photo-realistic video. This method is unique because it is highly dynamic. It does not assume a person-specific rendering network yet capable of translating arbitrary source audio into arbitrary video output. Instead of learning a highly heterogeneous and nonlinear mapping from audio to the video directly, we first factorize each target video frame into orthogonal parameter spaces, i.e., expression, geometry, and pose, via monocular 3D face reconstruction. Next, a recurrent network is introduced to translate source audio into expression parameters that are primarily related to the audio content. The audio-translated expression parameters are then used to synthesize a photo-realistic human subject in each video frame, with the movement of the mouth regions precisely mapped to the source audio. The geometry and pose parameters of the target human portrait are retained, therefore preserving the context of the original video footage. Finally, we introduce a novel video rendering network and a dynamic programming method to construct a temporally coherent and photo-realistic video. Extensive experiments demonstrate the superiority of our method over existing approaches. Our method is end-to-end learnable and robust to voice variations in the source audio.

  • HLO: Half-kernel Laplacian Operator for Surface Smoothing
    arXiv.cs.GR Pub Date : 2019-05-12
    Wei Pan; Xuequan Lu; Yuanhao Gong; Wenming Tang; Jun Liu; Ying He; Guoping Qiu

    This paper presents a simple yet effective method for feature-preserving surface smoothing. Through analyzing the differential property of surfaces, we show that the conventional discrete Laplacian operator with uniform weights is not applicable to feature points at which the surface is non-differentiable and the second order derivatives do not exist. To overcome this difficulty, we propose a Half-kernel Laplacian Operator (HLO) as an alternative to the conventional Laplacian. Given a vertex v, HLO first finds all pairs of its neighboring vertices and divides each pair into two subsets (called half windows); then computes the uniform Laplacians of all such subsets and subsequently projects the computed Laplacians to the full-window uniform Laplacian to alleviate flipping and degeneration. The half window with least regularization energy is then chosen for v. We develop an iterative approach to apply HLO for surface denoising. Our method is conceptually simple and easy to use because it has a single parameter, i.e., the number of iterations for updating vertices. We show that our method can preserve features better than the popular uniform Laplacian-based denoising and it significantly alleviates the shrinkage artifact. Extensive experimental results demonstrate that HLO is better than or comparable to state-of-the-art techniques both qualitatively and quantitatively and that it is particularly good at handling meshes with high noise. We will make our source code publicly available.

  • On Demand Solid Texture Synthesis Using Deep 3D Networks
    arXiv.cs.GR Pub Date : 2020-01-13
    Jorge Gutierrez; Julien Rabin; Bruno Galerne; Thomas Hurtut

    This paper describes a novel approach for on demand volumetric texture synthesis based on a deep learning framework that allows for the generation of high quality 3D data at interactive rates. Based on a few example images of textures, a generative network is trained to synthesize coherent portions of solid textures of arbitrary sizes that reproduce the visual characteristics of the examples along some directions. To cope with memory limitations and computation complexity that are inherent to both high resolution and 3D processing on the GPU, only 2D textures referred to as "slices" are generated during the training stage. These synthetic textures are compared to exemplar images via a perceptual loss function based on a pre-trained deep network. The proposed network is very light (less than 100k parameters), therefore it only requires sustainable training (i.e. few hours) and is capable of very fast generation (around a second for $256^3$ voxels) on a single GPU. Integrated with a spatially seeded PRNG the proposed generator network directly returns an RGB value given a set of 3D coordinates. The synthesized volumes have good visual results that are at least equivalent to the state-of-the-art patch based approaches. They are naturally seamlessly tileable and can be fully generated in parallel.

  • Neural Human Video Rendering: Joint Learning of Dynamic Textures and Rendering-to-Video Translation
    arXiv.cs.GR Pub Date : 2020-01-14
    Lingjie Liu; Weipeng Xu; Marc Habermann; Michael Zollhoefer; Florian Bernard; Hyeongwoo Kim; Wenping Wang; Christian Theobalt

    Synthesizing realistic videos of humans using neural networks has been a popular alternative to the conventional graphics-based rendering pipeline due to its high efficiency. Existing works typically formulate this as an image-to-image translation problem in 2D screen space, which leads to artifacts such as over-smoothing, missing body parts, and temporal instability of fine-scale detail, such as pose-dependent wrinkles in the clothing. In this paper, we propose a novel human video synthesis method that approaches these limiting factors by explicitly disentangling the learning of time-coherent fine-scale details from the embedding of the human in 2D screen space. More specifically, our method relies on the combination of two convolutional neural networks (CNNs). Given the pose information, the first CNN predicts a dynamic texture map that contains time-coherent high-frequency details, and the second CNN conditions the generation of the final video on the temporally coherent output of the first CNN. We demonstrate several applications of our approach, such as human reenactment and novel view synthesis from monocular video, where we show significant improvement over the state of the art both qualitatively and quantitatively.

  • Unsupervised K-modal Styled Content Generation
    arXiv.cs.GR Pub Date : 2020-01-10
    Omry Sendik; Dani Lischinski; Daniel Cohen-Or

    The emergence of generative models based on deep neural networks has recently enabled the automatic generation of massive amounts of graphical content, both in 2D and in 3D. Generative Adversarial Networks (GANs) and style control mechanisms, such as Adaptive Instance Normalization (AdaIN), have proved particularly effective in this context, culminating in the state-of-the-art StyleGAN architecture. While such models are able to learn diverse distributions, provided a sufficiently large training set, they are not well-suited for scenarios where the distribution of the training data exhibits a multi-modal behavior. In such cases, reshaping a uniform or normal distribution over the latent space into a complex multi-modal distribution in the data domain is challenging, and the quality of the generated samples may suffer as a result. Furthermore, the different modes are entangled with the other attributes of the data, and thus, mode transitions cannot be well controlled via continuous style parameters. In this paper, we introduce uMM-GAN, a novel architecture designed to better model such multi-modal distributions, in an unsupervised fashion. Building upon the StyleGAN architecture, our network learns multiple modes, in a completely unsupervised manner, and combines them using a set of learned weights. Quite strikingly, we show that this approach is capable of homing onto the natural modes in the training set, and effectively approximates the complex distribution as a superposition of multiple simple ones. We demonstrate that uMM-GAN copes better with multi-modal distributions, while at the same time disentangling between the modes and their style, thereby providing an independent degree of control over the generated content.

  • Efficient 3D Reconstruction and Streaming for Group-Scale Multi-Client Live Telepresence
    arXiv.cs.GR Pub Date : 2019-08-08
    Patrick Stotko; Stefan Krumpen; Michael Weinmann; Reinhard Klein

    Sharing live telepresence experiences for teleconferencing or remote collaboration receives increasing interest with the recent progress in capturing and AR/VR technology. Whereas impressive telepresence systems have been proposed on top of on-the-fly scene capture, data transmission and visualization, these systems are restricted to the immersion of single or up to a low number of users into the respective scenarios. In this paper, we direct our attention on immersing significantly larger groups of people into live-captured scenes as required in education, entertainment or collaboration scenarios. For this purpose, rather than abandoning previous approaches, we present a range of optimizations of the involved reconstruction and streaming components that allow the immersion of a group of more than 24 users within the same scene - which is about a factor of 6 higher than in previous work - without introducing further latency or changing the involved consumer hardware setup. We demonstrate that our optimized system is capable of generating high-quality scene reconstructions as well as providing an immersive viewing experience to a large group of people within these live-captured scenes.

  • OO-VR: NUMA Friendly Object-Oriented VR Rendering Framework For Future NUMA-Based Multi-GPU Systems
    arXiv.cs.GR Pub Date : 2020-01-08
    Chenhao Xie; Xin Fu; Mingsong Chen; Shuaiwen Leon Song

    With the strong computation capability, NUMA-based multi-GPU system is a promising candidate to provide sustainable and scalable performance for Virtual Reality. However, the entire multi-GPU system is viewed as a single GPU which ignores the data locality in VR rendering during the workload distribution, leading to tremendous remote memory accesses among GPU models. By conducting comprehensive characterizations on different kinds of parallel rendering frameworks, we observe that distributing the rendering object along with its required data per GPM can reduce the inter-GPM memory accesses. However, this object-level rendering still faces two major challenges in NUMA-based multi-GPU system: (1) the large data locality between the left and right views of the same object and the data sharing among different objects and (2) the unbalanced workloads induced by the software-level distribution and composition mechanisms. To tackle these challenges, we propose object-oriented VR rendering framework (OO-VR) that conducts the software and hardware co-optimization to provide a NUMA friendly solution for VR multi-view rendering in NUMA-based multi-GPU systems. We first propose an object-oriented VR programming model to exploit the data sharing between two views of the same object and group objects into batches based on their texture sharing levels. Then, we design an object aware runtime batch distribution engine and distributed hardware composition unit to achieve the balanced workloads among GPMs. Finally, evaluations on our VR featured simulator show that OO-VR provides 1.58x overall performance improvement and 76% inter-GPM memory traffic reduction over the state-of-the-art multi-GPU systems. In addition, OO-VR provides NUMA friendly performance scalability for the future larger multi-GPU scenarios with ever increasing asymmetric bandwidth between local and remote memory.

  • Deep Learning for Free-Hand Sketch: A Survey
    arXiv.cs.GR Pub Date : 2020-01-08
    Peng Xu

    Free-hand sketches are highly hieroglyphic and illustrative, which have been widely used by humans to depict objects or stories from ancient times to the present. The recent prevalence of touchscreen devices has made sketch creation a much easier task than ever and consequently made sketch-oriented applications increasingly more popular. The prosperity of deep learning has also immensely promoted the research for the free-hand sketch. This paper presents a comprehensive survey of the free-hand sketch oriented deep learning techniques. The main contents of this survey include: (i) The intrinsic traits and domain-unique challenges of the free-hand sketch are discussed, to clarify the essential differences between free-hand sketch and other data modalities, e.g., natural photo. (ii) The development of the free-hand sketch community in the deep learning era is reviewed, by surveying the existing datasets, research topics, and the state-of-the-art methods via a detailed taxonomy. (iii) Moreover, the bottlenecks, open problems, and potential research directions of this community have also been discussed to promote the future works.

  • Digesting the Elephant -- Experiences with Interactive Production Quality Path Tracing of the Moana Island Scene
    arXiv.cs.GR Pub Date : 2020-01-08
    Ingo Wald; Bruce Cherniak; Will Usher; Carson Brownlee; Attila Afra; Johannes Guenther; Jefferson Amstutz; Tim Rowley; Valerio Pascucci; Chris R Johnson; Jim Jeffers

    New algorithmic and hardware developments over the past two decades have enabled interactive ray tracing of small to modest sized scenes, and are finding growing popularity in scientific visualization and games. However, interactive ray tracing has not been as widely explored in the context of production film rendering, where challenges due to the complexity of the models and, from a practical standpoint, their unavailability to the wider research community, have posed significant challenges. The recent release of the Disney Moana Island Scene has made one such model available to the community for experimentation. In this paper, we detail the challenges posed by this scene to an interactive ray tracer, and the solutions we have employed and developed to enable interactive path tracing of the scene with full geometric and shading detail, with the goal of providing insight and guidance to other researchers.

  • Interactive Visualisation of Hierarchical Quantitative Data: An Evaluation
    arXiv.cs.GR Pub Date : 2019-08-04
    Linda Woodburn; Yalong Yang; Kim Marriott

    We have compared three common visualisations for hierarchical quantitative data, treemaps, icicle plots and sunburst charts as well as a semicircular variant of sunburst charts we call the sundown chart. In a pilot study, we found that the sunburst chart was least preferred. In a controlled study with 12 participants, we compared treemaps, icicle plots and sundown charts. Treemap was the least preferred and had a slower performance on a basic navigation task and slower performance and accuracy in hierarchy understanding tasks. The icicle plot and sundown chart had similar performance with slight user preference for the icicle plot.

  • MW-GAN: Multi-Warping GAN for Caricature Generation with Multi-Style Geometric Exaggeration
    arXiv.cs.GR Pub Date : 2020-01-07
    Haodi Hou; Jing Huo; Jing Wu; Yu-Kun Lai; Yang Gao

    Given an input face photo, the goal of caricature generation is to produce stylized, exaggerated caricatures that share the same identity as the photo. It requires simultaneous style transfer and shape exaggeration with rich diversity, and meanwhile preserving the identity of the input. To address this challenging problem, we propose a novel framework called Multi-Warping GAN (MW-GAN), including a style network and a geometric network that are designed to conduct style transfer and geometric exaggeration respectively. We bridge the gap between the style and landmarks of an image with corresponding latent code spaces by a dual way design, so as to generate caricatures with arbitrary styles and geometric exaggeration, which can be specified either through random sampling of latent code or from a given caricature sample. Besides, we apply identity preserving loss to both image space and landmark space, leading to a great improvement in quality of generated caricatures. Experiments show that caricatures generated by MW-GAN have better quality than existing methods.

  • CvxNets: Learnable Convex Decomposition
    arXiv.cs.GR Pub Date : 2019-09-12
    Boyang Deng; Kyle Genova; Soroosh Yazdani; Sofien Bouaziz; Geoffrey Hinton; Andrea Tagliasacchi

    Any solid object can be decomposed into a collection of convex polytopes (in short, convexes). When a small number of convexes are used, such a decomposition can bethought of as a piece-wise approximation of the geometry.This decomposition is fundamental to real-time physics simulation in computer graphics, where it creates a unified representation of dynamic geometry for collision detection. A convex object also has the property of being simultaneously an explicit and implicit representation: one can interpret it explicitly as a mesh derived by computing the vertices of a convex hull, or implicitly as the collection of half-space constraints or support functions. Their implicit representation makes them particularly well suited for neural net-work training, as they abstract away from the topology of the geometry they need to represent. We introduce a net-work architecture to represent a low dimensional family of convexes. This family is automatically derived via an auto-encoding process. We investigate the applications of this architecture including automatic convex decomposition, image to 3D reconstruction, and part-based shape retrieval.

  • Inverse Rendering Techniques for Physically Grounded Image Editing
    arXiv.cs.GR Pub Date : 2019-12-25
    Kevin Karsch

    From a single picture of a scene, people can typically grasp the spatial layout immediately and even make good guesses at materials properties and where light is coming from to illuminate the scene. For example, we can reliably tell which objects occlude others, what an object is made of and its rough shape, regions that are illuminated or in shadow, and so on. It is interesting how little is known about our ability to make these determinations; as such, we are still not able to robustly "teach" computers to make the same high-level observations as people. This document presents algorithms for understanding intrinsic scene properties from single images. The goal of these inverse rendering techniques is to estimate the configurations of scene elements (geometry, materials, luminaires, camera parameters, etc) using only information visible in an image. Such algorithms have applications in robotics and computer graphics. One such application is in physically grounded image editing: photo editing made easier by leveraging knowledge of the physical space. These applications allow sophisticated editing operations to be performed in a matter of seconds, enabling seamless addition, removal, or relocation of objects in images.

  • Painting Many Pasts: Synthesizing Time Lapse Videos of Paintings
    arXiv.cs.GR Pub Date : 2020-01-04
    Amy Zhao; Guha Balakrishnan; Kathleen M. Lewis; Frédo Durand; John V. Guttag; Adrian V. Dalca

    We introduce a new video synthesis task: synthesizing time lapse videos depicting how a given painting might have been created. Artists paint using unique combinations of brushes, strokes, colors, and layers. There are often many possible ways to create a given painting. Our goal is to learn to capture this rich range of possibilities. Creating distributions of long-term videos is a challenge for learning-based video synthesis methods. We present a probabilistic model that, given a single image of a completed painting, recurrently synthesizes steps of the painting process. We implement this model as a convolutional neural network, and introduce a training scheme to facilitate learning from a limited dataset of painting time lapses. We demonstrate that this model can be used to sample many time steps, enabling long-term stochastic video synthesis. We evaluate our method on digital and watercolor paintings collected from video websites, and show that human raters find our synthesized videos to be similar to time lapses produced by real artists.

  • TCM-ICP: Transformation Compatibility Measure for Registering Multiple LIDAR Scans
    arXiv.cs.GR Pub Date : 2020-01-04
    Aby Thomas; Adarsh Sunilkumar; Shankar Shylesh; Aby Abahai T.; Subhasree Methirumangalath; Dong Chen; Jiju Peethambaran

    Rigid registration of multi-view and multi-platform LiDAR scans is a fundamental problem in 3D mapping, robotic navigation, and large-scale urban modeling applications. Data acquisition with LiDAR sensors involves scanning multiple areas from different points of view, thus generating partially overlapping point clouds of the real world scenes. Traditionally, ICP (Iterative Closest Point) algorithm is used to register the acquired point clouds together to form a unique point cloud that captures the scanned real world scene. Conventional ICP faces local minima issues and often needs a coarse initial alignment to converge to the optimum. In this work, we present an algorithm for registering multiple, overlapping LiDAR scans. We introduce a geometric metric called Transformation Compatibility Measure (TCM) which aids in choosing the most similar point clouds for registration in each iteration of the algorithm. The LiDAR scan most similar to the reference LiDAR scan is then transformed using simplex technique. An optimization of the transformation using gradient descent and simulated annealing techniques are then applied to improve the resulting registration. We evaluate the proposed algorithm on four different real world scenes and experimental results shows that the registration performance of the proposed method is comparable or superior to the traditionally used registration methods. Further, the algorithm achieves superior registration results even when dealing with outliers.

  • Can NetGAN be improved on short random walks?
    arXiv.cs.GR Pub Date : 2019-05-13
    Amir Jalilifard; Vinicius Caridá; Alex Mansano; Rogers Cristo

    Graphs are useful structures that can model several important real-world problems. Recently, learning graphs have drawn considerable attention, leading to the proposal of new methods for learning these data structures. One of these studies produced NetGAN, a new approach for generating graphs via random walks. Although NetGAN has shown promising results in terms of accuracy in the tasks of generating graphs and link prediction, the choice of vertices from which it starts random walks can lead to inconsistent and highly variable results, especially when the length of walks is short. As an alternative to random starting, this study aims to establish a new method for initializing random walks from a set of dense vertices. We purpose estimating the importance of a node based on the inverse of its influence over the whole vertices of its neighborhood through random walks of different sizes. The proposed method manages to achieve significantly better accuracy, less variance and lesser outliers.

  • Lightform: Procedural Effects for Projected AR
    arXiv.cs.GR Pub Date : 2019-12-25
    Brittany Factura; Laura LaPerche; Phil Reyneri; Brett Jones; Kevin Karsch

    Projected augmented reality, also called projection mapping or video mapping, is a form of augmented reality that uses projected light to directly augment 3D surfaces, as opposed to using pass-through screens or headsets. The value of projected AR is its ability to add a layer of digital content directly onto physical objects or environments in a way that can be instantaneously viewed by multiple people, unencumbered by a screen or additional setup. Because projected AR typically involves projecting onto non-flat, textured objects (especially those that are conventionally not used as projection surfaces), the digital content needs to be mapped and aligned to precisely fit the physical scene to ensure a compelling experience. Current projected AR techniques require extensive calibration at the time of installation, which is not conducive to iteration or change, whether intentional (the scene is reconfigured) or not (the projector is bumped or settles). The workflows are undefined and fragmented, thus making it confusing and difficult for many to approach projected AR. For example, a digital artist may have the software expertise to create AR content, but could not complete an installation without experience in mounting, blending, and realigning projector(s); the converse is true for many A/V installation teams/professionals. Projection mapping has therefore been limited to high-end event productions, concerts, and films, because it requires expensive, complex tools, and skilled teams ($100K+ budgets). Lightform provides a technology that makes projected AR approachable, practical, intelligent, and robust through integrated hardware and computer-vision software. Lightform brings together and unites a currently fragmented workflow into a single cohesive process that provides users with an approachable and robust method to create and control projected AR experiences.

  • Animals in Virtual Environments
    arXiv.cs.GR Pub Date : 2019-12-30
    Hemal Naik; Renaud Bastien; Nassir Navab; Iain Couzin

    The core idea in an XR (VR/MR/AR) application is to digitally stimulate one or more sensory systems (e.g. visual, auditory, olfactory) of the human user in an interactive way to achieve an immersive experience. Since the early 2000s biologists have been using Virtual Environments (VE) to investigate the mechanisms of behavior in non-human animals including insect, fish, and mammals. VEs have become reliable tools for studying vision, cognition, and sensory-motor control in animals. In turn, the knowledge gained from studying such behaviors can be harnessed by researchers designing biologically inspired robots, smart sensors, and multi-agent artificial intelligence. VE for animals is becoming a widely used application of XR technology but such applications have not previously been reported in the technical literature related to XR. Biologists and computer scientists can benefit greatly from deepening interdisciplinary research in this emerging field and together we can develop new methods for conducting fundamental research in behavioral sciences and engineering. To support our argument we present this review which provides an overview of animal behavior experiments conducted in virtual environments.

Contents have been reproduced by permission of the publishers.
上海纽约大学William Glover