当前期刊: arXiv - CS - Mathematical Software Go to current issue    加入关注   
显示样式:        排序: IF: - GO 导出
  • A compute-bound formulation of Galerkin model reduction for linear time-invariant dynamical systems
    arXiv.cs.MS Pub Date : 2020-09-24
    Francesco Rizzi; Eric J. Parish; Patrick J. Blonigan; John Tencer

    This work aims to advance computational methods for projection-based reduced order models (ROMs) of linear time-invariant (LTI) dynamical systems. For such systems, current practice relies on ROM formulations expressing the state as a rank-1 tensor (i.e., a vector), leading to computational kernels that are memory bandwidth bound and, therefore, ill-suited for scalable performance on modern many-core

  • Portable high-order finite element kernels I: Streaming Operations
    arXiv.cs.MS Pub Date : 2020-09-23
    Noel Chalmers; Tim Warburton

    This paper is devoted to the development of highly efficient kernels performing vector operations relevant in linear system solvers. In particular, we focus on the low arithmetic intensity operations (i.e., streaming operations) performed within the conjugate gradient iterative method, using the parameters specified in the CEED benchmark problems for high-order hexahedral finite elements. We propose

  • QR and LQ Decomposition Matrix Backpropagation Algorithms for Square, Wide, and Deep Matrices and Their Software Implementation
    arXiv.cs.MS Pub Date : 2020-09-19
    Denisa A. O. Roberts; Lucas R. Roberts

    This article presents matrix backpropagation algorithms for the QR decomposition of matrices $A_{m,n}$, that are either square (m = n), wide (m < n), or deep (m > n), with rank $k = min(m, n)$. Furthermore, we derive a novel matrix backpropagation result for the LQ decomposition for deep input matrices. Differentiable QR decomposition offers a numerically stable, computationally efficient method to

  • HDGlab: An open-source implementation of the hybridisable discontinuous Galerkin method in MATLAB
    arXiv.cs.MS Pub Date : 2020-09-16
    Matteo Giacomini; Ruben Sevilla; Antonio Huerta

    This paper presents HDGlab, an open source MATLAB implementation of the hybridisable discontinuous Galerkin (HDG) method. The main goal is to provide a detailed description of both the HDG method for elliptic problems and its implementation available in HDGlab. Ultimately, this is expected to make this relatively new advanced discretisation method more accessible to the computational engineering community

  • Accelerating Domain Propagation: an Efficient GPU-Parallel Algorithm over Sparse Matrices
    arXiv.cs.MS Pub Date : 2020-09-16
    Boro Sofranac; Ambros Gleixner; Sebastian Pokutta

    Fast domain propagation of linear constraints has become a crucial component of today's best algorithms and solvers for mixed integer programming and pseudo-boolean optimization to achieve peak solving performance. Irregularities in the form of dynamic algorithmic behaviour, dependency structures, and sparsity patterns in the input data make efficient implementations of domain propagation on GPUs and

  • m-arcsinh: An Efficient and Reliable Function for SVM and MLP in scikit-learn
    arXiv.cs.MS Pub Date : 2020-09-16
    Luca Parisi

    This paper describes the 'm-arcsinh', a modified ('m-') version of the inverse hyperbolic sine function ('arcsinh'). Kernel and activation functions enable Machine Learning (ML)-based algorithms, such as Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP), to learn from data in a supervised manner. m-arcsinh, implemented in the open source Python library 'scikit-learn', is hereby presented

  • Dune-CurvedSurfaceGrid -- A Dune module for surface parametrization
    arXiv.cs.MS Pub Date : 2020-09-10
    Simon Praetorius; Florian Stenger

    In this paper we introduce and describe an implementation of curved surface geometries within the Dune framework for grid-based discretizations. Therefore, we employ the abstraction of geometries as local-functions bound to a grid element, and the abstraction of a grid as connectivity of elements together with a grid-function that can be localized to the elements to provide element local parametrizations

  • Performance Analysis of FEM Solvers on Practical Electromagnetic Problems
    arXiv.cs.MS Pub Date : 2020-09-04
    Gergely Máté Kiss; Jan Kaska; Roberto André Henrique de Oliveira; Olena Rubanenko; Balázs Tóth

    The paper presents a comparative analysis of different commercial and academic software. The comparison aims to examine how the integrated adaptive grid refinement methodologies can deal with challenging, electromagnetic-field related problems. For this comparison, two benchmark problems were examined in the paper. The first example is a solution of an L-shape domain like test problem, which has a

  • distr6: R6 Object-Oriented Probability Distributions Interface in R
    arXiv.cs.MS Pub Date : 2020-09-07
    Raphael Sonabend; Franz Kiraly

    distr6 is an object-oriented (OO) probability distributions interface leveraging the extensibility and scalability of R6, and the speed and efficiency of Rcpp. Over 50 probability distributions are currently implemented in the package with `core' methods including density, distribution, and generating functions, and more `exotic' ones including hazards and distribution function anti-derivatives. In

  • Introduction to Medical Image Registration with DeepReg, Between Old and New
    arXiv.cs.MS Pub Date : 2020-08-29
    N. Montana Brown; Y. Fu; S. U. Saeed; A. Casamitjana; Z. M. C. Baum; R. Delaunay; Q. Yang; A. Grimwood; Z. Min; E. Bonmati; T. Vercauteren; M. J. Clarkson; Y. Hu

    This document outlines a tutorial to get started with medical image registration using the open-source package DeepReg. The basic concepts of medical image registration are discussed, linking classical methods to newer methods using deep learning. Two iterative, classical algorithms using optimisation and one learning-based algorithm using deep learning are coded step-by-step using DeepReg utilities

  • A Survey of Singular Value Decomposition Methods for Distributed Tall/Skinny Data
    arXiv.cs.MS Pub Date : 2020-09-02
    Drew Schmidt

    The Singular Value Decomposition (SVD) is one of the most important matrix factorizations, enjoying a wide variety of applications across numerous application domains. In statistics and data analysis, the common applications of SVD such as Principal Components Analysis (PCA) and linear regression. Usually these applications arise on data that has far more rows than columns, so-called "tall/skinny"

  • TriCG and TriMR: Two Iterative Methods for Symmetric Quasi-Definite Systems
    arXiv.cs.MS Pub Date : 2020-08-28
    Alexis Montoison; Dominique Orban

    We introduce iterative methods named TriCG and TriMR for solving symmetric quasi-definite systems based on the orthogonal tridiagonalization process proposed by Saunders, Simon and Yip in 1988. TriCG and TriMR are tantamount to preconditioned block-CG and block-MINRES with two right-hand sides in which the two approximate solutions are summed at each iteration, but require less storage and work per

  • GPU-accelerating ImageJ Macro image processing workflows using CLIJ
    arXiv.cs.MS Pub Date : 2020-08-26
    Daniela Vorkel; Robert Haase

    This chapter introduces GPU-accelerated image processing in ImageJ/FIJI. The reader is expected to have some pre-existing knowledge of ImageJ Macro programming. Core concepts such as variables, for-loops, and functions are essential. The chapter provides basic guidelines for improved performance in typical image processing workflows. We present in a step-by-step tutorial how to translate a pre-existing

  • BSF-skeleton: user manual
    arXiv.cs.MS Pub Date : 2020-08-22
    Leonid B. Sokolinsky

    The BSF-skeleton is designed for creating parallel programs in C++ using the MPI library. The scope of the BSF-skeleton is cluster computing systems and iterative numerical algorithms of high computational complexity. The BSF-skeleton completely encapsulates all aspects that are associated with parallelizing a program on a cluster computing system. The source code of the BSF-skeleton is freely available

  • Transforming Probabilistic Programs for Model Checking
    arXiv.cs.MS Pub Date : 2020-08-21
    Ryan Bernstein; Matthijs Vákár; Jeannette Wing

    Probabilistic programming is perfectly suited to reliable and transparent data science, as it allows the user to specify their models in a high-level language without worrying about the complexities of how to fit the models. Static analysis of probabilistic programs presents even further opportunities for enabling a high-level style of programming, by automating time-consuming and error-prone tasks

  • Evaluating the Performance of NVIDIA's A100 Ampere GPU for Sparse Linear Algebra Computations
    arXiv.cs.MS Pub Date : 2020-08-19
    Yuhsiang Mike Tsai; Terry Cojean; Hartwig Anzt

    GPU accelerators have become an important backbone for scientific high performance computing, and the performance advances obtained from adopting new GPU hardware are significant. In this paper we take a first look at NVIDIA's newest server line GPU, the A100 architecture part of the Ampere generation. Specifically, we assess its performance for sparse linear algebra operations that form the backbone

  • Elmer FEM-Dakota: A unified open-source computational framework for electromagnetics and data analytics
    arXiv.cs.MS Pub Date : 2020-08-16
    Anjali Sandip

    Open-source electromagnetic design software, Elmer FEM, was interfaced with data analytics toolkit, Dakota. Furthermore, the coupled software was validated against a benchmark test. The interface developed provides a unified open-source computational framework for electromagnetics and data analytics. Its key features include uncertainty quantification, surrogate modelling and parameter studies. This

  • PyMGRIT: A Python Package for the parallel-in-time method MGRIT
    arXiv.cs.MS Pub Date : 2020-08-12
    Jens Hahne; Stephanie Friedhoff; Matthias Bolten

    In this paper, we introduce the Python framework PyMGRIT, which implements the multigrid-reduction-in-time (MGRIT) algorithm for solving the (non-)linear systems arising from the discretization of time-dependent problems. The MGRIT algorithm is a reduction-based iterative method that allows parallel-in-time simulations, i. e., calculating multiple time steps simultaneously in a simulation, by using

  • Randomized Projection for Rank-Revealing Matrix Factorizations and Low-Rank Approximations
    arXiv.cs.MS Pub Date : 2020-08-10
    Jed A. Duersch; Ming Gu

    Rank-revealing matrix decompositions provide an essential tool in spectral analysis of matrices, including the Singular Value Decomposition (SVD) and related low-rank approximation techniques. QR with Column Pivoting (QRCP) is usually suitable for these purposes, but it can be much slower than the unpivoted QR algorithm. For large matrices, the difference in performance is due to increased communication

  • EagerPy: Writing Code That Works Natively with PyTorch, TensorFlow, JAX, and NumPy
    arXiv.cs.MS Pub Date : 2020-08-10
    Jonas Rauber; Matthias Bethge; Wieland Brendel

    EagerPy is a Python framework that lets you write code that automatically works natively with PyTorch, TensorFlow, JAX, and NumPy. Library developers no longer need to choose between supporting just one of these frameworks or reimplementing the library for each framework and dealing with code duplication. Users of such libraries can more easily switch frameworks without being locked in by a specific

  • A parallel structured divide-and-conquer algorithm for symmetric tridiagonal eigenvalue problems
    arXiv.cs.MS Pub Date : 2020-08-05
    Xia Liao; Shengguo Li; Yutong Lu; Jose E. Roman

    In this paper, a parallel structured divide-and-conquer (PSDC) eigensolver is proposed for symmetric tridiagonal matrices based on ScaLAPACK and a parallel structured matrix multiplication algorithm, called PSMMA. Computing the eigenvectors via matrix-matrix multiplications is the most computationally expensive part of the divide-and-conquer algorithm, and one of the matrices involved in such multiplications

  • Improved Time Warp Edit Distance -- A Parallel Dynamic Program in Linear Memory
    arXiv.cs.MS Pub Date : 2020-07-31
    Garrett Wright

    Edit Distance is a classic family of dynamic programming problems, among which Time Warp Edit Distance refines the problem with the notion of a metric and temporal elasticity. A novel Improved Time Warp Edit Distance algorithm that is both massively parallelizable and requiring only linear storage is presented. This method uses the procession of a three diagonal band to cover the original dynamic program

  • A new framework for the computation of Hessians
    arXiv.cs.MS Pub Date : 2020-07-29
    Robert M. Gower; Margarida P. Mello

    We investigate the computation of Hessian matrices via Automatic Differentiation, using a graph model and an algebraic model. The graph model reveals the inherent symmetries involved in calculating the Hessian. The algebraic model, based on Griewank and Walther's state transformations, synthesizes the calculation of the Hessian as a formula. These dual points of view, graphical and algebraic, lead

  • The ITensor Software Library for Tensor Network Calculations
    arXiv.cs.MS Pub Date : 2020-07-28
    Matthew Fishman; Steven R. White; E. Miles Stoudenmire

    ITensor is a system for programming tensor network calculations with an interface modeled on tensor diagram notation, which allows users to focus on the connectivity of a tensor network without manually bookkeeping tensor indices. The ITensor interface rules out common programming errors and enables rapid prototyping of tensor network algorithms. After discussing the philosophy behind the ITensor approach

  • multivar_horner: a python package for computing Horner factorisations of multivariate polynomials
    arXiv.cs.MS Pub Date : 2020-07-26
    Jannik MichelfeitTechnische Universität Dresden

    Many applications in the sciences require numerically stable and computationally efficient evaluation of multivariate polynomials. Finding beneficial representations of polynomials, such as Horner factorisations, is therefore crucial. multivar_horner, the python package presented here, is the first open source software for computing multivariate Horner factorisations. This work briefly outlines the

  • Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM
    arXiv.cs.MS Pub Date : 2020-07-26
    Zijing Gu

    We implemented and optimized matrix multiplications between dense and block-sparse matrices on CUDA. We leveraged TVM, a deep learning compiler, to explore the schedule space of the operation and generate efficient CUDA code. With the automatic parameter tuning in TVM, our cross-thread reduction based implementation achieved competitive or better performance compared with other state-of-the-art frameworks

  • An Adaptive Solver for Systems of Linear Equations
    arXiv.cs.MS Pub Date : 2020-07-22
    Conrad Sanderson; Ryan Curtin

    Computational implementations for solving systems of linear equations often rely on a one-size-fits-all approach based on LU decomposition of dense matrices stored in column-major format. Such solvers are typically implemented with the aid of the xGESV set of functions available in the low-level LAPACK software, with the aim of reducing development time by taking advantage of well-tested routines.

  • Tegula -- exploring a galaxy of two-dimensional periodic tilings
    arXiv.cs.MS Pub Date : 2020-07-21
    Rüdiger Zeller; Olaf Delgado Friedrichs; Daniel H. Huson

    Periodic tilings play a role in the decorative arts, in construction and in crystal structures. Combinatorial tiling theory allows the systematic generation, visualization and exploration of such tilings of the plane, sphere and hyperbolic plane, using advanced algorithms and software.Here we present a "galaxy" of tilings that consists of the set of all 2.4 billion different types of periodic tilings

  • Approaches to the implementation of generalized complex numbers in the Julia language
    arXiv.cs.MS Pub Date : 2020-07-19
    Migran N. Gevorkyan; Anna V. Korolkova; Dmitry S. Kulyabov

    In problems of mathematical physics, to study the structures of spaces using the Cayley-Klein models in theoretical calculations, the use of generalized complex numbers is required. In the case of computational experiments, such tasks require their high-quality implementation in a programming language. The proposed small implementation of generalized complex numbers in modern programming languages

  • Languages for modeling the RED active queue management algorithms: Modelica vs. Julia
    arXiv.cs.MS Pub Date : 2020-07-18
    Anna Maria Yu. Apreutesey; Anna V. Korolkova; Dmitry S. Kulyabov

    This work is devoted to the study of the capabilities of the Modelica and Julia programming languages for the implementation of a continuously discrete paradigm in modeling hybrid systems that contain both continuous and discrete aspects of behavior. A system consisting of an incoming stream that is processed according to the Transmission Control Protocol (TCP) and a router that processes traffic using

  • Accelerating Geometric Multigrid Preconditioning with Half-Precision Arithmetic on GPUs
    arXiv.cs.MS Pub Date : 2020-07-15
    Kyaw L. Oo; Andreas Vogel

    With the hardware support for half-precision arithmetic on NVIDIA V100 GPUs, high-performance computing applications can benefit from lower precision at appropriate spots to speed up the overall execution time. In this paper, we investigate a mixed-precision geometric multigrid method to solve large sparse systems of equations stemming from discretization of elliptic PDEs. While the final solution

  • Meta-analysis parameters computation: a Python approach to facilitate the crossing of experimental conditions
    arXiv.cs.MS Pub Date : 2020-07-13
    Flavien Quijoux; Charles Truong; Aliénor Vienne-Jumeau; Laurent Oudre; François BERTIN-HUGAULT; Philippe ZAWIEJA; Marie LEFEVRE; Pierre-Paul VIDAL; Damien RICARD

    Meta-analysis is a data aggregation method that establishes an overall and objective level of evidence based on the results of several studies. It is necessary to maintain a high level of homogeneity in the aggregation of data collected from a systematic literature review. However, the current tools do not allow a cross-referencing of the experimental conditions that could explain the heterogeneity

  • A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic
    arXiv.cs.MS Pub Date : 2020-07-13
    Ahmad Abdelfattah; Hartwig Anzt; Erik G. Boman; Erin Carson; Terry Cojean; Jack Dongarra; Mark Gates; Thomas Grützmacher; Nicholas J. Higham; Sherry Li; Neil Lindquist; Yang Liu; Jennifer Loe; Piotr Luszczek; Pratik Nayak; Sri Pranesh; Siva Rajamanickam; Tobias Ribizel; Barry Smith; Kasia Swirydowicz; Stephen Thomas; Stanimire Tomov; Yaohung M. Tsai; Ichitaro Yamazaki; Urike Meier Yang

    Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the server-line products are increasingly featuring low-precision special function units, such as the NVIDIA tensor cores in ORNL's Summit supercomputer providing more than

  • fenicsR13: A Tensorial Mixed Finite Element Solver for the Linear R13 Equations Using the FEniCS Computing Platform
    arXiv.cs.MS Pub Date : 2020-07-12
    Lambert Theisen; Manuel Torrilhon

    We present a mixed finite element solver for the linearized R13 equations of non-equilibrium gas dynamics. The Python implementation builds upon the software tools provided by the FEniCS computing platform. We describe a new tensorial approach utilizing the extension capabilities of FEniCS's Unified Form Language (UFL) to define required differential operators for tensors above second degree. The presented

  • A Novel Approach to Generate Correctly Rounded Math Libraries for New Floating Point Representations
    arXiv.cs.MS Pub Date : 2020-07-09
    Jay P. Lim; Mridul Aanjaneya; John Gustafson; Santosh Nagarakatte

    Given the importance of floating-point~(FP) performance in numerous domains, several new variants of FP and its alternatives have been proposed (e.g., Bfloat16, TensorFloat32, and Posits). These representations do not have correctly rounded math libraries. Further, the use of existing FP libraries for these new representations can produce incorrect results. This paper proposes a novel methodology for

  • ACORNS: An Easy-To-Use Code Generator for Gradients and Hessians
    arXiv.cs.MS Pub Date : 2020-07-09
    Deshana Desai; Etai Shuchatowitz; Zhongshi Jiang; Teseo Schneider; Daniele Panozzo

    The computation of first and second-order derivatives is a staple in many computing applications, ranging from machine learning to scientific computing. We propose an algorithm to automatically differentiate algorithms written in a subset of C99 code and its efficient implementation as a Python script. We demonstrate that our algorithm enables automatic, reliable, and efficient differentiation of common

  • Blends in Maple
    arXiv.cs.MS Pub Date : 2020-07-09
    Robert M. Corless; Erik Postma

    A blend of two Taylor series for the same smooth real- or complex-valued function of a single variable can be useful for approximation. We use an explicit formula for a two-point Hermite interpolational polynomial to construct such blends. We show a robust Maple implementation that can stably and efficiently evaluate blends using linear-cost Horner form, evaluate their derivatives to arbitrary order

  • Algorithmic differentiation of hyperbolic flow problems
    arXiv.cs.MS Pub Date : 2020-07-10
    Michael Herty; Jonathan Hüser; Uwe Naumann; Thomas Schilden; Wolfgang Schröder

    We are interested in the development of an algorithmic differentiation framework for computing approximations to tangent vectors to scalar and systems of hyperbolic partial differential equations. The main difficulty of such a numerical method is the presence of shock waves that are resolved by proposing a numerical discretization of the calculus introduced in Bressan and Marson [Rend. Sem. Mat. Univ

  • A Task-based Multi-shift QR/QZ Algorithm with Aggressive Early Deflation
    arXiv.cs.MS Pub Date : 2020-07-07
    Mirko Myllykoski

    The QR algorithm is one of the three phases in the process of computing the eigenvalues and the eigenvectors of a dense nonsymmetric matrix. This paper describes a task-based QR algorithm for reducing an upper Hessenberg matrix to real Schur form. The algorithm also supports generalized eigenvalue problems (QZ algorithm) but this paper focuses more on the standard case. The algorithm inherits previous

  • volesti: Volume Approximation and Sampling for Convex Polytopes in R
    arXiv.cs.MS Pub Date : 2020-07-03
    Apostolos Chalkis; Vissarion Fisikopoulos

    Sampling from high dimensional distributions and volume approximation of convex bodies are fundamental operations that appear in optimization, finance, engineering and machine learning. In this paper we present volesti, a C++ package with an R interface that provides efficient, scalable algorithms for volume estimation, uniform and Gaussian sampling from convex polytopes. volesti scales to hundreds

  • Sparse Approximate Multifrontal Factorization with Butterfly Compression for High Frequency Wave Equations
    arXiv.cs.MS Pub Date : 2020-07-01
    Yang Liu; Pieter Ghysels; Lisa Claus; Xiaoye Sherry Li

    We present a fast and approximate multifrontal solver for large-scale sparse linear systems arising from finite-difference, finite-volume or finite-element discretization of high-frequency wave equations. The proposed solver leverages the butterfly algorithm and its hierarchical matrix extension for compressing and factorizing large frontal matrices via graph-distance guided entry evaluation or randomized

  • Massively parallel 3D computation of the compressible Euler equations with an invariant-domain preserving second-order finite-element scheme
    arXiv.cs.MS Pub Date : 2020-06-30
    Matthias Maier; Martin Kronbichler

    We discuss the efficient implementation of a high-performance second-order colocation-type finite-element scheme for solving the compressible Euler equations of gas dynamics on unstructured meshes. The solver is based on the convex limiting technique introduced by Guermond et al. (SIAM J. Sci. Comput. 40, A3211--A3239, 2018). As such it is invariant-domain preserving, i.e., the solver maintains important

  • On Designing GPU Algorithms with Applications to Mesh Refinement
    arXiv.cs.MS Pub Date : 2020-07-01
    Zhenghai Chen; Tiow-Seng Tan; Hong-Yang Ong

    We present a set of rules to guide the design of GPU algorithms. These rules are grounded on the principle of reducing waste in GPU utility to achieve good speed up. In accordance to these rules, we propose GPU algorithms for 2D constrained, 3D constrained and 3D Restricted Delaunay refinement problems respectively. Our algorithms take a 2D planar straight line graph (PSLG) or 3D piecewise linear complex

  • SParSH-AMG: A library for hybrid CPU-GPU algebraic multigrid and preconditioned iterative methods
    arXiv.cs.MS Pub Date : 2020-06-30
    Sashikumaar Ganesan; Manan Shah

    Hybrid CPU-GPU algorithms for Algebraic Multigrid methods (AMG) to efficiently utilize both CPU and GPU resources are presented. In particular, hybrid AMG framework focusing on minimal utilization of GPU memory with performance on par with GPU-only implementations is developed. The hybrid AMG framework can be tuned to operate at a significantly lower GPU-memory, consequently, enables to solve large

  • Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing
    arXiv.cs.MS Pub Date : 2020-06-30
    Hartwig Anzt; Terry Cojean; Goran Flegar; Fritz Goebel; Thomas Gruetzmacher; Pratik Nayak; Tobias Ribizel; Yu-Hsiang Tsai; Enrique S. Quintana-Orti

    In this paper, we present Ginkgo, a modern C++ math library for scientific high performance computing. While classical linear algebra libraries act on matrix and vector objects, Ginkgo's design principle abstracts all functionality as "linear operators", motivating the notation of a "linear operator algebra library". Ginkgo's current focus is oriented towards providing sparse linear algebra functionality

  • Hierarchical Jacobi Iteration for Structured Matrices on GPUs using Shared Memory
    arXiv.cs.MS Pub Date : 2020-06-30
    Mohammad Shafaet Islam; Qiqi Wang

    High fidelity scientific simulations modeling physical phenomena typically require solving large linear systems of equations which result from discretization of a partial differential equation (PDE) by some numerical method. This step often takes a vast amount of computational time to complete, and therefore presents a bottleneck in simulation work. Solving these linear systems efficiently requires

  • Adaptive SpMV/SpMSpV on GPUs for Input Vectors of Varied Sparsity
    arXiv.cs.MS Pub Date : 2020-06-30
    Min Li; Yulong Ao; Chao Yang

    Despite numerous efforts for optimizing the performance of Sparse Matrix and Vector Multiplication (SpMV) on modern hardware architectures, few works are done to its sparse counterpart, Sparse Matrix and Sparse Vector Multiplication (SpMSpV), not to mention dealing with input vectors of varied sparsity. The key challenge is that depending on the sparsity levels, distribution of data, and compute platform

  • The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R
    arXiv.cs.MS Pub Date : 2020-06-27
    Xingguo Li; Tuo Zhao; Xiaoming Yuan; Han Liu

    This paper describes an R package named flare, which implements a family of new high dimensional regression methods (LAD Lasso, SQRT Lasso, $\ell_q$ Lasso, and Dantzig selector) and their extensions to sparse precision matrix estimation (TIGER and CLIME). These methods exploit different nonsmooth loss functions to gain modeling flexibility, estimation robustness, and tuning insensitiveness. The developed

  • Preparing Ginkgo for AMD GPUs -- A Testimonial on Porting CUDA Code to HIP
    arXiv.cs.MS Pub Date : 2020-06-25
    Yuhsiang M. TsaiKarlsruhe Institute of Technology; Terry CojeanKarlsruhe Institute of Technology; Tobias RibizelKarlsruhe Institute of Technology; Hartwig AnztKarlsruhe Institute of TechnologyUniversity of Tennessee, Innovative Computing Lab

    With AMD reinforcing their ambition in the scientific high performance computing ecosystem, we extend the hardware scope of the Ginkgo linear algebra package to feature a HIP backend for AMD GPUs. In this paper, we report and discuss the porting effort from CUDA, the extension of the HIP framework to add missing features such as cooperative groups, the performance price of compiling HIP code for AMD

  • Index handling and assign optimization for Algorithmic Differentiation reuse index managers
    arXiv.cs.MS Pub Date : 2020-06-23
    Max Sagebaum; Johannes Blühdorn; Nicolas R. Gauger

    For operator overloading Algorithmic Differentiation tools, the identification of primal variables and adjoint variables is usually done via indices. Two common schemes exist for their management and distribution. The linear approach is easy to implement and supports memory optimization with respect to copy statements. On the other hand, the reuse approach requires more implementation effort but results

  • Robust and scalable h-adaptive aggregated unfitted finite elements for interface elliptic problems
    arXiv.cs.MS Pub Date : 2020-06-19
    Eric Neiva; Santiago Badia

    This work introduces a novel, fully robust and highly-scalable, $h$-adaptive aggregated unfitted finite element method for large-scale interface elliptic problems. The new method is based on a recent distributed-memory implementation of the aggregated finite element method atop a highly-scalable Cartesian forest-of-trees mesh engine. It follows the classical approach of weakly coupling nonmatching

  • Array Programming with NumPy
    arXiv.cs.MS Pub Date : 2020-06-18
    Charles R. Harris; K. Jarrod Millman; Stéfan J. van der Walt; Ralf Gommers; Pauli Virtanen; David Cournapeau; Eric Wieser; Julian Taylor; Sebastian Berg; Nathaniel J. Smith; Robert Kern; Matti Picus; Stephan Hoyer; Marten H. van Kerkwijk; Matthew Brett; Allan Haldane; Jaime Fernández del Río; Mark Wiebe; Pearu Peterson; Pierre Gérard-Marchant; Kevin Sheppard; Tyler Reddy; Warren Weckesser; Hameer Abbasi;

    Array programming provides a powerful, compact, expressive syntax for accessing, manipulating, and operating on data in vectors, matrices, and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It plays an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, material science

  • Accelerating linear solvers for large-scale Stokes problems with C++ metaprogramming
    arXiv.cs.MS Pub Date : 2020-06-10
    Denis Demidov; Lin Mu; Bin Wang

    Ability to solve large sparse linear systems of equations is very important in modern numerical methods. Creating a solver with a user-friendly interface that can work in many specific scenarios is a challenging task. We describe the C ++ programming techniques that can help in creating flexible and extensible programming interfaces for linear solvers. The approach is based on policy-based design and

  • On Computing the Kronecker Structure of Polynomial Matrices using Julia
    arXiv.cs.MS Pub Date : 2020-06-09
    Andreas Varga

    In this paper we discuss the mathematical background and the computational aspects which underly the implementation of a collection of Julia functions in the MatrixPencils package for the determination of structural properties of polynomial matrices. We primarily focus on the computation of the finite and infinite spectral structures (e.g., eigenvalues, zeros, poles) as well as the left and right singular

  • The aggregated unfitted finite element method on parallel tree-based adaptive meshes
    arXiv.cs.MS Pub Date : 2020-06-09
    Santiago Badia; Alberto F. Martín; Eric Neiva; Francesc Verdugo

    In this work, we present an adaptive unfitted finite element scheme that combines the aggregated finite element method with parallel adaptive mesh refinement. We introduce a novel scalable distributed-memory implementation of the resulting scheme on locally-adapted Cartesian forest-of-trees meshes. We propose a two-step algorithm to construct the finite element space at hand that carefully mixes aggregation

  • AutoMat -- Automatic Differentiation for Generalized Standard Materials on GPUs
    arXiv.cs.MS Pub Date : 2020-06-08
    Johannes Blühdorn; Nicolas R. Gauger; Matthias Kabel

    We propose a universal method for the evaluation of generalized standard materials that greatly simplifies the material law implementation process. By means of automatic differentiation and a numerical integration scheme, AutoMat reduces the implementation effort to two potential functions. By moving AutoMat to the GPU, we close the performance gap to conventional evaluation routines and demonstrate

  • copent: Estimating Copula Entropy in R
    arXiv.cs.MS Pub Date : 2020-05-27
    Jian Ma

    Statistical independence and conditional independence are the fundemental concepts in statistics and machine learning. Copula Entropy is a mathematical concept for multivariate statistical independence measuring and testing, and also closely related to conditional independence or transfer entropy. It has been applied to solve several statistical or machine learning problems, including association discovery

  • Model Evidence with Fast Tree Based Quadrature
    arXiv.cs.MS Pub Date : 2020-05-22
    Thomas Foster; Chon Lok Lei; Martin Robinson; David Gavaghan; Ben Lambert

    High dimensional integration is essential to many areas of science, ranging from particle physics to Bayesian inference. Approximating these integrals is hard, due in part to the difficulty of locating and sampling from regions of the integration domain that make significant contributions to the overall integral. Here, we present a new algorithm called Tree Quadrature (TQ) that separates this sampling

  • SymJAX: symbolic CPU/GPU/TPU programming
    arXiv.cs.MS Pub Date : 2020-05-21
    Randall Balestriero

    SymJAX is a symbolic programming version of JAX simplifying graph input/output/updates and providing additional functionalities for general machine learning and deep learning applications. From an user perspective SymJAX provides a la Theano experience with fast graph optimization/compilation and broad hardware support, along with Lasagne-like deep learning functionalities.

  • High-Performance GPU and CPU Signal Processing for a Reverse-GPS Wildlife Tracking System
    arXiv.cs.MS Pub Date : 2020-05-21
    Yaniv Rubinpur; Sivan Toledo

    We present robust high-performance implementations of signal-processing tasks performed by a high-throughput wildlife tracking system called ATLAS. The system tracks radio transmitters attached to wild animals by estimating the time of arrival of packets encoding known pseudo-random codes to receivers (base stations). Time-of-arrival estimation of wideband radio signals is computatoinally expensive

Contents have been reproduced by permission of the publishers.
ACS ES&T Engineering
ACS ES&T Water
ACS Publications填问卷