• arXiv.cs.MS Pub Date : 2021-01-13
David A. Ham

This white paper highlights current limitations in the algebraic closure Unified Form Language (UFL). UFL currently represents forms over finite element spaces, however finite element problems naturally result in objects in the dual to a finite element space, and operators mapping between primal and dual finite element spaces. This document sketches the relevant mathematical areas and proposes changes

更新日期：2021-01-14
• arXiv.cs.MS Pub Date : 2021-01-13
Angelika Schwarz

Inverse iteration is known to be an effective method for computing eigenvectors corresponding to simple and well-separated eigenvalues. In the non-symmetric case, the solution of shifted Hessenberg systems is a central step. Existing inverse iteration solvers approach the solution of the shifted Hessenberg systems with either RQ or LU factorizations and, once factored, solve the corresponding systems

更新日期：2021-01-14
• arXiv.cs.MS Pub Date : 2021-01-06
Neha R. GuptaDuke University; Vittorio OrlandiDuke University; Chia-Rui ChangHarvard University; Tianyu WangDuke University; Marco MorucciDuke University; Pritam DeyDuke University; Thomas J. HowellDuke University; Xian SunDuke University; Angikar GhosalDuke University; Sudeepa RoyDuke University; Cynthia RudinDuke University; Alexander VolfovskyDuke University

dame-flame is a Python package for performing matching for observational causal inference on datasets containing discrete covariates. This package implements the Dynamic Almost Matching Exactly (DAME) and Fast Large-Scale Almost Matching Exactly (FLAME) algorithms, which match treatment and control units on subsets of the covariates. The resulting matched groups are interpretable, because the matches

更新日期：2021-01-07
• arXiv.cs.MS Pub Date : 2020-12-31
Emanuele Guidotti

The R package calculus implements C++ optimized functions for numerical and symbolic calculus, such as the Einstein summing convention, fast computation of the Levi-Civita symbol and generalized Kronecker delta, Taylor series expansion, multivariate Hermite polynomials, high-order derivatives, ordinary differential equations, differential operators and numerical integration in arbitrary orthogonal

更新日期：2021-01-05
• arXiv.cs.MS Pub Date : 2020-12-28
Simon Dirckx; Daan Huybrechs; Karl Meerbergen

The discretisation of boundary integral equations for the scalar Helmholtz equation leads to large dense linear systems. Efficient boundary element methods (BEM), such as the fast multipole method (FMM) and H-matrix based methods, focus on structured low-rank approximations of subblocks in these systems. It is known that the ranks of these subblocks increase with the wavenumber. We explore a data-sparse

更新日期：2020-12-29
• arXiv.cs.MS Pub Date : 2020-12-22
Michael Lindner; Lucas Lincoln; Fenja Drauschke; Julia Monika Koulen; Hans Würfel; Anton Plietzsch; Frank Hellmann

NetworkDynamics.jl is an easy-to-use and computationally efficient package for working with heterogeneous dynamical systems on complex networks, written in Julia, a high-level, high-performance, dynamic programming language. By combining state of the art solver algorithms from DifferentialEquations.jl with efficient data structures, NetworkDynamics.jl achieves top performance while supporting advanced

更新日期：2020-12-24
• arXiv.cs.MS Pub Date : 2020-12-22
Oylum Şeker; Neda Tanoumand; Merve Bodur

Digital Annealer (DA) is a computer architecture designed for tackling combinatorial optimization problems formulated as quadratic unconstrained binary optimization (QUBO) models. In this paper, we present the results of an extensive computational study to evaluate the performance of DA in a systematic way in comparison to multiple state-of-the-art solvers for different problem classes. We examine

更新日期：2020-12-24
• arXiv.cs.MS Pub Date : 2020-12-20
E. Theodore L. Omtzigt; Peter Gottschling; Mark Seligman; William Zorn

With the proliferation of embedded systems requiring intelligent behavior, custom number systems to optimize performance per Watt of the entire system become essential components for successful commercial products. We present the Universal Number Library, a high-performance number systems library that includes arbitrary integer, decimal, fixed-point, floating-point, and introduces two tapered floating-point

更新日期：2020-12-22
• arXiv.cs.MS Pub Date : 2020-12-15
Abhinav Gupta; Rajib Chowdhury; Anupam Chakrabarti; Timon Rabczuk

This paper presents a 55-line code written in python for 2D and 3D topology optimization (TO) based on the open-source finite element computing software (FEniCS), equipped with various finite element tools and solvers. PETSc is used as the linear algebra back-end, which results in significantly less computational time than standard python libraries. The code is designed based on the popular solid isotropic

更新日期：2020-12-16
• arXiv.cs.MS Pub Date : 2020-12-11
Jan Verschelde

Hardware double precision is often insufficient to solve large scientific problems accurately. Computing in higher precision defined by software causes significant computational overhead. The application of parallel algorithms compensates for this overhead. Newton's method to develop power series expansions of algebraic space curves is the use case for this application.

更新日期：2020-12-15
• arXiv.cs.MS Pub Date : 2020-12-11
Markus Holzer; Martin Bauer; Ulrich Rüde

A high-performance implementation of a multiphase lattice Boltzmann method based on the conservative Allen-Cahn model supporting high-density ratios and high Reynolds numbers is presented. Metaprogramming techniques are used to generate optimized code for CPUs and GPUs automatically. The coupled model is specified in a high-level symbolic description and optimized through automatic transformations

更新日期：2020-12-14
• arXiv.cs.MS Pub Date : 2020-12-08
Jacob Montiel; Max Halford; Saulo Martiello Mastelini; Geoffrey Bolmier; Raphael Sourty; Robin Vaysse; Adil Zouitine; Heitor Murilo Gomes; Jesse Read; Talel Abdessalem; Albert Bifet

River is a machine learning library for dynamic data streams and continual learning. It provides multiple state-of-the-art learning methods, data generators/transformers, performance metrics and evaluators for different stream learning problems. It is the result from the merger of the two most popular packages for stream learning in Python: Creme and scikit-multiflow. River introduces a revamped architecture

更新日期：2020-12-10
• arXiv.cs.MS Pub Date : 2020-12-07
Matthew Francis-Landau

This paper introduces mFST, a new Python library for working with Finite-State Machines based on OpenFST. mFST is a thin wrapper for OpenFST and exposes all of OpenFST's methods for manipulating FSTs. Additionally, mFST is the only Python wrapper for OpenFST that exposes OpenFST's ability to define a custom semirings. This makes mFST ideal for developing models that involve learning the weights on

更新日期：2020-12-08
• arXiv.cs.MS Pub Date : 2020-11-30
Seth Troisi

A new Combined Sieve algorithm is presented with cost proportional to the number of enumerated factors over a series of intervals. This algorithm achieves a significant speedup, over a traditional sieve, when handling many ([10^4, 10^7]) intervals concurrently. The speedup comes from a space-time tradeoff and a novel solution to a modular equation. In real world tests, this new algorithm regularly

更新日期：2020-12-08
• arXiv.cs.MS Pub Date : 2020-12-04
Stephan Hageboeck

RooFit is a toolkit for statistical modelling and fitting, and together with RooStats it is used for measurements and statistical tests by most experiments in particle physics. Since one year, RooFit is being modernised. In this talk, improvements already released with ROOT will be discussed, such as faster data loading, vectorised computations and more standard-like interfaces. These allow for speeding

更新日期：2020-12-07
• arXiv.cs.MS Pub Date : 2020-12-01
Ryan Krueger; Jesse Michael Han; Daniel Selsam

We present a method for automatically building diagrams for olympiad-level geometry problems and implement our approach in a new open-source software tool, the Geometry Model Builder (GMB). Central to our method is a new domain-specific language, the Geometry Model-Building Language (GMBL), for specifying geometry problems along with additional metadata useful for building diagrams. A GMBL program

更新日期：2020-12-07
• arXiv.cs.MS Pub Date : 2020-12-02
Jeremy M. Myers; Daniel M. Dunlavy; Keita Teranishi; D. S. Hollman

Tensor decomposition models play an increasingly important role in modern data science applications. One problem of particular interest is fitting a low-rank Canonical Polyadic (CP) tensor decomposition model when the tensor has sparse structure and the tensor elements are nonnegative count data. SparTen is a high-performance C++ library which computes a low-rank decomposition using different solvers:

更新日期：2020-12-04
• arXiv.cs.MS Pub Date : 2020-12-01
Shengguo Li; Xinzhe Wu; Jose E. Roman; Ziyang Yuan; Lizhi Cheng

In this paper, a Parallel Direct Eigensolver for Sequences of Hermitian Eigenvalue Problems with no tridiagonalization is proposed, denoted by \texttt{PDESHEP}, and it combines direct methods with iterative methods. \texttt{PDESHEP} first reduces a Hermitian matrix to its banded form, then applies a spectrum slicing algorithm to the banded matrix, and finally computes the eigenvectors of the original

更新日期：2020-12-03
• arXiv.cs.MS Pub Date : 2020-12-01
Santiago Badia; Manuel Caicedo; Alberto F. Martín; Javier Principe

We extend the unfitted $h$-adaptive Finite Element Method ($h$-AgFEM) on parallel tree-based adaptive meshes, recently developed for linear scalar elliptic problems, to handle nonlinear problems in solid mechanics. Leveraging $h$-AgFEM on locally-adapted, non-conforming, tree-based meshes, and its parallel distributed-memory implementation, we can tackle large-, multi-scale problems posed on complex

更新日期：2020-12-03
• arXiv.cs.MS Pub Date : 2020-11-30
André Greiner-Petter

In mathematics, LaTeX is the de facto standard to prepare documents, e.g., scientific publications. While some formulae are still developed using pen and paper, more complicated mathematical expressions used more and more often with computer algebra systems. Mathematical expressions are often manually transcribed to computer algebra systems. The goal of my doctoral thesis is to improve the efficiency

更新日期：2020-12-01
• arXiv.cs.MS Pub Date : 2020-11-25
Cody J. Balos; David J. Gardner; Carol S. Woodward; Daniel R. Reynolds

As part of the Exascale Computing Project (ECP), a recent focus of development efforts for the SUite of Nonlinear and DIfferential/ALgebraic equation Solvers (SUNDIALS) has been to enable GPU-accelerated time integration in scientific applications at extreme scales. This effort has resulted in several new GPU-enabled implementations of core SUNDIALS data structures, support for programming paradigms

更新日期：2020-12-01
• arXiv.cs.MS Pub Date : 2020-11-23
Emanuel H. Rubensson; Elias Rudberg; Anastasia Kruchinina; Anton G. Artemov

We present a C++ header-only parallel sparse matrix library, based on sparse quadtree representation of matrices using the Chunks and Tasks programming model. The library implements a number of sparse matrix algorithms for distributed memory parallelization that are able to dynamically exploit data locality to avoid movement of data. This is demonstrated for the example of block-sparse matrix-matrix

更新日期：2020-11-25
• arXiv.cs.MS Pub Date : 2020-11-19
Milinda Fernando; Hari Sundar

Numerical solutions of hyperbolic partial differential equations(PDEs) are ubiquitous in science and engineering. Method of lines is a popular approach to discretize PDEs defined in spacetime, where space and time are discretized independently. When using explicit timesteppers on adaptive grids, the use of a global timestep-size dictated by the finest grid-spacing leads to inefficiencies in the coarser

更新日期：2020-11-25
• arXiv.cs.MS Pub Date : 2020-11-23
Ta-Chu Kao; Guillaume Hennequin

Sylvester, Lyapunov, and algebraic Riccati equations are the bread and butter of control theorists. They are used to compute infinite-horizon Gramians, solve optimal control problems in continuous or discrete time, and design observers. While popular numerical computing frameworks (e.g., scipy) provide efficient solvers for these equations, these solvers are still largely missing from most automatic

更新日期：2020-11-25
• arXiv.cs.MS Pub Date : 2020-11-20
Daoru Han; Xiaoming He; David Lund; Xu Zhang

This paper presents a recently developed particle simulation code package PIFE-PIC, which is a novel three-dimensional (3-D) Parallel Immersed-Finite-Element (IFE) Particle-in-Cell (PIC) simulation model for particle simulations of plasma-material interactions. This framework is based on the recently developed non-homogeneous electrostatic IFE-PIC algorithm, which is designed to handle complex plasma-material

更新日期：2020-11-23
• arXiv.cs.MS Pub Date : 2020-11-19
David J. Gardner; Daniel R. Reynolds; Carol S. Woodward; Cody J. Balos

In recent years, the SUite of Nonlinear and DIfferential/ALgebraic equation Solvers (SUNDIALS) has been redesigned to better enable the use of application-specific and third-party algebraic solvers and data structures. Throughout this work, we have adhered to specific guiding principles that minimized the impact to current users while providing maximum flexibility for later evolution of solvers and

更新日期：2020-11-23
• arXiv.cs.MS Pub Date : 2020-11-17
Terry Cojean; Yu-Hsiang "Mike" Tsai; Hartwig Anzt

The first associations to software sustainability might be the existence of a continuous integration (CI) framework; the existence of a testing framework composed of unit tests, integration tests, and end-to-end tests; and also the existence of software documentation. However, when asking what is a common deathblow for a scientific software product, it is often the lack of platform and performance

更新日期：2020-11-19
• arXiv.cs.MS Pub Date : 2020-11-17
Andrei Nicolae

This work is a rigorous development of a complete and general-purpose deep learning framework from the ground up. The fundamental components of deep learning - automatic differentiation and gradient methods of optimizing multivariable scalar functions - are developed from elementary calculus and implemented in a sensible object-oriented approach using only Python and the Numpy library. Demonstrations

更新日期：2020-11-18
• arXiv.cs.MS Pub Date : 2020-11-12
Nicola Bastianello

This paper introduces tvopt, a Python framework for prototyping and benchmarking time-varying (or online) optimization algorithms. The paper first describes the theoretical approach that informed the development of tvopt. Then it discusses the different components of the framework and their use for modeling and solving time-varying optimization problems. In particular, tvopt provides functionalities

更新日期：2020-11-17
• arXiv.cs.MS Pub Date : 2020-11-16

The complexity of Gr\"{o}bner computations has inspired many improvements to Buchberger's algorithm over the years. Looking for further insights into the algorithm's performance, we offer a threaded implementation of classical Buchberger's algorithm in {\it Macaulay2}. The output of the main function of the package includes information about {\it lineages} of non-zero remainders that are added to the

更新日期：2020-11-17
• arXiv.cs.MS Pub Date : 2020-10-10
Marcel Van de Vel

We combine the design of two \emph{random number generators}, \emph{Mersenne Twister} and \emph{Xorgens}, to obtain a new class of generators with heavy-weight characteristic polynomials (exceeded only by the {\sc well} generators) and high speed (comparable with the originals). Tables with parameter combinations are included for state sizes ranging from 521 to 44497 bits and each of the word lengths

更新日期：2020-11-17
• arXiv.cs.MS Pub Date : 2020-11-16
Tom Gustafsson

This work describes a concise algorithm for the generation of triangular meshes with the help of standard adaptive finite element methods. We demonstrate that a generic adaptive finite element solver can be repurposed into a triangular mesh generator if a robust mesh smoothing algorithm is applied between the mesh refinement steps. We present an implementation of the mesh generator and demonstrate

更新日期：2020-11-17
• arXiv.cs.MS Pub Date : 2020-11-16
Chao Chen; Tianyu Liang; George Biros

We introduce a randomized algorithm, namely {\tt rchol}, to construct an approximate Cholesky factorization for a given sparse Laplacian matrix (a.k.a., graph Laplacian). The (exact) Cholesky factorization for the matrix introduces a clique in the associated graph after eliminating every row/column. By randomization, {\tt rchol} samples a subset of the edges in the clique. We prove {\tt rchol} is breakdown

更新日期：2020-11-17
• arXiv.cs.MS Pub Date : 2020-11-09
Amit Sharma; Emre Kiciman

In addition to efficient statistical estimators of a treatment's effect, successful application of causal inference requires specifying assumptions about the mechanisms underlying observed data and testing whether they are valid, and to what extent. However, most libraries for causal inference focus only on the task of providing powerful statistical estimators. We describe DoWhy, an open-source Python

更新日期：2020-11-12
• arXiv.cs.MS Pub Date : 2020-11-08
Vasileios Gkolemis; Michael Gutmann

Bayesian inference is a principled framework for dealing with uncertainty. The practitioner can perform an initial assumption for the physical phenomenon they want to model (prior belief), collect some data and then adjust the initial assumption in the light of the new evidence (posterior belief). Approximate Bayesian Computation (ABC) methods, also known as likelihood-free inference techniques, are

更新日期：2020-11-12
• arXiv.cs.MS Pub Date : 2020-11-03
Fredrik JohanssonLFANT

Calcium is a C library for real and complex numbers in a form suitable for exact algebraic and symbolic computation. Numbers are represented as elements of fields $\mathbb{Q}(a_1,\ldots,a_n)$ where the extensions numbers $a_k$ may be algebraic or transcendental. The system combines efficient field operations with automatic discovery and certification of algebraic relations, resulting in a practical

更新日期：2020-11-04
• arXiv.cs.MS Pub Date : 2020-11-02
Richard Tran Mills; Mark F. Adams; Satish Balay; Jed Brown; Alp Dener; Matthew Knepley; Scott E. Kruger; Hannah Morgan; Todd Munson; Karl Rupp; Barry F. Smith; Stefano Zampini; Hong Zhang; Junchao Zhang

The Portable Extensible Toolkit for Scientific computation (PETSc) library delivers scalable solvers for nonlinear time-dependent differential and algebraic equations and for numerical optimization.The PETSc design for performance portability addresses fundamental GPU accelerator challenges and stresses flexibility and extensibility by separating the programming model used by the application from that

更新日期：2020-11-03
• arXiv.cs.MS Pub Date : 2020-11-02
Olivier Adjoua; Louis Lagardère; Luc-Henri Jolly; Arnaud Durocher; Thibaut Very; Isabelle Dupays; Zhi Wang; Théo Jaffrelot Inizan; Frédéric Célerse; Pengyu Ren; Jay Ponder; Jean-Philip Piquemal

We present the extension of the Tinker-HP package (Lagard\ere et al., Chem. Sci., 2018,9, 956-972) to the use of Graphics Processing Unit (GPU) cards to accelerate molecular dynamics simulations using polarizable many-body force fields. The new high-performance module allows for an efficient use of single- and multi-GPU architectures ranging from research laboratories to modern pre-exascale supercomputer

更新日期：2020-11-03
• arXiv.cs.MS Pub Date : 2020-11-02
Léo Simpson; Patrick L. Combettes; Christian L. Müller

We introduce c-lasso, a Python package that enables sparse and robust linear regression and classification with linear equality constraints. The underlying statistical forward model is assumed to be of the following form: $y = X \beta + \sigma \epsilon \qquad \textrm{subject to} \qquad C\beta=0$ Here, $X \in \mathbb{R}^{n\times d}$is a given design matrix and the vector $y \in \mathbb{R}^{n}$ is

更新日期：2020-11-03
• arXiv.cs.MS Pub Date : 2020-10-30
Luca Chiaraviglio; Simone Rossetti; Sara Saida; Stefania Bartoletti

According to a very popular belief - very widespread among non-scientific communities - the exploitation of narrow beams, a.k.a. "pencil beamforming", will dramatically increase the exposure levels radiated by 5G Base Stations. To face such concern with a scientific approach, in this work we derive a simple yet meaningful model to evaluate the ElectroMagnetic Field (EMF) exposure from a set of 5G Base

更新日期：2020-11-02
• arXiv.cs.MS Pub Date : 2020-10-30
Michael Hoefnagel; Pierre-Alain Jacqmin; Zurab Janelidze

This paper is concerned with the problem of classifying left exact categories according to their matrix properties' -- a particular category-theoretic property represented by integer matrices. We obtain an algorithm for deciding whether a conjunction of these matrix properties follows from another. Computer implementation of this algorithm allows one to peer into the complex structure of the poset

更新日期：2020-11-02
• arXiv.cs.MS Pub Date : 2020-10-30
Seyoon Ko; Hua Zhou; Jin Zhou; Joong-Ho Won

The demand for high-performance computing (HPC) is ever-increasing for everyday statistical computing purposes. The downside is that we need to write specialized code for each HPC environment. CPU-level parallelization needs to be explicitly coded for effective use of multiple nodes in cluster supercomputing environments. Acceleration via graphics processing units (GPUs) requires to write kernel code

更新日期：2020-11-02
• arXiv.cs.MS Pub Date : 2020-10-28
Derek BeatonRotman Research Institute, Baycrest Health Sciences

The generalized singular value decomposition (GSVD, a.k.a. "SVD triplet", "duality diagram" approach) provides a unified strategy and basis to perform nearly all of the most common multivariate analyses (e.g., principal components, correspondence analysis, multidimensional scaling, canonical correlation, partial least squares). Though the GSVD is ubiquitous, powerful, and flexible, it has very few

更新日期：2020-10-30
• arXiv.cs.MS Pub Date : 2020-10-26
I. Hristov; R. Hristova; S. Dimova; P. Armyanov; N. Shegunov; I. Puzynin; T. Puzynina; Z. Sharipov; Z. Tukhliev

A hybrid MPI+OpenMP strategy for parallelizing multiple precision Taylor series method is proposed, realized and tested. To parallelize the algorithm we combine MPI and OpenMP parallel technologies together with GMP library (GNU miltiple precision libary) and the tiny MPIGMP library. The details of the parallelization are explained on the paradigmatic model of the Lorenz system. We succeed to obtain

更新日期：2020-10-30
• arXiv.cs.MS Pub Date : 2020-10-23
Xiaohui Wang; Ying Xiong; Yang Wei; Mingxuan Wang; Lei Li

LightSeq is a high performance inference library for sequence processing and generation implemented in CUDA. To our best knowledge, this is the first open-source inference library which fully supports highly efficient computation of modern NLP models such as BERT, GPT, Transformer, etc. This library is efficient, functional and convenient. A demo usage can be found here: https://github.com/bytedan

更新日期：2020-10-30
• arXiv.cs.MS Pub Date : 2020-10-27
Nick Brown

CFD is a ubiquitous technique central to much of computational simulation such as that required by aircraft design. Solving of the Poisson equation occurs frequently in CFD and there are a number of possible approaches one may leverage. The dynamical core of the MONC atmospheric model is one example of CFD which requires the solving of the Poisson equation to determine pressure terms. Traditionally

更新日期：2020-10-30
• arXiv.cs.MS Pub Date : 2020-10-26
Nils Kohl; Ulrich Rüde

We employ textbook multigrid efficiency (TME), as introduced by Achi Brandt, to construct an asymptotically optimal monolithic multigrid solver for the Stokes system. The geometric multigrid solver builds upon the concept of hierarchical hybrid grids (HHG), which is extended to higher-order finite-element discretizations, and a corresponding matrix-free implementation. The computational cost of the

更新日期：2020-10-30
• arXiv.cs.MS Pub Date : 2020-10-20
George Bisbas; Fabio Luporini; Mathias Louboutin; Rhodri Nelson; Gerard Gorman; Paul H. J. Kelly

Stencil kernels dominate a range of scientific applications including seismic and medical imaging, image processing, and neural networks. Temporal blocking is a performance optimisation that aims to reduce the required memory bandwidth of stencil computations by re-using data from the cache for multiple time steps. It has already been shown to be beneficial for this class of algorithms. However, optimising

更新日期：2020-10-26
• arXiv.cs.MS Pub Date : 2020-10-16
Matthew Roughan

The polylogarithm function is one of the constellation of important mathematical functions. It has a long history, and many connections to other special functions and series, and many applications, for instance in statistical physics. However, the practical aspects of its numerical evaluation have not received the type of comprehensive treatments lavished on its siblings. Only a handful of formal publications

更新日期：2020-10-26
• arXiv.cs.MS Pub Date : 2020-10-17
Anas M. Al-Oraiqat; Alexander Y. Ivanov; Yuriy A. Ivanov

The problem of optimization of the rolling dynamics model is considered. That providing safe movement at high frequency when interacting with the railway. Moreover, allowing to evaluate the dynamic parameters when designing new and modernizing existing locomotives. The object of this research is a rail transport dynamic system model. The article's purpose is to increase the efficiency of the digital

更新日期：2020-10-20
• arXiv.cs.MS Pub Date : 2020-10-15
Sebastiano Vascon; Samuel Rota Bulò; Vittorio Murino; Marcello Pelillo

DSLib is an open-source implementation of the Dominant Set (DS) clustering algorithm written entirely in Matlab. The DS method is a graph-based clustering technique rooted in the evolutionary game theory that starts gaining lots of interest in the computer science community. Thanks to its duality with game theory and its strict relation to the notion of maximal clique, has been explored in several

更新日期：2020-10-17
• arXiv.cs.MS Pub Date : 2020-10-10
Liang Yuan; Hang Cao; Yunquan Zhang; Kun Li; Pengqi Lu; Yue Yue

Stencil computations represent a very common class of nested loops in scientific and engineering applications. Exploiting vector units in modern CPUs is crucial to achieving peak performance. Previous vectorization approaches often consider the data space, in particular the innermost unit-strided loop. It leads to the well-known data alignment conflict problem that vector loads are overlapped due to

更新日期：2020-10-13
• arXiv.cs.MS Pub Date : 2020-10-09
Christos Psarras; Lars Karlsson; Paolo Bientinesi

Tensor decompositions, such as CANDECOMP/PARAFAC (CP), are widely used in a variety of applications, such as chemometrics, signal processing, and machine learning. A broadly used method for computing such decompositions relies on the Alternating Least Squares (ALS) algorithm. When the number of components is small, regardless of its implementation, ALS exhibits low arithmetic intensity, which severely

更新日期：2020-10-12
• arXiv.cs.MS Pub Date : 2020-10-08
D. Groen; H. Arabnejad; V. Jancauskas; W. N. Edeling; F. Jansson; R. A. Richardson; J. Lakhlili; L. Veen; B. Bosak; P. Kopta; D. W. Wright; N. Monnieri; P. Karlshoefer; D. Suleimenova; R. Sinclair; M. Vassaux; A. Nikishova; M. Bieniek; O. O. Luk; M. Kulczewski; E. Raffin; D. Crommelin; O. Hoenen; D. P. Coster; T. Piontek; P. V. Coveney

We present the VECMA toolkit (VECMAtk), a flexible software environment for single and multiscale simulations that introduces directly applicable and reusable procedures for verification, validation (V&V), sensitivity analysis (SA) and uncertainty quantification (UQ). It enables users to verify key aspects of their applications, systematically compare and validate the simulation outputs against observational

更新日期：2020-10-11
• arXiv.cs.MS Pub Date : 2020-10-08
Thien Nguyen; Anthony Santana; Tyler Kharazi; Daniel Claudino; Hal Finkel; Alexander McCaskey

We present qcor - a language extension to C++ and compiler implementation that enables heterogeneous quantum-classical programming, compilation, and execution in a single-source context. Our work provides a first-of-its-kind C++ compiler enabling high-level quantum kernel (function) expression in a quantum-language agnostic manner, as well as a hardware-agnostic, retargetable compiler workflow targeting

更新日期：2020-10-11
• arXiv.cs.MS Pub Date : 2020-10-04
William S. Moses; Valentin Churavy

Applying differentiable programming techniques and machine learning algorithms to foreign programs requires developers to either rewrite their code in a machine learning framework, or otherwise provide derivatives of the foreign code. This paper presents Enzyme, a high-performance automatic differentiation (AD) compiler plugin for the LLVM compiler framework capable of synthesizing gradients of statically

更新日期：2020-10-06
• arXiv.cs.MS Pub Date : 2020-10-02
C. Palenzuela; B. Miñano; A. Arbona; C. Bona-Casas; C. Bona; J. Massó

Simflowny is an open platform which automatically generates efficient parallel code of scientific dynamical models for different simulation frameworks. Here we present major upgrades on this software to support simultaneously a quite generic family of partial differential equations. These equations can be discretized using: (i) standard finite-difference for systems with derivatives up to any order

更新日期：2020-10-05
• arXiv.cs.MS Pub Date : 2020-10-01
Amir Shahmoradi; Fatemeh Bagheri; Joshua Alexander Osborne

ParaMonte::Python (standing for Parallel Monte Carlo in Python) is a serial and MPI-parallelized library of (Markov Chain) Monte Carlo (MCMC) routines for sampling mathematical objective functions, in particular, the posterior distributions of parameters in Bayesian modeling and analysis in data science, Machine Learning, and scientific inference in general. In addition to providing access to fast

更新日期：2020-10-05
• arXiv.cs.MS Pub Date : 2020-10-01
Simon Heybrock; Owen Arnold; Igor Gudich; Daniel Nixon; Neil Vaytet

Scipp is heavily inspired by the Python library xarray. It enriches raw NumPy-like multi-dimensional arrays of data by adding named dimensions and associated coordinates. Multiple arrays are combined into datasets. On top of this, scipp introduces (i) implicit handling of physical units, (ii) implicit propagation of uncertainties, (iii) support for histograms, i.e., bin-edge coordinate axes, which

更新日期：2020-10-02
• arXiv.cs.MS Pub Date : 2020-09-29
Orestis Zachariadis; Nitin Satpute; Juan Gómez-Luna; Joaquín Olivares

Sparse general matrix-matrix multiplication (spGEMM) is an essential component in many scientific and data analytics applications. However, the sparsity pattern of the input matrices and the interaction of their patterns make spGEMM challenging. Modern GPUs include Tensor Core Units (TCUs), which specialize in dense matrix multiplication. Our aim is to re-purpose TCUs for sparse matrices. The key idea

更新日期：2020-10-02
Contents have been reproduced by permission of the publishers.

down
wechat
bug