当前期刊: Scientific Programming Go to current issue    加入关注   
显示样式:        排序: IF: - GO 导出
我的关注
我的收藏
您暂时未登录!
登录
  • A Pattern-Based Software Testing Framework for Exploitability Evaluation of Metadata Corruption Vulnerabilities
    Sci. Program. (IF 0.963) Pub Date : 2020-09-27
    Fenglei Deng; Jian Wang; Bin Zhang; Chao Feng; Zhiyuan Jiang; Yunfei Su

    In recent years, increased attention is being given to software quality assurance and protection. With considerable verification and protection schemes proposed and deployed, today’s software unfortunately still fails to be protected from cyberattacks, especially in the presence of insecure organization of heap metadata. In this paper, we aim to explore whether heap metadata could be corrupted and

    更新日期:2020-09-28
  • Hybrid MPI and CUDA Parallelization for CFD Applications on Multi-GPU HPC Clusters
    Sci. Program. (IF 0.963) Pub Date : 2020-09-25
    Jianqi Lai; Hang Yu; Zhengyu Tian; Hua Li

    Graphics processing units (GPUs) have a strong floating-point capability and a high memory bandwidth in data parallelism and have been widely used in high-performance computing (HPC). Compute unified device architecture (CUDA) is used as a parallel computing platform and programming model for the GPU to reduce the complexity of programming. The programmable GPUs are becoming popular in computational

    更新日期:2020-09-25
  • Analyzing Influenza Virus Sequences using Binary Encoding Approach
    Sci. Program. (IF 0.963) Pub Date : 2012
    Ham Ching Lam; Srinand Sreevatsan; Daniel Boley

    Capturing mutation patterns of each individual influenza virus sequence is often challenging; in this paper, we demonstrated that using a binary encoding scheme coupled with dimension reduction technique, we were able to capture the intrinsic mutation pattern of the virus. Our approach looks at the variance between sequences instead of the commonly used p-distance or Hamming distance. We first convert

    更新日期:2020-09-25
  • QR Factorization for the Cell Broadband Engine
    Sci. Program. (IF 0.963) Pub Date : 2009
    Jakub Kurzak; Jack Dongarra

    The QR factorization is one of the most important operations in dense linear algebra, offering a numerically stable method for solving linear systems of equations including overdetermined and underdetermined systems. Modern implementations of the QR factorization, such as the one in the LAPACK library, suffer from performance limitations due to the use of matrix–vector type operations in the phase

    更新日期:2020-09-25
  • Mining Low-Variance Biclusters to Discover Coregulation Modules in Sequencing Datasets
    Sci. Program. (IF 0.963) Pub Date : 2012
    Zhen Hu; Raj Bhatnagar

    High-throughput sequencing (CHIP-Seq) data exhibit binding events with possible binding locations and their strengths, followed by interpretation of the locations of peaks. Recent methods tend to summarize all CHIP-Seq peaks detected within a limited up and down region of each gene into one real-valued score in order to quantify the probability of regulation in a region. Applying subspace clustering

    更新日期:2020-09-25
  • A Framework for Low-Communication 1-D FFT
    Sci. Program. (IF 0.963) Pub Date : 2013
    Ping Tak Peter Tang; Jongsoo Park; Daehyun Kim; Vladimir Petrov

    In high-performance computing on distributed-memory systems, communication often represents a significant part of the overall execution time. The relative cost of communication will certainly continue to rise as compute-density growth follows the current technology and industry trends. Design of lower-communication alternatives to fundamental computational algorithms has become an important field of

    更新日期:2020-09-25
  • Efficient Backprojection-Based Synthetic Aperture Radar Computation with Many-Core Processors
    Sci. Program. (IF 0.963) Pub Date : 2013
    Jongsoo Park; Ping Tak Peter Tang; Mikhail Smelyanskiy; Daehyun Kim; Thomas Benson

    Tackling computationally challenging problems with high efficiency often requires the combination of algorithmic innovation, advanced architecture, and thorough exploitation of parallelism. We demonstrate this synergy through synthetic aperture radar (SAR) via backprojection, an image reconstruction method that can require hundreds of TFLOPS. Computation cost is significantly reduced by our new algorithm

    更新日期:2020-09-25
  • Reshaping Text Data for Efficient Processing on Amazon EC2
    Sci. Program. (IF 0.963) Pub Date : 2011
    Gabriela Turcu; Ian Foster; Svetlozar Nestorov

    Text analysis tools are nowadays required to process increasingly large corpora which are often organized as small files (abstracts, news articles, etc.). Cloud computing offers a convenient, on-demand, pay-as-you-go computing environment for solving such problems. We investigate provisioning on the Amazon EC2 cloud from the user perspective, attempting to provide a scheduling strategy that is both

    更新日期:2020-09-25
  • Experiences with Resource Provisioning for Scientific Workflows Using Corral
    Sci. Program. (IF 0.963) Pub Date : 2010
    Gideon Juve; Ewa Deelman; Karan Vahi; Gaurang Mehta

    The development of grid and workflow technologies has enabled complex, loosely coupled scientific applications to be executed on distributed resources. Many of these applications consist of large numbers of short-duration tasks whose runtimes are heavily influenced by delays in the execution environment. Such applications often perform poorly on the grid because of the large scheduling overheads commonly

    更新日期:2020-09-25
  • From Data to Knowledge to Discoveries: Artificial Intelligence and Scientific Workflows
    Sci. Program. (IF 0.963) Pub Date : 2009
    Yolanda Gil

    Scientific computing has entered a new era of scale and sharing with the arrival of cyberinfrastructure facilities for computational experimentation. A key emerging concept is scientific workflows, which provide a declarative representation of complex scientific applications that can be automatically managed and executed in distributed shared resources. In the coming decades, computational experimentation

    更新日期:2020-09-25
  • CaKernel – A Parallel Application Programming Framework for Heterogenous Computing Architectures
    Sci. Program. (IF 0.963) Pub Date : 2011
    Marek Blazewicz; Steven R. Brandt; Michal Kierzynka; Krzysztof Kurowski; Bogdan Ludwiczak; Jian Tao; Jan Weglarz

    With the recent advent of new heterogeneous computing architectures there is still a lack of parallel problem solving environments that can help scientists to use easily and efficiently hybrid supercomputers. Many scientific simulations that use structured grids to solve partial differential equations in fact rely on stencil computations. Stencil computations have become crucial in solving many challenging

    更新日期:2020-09-25
  • Optimizing UPC Programs for Multi-Core Systems
    Sci. Program. (IF 0.963) Pub Date : 2010
    Yili Zheng

    The Partitioned Global Address Space (PGAS) model of Unified Parallel C (UPC) can help users express and manage application data locality on non-uniform memory access (NUMA) multi-core shared-memory systems to get good performance. First, we describe several UPC program optimization techniques that are important to achieving good performance on NUMA multi-core computers with examples and quantitative

    更新日期:2020-09-25
  • State-of-the-art in Heterogeneous Computing
    Sci. Program. (IF 0.963) Pub Date : 2010
    Andre R. Brodtkorb; Christopher Dyken; Trond R. Hagen; Jon M. Hjelmervik; Olaf O. Storaasli

    Node level heterogeneous architectures have become attractive during the last decade for several reasons: compared to traditional symmetric CPUs, they offer high peak performance and are energy and/or cost efficient. With the increase of fine-grained parallelism in high-performance computing, as well as the introduction of parallelism in workstations, there is an acute need for a good overview and

    更新日期:2020-09-25
  • Implementation of the Two-Point Angular Correlation Function on a High-Performance Reconfigurable Computer
    Sci. Program. (IF 0.963) Pub Date : 2009
    Volodymyr V. Kindratenko; Adam D. Myers; Robert J. Brunner

    We present a parallel implementation of an algorithm for calculating the two-point angular correlation function as applied in the field of computational cosmology. The algorithm has been specifically developed for a reconfigurable computer. Our implementation utilizes a microprocessor and two reconfigurable processors on a dual-MAP SRC-6 system. The two reconfigurable processors are used as two application-specific

    更新日期:2020-09-25
  • Special Issue: SC13 – The International Conference for High Performance Computing, Networking, Storage and Analysis
    Sci. Program. (IF 0.963) Pub Date : 2014
    William Gropp; Satoshi Matsuoka

    The technical papers program for SC13 received 449 submissions of which 90 where selected for the program giving an acceptance rate of 20%. A rigorous peer review process, including author rebuttals and a 1.5 day face-to-face program committee meeting ensured that selected papers were the very best in our field. One of the tasks at the face-to-face meeting was also to select finalists for the best

    更新日期:2020-09-25
  • Efficient and Reliable Network Tomography in Heterogeneous Networks Using Bittorrent Broadcasts and Clustering Algorithms
    Sci. Program. (IF 0.963) Pub Date : 2013
    Kiril Dichev; Fergal Reid; Alexey Lastovetsky

    In the area of network performance and discovery, network tomography focuses on reconstructing network properties using only end-to-end measurements at the application layer. One challenging problem in network tomography is reconstructing available bandwidth along all links during multiple source/multiple destination transmissions. The traditional measurement procedures used for bandwidth tomography

    更新日期:2020-09-25
  • Acceleration of a CFD Code with a GPU
    Sci. Program. (IF 0.963) Pub Date : 2010
    Dennis C. Jespersen

    The Computational Fluid Dynamics code OVERFLOW includes as one of its solver options an algorithm which is a fairly small piece of code but which accounts for a significant portion of the total computational time. This paper studies some of the issues in accelerating this piece of code by using a Graphics Processing Unit (GPU). The algorithm needs to be modified to be suitable for a GPU and attention

    更新日期:2020-09-25
  • Biological Knowledge Discovery and Data Mining
    Sci. Program. (IF 0.963) Pub Date : 2012
    Mohammad Al Hasan; Jun Huan; Jake Chen; Mohammed J. Zaki

    This article has no abstract.

    更新日期:2020-09-25
  • Containment Domains: A Scalable, Efficient and Flexible Resilience Scheme for Exascale Systems
    Sci. Program. (IF 0.963) Pub Date : 2013
    Jinsuk Chung; Ikhwan Lee; Michael Sullivan; Jee Ho Ryoo; Dong Wan Kim; Doe Hyun Yoon; Larry Kaplan; Mattan Erez

    This paper describes and evaluates a scalable and efficient resilience scheme based on the concept of containment domains. Containment domains are a programming construct that enable applications to express resilience needs and to interact with the system to tune and specialize error detection, state preservation and restoration, and recovery schemes. Containment domains have weak transactional semantics

    更新日期:2020-09-25
  • McrEngine: A Scalable Checkpointing System Using Data-Aware Aggregation and Compression
    Sci. Program. (IF 0.963) Pub Date : 2013
    Tanzima Zerin Islam; Kathryn Mohror; Saurabh Bagchi; Adam Moody; Bronis R. de Supinski; Rudolf Eigenmann

    High performance computing (HPC) systems use checkpoint-restart to tolerate failures. Typically, applications store their states in checkpoints on a parallel file system (PFS). As applications scale up, checkpoint-restart incurs high overheads due to contention for PFS resources. The high overheads force large-scale applications to reduce checkpoint frequency, which means more compute time is lost

    更新日期:2020-09-25
  • A Computational Framework for Flood Risk Assessment in The Netherlands
    Sci. Program. (IF 0.963) Pub Date : 2010
    A.A. Markus; W.M.G. Courage; M.C.L.M. van Mierlo

    The safety of dikes in The Netherlands, located in the delta of the rivers Rhine, Meuse and Scheldt, has been the subject of debate for more than ten years. The safety (or flood risk) of a particular area may depend on the safety of other areas. This is referred to as effects of river system behaviour on flood risk (quantified as the estimated number of casualties and economic damage). A computational

    更新日期:2020-09-25
  • Automating Embedded Analysis Capabilities and Managing Software Complexity in Multiphysics Simulation, Part I: Template-Based Generic Programming
    Sci. Program. (IF 0.963) Pub Date : 2012
    Roger P. Pawlowski; Eric T. Phipps; Andrew G. Salinger

    An approach for incorporating embedded simulation and analysis capabilities in complex simulation codes through template-based generic programming is presented. This approach relies on templating and operator overloading within the C++ language to transform a given calculation into one that can compute a variety of additional quantities that are necessary for many state-of-the-art simulation and analysis

    更新日期:2020-09-25
  • MATLAB-Like Scripting of Java Scientific Libraries in ScalaLab
    Sci. Program. (IF 0.963) Pub Date : 2014
    Stergios Papadimitriou; Seferina Mavroudi; Kostas Theofilatos; Spiridon Likothanasis

    Although there are a lot of robust and effective scientific libraries in Java, the utilization of these libraries in pure Java is difficult and cumbersome, especially for the average scientist that does not expertise in software development. We illustrate that ScalaLab presents an easier and productive MATLAB like front end. Also, the main strengths and weaknesses of the core Java libraries of ScalaLab

    更新日期:2020-09-25
  • Manycore Performance-Portability: Kokkos Multidimensional Array Library
    Sci. Program. (IF 0.963) Pub Date : 2012
    H. Carter Edwards; Daniel Sunderland; Vicki Porter; Chris Amsler; Sam Mish

    Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern manycore accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The Kokkos

    更新日期:2020-09-25
  • Evaluating Multicore Algorithms on the Unified Memory Model
    Sci. Program. (IF 0.963) Pub Date : 2009
    John E. Savage; Mohammad Zubair

    One of the challenges to achieving good performance on multicore architectures is the effective utilization of the underlying memory hierarchy. While this is an issue for single-core architectures, it is a critical problem for multicore chips. In this paper, we formulate the unified multicore model (UMM) to help understand the fundamental limits on cache performance on these architectures. The UMM

    更新日期:2020-09-25
  • Exploiting Fine-Grain Thread Parallelism on Multicore Architectures
    Sci. Program. (IF 0.963) Pub Date : 2009
    P.E. Hadjidoukas; G.Ch. Philos; V.V. Dimakopoulos

    In this work we present a runtime threading system which provides an efficient substrate for fine-grain parallelism, suitable for deployment in multicore platforms. Its architecture encompasses a number of optimizations that make it particularly effective in managing a large number of threads and with low overheads. The runtime system has been integrated into an OpenMP implementation to allow for transparent

    更新日期:2020-09-25
  • Direction-Optimizing Breadth-First Search
    Sci. Program. (IF 0.963) Pub Date : 2013
    Scott Beamer; Krste Asanović; David Patterson

    Breadth-First Search is an important kernel used by many graph-processing applications. In many of these emerging applications of BFS, such as analyzing social networks, the input graphs are low-diameter and scale-free. We propose a hybrid approach that is advantageous for low-diameter graphs, which combines a conventional top-down algorithm along with a novel bottom-up algorithm. The bottom-up algorithm

    更新日期:2020-09-25
  • MPI Runtime Error Detection with MUST: Advances in Deadlock Detection
    Sci. Program. (IF 0.963) Pub Date : 2013
    Tobias Hilbrich; Joachim Protze; Martin Schulz; Bronis R. de Supinski; Matthias S. Müller

    The widely used Message Passing Interface (MPI) is complex and rich. As a result, application developers require automated tools to avoid and to detect MPI programming errors. We present the Marmot Umpire Scalable Tool (MUST) that detects such errors with significantly increased scalability. We present improvements to our graph-based deadlock detection approach for MPI, which cover future MPI extensions

    更新日期:2020-09-25
  • High Performance Protein Sequence Database Scanning on the Cell Broadband Engine
    Sci. Program. (IF 0.963) Pub Date : 2009
    Adrianto Wirawan; Bertil Schmidt; Huiliang Zhang; Chee Keong Kwoh

    The enormous growth of biological sequence databases has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing rapidly as well. The recent emergence of low cost parallel multicore accelerator technologies has made it possible to reduce execution times of many bioinformatics applications

    更新日期:2020-09-25
  • Solving PDEs with Intrepid
    Sci. Program. (IF 0.963) Pub Date : 2012
    P. Bochev; H.C. Edwards; R.C. Kirby; K. Peterson; D. Ridzal

    Intrepid is a Trilinos package for advanced discretizations of Partial Differential Equations (PDEs). The package provides a comprehensive set of tools for local, cell-based construction of a wide range of numerical methods for PDEs. This paper describes the mathematical ideas and software design principles incorporated in the package. We also provide representative examples showcasing the use of Intrepid

    更新日期:2020-09-25
  • Strong Scaling Analysis of a Parallel, Unstructured, Implicit Solver and the Influence of the Operating System Interference
    Sci. Program. (IF 0.963) Pub Date : 2009
    Onkar Sahni; Christopher D. Carothers; Mark S. Shephard; Kenneth E. Jansen

    PHASTA falls under the category of high-performance scientific computation codes designed for solving partial differential equations (PDEs). Its a massively parallel unstructured, implicit solver with particular emphasis on fluid dynamics (CFD) applications. More specifically, PHASTA is a parallel, hierarchic, adaptive, stabilized, transient analysis code that effectively employs advanced anisotropic

    更新日期:2020-09-25
  • Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E.
    Sci. Program. (IF 0.963) Pub Date : 2009
    Olaf Lubeck; Michael Lang; Ram Srinivasan; Greg Johnson

    The IBM Cell Broadband Engine (BE) is a novel multi-core chip with the potential for the demanding floating point performance that is required for high-fidelity scientific simulations. However, data movement within the chip can be a major challenge to realizing the benefits of the peak floating point rates. In this paper, we present the results of implementing Sweep3D on the Cell/B.E. using an intra-chip

    更新日期:2020-09-25
  • Concurrent Collections
    Sci. Program. (IF 0.963) Pub Date : 2010
    Zoran Budimlić; Michael Burke; Vincent Cavé; Kathleen Knobe; Geoff Lowney; Ryan Newton; Jens Palsberg; David Peixotto; Vivek Sarkar; Frank Schlimbach; Sağnak Taşırlar

    We introduce the Concurrent Collections (CnC) programming model. CnC supports flexible combinations of task and data parallelism while retaining determinism. CnC is implicitly parallel, with the user providing high-level operations along with semantic ordering constraints that together form a CnC graph. We formally describe the execution semantics of CnC and prove that the model guarantees deterministic

    更新日期:2020-09-25
  • Improving Accuracy for Matrix Multiplications on GPUs
    Sci. Program. (IF 0.963) Pub Date : 2011
    Matthew Badin; Lubomir Bic; Michael Dillencourt; Alexandru Nicolau

    Reproducibility of an experiment is a commonly used metric to determine its validity. Within scientific computing, this can become difficult due to the accumulation of floating point rounding errors in the numerical computation, greatly reducing the accuracy of the computation. Matrix multiplication is particularly susceptible to these rounding errors which is why there exist so many solutions, ranging

    更新日期:2020-09-25
  • Erratum
    Sci. Program. (IF 0.963) Pub Date : 2009
    Sandya S. Mannarswamy

    This article has no abstract.

    更新日期:2020-09-25
  • Implementation of Scientific Computing Applications on the Cell Broadband Engine
    Sci. Program. (IF 0.963) Pub Date : 2009
    Guochun Shi; Volodymyr V. Kindratenko; Ivan S. Ufimtsev; Todd J. Martinez; James C. Phillips; Steven A. Gottlieb

    The Cell Broadband Engine architecture is a revolutionary processor architecture well suited for many scientific codes. This paper reports on an effort to implement several traditional high-performance scientific computing applications on the Cell Broadband Engine processor, including molecular dynamics, quantum chromodynamics and quantum chemistry codes. The paper discusses data and code restructuring

    更新日期:2020-09-25
  • Storage Qos Provisioning for Execution Programming of Data-Intensive Applications
    Sci. Program. (IF 0.963) Pub Date : 2012
    Renata Słota

    In this paper a method for execution programming of data-intensive applications is presented. The method is based on storage Quality of Service (SQoS) provisioning. SQoS provisioning uses the semantic based storage monitoring based on a storage resources model and a storage performance management. Test results show the gain for the execution time when using the QStorMan toolkit which implements the

    更新日期:2020-09-25
  • Tpetra, and the Use of Generic Programming in Scientific Computing
    Sci. Program. (IF 0.963) Pub Date : 2012
    C.G. Baker; M.A. Heroux

    We present Tpetra, a Trilinos package for parallel linear algebra primitives implementing the Petra object model. We describe Tpetra's design, based on generic programming via C++ templated types and template metaprogramming. We discuss some benefits of this approach in the context of scientific computing, with illustrations consisting of code and notable empirical results.

    更新日期:2020-09-25
  • Python for Scientific Computing Education: Modeling of Queueing Systems
    Sci. Program. (IF 0.963) Pub Date : 2014
    Vladimiras Dolgopolovas; Valentina Dagienė; Saulius Minkevičius; Leonidas Sakalauskas

    In this paper, we present the methodology for the introduction to scientific computing based on model-centered learning. We propose multiphase queueing systems as a basis for learning objects. We use Python and parallel programming for implementing the models and present the computer code and results of stochastic simulations.

    更新日期:2020-09-25
  • Template Metaprogramming Techniques for Concept-Based Specialization
    Sci. Program. (IF 0.963) Pub Date : 2013
    Bruno Bachelet; Antoine Mahul; Loïc Yon

    In generic programming, software components are parameterized on types. When available, a static specialization mechanism allows selecting, for a given set of parameters, a more suitable version of a generic component than its primary version. The normal C++ template specialization mechanism is based on the type pattern of the parameters, which is not always the best way to guide the specialization

    更新日期:2020-09-25
  • A Divide and Conquer Strategy for Scaling Weather Simulations with Multiple Regions of Interest
    Sci. Program. (IF 0.963) Pub Date : 2013
    Preeti Malakar; Thomas George; Sameer Kumar; Rashmi Mittal; Vijay Natarajan; Yogish Sabharwal; Vaibhav Saxena; Sathish S. Vadhiyar

    Accurate and timely prediction of weather phenomena, such as hurricanes and flash floods, require high-fidelity compute intensive simulations of multiple finer regions of interest within a coarse simulation domain. Current weather applications execute these nested simulations sequentially using all the available processors, which is sub-optimal due to their sub-linear scalability. In this work, we

    更新日期:2020-09-25
  • Assessing the Effects of Data Compression in Simulations Using Physically Motivated Metrics
    Sci. Program. (IF 0.963) Pub Date : 2014
    Daniel Laney; Steven Langer; Christopher Weber; Peter Lindstrom; Al Wegener

    This paper examines whether lossy compression can be used effectively in physics simulations as a possible strategy to combat the expected data-movement bottleneck in future high performance computing architectures. We show that, for the codes and simulations we tested, compression levels of 3–5X can be applied without causing significant changes to important physical quantities. Rather than applying

    更新日期:2020-09-25
  • Enabling Locality-Aware Computations in OpenMP
    Sci. Program. (IF 0.963) Pub Date : 2010
    Lei Huang; Haoqiang Jin; Liqi Yi; Barbara Chapman

    Locality of computation is key to obtaining high performance on a broad variety of parallel architectures and applications. It is moreover an essential component of strategies for energy-efficient computing. OpenMP is a widely available industry standard for shared memory programming. With the pervasive deployment of multi-core computers and the steady growth in core count, a productive programming

    更新日期:2020-09-25
  • Autonomic Management of Application Workflows on Hybrid Computing Infrastructure
    Sci. Program. (IF 0.963) Pub Date : 2011
    Hyunjoo Kim; Yaakoub el-Khamra; Ivan Rodero; Shantenu Jha; Manish Parashar

    In this paper, we present a programming and runtime framework that enables the autonomic management of complex application workflows on hybrid computing infrastructures. The framework is designed to address system and application heterogeneity and dynamics to ensure that application objectives and constraints are satisfied. The need for such autonomic system and application management is becoming critical

    更新日期:2020-09-25
  • A Programming Model Performance Study Using the NAS Parallel Benchmarks
    Sci. Program. (IF 0.963) Pub Date : 2010
    Hongzhang Shan; Filip Blagojević; Seung-Jai Min; Paul Hargrove; Haoqiang Jin; Karl Fuerlinger; Alice Koniges; Nicholas J. Wright

    Harnessing the power of multicore platforms is challenging due to the additional levels of parallelism present. In this paper we use the NAS Parallel Benchmarks to study three programming models, MPI, OpenMP and PGAS to understand their performance and memory usage characteristics on current multicore architectures. To understand these characteristics we use the Integrated Performance Monitoring tool

    更新日期:2020-09-25
  • Implementing a Parallel Matrix Factorization Library on the Cell Broadband Engine
    Sci. Program. (IF 0.963) Pub Date : 2009
    B.C. Vishwas; Abhishek Gadia; Mainak Chaudhuri

    Matrix factorization (or often called decomposition) is a frequently used kernel in a large number of applications ranging from linear solvers to data clustering and machine learning. The central contribution of this paper is a thorough performance study of four popular matrix factorization techniques, namely, LU, Cholesky, QR and SVD on the STI Cell broadband engine. The paper explores algorithmic

    更新日期:2020-09-25
  • Early Observations on the Performance of Windows Azure
    Sci. Program. (IF 0.963) Pub Date : 2011
    Zach Hill; Jie Li; Ming Mao; Arkaitz Ruiz-Alvarez; Marty Humphrey

    A significant open issue in cloud computing is the real performance of the infrastructure. Few, if any, cloud providers or technologies offer quantitative performance guarantees. Regardless of the potential advantages of the cloud in comparison to enterprise-deployed applications, cloud infrastructures may ultimately fail if deployed applications cannot predictably meet behavioral requirements. In

    更新日期:2020-09-25
  • CellSs: Scheduling Techniques to Better Exploit Memory Hierarchy
    Sci. Program. (IF 0.963) Pub Date : 2009
    Pieter Bellens; Josep M. Perez; Felipe Cabarcas; Alex Ramirez; Rosa M. Badia; Jesus Labarta

    Cell Superscalar's (CellSs) main goal is to provide a simple, flexible and easy programming approach for the Cell Broadband Engine (Cell/B.E.) that automatically exploits the inherent concurrency of the applications at a task level. The CellSs environment is based on a source-to-source compiler that translates annotated C or Fortran code and a runtime library tailored for the Cell/B.E. that takes care

    更新日期:2020-09-25
  • From Single- to Multi-Objective Auto-Tuning of Programs: Advantages and Implications
    Sci. Program. (IF 0.963) Pub Date : 2014
    Juan Durillo; Thomas Fahringer

    Automatic tuning (auto-tuning) of software has emerged in recent years as a promising method that tries to automatically adapt the behaviour of a program to attain different performance objectives on a given computing system. This method is gaining momentum due to the increasing complexity of modern multicore-based hardware architectures. Many solutions to auto-tuning have been explored ranging from

    更新日期:2020-09-25
  • 3D Seismic Imaging through Reverse-Time Migration on Homogeneous and Heterogeneous Multi-Core Processors
    Sci. Program. (IF 0.963) Pub Date : 2009
    Mauricio Araya-Polo; Félix Rubio; Raúl de la Cruz; Mauricio Hanzich; José María Cela; Daniele Paolo Scarpazza

    Reverse-Time Migration (RTM) is a state-of-the-art technique in seismic acoustic imaging, because of the quality and integrity of the images it provides. Oil and gas companies trust RTM with crucial decisions on multi-million-dollar drilling investments. But RTM requires vastly more computational power than its predecessor techniques, and this has somewhat hindered its practical success. On the other

    更新日期:2020-09-25
  • Mesh Algorithms for PDE with Sieve I: Mesh Distribution
    Sci. Program. (IF 0.963) Pub Date : 2009
    Matthew G. Knepley; Dmitry A. Karpeev

    We have developed a new programming framework, called Sieve, to support parallel numerical partial differential equation(s) (PDE) algorithms operating over distributed meshes. We have also developed a reference implementation of Sieve in C++ as a library of generic algorithms operating on distributed containers conforming to the Sieve interface. Sieve makes instances of the incidence relation, or arrows

    更新日期:2020-09-25
  • Special Issue: Selected Papers from Super Computing 2012
    Sci. Program. (IF 0.963) Pub Date : 2013
    Jeffrey S. Vetter; Padma Raghavan

    This article has no abstract.

    更新日期:2020-09-25
  • ePRO-MP: A Tool for Profiling and Optimizing Energy and Performance of Mobile Multiprocessor Applications
    Sci. Program. (IF 0.963) Pub Date : 2009
    Wonil Choi; Hyunhee Kim; Wook Song; Jiseok Song; Jihong Kim

    For mobile multiprocessor applications, achieving high performance with low energy consumption is a challenging task. In order to help programmers to meet these design requirements, system development tools play an important role. In this paper, we describe one such development tool, ePRO-MP, which profiles and optimizes both performance and energy consumption of multi-threaded applications running

    更新日期:2020-09-25
  • The Science DMZ: A Network Design Pattern for Data-Intensive Science
    Sci. Program. (IF 0.963) Pub Date : 2014
    Eli Dart; Lauren Rotman; Brian Tierney; Mary Hester; Jason Zurawski

    The ever-increasing scale of scientific data has become a significant challenge for researchers that rely on networks to interact with remote computing systems and transfer results to collaborators worldwide. Despite the availability of high-capacity connections, scientists struggle with inadequate cyberinfrastructure that cripples data transfer performance, and impedes scientific progress. The Science

    更新日期:2020-09-25
  • A Parallel Ghosting Algorithm for The Flexible Distributed Mesh Database
    Sci. Program. (IF 0.963) Pub Date : 2013
    Misbah Mubarak; Seegyoung Seol; Qiukai Lu; Mark S. Shephard

    Critical to the scalability of parallel adaptive simulations are parallel control functions including load balancing, reduced inter-process communication and optimal data decomposition. In distributed meshes, many mesh-based applications frequently access neighborhood information for computational purposes which must be transmitted efficiently to avoid parallel performance degradation when the neighbors

    更新日期:2020-09-25
  • Special Issue: Exploring Languages for Expressing Medium to Massive On-Chip Parallelism
    Sci. Program. (IF 0.963) Pub Date : 2010
    Gabriele Jost; Alice Koniges

    This article has no abstract.

    更新日期:2020-09-25
  • Scalability of Parallel Scientific Applications on the Cloud
    Sci. Program. (IF 0.963) Pub Date : 2011
    Satish Narayana Srirama; Oleg Batrashev; Pelle Jakovits; Eero Vainikko

    Cloud computing, with its promise of virtually infinite resources, seems to suit well in solving resource greedy scientific computing problems. To study the effects of moving parallel scientific applications onto the cloud, we deployed several benchmark applications like matrix–vector operations and NAS parallel benchmarks, and DOUG (Domain decomposition On Unstructured Grids) on the cloud. DOUG is

    更新日期:2020-09-25
  • Building High-Resolution Sky Images Using the Cell/B.E.
    Sci. Program. (IF 0.963) Pub Date : 2009
    Ana Lucia Varbanescu; Alexander S. van Amesfoort; Tim Cornwell; Ger van Diepen; Rob van Nieuwpoort; Bruce G. Elmegreen; Henk Sips

    The performance potential of the Cell/B.E., as well as its availability, have attracted a lot of attention from various high-performance computing (HPC) fields. While computation intensive kernels proved to be exceptionally well suited for running on the Cell, irregular data-intensive applications are usually considered as poor matches. In this paper, we present our complete solution for enabling such

    更新日期:2020-09-25
  • ELASTIC: A Large Scale Dynamic Tuning Environment
    Sci. Program. (IF 0.963) Pub Date : 2014
    Andrea Martínez; Anna Sikora; Eduardo César; Joan Sorribes

    The spectacular growth in the number of cores in current supercomputers poses design challenges for the development of performance analysis and tuning tools. To be effective, such analysis and tuning tools must be scalable and be able to manage the dynamic behaviour of parallel applications. In this work, we present ELASTIC, an environment for dynamic tuning of large-scale parallel applications. To

    更新日期:2020-09-25
  • Design Considerations for a Flexible Multigrid Preconditioning Library
    Sci. Program. (IF 0.963) Pub Date : 2012
    Jérémie Gaidamour; Jonathan Hu; Chris Siefert; Ray Tuminaro

    MueLu is a library within the Trilinos software project [An overview of Trilinos, Technical Report SAND2003-2927, Sandia National Laboratories, 2003] and provides a framework for parallel multigrid preconditioning methods for large sparse linear systems. While providing efficient implementations of modern multigrid methods based on smoothed aggregation and energy minimization concepts, MueLu is designed

    更新日期:2020-09-25
Contents have been reproduced by permission of the publishers.
导出
全部期刊列表>>
物理学研究前沿热点精选期刊推荐
chemistry
自然职位线上招聘会
欢迎报名注册2020量子在线大会
化学领域亟待解决的问题
材料学研究精选新
GIANT
ACS ES&T Engineering
ACS ES&T Water
ACS Publications填问卷
屿渡论文,编辑服务
阿拉丁试剂right
南昌大学
王辉
南方科技大学
彭小水
隐藏1h前已浏览文章
课题组网站
新版X-MOL期刊搜索和高级搜索功能介绍
ACS材料视界
天合科研
x-mol收录
赵延川
李霄羽
廖矿标
朱守非
试剂库存
down
wechat
bug