显示样式： 排序： IF:  GO 导出

Exascale models of stellar explosions: Quintessential multiphysics simulation Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210720
J. Austin Harris, Ran Chu, Sean M Couch, Anshu Dubey, Eirik Endeve, Antigoni Georgiadou, Rajeev Jain, Daniel Kasen, M P Laiu, OE B Messer, Jared O’Neal, Michael A Sandoval, Klaus WeideThe ExaStar project aims to deliver an efficient, versatile, and portable software ecosystem for multiphysics astrophysics simulations run on exascale machines. The code suite is a componentbased multiphysics toolkit, built on the capabilities of current simulation codes (in particular FlashX and Castro), and based on the massively parallel adaptive mesh refinement framework AMReX. It includes

Exascale models of stellar explosions: Quintessential multiphysics simulation Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210720
J. Austin Harris, Ran Chu, Sean M Couch, Anshu Dubey, Eirik Endeve, Antigoni Georgiadou, Rajeev Jain, Daniel Kasen, M P Laiu, OE B Messer, Jared O’Neal, Michael A Sandoval, Klaus WeideThe ExaStar project aims to deliver an efficient, versatile, and portable software ecosystem for multiphysics astrophysics simulations run on exascale machines. The code suite is a componentbased multiphysics toolkit, built on the capabilities of current simulation codes (in particular FlashX and Castro), and based on the massively parallel adaptive mesh refinement framework AMReX. It includes

Unprecedented cloud resolution in a GPUenabled fullphysics atmospheric climate simulation on OLCF’s summit supercomputer Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210716
Matthew R Norman, David A Bader, Christopher Eldred, Walter M Hannah, Benjamin R Hillman, Christopher R Jones, Jungmin M Lee, LR Leung, Isaac Lyngaas, Kyle G Pressel, Sarat Sreepathi, Mark A Taylor, Xingqiu YuanClouds represent a key uncertainty in future climate projection. While explicit cloud resolution remains beyond our computational grasp for global climate, we can incorporate important cloud effects through a computational middle ground called the Multiscale Modeling Framework (MMF), also known as Super Parameterization. This algorithmic approach embeds highresolution Cloud Resolving Models (CRMs)

Unprecedented cloud resolution in a GPUenabled fullphysics atmospheric climate simulation on OLCF’s summit supercomputer Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210716
Matthew R Norman, David A Bader, Christopher Eldred, Walter M Hannah, Benjamin R Hillman, Christopher R Jones, Jungmin M Lee, LR Leung, Isaac Lyngaas, Kyle G Pressel, Sarat Sreepathi, Mark A Taylor, Xingqiu YuanClouds represent a key uncertainty in future climate projection. While explicit cloud resolution remains beyond our computational grasp for global climate, we can incorporate important cloud effects through a computational middle ground called the Multiscale Modeling Framework (MMF), also known as Super Parameterization. This algorithmic approach embeds highresolution Cloud Resolving Models (CRMs)

Enabling particle applications for exascale computing platforms Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210701
Susan M Mniszewski, James Belak, JeanLuc Fattebert, Christian FA Negre, Stuart R Slattery, Adetokunbo A Adedoyin, Robert F Bird, Choongseok Chang, Guangye Chen, Stéphane Ethier, Shane Fogerty, Salman Habib, Christoph Junghans, Damien LebrunGrandié, Jamaludin MohdYusof, Stan G Moore, Daniel OseiKuffuor, Steven J Plimpton, Adrian Pope, Samuel Temple Reeve, Lee Ricketson, Aaron Scheinberg, Amil YThe Exascale Computing Project (ECP) is invested in codesign to assure that key applications are ready for exascale computing. Within ECP, the Codesign Center for Particle Applications (CoPA) is addressing challenges faced by particlebased applications across four “submotifs”: shortrange particle–particle interactions (e.g., those which often dominate molecular dynamics (MD) and smoothed particle

Enabling particle applications for exascale computing platforms Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210701
Susan M Mniszewski, James Belak, JeanLuc Fattebert, Christian FA Negre, Stuart R Slattery, Adetokunbo A Adedoyin, Robert F Bird, Choongseok Chang, Guangye Chen, Stéphane Ethier, Shane Fogerty, Salman Habib, Christoph Junghans, Damien LebrunGrandié, Jamaludin MohdYusof, Stan G Moore, Daniel OseiKuffuor, Steven J Plimpton, Adrian Pope, Samuel Temple Reeve, Lee Ricketson, Aaron Scheinberg, Amil YThe Exascale Computing Project (ECP) is invested in codesign to assure that key applications are ready for exascale computing. Within ECP, the Codesign Center for Particle Applications (CoPA) is addressing challenges faced by particlebased applications across four “submotifs”: shortrange particle–particle interactions (e.g., those which often dominate molecular dynamics (MD) and smoothed particle

A survey of software implementations used by application codes in the Exascale Computing Project Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210625
Thomas M Evans, Andrew Siegel, Erik W Draeger, Jack Deslippe, Marianne M Francois, Timothy C Germann, William E Hart, Daniel F MartinThe US Department of Energy Office of Science and the National Nuclear Security Administration initiated the Exascale Computing Project (ECP) in 2016 to prepare missionrelevant applications and scientific software for the delivery of the exascale computers starting in 2023. The ECP currently supports 24 efforts directed at specific applications and six supporting codesign projects. These 24 application

A survey of software implementations used by application codes in the Exascale Computing Project Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210625
Thomas M Evans, Andrew Siegel, Erik W Draeger, Jack Deslippe, Marianne M Francois, Timothy C Germann, William E Hart, Daniel F MartinThe US Department of Energy Office of Science and the National Nuclear Security Administration initiated the Exascale Computing Project (ECP) in 2016 to prepare missionrelevant applications and scientific software for the delivery of the exascale computers starting in 2023. The ECP currently supports 24 efforts directed at specific applications and six supporting codesign projects. These 24 application

Multiphysics coupling in the Exascale computing project Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210623
Thomas M Evans, Julia C WhiteMultiphysics coupling presents a significant challenge in terms of both computational accuracy and performance. Achieving high performance on coupled simulations can be particularly challenging in a highperformance computing context. The US Department of Energy Exascale Computing Project has the mission to prepare missionrelevant applications for the delivery of the exascale computers starting in

Multiphysics coupling in the Exascale computing project Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210623
Thomas M Evans, Julia C WhiteMultiphysics coupling presents a significant challenge in terms of both computational accuracy and performance. Achieving high performance on coupled simulations can be particularly challenging in a highperformance computing context. The US Department of Energy Exascale Computing Project has the mission to prepare missionrelevant applications for the delivery of the exascale computers starting in

AMReX: Blockstructured adaptive mesh refinement for multiphysics applications Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210612
Weiqun Zhang, Andrew Myers, Kevin Gott, Ann Almgren, John BellBlockstructured adaptive mesh refinement (AMR) provides the basis for the temporal and spatial discretization strategy for a number of Exascale Computing Project applications in the areas of accelerator design, additive manufacturing, astrophysics, combustion, cosmology, multiphase flow, and wind plant modeling. AMReX is a software framework that provides a unified infrastructure with the functionality

Online data analysis and reduction: An important Codesign motif for extremescale computers Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210612
Ian Foster, Mark Ainsworth, Julie Bessac, Franck Cappello, Jong Choi, Sheng Di, Zichao Di, Ali M Gok, Hanqi Guo, Kevin A Huck, Christopher Kelly, Scott Klasky, Kerstin Kleese van Dam, Xin Liang, Kshitij Mehta, Manish Parashar, Tom Peterka, Line Pouchard, Tong Shu, Ozan Tugluk, Hubertus van Dam, Lipeng Wan, Matthew Wolf, Justin M Wozniak, Wei Xu, Igor Yakushin, Shinjae Yoo, Todd MunsonA growing disparity between supercomputer computation speeds and I/O rates means that it is rapidly becoming infeasible to analyze supercomputer application output only after that output has been written to a file system. Instead, datagenerating applications must run concurrently with data reduction and/or analysis operations, with which they exchange information via highspeed methods such as interprocess

AMReX: Blockstructured adaptive mesh refinement for multiphysics applications Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210612
Weiqun Zhang, Andrew Myers, Kevin Gott, Ann Almgren, John BellBlockstructured adaptive mesh refinement (AMR) provides the basis for the temporal and spatial discretization strategy for a number of Exascale Computing Project applications in the areas of accelerator design, additive manufacturing, astrophysics, combustion, cosmology, multiphase flow, and wind plant modeling. AMReX is a software framework that provides a unified infrastructure with the functionality

Online data analysis and reduction: An important Codesign motif for extremescale computers Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210612
Ian Foster, Mark Ainsworth, Julie Bessac, Franck Cappello, Jong Choi, Sheng Di, Zichao Di, Ali M Gok, Hanqi Guo, Kevin A Huck, Christopher Kelly, Scott Klasky, Kerstin Kleese van Dam, Xin Liang, Kshitij Mehta, Manish Parashar, Tom Peterka, Line Pouchard, Tong Shu, Ozan Tugluk, Hubertus van Dam, Lipeng Wan, Matthew Wolf, Justin M Wozniak, Wei Xu, Igor Yakushin, Shinjae Yoo, Todd MunsonA growing disparity between supercomputer computation speeds and I/O rates means that it is rapidly becoming infeasible to analyze supercomputer application output only after that output has been written to a file system. Instead, datagenerating applications must run concurrently with data reduction and/or analysis operations, with which they exchange information via highspeed methods such as interprocess

Efficient exascale discretizations: Highorder finite element methods Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210608
Tzanio Kolev, Paul Fischer, Misun Min, Jack Dongarra, Jed Brown, Veselin Dobrev, Tim Warburton, Stanimire Tomov, Mark S Shephard, Ahmad Abdelfattah, Valeria Barra, Natalie Beams, JeanSylvain Camier, Noel Chalmers, Yohann Dudouit, Ali Karakus, Ian Karlin, Stefan Kerkemeier, YuHsiang Lan, David Medina, Elia Merzari, Aleksandr Obabko, Will Pazner, Thilina Rathnayake, Cameron W Smith, Lukas Spies, KasiaEfficient exploitation of exascale architectures requires rethinking of the numerical algorithms used in many largescale applications. These architectures favor algorithms that expose ultra finegrain parallelism and maximize the ratio of floating point operations to energy intensive data movement. One of the few viable approaches to achieve high efficiency in the area of PDE discretizations on unstructured

Efficient exascale discretizations: Highorder finite element methods Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210608
Tzanio Kolev, Paul Fischer, Misun Min, Jack Dongarra, Jed Brown, Veselin Dobrev, Tim Warburton, Stanimire Tomov, Mark S Shephard, Ahmad Abdelfattah, Valeria Barra, Natalie Beams, JeanSylvain Camier, Noel Chalmers, Yohann Dudouit, Ali Karakus, Ian Karlin, Stefan Kerkemeier, YuHsiang Lan, David Medina, Elia Merzari, Aleksandr Obabko, Will Pazner, Thilina Rathnayake, Cameron W Smith, Lukas Spies, KasiaEfficient exploitation of exascale architectures requires rethinking of the numerical algorithms used in many largescale applications. These architectures favor algorithms that expose ultra finegrain parallelism and maximize the ratio of floating point operations to energy intensive data movement. One of the few viable approaches to achieve high efficiency in the area of PDE discretizations on unstructured

Coupling of regional geophysics and local soilstructure models in the EQSIM faulttostructure earthquake simulation framework Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210525
David McCallen, Houjun Tang, Suiwen Wu, Eric Eckert, Junfei Huang, N Anders PeterssonAccurate understanding and quantification of the risk to critical infrastructure posed by future large earthquakes continues to be a very challenging problem. Earthquake phenomena are quite complex and traditional approaches to predicting ground motions for future earthquake events have historically been empirically based whereby measured ground motion data from historical earthquakes are homogenized

Coupling of regional geophysics and local soilstructure models in the EQSIM faulttostructure earthquake simulation framework Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210525
David McCallen, Houjun Tang, Suiwen Wu, Eric Eckert, Junfei Huang, N Anders PeterssonAccurate understanding and quantification of the risk to critical infrastructure posed by future large earthquakes continues to be a very challenging problem. Earthquake phenomena are quite complex and traditional approaches to predicting ground motions for future earthquake events have historically been empirically based whereby measured ground motion data from historical earthquakes are homogenized

The Exascale Framework for High Fidelity coupled Simulations (EFFIS): Enabling whole device modeling in fusion science Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210524
Eric Suchyta, Scott Klasky, Norbert Podhorszki, Matthew Wolf, Abolaji Adesoji, CS Chang, Jong Choi, Philip E Davis, Julien Dominski, Stéphane Ethier, Ian Foster, Kai Germaschewski, Berk Geveci, Chris Harris, Kevin A Huck, Qing Liu, Jeremy Logan, Kshitij Mehta, Gabriele Merlo, Shirley V Moore, Todd Munson, Manish Parashar, David Pugmire, Mark S Shephard, Cameron W Smith, Pradeep Subedi, Lipeng Wan,We present the Exascale Framework for High Fidelity coupled Simulations (EFFIS), a workflow and code coupling framework developed as part of the Whole Device Modeling Application (WDMApp) in the Exascale Computing Project. EFFIS consists of a library, command line utilities, and a collection of runtime daemons. Together, these software products enable users to easily compose and execute workflows

The Exascale Framework for High Fidelity coupled Simulations (EFFIS): Enabling whole device modeling in fusion science Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210524
Eric Suchyta, Scott Klasky, Norbert Podhorszki, Matthew Wolf, Abolaji Adesoji, CS Chang, Jong Choi, Philip E Davis, Julien Dominski, Stéphane Ethier, Ian Foster, Kai Germaschewski, Berk Geveci, Chris Harris, Kevin A Huck, Qing Liu, Jeremy Logan, Kshitij Mehta, Gabriele Merlo, Shirley V Moore, Todd Munson, Manish Parashar, David Pugmire, Mark S Shephard, Cameron W Smith, Pradeep Subedi, Lipeng Wan,We present the Exascale Framework for High Fidelity coupled Simulations (EFFIS), a workflow and code coupling framework developed as part of the Whole Device Modeling Application (WDMApp) in the Exascale Computing Project. EFFIS consists of a library, command line utilities, and a collection of runtime daemons. Together, these software products enable users to easily compose and execute workflows

Parallel encryption of input and output data for HPC applications Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210518
Leigh LapworthA methodology for protecting confidential data sets on thirdparty HPC systems is reported. This is based on the NIST AES algorithm and supports the common ECB, CTR and CBC modes. The methodology is built on a flexible programming model that delegates management of the encryption key to the application code. The methodology also includes a finegrain control over which arrays on the files are encrypted

Parallel encryption of input and output data for HPC applications Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210518
Leigh LapworthA methodology for protecting confidential data sets on thirdparty HPC systems is reported. This is based on the NIST AES algorithm and supports the common ECB, CTR and CBC modes. The methodology is built on a flexible programming model that delegates management of the encryption key to the application code. The methodology also includes a finegrain control over which arrays on the files are encrypted

A GPUaccelerated adaptive FSAI preconditioner for massively parallel simulations Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210517
Giovanni Isotton, Carlo Janna, Massimo BernaschiThe solution of linear systems of equations is a central task in a number of scientific and engineering applications. In many cases the solution of linear systems may take most of the simulation time thus representing a major bottleneck in the further development of scientific and technical software. For large scale simulations, nowadays accounting for several millions or even billions of unknowns

A GPUaccelerated adaptive FSAI preconditioner for massively parallel simulations Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210517
Giovanni Isotton, Carlo Janna, Massimo BernaschiThe solution of linear systems of equations is a central task in a number of scientific and engineering applications. In many cases the solution of linear systems may take most of the simulation time thus representing a major bottleneck in the further development of scientific and technical software. For large scale simulations, nowadays accounting for several millions or even billions of unknowns

Demystifying asynchronous I/O Interference in HPC applications Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210513
ShuMei Tseng, Bogdan Nicolae, Franck Cappello, Aparna ChandramowlishwaranWith increasing complexity of HPC workflows, data management services need to perform expensive I/O operations asynchronously in the background, aiming to overlap the I/O with the application runtime. However, this may cause interference due to competition for resources: CPU, memory/network bandwidth. The advent of multicore architectures has exacerbated this problem, as many I/O operations are issued

Highly efficient lattice Boltzmann multiphase simulations of immiscible fluids at highdensity ratios on CPUs and GPUs through code generation Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210513
Markus Holzer, Martin Bauer, Harald Köstler, Ulrich RüdeA highperformance implementation of a multiphase lattice Boltzmann method based on the conservative AllenCahn model supporting highdensity ratios and high Reynolds numbers is presented. Metaprogramming techniques are used to generate optimized code for CPUs and GPUs automatically. The coupled model is specified in a highlevel symbolic description and optimized through automatic transformations

Demystifying asynchronous I/O Interference in HPC applications Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210513
ShuMei Tseng, Bogdan Nicolae, Franck Cappello, Aparna ChandramowlishwaranWith increasing complexity of HPC workflows, data management services need to perform expensive I/O operations asynchronously in the background, aiming to overlap the I/O with the application runtime. However, this may cause interference due to competition for resources: CPU, memory/network bandwidth. The advent of multicore architectures has exacerbated this problem, as many I/O operations are issued

Highly efficient lattice Boltzmann multiphase simulations of immiscible fluids at highdensity ratios on CPUs and GPUs through code generation Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210513
Markus Holzer, Martin Bauer, Harald Köstler, Ulrich RüdeA highperformance implementation of a multiphase lattice Boltzmann method based on the conservative AllenCahn model supporting highdensity ratios and high Reynolds numbers is presented. Metaprogramming techniques are used to generate optimized code for CPUs and GPUs automatically. The coupled model is specified in a highlevel symbolic description and optimized through automatic transformations

Enabling rapid COVID19 small molecule drug design through scalable deep learning of generative models Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210503
Sam Ade Jacobs, Tim Moon, Kevin McLoughlin, Derek Jones, David Hysom, Dong H Ahn, John Gyllenhaal, Pythagoras Watson, Felice C Lightstone, Jonathan E Allen, Ian Karlin, Brian Van EssenWe improved the quality and reduced the time to produce machine learned models for use in small molecule antiviral design. Our globally asynchronous multilevel parallel training approach strong scales to all of Sierra with up to 97.7% efficiency. We trained a novel, characterbased Wasserstein autoencoder that produces a higher quality model trained on 1.613 billion compounds in 23 minutes while the

Enabling rapid COVID19 small molecule drug design through scalable deep learning of generative models Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210503
Sam Ade Jacobs, Tim Moon, Kevin McLoughlin, Derek Jones, David Hysom, Dong H Ahn, John Gyllenhaal, Pythagoras Watson, Felice C Lightstone, Jonathan E Allen, Ian Karlin, Brian Van EssenWe improved the quality and reduced the time to produce machine learned models for use in small molecule antiviral design. Our globally asynchronous multilevel parallel training approach strong scales to all of Sierra with up to 97.7% efficiency. We trained a novel, characterbased Wasserstein autoencoder that produces a higher quality model trained on 1.613 billion compounds in 23 minutes while the

AIdriven multiscale simulations illuminate mechanisms of SARSCoV2 spike dynamics Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210420
Lorenzo Casalino, Abigail C Dommer, Zied Gaieb, Emilia P Barros, Terra Sztain, SurlHee Ahn, Anda Trifan, Alexander Brace, Anthony T Bogetti, Austin Clyde, Heng Ma, Hyungro Lee, Matteo Turilli, Syma Khalid, Lillian T Chong, Carlos Simmerling, David J Hardy, Julio DC Maia, James C Phillips, Thorsten Kurth, Abraham C Stern, Lei Huang, John D McCalpin, Mahidhar Tatineni, Tom Gibbs, John E Stone, ShantenuWe develop a generalizable AIdriven workflow that leverages heterogeneous HPC resources to explore the timedependent dynamics of molecular systems. We use this workflow to investigate the mechanisms of infectivity of the SARSCoV2 spike protein, the main viral infection machinery. Our workflow enables more efficient investigation of spike dynamics in a variety of complex environments, including

AIdriven multiscale simulations illuminate mechanisms of SARSCoV2 spike dynamics Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210420
Lorenzo Casalino, Abigail C Dommer, Zied Gaieb, Emilia P Barros, Terra Sztain, SurlHee Ahn, Anda Trifan, Alexander Brace, Anthony T Bogetti, Austin Clyde, Heng Ma, Hyungro Lee, Matteo Turilli, Syma Khalid, Lillian T Chong, Carlos Simmerling, David J Hardy, Julio DC Maia, James C Phillips, Thorsten Kurth, Abraham C Stern, Lei Huang, John D McCalpin, Mahidhar Tatineni, Tom Gibbs, John E Stone, ShantenuWe develop a generalizable AIdriven workflow that leverages heterogeneous HPC resources to explore the timedependent dynamics of molecular systems. We use this workflow to investigate the mechanisms of infectivity of the SARSCoV2 spike protein, the main viral infection machinery. Our workflow enables more efficient investigation of spike dynamics in a variety of complex environments, including

GPUaccelerated molecular dynamics: Stateofart software performance and porting from Nvidia CUDA to AMD HIP Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210419
Nikolay Kondratyuk, Vsevolod Nikolskiy, Daniil Pavlov, Vladimir StegailovClassical molecular dynamics (MD) calculations represent a significant part of the utilization time of highperformance computing systems. As usual, the efficiency of such calculations is based on an interplay of software and hardware that are nowadays moving to hybrid GPUbased technologies. Several welldeveloped opensource MD codes focused on GPUs differ both in their data management capabilities

Introduction to the Special Issue related to the PowerAware Computing Workshop 2019—PACO 2019 Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210419
Peter Benner, Enrique QuintanaOrtí, Jens SaakPowerawareness in highperformance scientific computing has gained increased interest due to its nonnegligible contributions to carbondioxide emissions and thus, to one of the main drivers of anthropocenic climate change. This is for instance recognized and popularized by the Green500 list,1 which ranks the supercomputers from the TOP500 list in terms of energy efficiency by measuring performance

GPUaccelerated molecular dynamics: Stateofart software performance and porting from Nvidia CUDA to AMD HIP Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210419
Nikolay Kondratyuk, Vsevolod Nikolskiy, Daniil Pavlov, Vladimir StegailovClassical molecular dynamics (MD) calculations represent a significant part of the utilization time of highperformance computing systems. As usual, the efficiency of such calculations is based on an interplay of software and hardware that are nowadays moving to hybrid GPUbased technologies. Several welldeveloped opensource MD codes focused on GPUs differ both in their data management capabilities

Introduction to the Special Issue related to the PowerAware Computing Workshop 2019—PACO 2019 Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210419
Peter Benner, Enrique QuintanaOrtí, Jens SaakPowerawareness in highperformance scientific computing has gained increased interest due to its nonnegligible contributions to carbondioxide emissions and thus, to one of the main drivers of anthropocenic climate change. This is for instance recognized and popularized by the Green500 list,1 which ranks the supercomputers from the TOP500 list in terms of energy efficiency by measuring performance

MFIXExa: A path toward exascale CFDDEM simulations Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210416
Jordan Musser, Ann S Almgren, William D Fullmer, Oscar Antepara, John B Bell, Johannes Blaschke, Kevin Gott, Andrew Myers, Roberto Porcu, Deepak Rangarajan, Michele Rosso, Weiqun Zhang, Madhava SyamlalMFIXExa is a computational fluid dynamics–discrete element model (CFDDEM) code designed to run efficiently on current and nextgeneration supercomputing architectures. MFIXExa combines the CFDDEM expertise embodied in the MFIX code—which was developed at NETL and is used widely in academia and industry—with the modern software framework, AMReX, developed at LBNL. The fundamental physics models

MFIXExa: A path toward exascale CFDDEM simulations Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210416
Jordan Musser, Ann S Almgren, William D Fullmer, Oscar Antepara, John B Bell, Johannes Blaschke, Kevin Gott, Andrew Myers, Roberto Porcu, Deepak Rangarajan, Michele Rosso, Weiqun Zhang, Madhava SyamlalMFIXExa is a computational fluid dynamics–discrete element model (CFDDEM) code designed to run efficiently on current and nextgeneration supercomputing architectures. MFIXExa combines the CFDDEM expertise embodied in the MFIX code—which was developed at NETL and is used widely in academia and industry—with the modern software framework, AMReX, developed at LBNL. The fundamental physics models

Increased spaceparallelism via timesimultaneous Newtonmultigrid methods for nonstationary nonlinear PDE problems Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210401
Jonas Dünnebacke, Stefan Turek, Christoph Lohmann, Andriy Sokolov, Peter ZajacWe discuss how “parallelinspace & simultaneousintime” Newtonmultigrid approaches can be designed which improve the scaling behavior of the spatial parallelism by reducing the latency costs. The idea is to solve many time steps at once and therefore solving fewer but larger systems. These large systems are reordered and interpreted as a spaceonly problem leading to multigrid algorithm with semicoarsening

A runtime based comparison of highly tuned lattice Boltzmann and finite difference solvers Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210401
KarlRobert Wichmann, Martin Kronbichler, Rainald Löhner, Wolfgang A WallThe aim of this work is a fair and unbiased comparison of a lattice Boltzmann method (LBM) against a finite difference method (FDM) for the simulation of fluid flows. Rather than reporting metrics such as floating point operation rates or memory throughput, our work considers the engineering quest of reaching a desired solution quality with the least computational effort. The specific lattice Boltzmann

Increased spaceparallelism via timesimultaneous Newtonmultigrid methods for nonstationary nonlinear PDE problems Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210401
Jonas Dünnebacke, Stefan Turek, Christoph Lohmann, Andriy Sokolov, Peter ZajacWe discuss how “parallelinspace & simultaneousintime” Newtonmultigrid approaches can be designed which improve the scaling behavior of the spatial parallelism by reducing the latency costs. The idea is to solve many time steps at once and therefore solving fewer but larger systems. These large systems are reordered and interpreted as a spaceonly problem leading to multigrid algorithm with semicoarsening

A runtime based comparison of highly tuned lattice Boltzmann and finite difference solvers Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210401
KarlRobert Wichmann, Martin Kronbichler, Rainald Löhner, Wolfgang A WallThe aim of this work is a fair and unbiased comparison of a lattice Boltzmann method (LBM) against a finite difference method (FDM) for the simulation of fluid flows. Rather than reporting metrics such as floating point operation rates or memory throughput, our work considers the engineering quest of reaching a desired solution quality with the least computational effort. The specific lattice Boltzmann

Highthroughput virtual laboratory for drug discovery using massive datasets Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210323
Jens Glaser, Josh V Vermaas, David M Rogers, Jeff Larkin, Scott LeGrand, Swen Boehm, Matthew B Baker, Aaron Scheinberg, Andreas F Tillack, Mathialakan Thavappiragasam, Ada Sedova, Oscar HernandezTimetosolution for structurebased screening of massive chemical databases for COVID19 drug discovery has been decreased by an order of magnitude, and a virtual laboratory has been deployed at scale on up to 27,612 GPUs on the Summit supercomputer, allowing an average molecular docking of 19,028 compounds per second. Over one billion compounds were docked to two SARSCoV2 protein structures with

A survey of numerical linear algebra methods utilizing mixedprecision arithmetic Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210319
Ahmad Abdelfattah, Hartwig Anzt, Erik G Boman, Erin Carson, Terry Cojean, Jack Dongarra, Alyson Fox, Mark Gates, Nicholas J Higham, Xiaoye S Li, Jennifer Loe, Piotr Luszczek, Srikara Pranesh, Siva Rajamanickam, Tobias Ribizel, Barry F Smith, Kasia Swirydowicz, Stephen Thomas, Stanimire Tomov, Yaohung M Tsai, Ulrike Meier YangThe efficient utilization of mixedprecision numerical linear algebra algorithms can offer attractive acceleration to scientific computing applications. Especially with the hardware integration of lowprecision specialfunction units designed for machine learning applications, the traditional numerical algorithms community urgently needs to reconsider the floating point formats used in the distinct

Accelerated execution via eagerrelease of dependencies in taskbased workflows Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210303
Hatem Elshazly, Francesc Lordan, Jorge Ejarque, Rosa M. BadiaTaskbased programming models offer a flexible way to express the unstructured parallelism patterns of nowadays complex applications. This expressive capability is required to achieve maximum possible performance for applications that are executed in distributed execution platforms. In current taskbased workflows, tasks are launched for execution when their data dependencies are satisfied. However

Resilience and fault tolerance in highperformance computing for numerical weather and climate prediction Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210208
Tommaso Benacchio, Luca Bonaventura, Mirco Altenbernd, Chris D Cantwell, Peter D Düben, Mike Gillard, Luc Giraud, Dominik Göddeke, Erwan Raffin, Keita Teranishi, Nils WediProgress in numerical weather and climate prediction accuracy greatly depends on the growth of the available computing power. As the number of cores in top computing facilities pushes into the millions, increased average frequency of hardware and software failures forces users to review their algorithms and systems in order to protect simulations from breakdown. This report surveys hardware, applicationlevel

Selecting optimal SpMV realizations for GPUs via machine learning Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20210129
Ernesto Dufrechou, Pablo Ezzatti, Enrique S QuintanaOrtíMore than 10 years of research related to the development of efficient GPU routines for the sparse matrixvector product (SpMV) have led to several realizations, each with its own strengths and weaknesses. In this work, we review some of the most relevant efforts on the subject, evaluate a few prominent routines that are publicly available using more than 3000 matrices from different applications,

Highthroughput virtual laboratory for drug discovery using massive datasets Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210323
Jens Glaser, Josh V Vermaas, David M Rogers, Jeff Larkin, Scott LeGrand, Swen Boehm, Matthew B Baker, Aaron Scheinberg, Andreas F Tillack, Mathialakan Thavappiragasam, Ada Sedova, Oscar HernandezTimetosolution for structurebased screening of massive chemical databases for COVID19 drug discovery has been decreased by an order of magnitude, and a virtual laboratory has been deployed at scale on up to 27,612 GPUs on the Summit supercomputer, allowing an average molecular docking of 19,028 compounds per second. Over one billion compounds were docked to two SARSCoV2 protein structures with

A survey of numerical linear algebra methods utilizing mixedprecision arithmetic Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210319
Ahmad Abdelfattah, Hartwig Anzt, Erik G Boman, Erin Carson, Terry Cojean, Jack Dongarra, Alyson Fox, Mark Gates, Nicholas J Higham, Xiaoye S Li, Jennifer Loe, Piotr Luszczek, Srikara Pranesh, Siva Rajamanickam, Tobias Ribizel, Barry F Smith, Kasia Swirydowicz, Stephen Thomas, Stanimire Tomov, Yaohung M Tsai, Ulrike Meier YangThe efficient utilization of mixedprecision numerical linear algebra algorithms can offer attractive acceleration to scientific computing applications. Especially with the hardware integration of lowprecision specialfunction units designed for machine learning applications, the traditional numerical algorithms community urgently needs to reconsider the floating point formats used in the distinct

Accelerated execution via eagerrelease of dependencies in taskbased workflows Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210303
Hatem Elshazly, Francesc Lordan, Jorge Ejarque, Rosa M. BadiaTaskbased programming models offer a flexible way to express the unstructured parallelism patterns of nowadays complex applications. This expressive capability is required to achieve maximum possible performance for applications that are executed in distributed execution platforms. In current taskbased workflows, tasks are launched for execution when their data dependencies are satisfied. However

Resilience and fault tolerance in highperformance computing for numerical weather and climate prediction Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210208
Tommaso Benacchio, Luca Bonaventura, Mirco Altenbernd, Chris D Cantwell, Peter D Düben, Mike Gillard, Luc Giraud, Dominik Göddeke, Erwan Raffin, Keita Teranishi, Nils WediProgress in numerical weather and climate prediction accuracy greatly depends on the growth of the available computing power. As the number of cores in top computing facilities pushes into the millions, increased average frequency of hardware and software failures forces users to review their algorithms and systems in order to protect simulations from breakdown. This report surveys hardware, applicationlevel

Selecting optimal SpMV realizations for GPUs via machine learning Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20210129
Ernesto Dufrechou, Pablo Ezzatti, Enrique S QuintanaOrtíMore than 10 years of research related to the development of efficient GPU routines for the sparse matrixvector product (SpMV) have led to several realizations, each with its own strengths and weaknesses. In this work, we review some of the most relevant efforts on the subject, evaluate a few prominent routines that are publicly available using more than 3000 matrices from different applications,

Pointblock incomplete LU preconditioning with asynchronous iterations on GPU for multiphysics problems Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20201228
Wenpeng Ma, XiaoChuan CaiPointblock matrices arise naturally in multiphysics problems when all variables associated with a mesh point are ordered together, and are different from the general block matrices since the sizes of the blocks are so small one can often invert some of the diagonal blocks explicitly. Motivated by the recent works of Chow and Patel and Chow et al., we propose an efficient incomplete LU (ILU) preconditioner

Pointblock incomplete LU preconditioning with asynchronous iterations on GPU for multiphysics problems Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20201228
Wenpeng Ma, XiaoChuan CaiPointblock matrices arise naturally in multiphysics problems when all variables associated with a mesh point are ordered together, and are different from the general block matrices since the sizes of the blocks are so small one can often invert some of the diagonal blocks explicitly. Motivated by the recent works of Chow and Patel and Chow et al., we propose an efficient incomplete LU (ILU) preconditioner

Implicit Hari–Zimmermann algorithm for the generalized SVD on the GPUs Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20201210
Vedran Novaković, Sanja SingerA parallel, blocked, onesided Hari–Zimmermann algorithm for the generalized singular value decomposition (GSVD) of a real or a complex matrix pair (F,G) is here proposed, where F and G have the same number of columns, and are both of the full column rank. The algorithm targets either a single graphics processing unit (GPU), or a cluster of those, performs all nontrivial computation exclusively on

Implicit Hari–Zimmermann algorithm for the generalized SVD on the GPUs Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20201210
Vedran Novaković, Sanja SingerA parallel, blocked, onesided HariZimmermann algorithm for the generalized singular value decomposition (GSVD) of a real or a complex matrix pair $(F,G)$ is here proposed, where $F$ and $G$ have the same number of columns, and are both of the full column rank. The algorithm targets either a single graphics processing unit (GPU), or a cluster of those, performs all nontrivial computation exclusively

Highly parallel boundary element method for solving extremely large, widearea powerline models Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20201203
Ross AdelmanThe electric and magnetic fields around power lines carry an immense amount of information about the power grid and can be used to improve stability, balance loads, conserve power, and reduce outages. To study this, an extremely large model of transmission lines over a 70km2 tract of land near Washington, DC, has been built. The terrain was modeled accurately using 1mresolution LIDAR data. The 140millionelement

Highly parallel boundary element method for solving extremely large, widearea powerline models Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20201203
Ross AdelmanThe electric and magnetic fields around power lines carry an immense amount of information about the power grid and can be used to improve stability, balance loads, conserve power, and reduce outag...

Designing a parallel FeeltheWay clustering algorithm on HPC systems Int. J. High Perform. Comput. Appl. (IF 1.956) Pub Date : 20201128
Weijian Zheng, Dali Wang, Fengguang SongThis paper introduces a new parallel clustering algorithm, named FeeltheWay clustering algorithm, that provides better or equivalent convergence rate than the traditional clustering methods by optimizing the synchronization and communication costs. Our algorithm design centers on how to optimize three factors simultaneously: reduced synchronizations, improved convergence rate, and retained same or

Guest editor’s note: Special issue on application performance optimization in the era of extreme heterogeneity Int. J. High Perform. Comput. Appl. (IF 1.942) Pub Date : 20201128
Roman Wyrzykowski, Ewa DeelmanThis special issue gathers revised and extended versions of selected papers presented at the 13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019, held September 8–11, 2019 in Bialystok, Poland (http://ppam.info). This conference is a continuation of a series of events started in 1994 when the first PPAM took place in Czestochowa. They have been held every 2 years