样式: 排序: IF: - GO 导出 标记为已读
-
ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation Int. J. Parallel. Program (IF 1.5) Pub Date : 2024-02-26 Alessandro Ottaviano, Robert Balas, Giovanni Bambini, Antonio Del Vecchio, Maicol Ciani, Davide Rossi, Luca Benini, Andrea Bartolini
-
Investigating Methods for ASPmT-Based Design Space Exploration in Evolutionary Product Design Int. J. Parallel. Program (IF 1.5) Pub Date : 2024-02-24 Luise Müller, Philipp Wanko, Christian Haubelt, Torsten Schaub
-
Hardware-Aware Evolutionary Explainable Filter Pruning for Convolutional Neural Networks Int. J. Parallel. Program (IF 1.5) Pub Date : 2024-02-22 Christian Heidorn, Muhammad Sabih, Nicolai Meyerhöfer, Christian Schinabeck, Jürgen Teich, Frank Hannig
-
A Practical Approach for Employing Tensor Train Decomposition in Edge Devices Int. J. Parallel. Program (IF 1.5) Pub Date : 2024-02-16 Milad Kokhazadeh, Georgios Keramidas, Vasilios Kelefouras, Iakovos Stamoulis
-
Access Interval Prediction by Partial Matching for Tightly Coupled Memory Systems Int. J. Parallel. Program (IF 1.5) Pub Date : 2024-02-13 Viktor Razilov, Robert Wittig, Emil Matúš, Gerhard Fettweis
-
Accelerating Massively Distributed Deep Learning Through Efficient Pseudo-Synchronous Update Method Int. J. Parallel. Program (IF 1.5) Pub Date : 2023-11-13 Yingpeng Wen, Zhilin Qiu, Dongyu Zhang, Dan Huang, Nong Xiao, Liang Lin
-
A Hybrid Machine Learning Model for Code Optimization Int. J. Parallel. Program (IF 1.5) Pub Date : 2023-09-22 Yacine Hakimi, Riyadh Baghdadi, Yacine Challal
-
GPU-Based Algorithms for Processing the k Nearest-Neighbor Query on Spatial Data Using Partitioning and Concurrent Kernel Execution Int. J. Parallel. Program (IF 1.5) Pub Date : 2023-07-21 Polychronis Velentzas, Michael Vassilakopoulos, Antonio Corral, Christos Antonopoulos
-
Calculation of Distributed-Order Fractional Derivative on Tensor Cores-Enabled GPU Int. J. Parallel. Program (IF 1.5) Pub Date : 2023-07-10 Vsevolod Bohaienko
-
Partitioning-Aware Performance Modeling of Distributed Graph Processing Tasks Int. J. Parallel. Program (IF 1.5) Pub Date : 2023-05-05 Daniel Presser, Frank Siqueira
-
Accelerating OCaml Programs on FPGA Int. J. Parallel. Program (IF 1.5) Pub Date : 2023-01-24 Loïc Sylvestre, Emmanuel Chailloux, Jocelyn Sérot
-
Distributed Calculations with Algorithmic Skeletons for Heterogeneous Computing Environments Int. J. Parallel. Program (IF 1.5) Pub Date : 2023-01-07 Nina Herrmann, Herbert Kuchen
-
Declarative Data Flow in a Graph-Based Distributed Memory Runtime System Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-12-26 Fabian Knorr, Peter Thoman, Thomas Fahringer
-
SMSG: Profiling-Free Parallelism Modeling for Distributed Training of DNN Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-12-12 Haoran Wang, Thibaut Tachon, Chong Li, Sophie Robert, Sébastien Limet
-
A Fault-Model-Relevant Classification of Consensus Mechanisms for MPI and HPC Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-12-12 Grace Nansamba, Amani Altarawneh, Anthony Skjellum
-
Portable C++ Code that can Look and Feel Like Fortran Code with Yet Another Kernel Launcher (YAKL) Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-12-08 Matthew Norman, Isaac Lyngaas, Abhishek Bagusetty, Mark Berrill
-
Generic Exact Combinatorial Search at HPC Scale Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-12-07 Ruairidh MacGregor, Blair Archibald, Phil Trinder
-
Interruptible Nodes: Reducing Queueing Costs in Irregular Streaming Dataflow Applications on Wide-SIMD Architectures Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-12-05 Stephen Timcheck, Jeremy Buhler
-
Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-12-06 August Ernstsson, Dalvan Griebler, Christoph Kessler
-
Efficient High-Level Programming in Plain Java Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-12-05 Rui S. Silva, João L. Sobral
-
Distributed-Memory FastFlow Building Blocks Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-12-02 Nicolò Tonci, Massimo Torquati, Gabriele Mencagli, Marco Danelutto
-
Scaling the Maximum Flow Computation on GPUs Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-11-15 Jash Khatri, Arihant Samar, Bikash Behera, Rupesh Nasre
-
DSParLib: A C++ Template Library for Distributed Stream Parallelism Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-10-29 Júnior Löff, Renato B. Hoffmann, Ricardo Pieper, Dalvan Griebler, Luiz G. Fernandes
-
Parallelization of Swarm Intelligence Algorithms: Literature Review Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-08-10 Breno Augusto de Melo Menezes, Herbert Kuchen, Fernando Buarque de Lima Neto
-
Stencil Calculations with Algorithmic Skeletons for Heterogeneous Computing Environments Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-07-23 Nina Herrmann, Breno A. de Melo Menezes, Herbert Kuchen
-
A Scalable Similarity Join Algorithm Based on MapReduce and LSH Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-05-23 Sébastien Rivault, Mostafa Bamha, Sébastien Limet, Sophie Robert
-
A Methodology for Efficient Tile Size Selection for Affine Loop Kernels Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-05-23 Vasilios Kelefouras, Karim Djemame, Georgios Keramidas, Nikolaos Voros
-
The Celerity High-level API: C++20 for Accelerator Clusters Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-04-22 Peter Thoman, Florian Tischler, Philip Salzmann, Thomas Fahringer
-
Guest Editorial: Special Issue on 2020 IEEE International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS 2020) Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-04-01 Marc Reichenbach,Matthias Jung,Alex Orailoglu
-
A Quantitative Study of Locality in GPU Caches for Memory-Divergent Workloads Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-04-01 Sohan Lal,Bogaraju Sharatchandra Varma,Ben Juurlink
AbstractGPUs are capable of delivering peak performance in TFLOPs, however, peak performance is often difficult to achieve due to several performance bottlenecks. Memory divergence is one such performance bottleneck that makes it harder to exploit locality, cause cache thrashing, and high miss rate, therefore, impeding GPU performance. As data locality is crucial for performance, there have been several
-
Fine-Grained Power Modeling of Multicore Processors Using FFNNs Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-03-29 Mark Sagi, Nguyen Anh Vu Doan, Nael Fasfous, Thomas Wild, Andreas Herkersdorf
To minimize power consumption while maximizing performance, today’s multicore processors rely on fine-grained run-time dynamic power information—both in the time domain, e.g. \(\mu \)s to ms, and space domain, e.g. core-level. The state-of-the-art for deriving such power information is mainly based on predetermined power models which use linear modeling techniques to determine the core-performance/core-power
-
An Improved/Optimized Practical Non-Blocking PageRank Algorithm for Massive Graphs* Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-03-26 Hemalatha Eedi, Sahith Karra, Sathya Peri, Neha Ranabothu, Rahul Utkoor
PageRank kernel is a standard benchmark addressing various graph processing and analytical problems. The PageRank algorithm serves as a standard for many graph analytics and a foundation for extracting graph features and predicting user ratings in recommendation systems. The PageRank algorithm is an iterative algorithm that continuously updates the ranks of pages until it converges to a value. However
-
AMAIX In-Depth: A Generic Analytical Model for Deep Learning Accelerators Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-03-24 Niko Zurstraßen, Lukas Jünger, Tim Kogel, Holger Keding, Rainer Leupers
In recent years the growing popularity of Convolutional Neural Network(CNNs) has driven the development of specialized hardware, so called Deep Learning Accelerator (DLAs). The large market for DLAs and the huge amount of papers published on DLA design show that there is currently no one-size-fits-all solution. Depending on the given optimization goals such as power consumption or performance, there
-
A Deterministic Portable Parallel Pseudo-Random Number Generator for Pattern-Based Programming of Heterogeneous Parallel Systems Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-03-22 August Ernstsson, Nicolas Vandenbergen, Jörg Keller, Christoph Kessler
SkePU is a pattern-based high-level programming model for transparent program execution on heterogeneous parallel computing systems. A key feature of SkePU is that, in general, the selection of the execution platform for a skeleton-based function call need not be determined statically. On single-node systems, SkePU can select among CPU, multithreaded CPU, single or multi-GPU execution. Many scientific
-
DRAMSys4.0: An Open-Source Simulation Framework for In-depth DRAM Analyses Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-03-12 Lukas Steiner, Matthias Jung, Felipe S. Prado, Kirill Bykov, Norbert Wehn
The simulation of Dynamic Random Access Memories (DRAMs) on system level requires highly accurate models due to their complex timing and power behavior. However, conventional cycle-accurate DRAM subsystem models often become a bottleneck for the overall simulation speed. A promising alternative are simulators based on Transaction Level Modeling, which can be fast and accurate at the same time. In this
-
Energy-Efficient Partial-Duplication Task Mapping Under Multiple DVFS Schemes Int. J. Parallel. Program (IF 1.5) Pub Date : 2022-02-16 Minyu Cui, Angeliki Kritikakou, Lei Mo, Emmanuel Casseau
On multicore platforms, reliable task execution, as well as low energy consumption, are essential. Dynamic Voltage/Frequency Scaling (DVFS) is typically used for energy savings, but with a negative impact on reliability, especially when the applied frequency is low. Using high frequencies, required to meet reliability constraints, or replicating tasks increases energy consumption. To reduce energy
-
Accelerating Computation of Steiner Trees on GPUs Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-11-27 Rajesh Pandian Muniasamy, Rupesh Nasre, N. S. Narayanaswamy
The Steiner Tree Problem (STP) is a well studied graph theoretic problem. It computes a minimum-weighted tree of a given graph such that the tree spans a given subset of vertices called terminals. STP is NP-hard. Due to its wide applicability, it has been a challenge problem in the 11th DIMACS implementation challenge and the PACE 2018 challenge. Due to its importance, polynomial-time approximation
-
A Profile-Based AI-Assisted Dynamic Scheduling Approach for Heterogeneous Architectures Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-08-23 Tongsheng Geng, Marcos Amaris, Stéphane Zuckerman, Alfredo Goldman, Guang R. Gao, Jean-Luc Gaudiot
While heterogeneous architectures are increasing popular with High Performance Computing systems, their effectiveness depends on how efficient the scheduler is at allocating workloads onto appropriate computing devices and how communication and computation can be overlapped. With different types of resources integrated into one system, the complexity of the scheduler correspondingly increases. Moreover
-
Guest Editorial: Special issue on Network and Parallel Computing for Emerging Architectures and Applications Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-08-13 Guangming Tan,Guang R. Gao
-
Enhancing the Effectiveness of Inlining in Automatic Parallelization Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-08-06 Guo, Jichi, Yi, Qing, Psarris, Kleanthis
The emergence of multi-core architectures makes it essential for optimizing compilers to automatically extract parallelism for large scientific applications composed of many subroutines residing in different files. Inlining is a well-known technique which can be used to erase procedural boundaries and enable more aggressive loop parallelization. However, conventional inlining cannot be applied to external
-
Statistical Analysis Based Intrusion Detection System for Ultra-High-Speed Software Defined Network Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-08-09 Talha Naqash, Sajjad Hussain Shah, Muhammad Najam Ul Islam
Internet users and internet services are increasing day by day, which increases the internet traffic from zeta-bytes to petabytes with ultra-high-speed. Different types of architecture are implemented to handle high-speed data traffic. The two layers approach of the Software-Defined Network (SDN) architecture converts classical network architecture to consistent, centralized controllable network architecture
-
Fortress Abstractions in X10 Framework Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-07-15 Anshu S. Anand, Karthik Sayani, R. K. Shyamasundar
Fortress provides a nice set of abstractions used widely in scientific computing. The use of such abstractions enhances the productivity of programmers/users. Also, in scientific computations, boilerplate code has extensive usage. Keeping this in view, we embed Fortress abstractions in an X10 environment so that we can get better productivity without losing performance. In this paper, we transform
-
Restoration of Legacy Parallelism: Transforming Pthreads into Farm and Pipeline Patterns Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-06-10 Vladimir Janjic, Christopher Brown, Adam D. Barwell
Parallel patterns are a high-level programming paradigm that enables non-experts in parallelism to develop structured parallel programs that are maintainable, adaptive, and portable whilst achieving good performance on a variety of parallel systems. However, there still exists a large base of legacy-parallel code developed using ad-hoc methods and incorporating low-level parallel/concurrency libraries
-
Portable Node-Level Parallelism for the PGAS Model Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-06-05 Pascal Jungblut, Karl Fürlinger
The Partitioned Global Address Space (PGAS) programming model brings intuitive shared memory semantics to distributed memory systems. Even with an abstract and unifying virtual global address space it is, however, challenging to use the full potential of different systems. Without explicit support by the implementation node-local operations have to be optimized manually for each architecture. A goal
-
A Comparative Survey of Big Data Computing and HPC: From a Parallel Programming Model to a Cluster Architecture Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-05-26 Fei Yin, Feng Shi
With the rapid growth of artificial intelligence (AI), the Internet of Things (IoT) and big data, emerging applications that cross stacks with different techniques bring new challenges to parallel computing systems. These cross-stack functionalities require one system to possess multiple characteristics, such as the ability to process data under high throughput and low latency, the ability to carry
-
CCRP: Converging Credit-Based and Reactive Protocols in Datacenters Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-05-21 Yang Bai, Dinghuang Hu, Dezun Dong, Shan Huang, Xiangke Liao
As the link speed has grown steadily from 10 Gbps to 100 Gbps, high-speed data center networks (DCNs) require more efficient congestion management. Therefore, proactive transports, especially credit-based congestion control, nowadays have drawn much attention because of fast convergence, near-zero queueing and low latency. However, in real deployment scenarios, it is hard to guarantee one protocol
-
SkePU 3: Portable High-Level Programming of Heterogeneous Systems and HPC Clusters Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-05-19 August Ernstsson, Johan Ahlqvist, Stavroula Zouzoula, Christoph Kessler
We present the third generation of the C++-based open-source skeleton programming framework SkePU. Its main new features include new skeletons, new data container types, support for returning multiple objects from skeleton instances and user functions, support for specifying alternative platform-specific user functions to exploit e.g. custom SIMD instructions, generalized scheduling variants for the
-
A Parallel Skeleton for Divide-and-conquer Unbalanced and Deep Problems Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-05-14 Millán A. Martínez, Basilio B. Fraguela, José C. Cabaleiro
The Divide-and-conquer (D&C) pattern appears in a large number of problems and is highly suitable to exploit parallelism. This has led to much research on its easy and efficient application both in shared and distributed memory parallel systems. One of the most successful approaches explored in this area consists of expressing this pattern by means of parallel skeletons which automate and hide the
-
On Single-Valuedness in Textually Aligned SPMD Programs Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-05-04 Frédéric Dabrowski
Single-valuedness is a property of an expression occurring in a SPMD program and states that concomitant evaluations of this expression lead to the same value at all processes. Although widely used, this property still lacks a formal definition, which is necessary to tackle the subtleties of the notion of concomitance. First, we propose such a definition in which the states of all processes can be
-
M-DRL: Deep Reinforcement Learning Based Coflow Traffic Scheduler with MLFQ Threshold Adaption Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-05-04 Tianba Chen, Wei Li, YuKang Sun, Yunchun Li
The coflow scheduling in data-parallel clusters can improve application-level communication performance. The existing coflow scheduling method without prior knowledge usually uses multi-level feedback queue (MLFQ) with fixed threshold parameters, which is insensitive to coflow traffic characteristics. Manual adjustment of the threshold parameters for different application scenarios often has long optimization
-
High-Level Parallel Ant Colony Optimization with Algorithmic Skeletons Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-04-29 Breno A. de Melo Menezes, Nina Herrmann, Herbert Kuchen, Fernando Buarque de Lima Neto
Parallel implementations of swarm intelligence algorithms such as the ant colony optimization (ACO) have been widely used to shorten the execution time when solving complex optimization problems. When aiming for a GPU environment, developing efficient parallel versions of such algorithms using CUDA can be a difficult and error-prone task even for experienced programmers. To overcome this issue, the
-
Fault-Tolerant and Unicast Performances of the Data Center Network HSDC Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-04-22 Hui Dong, Jianxi Fan, Baolei Cheng, Yan Wang, Jingya Zhou
In order to satisfy the rapidly increasing demand for data volume, large data center networks (DCNs) have been proposed. In 2019, Zhang et al. proposed a new highly scalable DCN architecture named HSDC (Highly Scalable Data Center Network), which can achieve greater incremental scalability. In this paper, we give the definition of the logical graph of HSDC, named \(H_n\), which can be treated as a
-
Accelerating DES and AES Algorithms for a Heterogeneous Many-core Processor Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-04-16 Biao Xing, DanDan Wang, Yongquan Yang, Zhiqiang Wei, Jiajing Wu, Cuihua He
Data security is the focus of information security. As a primary method, file encryption is adopted for ensuring data security. Encryption algorithms created to meet the Data Encryption Standard (DES) and the Advanced Encryption Standard (AES) are widely used in a variety of systems. These algorithms are computationally highly complex, thus, the efficiency of encrypting or decrypting large files can
-
Parallel Computation of Discrete Orthogonal Moment on Block Represented Images Using OpenMP Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-04-15 Iraklis M. Spiliotis, Charalampos Sitaridis, Michael P. Bekakos
Herein, a parallel implementation of Discrete Orthogonal moments on block represented images is investigated. Moments and moment functions have been used widely as features for image analysis and pattern recognition tasks. The main disadvantage of all moment sets, is the high computational cost which is increased as higher-order moments are involved in the computations. In image block representation
-
Location-based and Time-aware Service Recommendation in Mobile Edge Computing Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-04-09 Mengshan Yu, Guisheng Fan, Huiqun Yu, Liang Chen
With the rapid development of Internet of Things, mobile edge computing which provides physical resources closer to end users has gained considerable popularity in academic and industrial field. As the number of edge server increases, accessing effective edge services fast is an urgent problem to be solved. In this paper, we mainly focus on the cold-start problem for service recommendation based on
-
DeeperThings: Fully Distributed CNN Inference on Resource-Constrained Edge Devices Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-04-07 Rafael Stahl, Alexander Hoffman, Daniel Mueller-Gritschneder, Andreas Gerstlauer, Ulf Schlichtmann
Performing inference of Convolutional Neural Networks (CNNs) on Internet of Things (IoT) edge devices ensures both privacy of input data and possible run time reductions when compared to a cloud solution. As most edge devices are memory- and compute-constrained, they cannot store and execute complex CNNs. Partitioning and distributing layer information across multiple edge devices to reduce the amount
-
A Configurable Hardware Architecture for Runtime Application of Network Calculus Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-04-02 Xiao Hu, Zhonghai Lu
Network Calculus has been a foundational theory for analyzing and ensuring Quality-of-Service (QoS) in a variety of networks including Networks on Chip (NoCs). To fulfill dynamic QoS requirements of applications, runtime application of network calculus is essential. However, the primitive operations in network calculus such as arrival curve, min-plus convolution and min-plus deconvolution are very
-
Predicting the Soft Error Vulnerability of Parallel Applications Using Machine Learning Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-03-28 Işıl Öz, Sanem Arslan
With the widespread use of the multicore systems having smaller transistor sizes, soft errors become an important issue for parallel program execution. Fault injection is a prevalent method to quantify the soft error rates of the applications. However, it is very time consuming to perform detailed fault injection experiments. Therefore, prediction-based techniques have been proposed to evaluate the
-
Segmented Merge: A New Primitive for Parallel Sparse Matrix Computations Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-03-26 Haonan Ji, Shibo Lu, Kaixi Hou, Hao Wang, Zhou Jin, Weifeng Liu, Brian Vinter
Segmented operations, such as segmented sum, segmented scan and segmented sort, are important building blocks for parallel irregular algorithms. We in this work propose a new parallel primitive called segmented merge. Its function is in parallel merging q sub-segments to p segments, both of possibly nonuniform lengths which easily cause the load balancing and the vectorization problems on massively
-
Bounds Checking on GPU Int. J. Parallel. Program (IF 1.5) Pub Date : 2021-03-25 Troels Henriksen
We present a simple compilation strategy for safety-checking array indexing in high-level languages on GPUs. Our technique does not depend on hardware support for abnormal termination, and is designed to be efficient in the non-failing case. We rely on certain properties of array languages, namely the absence of arbitrary cross-thread communication, to ensure well-defined execution in the presence