A machine learning method to variable classification in OpenMP
Introduction
Multi-core processors have brought continuous increases in computing power and become the mainstream architecture in fields as diverse as science, the internet, and entertainment [1]. Most applications today have been written for single-core that cannot exploit the potential performance gains offered by multiple cores technology. To make use of multi-core processors, programmers need to rewrite or translate the serial programs to parallel. The emergence of different application program interfaces has made it possible to write explicitly parallel programs, including Message-Passing Interface (MPI) [2], [3], POSIX threads (Pthreads) [4], OpenMP [5], and CUDA [6]. Of these programming models, OpenMP is an industrial standard for shared-memory systems with directives [7]. It has attracted significant attention since it is simple and portable for expressing parallelism in sequential applications. Some automatic tools have been created to generate OpenMP parallel programs from C/C++ serial programs such as Pluto [8], Rose [9], [10], PPCG [11], Polly [12], and AutoPar-Clava [13]. To obtain the highest performance improvement, It is necessary to identify the right constructions for parallelizable program regions (e.g., worksharing loop or task) and to categorize the attribute of each variable in the regions into the proper OpenMP scoping (e.g., firstprivate, private, lastprivate, shared, or reduction clauses). In particular, when the size of the target program becomes large, programmers need to spend more time and effort classifying more variable attributes, which is time-consuming and error-prone. The existence of automatic tools eases the parallelization process. However, these tools are constructed based on different data-dependence analyses and variable classification rules. Many of them are limited in program size, the setting of OpenMP scoping, and reduction of scalar and array. We sought a cost-effective way to reduce the human effort and address these limitations.
Dynamically typed languages like Python and Typescript have become increasingly popular. Type annotations play an important role in tasks such as code completion and static error catching. However, the issue is that these annotations cannot be fully determined by compilers and are tedious to annotate by hand [14]. Neural network methods are proposed to address the type annotation inferring issue to reduce the human effort [14], [15], [16]. In this study, we aim to develop a neural network method into a variable classification task with high accuracy and efficiency to reduce human effort. To approach variable classification in OpenMP with machine learning, we are inspired by the DeepTyper [15]. DeepTyper can predict variable and function type annotations by leveraging a corpus of tokens and types in the languages of JavaScript and TypeScript.
Fig. 1 shows one sample to better illustrate our motivations. We target to explore parallelism of loop-level. Considering a parallelizable serial source code, as illustrated in Fig. 1(a), we can parallelize the loop in the code with the OpenMP library by using one #pragma omp parallel for private (i) reduction (+:s) firstprivate (m,n) directive before the loop, as shown in Fig. 1(b). #pragma omp parallel for means that the next for-loop will be executed by multiple threads. private (i) means creating the variable as a private variable for each thread, with each thread having a personal copy of this variable. reduction (+:s) means that a private copy of variable will be created for each thread, and the variable will be initialized according to the operator. At the end of creation execution, the main thread retains the initial value of each variable and the final value of the private copy obtained by using the specified operator. firstprivate (m,n) has the effect of a private declaration, in addition, it also means that the variables of and are initialized to the corresponding variable values in the main thread. We annotate each variable with its corresponding attribute and transform the parallel code to the code as shown in Fig. 1(c).
We set our objective as demonstrating the general applicability of the machine learning method to the variable classification problem in OpenMP parallelization. We abstract variable classification into a type annotation inferring task applied using neural networks. We take advantage of the learning-based method to infer the attributes, rather than based on explicit manually designed rules. To summarize, our contributions are as follows.
- (1)
Formulation of the variable classification problem in OpenMP parallelization as a type annotation inferring task. To make use of the data-driven method, we present a newly aligned corpus including tokens of the code and corresponding commonly-used types or OpenMP attributes.
- (2)
We describe a DeepTyper-based learning architecture for solving the variable classification task without manually designed rules or hand-crafted features. The method understands the attributes of variables in certain contexts and uses an attention mechanism to focus on information about the data-sharing semantics.
- (3)
Experimental results show the potential of using learning-based methods to solve variable classification problems. We demonstrate new solutions and tools for this task.
The remainder of this study is structured as follows. Section 2 gives a summarize of the related works to variable classification in OpenMP and type inferring in software engineering. Section 3 introduces in detail the learning-based architecture. In Section 4, the evaluation is provided with discussions. Finally, a conclusion of this study is drawn in Section 5.
Section snippets
Related work
OpenMP has attracted a great deal of attention due to its simplicity and portability in expressing the parallelism of sequential applications. To decide where in a program to insert the OpenMP directives with the proper data-sharing attributes is an important task to obtain the highest performance improvement. There are some studies for variable classification which can be divided into two categories from the static and dynamic perspectives. From the static perspective, Pluto can use the
Aligned corpus generation
For this study, we collect a source code dataset from widely used benchmarks, program applications, and math libraries to demonstrate the potential of the machine learning method. And we generate a newly aligned corpus including tokens of the code and corresponding common types or data-sharing attributes in OpenMP parallelization. We then leverage this corpus to train the machine learning models using supervised learning techniques. As shown in Fig. 1, the goal of this study is to infer proper
Model architecture
Fig. 4 presents the high-level overview of our proposed learning-based architecture with an attention mechanism. We model the problem of variable classification as a type annotation problem: given a token-based input, our architecture predicts the data-sharing attributes of variables. We divide the architecture into two phases: (1) a preprocessing strategy, which represents the code as a sequence of tokens aligned with types or data-sharing attributes; (2) a DeepTyper-based learning process,
Evaluation
In this section, we evaluate the effectiveness of the learning-based method for classifying variables with data-sharing semantics in OpenMP parallelization. We first describe the datasets. Then, we introduce the experiment details. Furthermore, we discuss and analyze the results in two phases. The first phase is that we assess the performance of our method on the generated datasets. The second phase is that we further present the effectiveness of our method on the real-world test suite of NAS
Limitation
During the generation of the dataset, it is unrealistic to manually parallelize programs with the correct OpenMP directives to obtain a large suitable dataset. And, there is a lack of clear criteria for where to insert the correct directives. We achieved our batch data-processing demands with the help of Rose which could unify annotation criteria to generate the dataset. Rose is an automatic parallelization tool and is based on the variable references for specific program analysis which makes
Conclusion
In this study, we used a learning-based method to classify variables in OpenMP parallelization. We modeled the variable classify problem as a type inference task and represented the source code as a sequence of tokens. Equipped with an attention mechanism, our method can pay attention to the important parts that make contributions to the data-sharing semantics and boost the performance. Experimental results have shown that our method obtained better performance. Our findings demonstrate
CRediT authorship contribution statement
Yuanyuan Shen: Conceptualization, Methodology, Software, Validation, Writing – original draft. Manman Peng: Writing – original draft, Supervision, Project administration. Qiang Wu: Writing – review & editing. Renfa Li: Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
We would like to express our gratitude to the editor and all anonymous reviewers for their valuable and constructive comments which have helped to improve the quality of the paper. This work was supported by the National Key Research and Development Program of China (Grant Nos. 2017YFB0202901, 2017YFB0202905) and Special Funds for Construction of Innovative Provinces in Hunan Province of China (Grant Nos. 2020GK2006, 2020GK2007).
Yuanyuan Shen received the bachelor’s degree in information and science technology from Changsha university of science and technology, China, in 2016. She is currently pursuing the Ph.D. degree from Hunan University, Changsha, China. Her research interests include computer architecture, program analysis, and machine learning.
References (43)
Chapter 1 - why parallel computing?
- et al.
Towards parallelism detection of sequential programs with graph neural network
Future Gener. Comput. Syst.
(2021) - et al.
Adaptive blocks: A high performance data structure
- et al.
An adaptive MHD method for global space weather simulations
IEEE Trans. Plasma Sci.
(2000) - et al.
Pthreads Programming: A POSIX Standard for Better Multiprocessing
(1996) OpenMP architecture review board: The OpenMP specification for parallel programming
(2009)- et al.
CUDA-lite: Reducing GPU programming complexity
- et al.
Automatic multilevel parallelization using OpenMP
Sci. Program.
(2003) - U. Bondhugula, A. Hartono, J. Ramanujam, P. Sadayappan, A practical automatic polyhedral parallelizer and locality...
- D. Quinlan, C. Liao, The ROSE source-to-source compiler infrastructure, in: Cetus Users and Compiler Infrastructure...
Semantic-aware automatic parallelization of modern applications using high-level abstractions
Int. J. Parallel Program.
Polyhedral parallel code generation for CUDA
ACM Trans. Archit. Code Optim. (TACO)
Polly – performing polyhedral optimizations on a low-level intermediate representation
Parallel Process. Lett.
Lambdanet: Probabilistic type inference using graph neural networks
Predicting variable types in dynamically typed programming languages
Efficient exploitation of parallelism on pentium III and pentium 4 processor-based systems
Automatic construct selection and variable classification in openmp
An OpenMP extension that supports thread-level speculation
IEEE Trans. Parallel Distrib. Syst.
Bidirectional LSTM-CRF models for sequence tagging
Cited by (0)
Yuanyuan Shen received the bachelor’s degree in information and science technology from Changsha university of science and technology, China, in 2016. She is currently pursuing the Ph.D. degree from Hunan University, Changsha, China. Her research interests include computer architecture, program analysis, and machine learning.
Manman Peng received her bachelor’s degree, master’s degree, and a Ph.D. degree in computer science from Hunan University, in 1985, in 1988, and 2006, respectively. In 1988, she joined the College of Computer Science and Electronic Engineering. She is currently a Professor and doctoral supervisor. Her research interests include computer architecture and high-performance computing. Dr. Peng is a member of the Professional Committee of Computer Architecture of China Computer Federation.
Qiang Wu received the master’s degree in computer science from Tsinghua University, in 2002, and the Ph.D. degree from the Imperial College of Technology, in 2009. He is currently an Associate Professor at the College of Computer Science and Electronic Engineering. His main areas of research interests are computer architecture, machine learning, and Heterogeneous computing.
Renfa Li received the Ph.D. degree in electronic engineering from Huazhong University of Science and Technology, Wuhan, China, in 2002. He is a Professor and the Chair of Embedded and Cyber–Physical Systems with Hunan University, Changsha, China, where he is also the Chair of the Key Laboratory for Embedded and Cyber–Physical Systems. His major interests include computer architectures, embedded computing systems, cyber–physical systems, and Internet of Things. Prof. Li is a member of the council of CCF, a senior member of the IEEE, and ACM.