A machine learning method to variable classification in OpenMP

https://doi.org/10.1016/j.future.2022.10.010Get rights and content

Highlights

  • Formulation of the variable classification problem as a type annotation inferring task.

  • Presented a newly aligned corpus to solve the problem of dataset inadequacy.

  • Proposed a neural network learning architecture with attention mechanism to address the classification problem.

  • Demonstration the potential of using data-driven method for variable classification.

Abstract

OpenMP is a parallel computing framework that provides programmers with a set of directives and clauses to use when writing parallel applications. The most important task in adopting OpenMP is deciding the parallel pattern with associated clauses to employ in a sequential program that already exists. The shared-memory parallelization is complicated by parallel directives with different roles. Some tools have been developed to assist programmers in developing parallel programs using OpenMP. Many tools, however, have constraints on the size of program analysis, OpenMP scoping, and scalar and array reduction. Manually selecting clauses with the necessary data-sharing attributes is also prone to errors. In this study, we target the variable classification in directives to explore the loop-level parallelism. We set the variable classification problem as a type inference task based on a machine learning method, which understands the attributes of variables in certain contexts and relations. We propose an aligned corpus of tokens and types to predict variable attributes used inside the target loop. We support the reduction clause whenever it is applicable. Experimental results indicate that our method is very promising and favorably suited to dealing with real-world complex programs while showing high accuracy.

Introduction

Multi-core processors have brought continuous increases in computing power and become the mainstream architecture in fields as diverse as science, the internet, and entertainment [1]. Most applications today have been written for single-core that cannot exploit the potential performance gains offered by multiple cores technology. To make use of multi-core processors, programmers need to rewrite or translate the serial programs to parallel. The emergence of different application program interfaces has made it possible to write explicitly parallel programs, including Message-Passing Interface (MPI) [2], [3], POSIX threads (Pthreads) [4], OpenMP [5], and CUDA [6]. Of these programming models, OpenMP is an industrial standard for shared-memory systems with directives [7]. It has attracted significant attention since it is simple and portable for expressing parallelism in sequential applications. Some automatic tools have been created to generate OpenMP parallel programs from C/C++ serial programs such as Pluto [8], Rose [9], [10], PPCG [11], Polly [12], and AutoPar-Clava [13]. To obtain the highest performance improvement, It is necessary to identify the right constructions for parallelizable program regions (e.g., worksharing loop or task) and to categorize the attribute of each variable in the regions into the proper OpenMP scoping (e.g., firstprivate, private, lastprivate, shared, or reduction clauses). In particular, when the size of the target program becomes large, programmers need to spend more time and effort classifying more variable attributes, which is time-consuming and error-prone. The existence of automatic tools eases the parallelization process. However, these tools are constructed based on different data-dependence analyses and variable classification rules. Many of them are limited in program size, the setting of OpenMP scoping, and reduction of scalar and array. We sought a cost-effective way to reduce the human effort and address these limitations.

Dynamically typed languages like Python and Typescript have become increasingly popular. Type annotations play an important role in tasks such as code completion and static error catching. However, the issue is that these annotations cannot be fully determined by compilers and are tedious to annotate by hand [14]. Neural network methods are proposed to address the type annotation inferring issue to reduce the human effort [14], [15], [16]. In this study, we aim to develop a neural network method into a variable classification task with high accuracy and efficiency to reduce human effort. To approach variable classification in OpenMP with machine learning, we are inspired by the DeepTyper [15]. DeepTyper can predict variable and function type annotations by leveraging a corpus of tokens and types in the languages of JavaScript and TypeScript.

Fig. 1 shows one sample to better illustrate our motivations. We target to explore parallelism of loop-level. Considering a parallelizable serial source code, as illustrated in Fig. 1(a), we can parallelize the loop in the code with the OpenMP library by using one #pragma omp parallel for private (i) reduction (+:s) firstprivate (m,n) directive before the loop, as shown in Fig. 1(b). #pragma omp parallel for means that the next for-loop will be executed by multiple threads. private (i) means creating the variable i as a private variable for each thread, with each thread having a personal copy of this variable. reduction (+:s) means that a private copy of variable s will be created for each thread, and the variable will be initialized according to the operator. At the end of creation execution, the main thread retains the initial value of each variable and the final value of the private copy obtained by using the specified operator. firstprivate (m,n) has the effect of a private declaration, in addition, it also means that the variables of m and n are initialized to the corresponding variable values in the main thread. We annotate each variable with its corresponding attribute and transform the parallel code to the code as shown in Fig. 1(c).

We set our objective as demonstrating the general applicability of the machine learning method to the variable classification problem in OpenMP parallelization. We abstract variable classification into a type annotation inferring task applied using neural networks. We take advantage of the learning-based method to infer the attributes, rather than based on explicit manually designed rules. To summarize, our contributions are as follows.

  • (1)

    Formulation of the variable classification problem in OpenMP parallelization as a type annotation inferring task. To make use of the data-driven method, we present a newly aligned corpus including tokens of the code and corresponding commonly-used types or OpenMP attributes.

  • (2)

    We describe a DeepTyper-based learning architecture for solving the variable classification task without manually designed rules or hand-crafted features. The method understands the attributes of variables in certain contexts and uses an attention mechanism to focus on information about the data-sharing semantics.

  • (3)

    Experimental results show the potential of using learning-based methods to solve variable classification problems. We demonstrate new solutions and tools for this task.

The remainder of this study is structured as follows. Section 2 gives a summarize of the related works to variable classification in OpenMP and type inferring in software engineering. Section 3 introduces in detail the learning-based architecture. In Section 4, the evaluation is provided with discussions. Finally, a conclusion of this study is drawn in Section 5.

Section snippets

Related work

OpenMP has attracted a great deal of attention due to its simplicity and portability in expressing the parallelism of sequential applications. To decide where in a program to insert the OpenMP directives with the proper data-sharing attributes is an important task to obtain the highest performance improvement. There are some studies for variable classification which can be divided into two categories from the static and dynamic perspectives. From the static perspective, Pluto can use the

Aligned corpus generation

For this study, we collect a source code dataset from widely used benchmarks, program applications, and math libraries to demonstrate the potential of the machine learning method. And we generate a newly aligned corpus including tokens of the code and corresponding common types or data-sharing attributes in OpenMP parallelization. We then leverage this corpus to train the machine learning models using supervised learning techniques. As shown in Fig. 1, the goal of this study is to infer proper

Model architecture

Fig. 4 presents the high-level overview of our proposed learning-based architecture with an attention mechanism. We model the problem of variable classification as a type annotation problem: given a token-based input, our architecture predicts the data-sharing attributes of variables. We divide the architecture into two phases: (1) a preprocessing strategy, which represents the code as a sequence of tokens aligned with types or data-sharing attributes; (2) a DeepTyper-based learning process,

Evaluation

In this section, we evaluate the effectiveness of the learning-based method for classifying variables with data-sharing semantics in OpenMP parallelization. We first describe the datasets. Then, we introduce the experiment details. Furthermore, we discuss and analyze the results in two phases. The first phase is that we assess the performance of our method on the generated datasets. The second phase is that we further present the effectiveness of our method on the real-world test suite of NAS

Limitation

During the generation of the dataset, it is unrealistic to manually parallelize programs with the correct OpenMP directives to obtain a large suitable dataset. And, there is a lack of clear criteria for where to insert the correct directives. We achieved our batch data-processing demands with the help of Rose which could unify annotation criteria to generate the dataset. Rose is an automatic parallelization tool and is based on the variable references for specific program analysis which makes

Conclusion

In this study, we used a learning-based method to classify variables in OpenMP parallelization. We modeled the variable classify problem as a type inference task and represented the source code as a sequence of tokens. Equipped with an attention mechanism, our method can pay attention to the important parts that make contributions to the data-sharing semantics and boost the performance. Experimental results have shown that our method obtained better performance. Our findings demonstrate

CRediT authorship contribution statement

Yuanyuan Shen: Conceptualization, Methodology, Software, Validation, Writing – original draft. Manman Peng: Writing – original draft, Supervision, Project administration. Qiang Wu: Writing – review & editing. Renfa Li: Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We would like to express our gratitude to the editor and all anonymous reviewers for their valuable and constructive comments which have helped to improve the quality of the paper. This work was supported by the National Key Research and Development Program of China (Grant Nos. 2017YFB0202901, 2017YFB0202905) and Special Funds for Construction of Innovative Provinces in Hunan Province of China (Grant Nos. 2020GK2006, 2020GK2007).

Yuanyuan Shen received the bachelor’s degree in information and science technology from Changsha university of science and technology, China, in 2016. She is currently pursuing the Ph.D. degree from Hunan University, Changsha, China. Her research interests include computer architecture, program analysis, and machine learning.

References (43)

  • PachecoP.S.

    Chapter 1 - why parallel computing?

  • ShenY. et al.

    Towards parallelism detection of sequential programs with graph neural network

    Future Gener. Comput. Syst.

    (2021)
  • StoutQ.F. et al.

    Adaptive blocks: A high performance data structure

  • De ZeeuwD. et al.

    An adaptive MHD method for global space weather simulations

    IEEE Trans. Plasma Sci.

    (2000)
  • ButtlarD.A. et al.

    Pthreads Programming: A POSIX Standard for Better Multiprocessing

    (1996)
  • OpenMP architecture review board: The OpenMP specification for parallel programming

    (2009)
  • UengS.-Z. et al.

    CUDA-lite: Reducing GPU programming complexity

  • JinH. et al.

    Automatic multilevel parallelization using OpenMP

    Sci. Program.

    (2003)
  • U. Bondhugula, A. Hartono, J. Ramanujam, P. Sadayappan, A practical automatic polyhedral parallelizer and locality...
  • D. Quinlan, C. Liao, The ROSE source-to-source compiler infrastructure, in: Cetus Users and Compiler Infrastructure...
  • LiaoC. et al.

    Semantic-aware automatic parallelization of modern applications using high-level abstractions

    Int. J. Parallel Program.

    (2010)
  • VerdoolaegeS. et al.

    Polyhedral parallel code generation for CUDA

    ACM Trans. Archit. Code Optim. (TACO)

    (2013)
  • GrosserT. et al.

    Polly – performing polyhedral optimizations on a low-level intermediate representation

    Parallel Process. Lett.

    (2012)
  • H. Arabnejad, J. Bispo, J.G. Barbosa, J.M. Cardoso, Autopar-clava: An automatic parallelization source-to-source tool...
  • WeiJ. et al.

    Lambdanet: Probabilistic type inference using graph neural networks

    (2020)
  • V.J. Hellendoorn, C. Bird, E.T. Barr, M. Allamanis, Deep learning type inference, in: Proceedings of the 2018 26th Acm...
  • JangdaA. et al.

    Predicting variable types in dynamically typed programming languages

    (2019)
  • BikA. et al.

    Efficient exploitation of parallelism on pentium III and pentium 4 processor-based systems

    (2001)
  • NorouziM. et al.

    Automatic construct selection and variable classification in openmp

  • AldeaS. et al.

    An OpenMP extension that supports thread-level speculation

    IEEE Trans. Parallel Distrib. Syst.

    (2015)
  • HuangZ. et al.

    Bidirectional LSTM-CRF models for sequence tagging

    (2015)
  • Cited by (0)

    Yuanyuan Shen received the bachelor’s degree in information and science technology from Changsha university of science and technology, China, in 2016. She is currently pursuing the Ph.D. degree from Hunan University, Changsha, China. Her research interests include computer architecture, program analysis, and machine learning.

    Manman Peng received her bachelor’s degree, master’s degree, and a Ph.D. degree in computer science from Hunan University, in 1985, in 1988, and 2006, respectively. In 1988, she joined the College of Computer Science and Electronic Engineering. She is currently a Professor and doctoral supervisor. Her research interests include computer architecture and high-performance computing. Dr. Peng is a member of the Professional Committee of Computer Architecture of China Computer Federation.

    Qiang Wu received the master’s degree in computer science from Tsinghua University, in 2002, and the Ph.D. degree from the Imperial College of Technology, in 2009. He is currently an Associate Professor at the College of Computer Science and Electronic Engineering. His main areas of research interests are computer architecture, machine learning, and Heterogeneous computing.

    Renfa Li received the Ph.D. degree in electronic engineering from Huazhong University of Science and Technology, Wuhan, China, in 2002. He is a Professor and the Chair of Embedded and Cyber–Physical Systems with Hunan University, Changsha, China, where he is also the Chair of the Key Laboratory for Embedded and Cyber–Physical Systems. His major interests include computer architectures, embedded computing systems, cyber–physical systems, and Internet of Things. Prof. Li is a member of the council of CCF, a senior member of the IEEE, and ACM.

    View full text