Custom-tailored clone detection for IEC 61131-3 programming languages

https://doi.org/10.1016/j.jss.2021.111070Get rights and content

Highlights

  • A model-based approach for the detection of code clones in IEC 61131-3 programs.

  • Publicly available prototype implementation called Variability Analysis Toolkit.

  • Mutation-based benchmark for IEC 61131-3 clone detection.

  • Evaluation of the approach based on the PPU and xPPU case studies.

Abstract

Automated production systems (aPS) are highly customized systems that consist of hardware and software. Such aPS are controlled by a programmable logic controller (PLC), often in accordance with the IEC 61131-3 standard that divides system implementation into so-called program organization units (POUs) as the smallest software unit and is comprised of multiple textual (Structured Text (ST)) and graphical (Function Block Diagram (FBD), Ladder Diagram (LD), and Sequential Function Chart(SFC)) programming languages that can be arbitrarily nested.

A common practice during the development of such systems is reusing implementation artifacts by copying, pasting, and then modifying code. This approach is referred to as code cloning. It is used on a fine-granular level where a POU is cloned within a system variant. It is also applied on the coarse-granular system level, where the entire system is cloned and adapted to create a system variant, for example for another customer. This ad hoc practice for the development of variants is commonly referred to as clone-and-own. It allows the fast development of variants to meet varying customer requirements or altered regulatory guidelines. However, clone-and-own is a non-sustainable approach and does not scale with an increasing number of variants. It has a detrimental effect on the overall quality of a software system, such as the propagation of bugs to other variants, which harms maintenance.

In order to support the effective development and maintenance of such systems, a detailed code clone analysis is required. On the one hand, an analysis of code clones within a variant (i.e., clone detection in the classical sense) supports experts in refactoring respective code into library components. On the other hand, an analysis of commonalities and differences between cloned variants (i.e., variability analysis) supports the maintenance and further reuse and facilitates the migration of variants into a software productline (SPL).

In this paper, we present an approach for the automated detection of code clones within variants (intra variant clone detection) and between variants (inter variant clone detection) of IEC61131-3 control software with arbitrary nesting of both textual and graphical languages. We provide an implementation of the approach in the variability analysis toolkit (VAT) as a freely available prototype for the analysis of IEC 61131-3 programs. For the evaluation, we developed a meta-model-based mutation framework to measure our approach’s precision and recall. Besides, we evaluated our approach using the Pick and Place Unit (PPU) and Extended Pick and Place Unit (xPPU) scenarios. Results show the usefulness of intra and inter clone detection in the domain of automated production systems.

Introduction

During the evolution of software systems, code cloning is a common practice (Mondal et al., 2020) for reusing software artifacts. To cope with an increasing market for custom-tailored software systems, developers often follow a clone-and-own approach where existing variants are copied and altered to create new variants (Fischer et al., 2014). It is an unsustainable approach that reduces the overall software quality due to bug propagation, increases the maintenance effort, and hinders further reuse (Deissenboeck et al., 2010). In the field of clone detection, research focuses on high-level programming languages such as Java or C (Mondal et al., 2020, Ain et al., 2019, Roy and Cordy, 2007, Bellon et al., 2007). In the domain of automated production systems ( mbox aPS:̱mbox ), code cloning is a common practice due to frequently changing products, customer requirements, and altered regulatory guidelines (Durdik et al., 2012, Legat et al., 2013).

The state of the art programming languages for programming logical controller software is defined in the IEC 61131-3 standard (International Electrotechnical Commision, 2009). It comprises five programming languages, the two textual languages Structured Text (ST) and Instruction List (IL), and the three graphical languages Sequential Function Chart (SFC), Ladder Diagram (LD), and Function Block Diagram (FBD). The standard allows the nesting of languages, such as using Structured Text (ST) in Function Block Diagram (FBD) implementations. The control program developers can select the language that is best suited for a particular task, significantly increasing their productivity. Programs implemented according to IEC 61131-3 are divided into program organization units (POUs) as the smallest software unit in a program. Such systems are often reused by copying the whole system and then modifying it to create new and independent system variants (referred to as clone-and-own). Furthermore, developers also often reuse single POUs within a system (referred to as classical code cloning), for example, the POU that controls a sorting conveyor that can occur several times in a production system (Vogel-Heuser and Ocker, 2018, Bougouffa et al., 2019).

To restore the sustainable development of cloned system variants, they need to be re-engineered into a structured reuse approach, such as a software product line (SPL) (Northrop, 2002, Fischer et al., 2018). Therefore, a detailed analysis of system variants concerning code clones within a variant (intra clone detection) and commonalities and differences between cloned variants (inter clone detection) is essential. It serves as a first step to re-engineer system variants into an SPL (Breivold et al., 2008, Krueger, 2001) and to refactor code clones into reusable and configurable software artifacts such as library components (Vogel-Heuser et al., 2018).

We propose a fully customizable comparison approach for IEC 61131-3 in order to support the detection of clones within a variant (intra variant clone detection) and between variants (inter variant clone detection). This supports developers in tracing clones within and between variants, which helps them create reusable components within systems and migrating system variants into an SPL, respectively. Specifically, the contributions of this paper are as follows:

  • A model-based, fine-grained, and fully customizable approach for the detection of code clones within variants (intra clone detection) and analysis of commonalities and differences between cloned variants (inter clone detection) of IEC 61131-3 programs composed of arbitrarily nested sub languages.

  • Publicly available prototype implementation called Variability Analysis Toolkit (VAT), evaluation data and results.1

  • A mutation framework for the evaluation of clone detection tools for IEC 61131-3 systems.

  • Detailed evaluation and analysis of the approach by applying it to a large clone data set created using the mutation framework, as well as to the PPU and xPPU case study systems.

The remainder of this paper is structured as follows: Section 2 provides relevant background on the IEC 61131-3 standard with the utilized programming languages and describes code clones and variability analysis. Section 3 presents our approach for detecting clones within and between variants. In Section 4, we explain the implementation of our approach as a tool called VAT. In Section 5, we evaluate our approach by performing qualitative and quantitative analyses. Finally, we discuss related work in Section 6 and conclude in Section 7.

Section snippets

Background

This section provides background on IEC 61131-3 control software, types of code clones, and variability analysis.

Clone detection approach

This section presents our approach for the detection of code clones in IEC 61131-3 control software. We first explain the general comparison approach and then each step in more detail in the following sections. Fig. 7 illustrates the process for the detection of code clones.

In the first step, the control software is parsed ①. The parsing process transforms a PLCOpenXML file into a model based on a set of meta-models. We created these meta-models as an abstraction of the IEC 61131-3 standard to

Implementation

In order to evaluate our approach, we implemented it in a publicly available tool we call the Variability Analysis Toolkit (VAT).3

Evaluation

We evaluated different aspects of our clone detection approach. The correctness, measured in precision and recall, of results are crucial for detecting code clones within software variants and analyzing commonalities and differences between software variants. Otherwise, incorrectly matched elements inevitably compromise subsequent steps such as refactoring code clones into library components or consolidating a set of variants into an SPL. Thus, analyzing the results concerning their correctness

Related work

Clone and own is a common and popular reuse strategy in the software development domain. In the past decades, the interest in code clones is also exhibited in existing research’s wealth. In general, clone-detection aims to reduce large software systems’ maintenance effort by tracing clones or transferring a software system into an SPL (Juergens et al., 2009). Both activities require a detailed analysis of the respective software systems. Most of the research focused on detecting code clones in

Conclusion and future work

With an increasing interest in variant variety for industrial products, variability has become a key factor of many software systems. In the domain of mbox aPS:̱mbox and their control, software often remains in use for decades. To reduce such a system’s maintenance effort, the detection of clones and analysis of variability is crucial. On the one hand, code-clones can be refactored into reusable artifacts such as library components. And on the other hand, the variability analysis can support

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work has been supported by the DFG (German Research Foundation) (SCHA 1635/12-1) and (VO 937/31-1).

Kamil Rosiak is a research assistant at the Institute of Software Engineering and Automotive Informatics.

Before, he worked in the field of electronic intelligence and received his master’s degree in 2019. His research interests are on reverse-engineering of legacy software systems and the analysis of programming languages.

References (60)

  • BougouffaS. et al.

    Visualization of variability analysis of control software from industrial automation systems

  • Breivold, H., Larsson, S., Land, R., 2008. Migrating industrial systems towards software product lines: experiences and...
  • Deissenboeck, F., Hummel, B., Juergens, E., 2010. Code clone detection in practice. In: 2010 ACM/IEEE 32nd...
  • Deissenboeck, F., Hummel, B., Juergens, E., Pfaehler, M., Schaetz, B., 2010. Model clone detection in practice. In:...
  • DeissenboeckF. et al.

    Clone detection in automotive model-based development

  • DurdikZ. et al.

    Towards sustainability guidelines for long-living software systems

  • Duszynski, S., Knodel, J., Becker, M., 2011. Analyzing the source code of multiple software variants for reuse...
  • FahimipirehgalinM. et al.

    Similarity analysis of control software using graph mining

  • FischerJ. et al.

    A qualitative study of variability management of control software for industrial automation systems

  • Fischer, S., Linsbauer, L., Lopez-Herrejon, R., Egyed, A., 2014. Enhancing clone-and-own with systematic reuse for...
  • FischerS. et al.

    Enhancing clone-and-own with systematic reuse for developing software variants

  • FischerS. et al.

    The ECCO tool: Extraction and composition for clone-and-own

  • FischerJ. et al.

    Reengineering workflow for planned reuse of IEC 61131-3 legacy software

  • HarrisS.

    Simian-similarity analyser

    (2003)
  • HKJ. et al.

    Analysis of Industrial Control System Software to Detect Semantic Clones

    (2019)
  • HolthusenS. et al.

    Automatische synthese von familienmodellen durch analyse von block-basierten funktionsmodellen

  • Holthusen, S., Wille, D., Legat, C., Beddig, S., Schaefer, I., Vogel-Heuser, B., 2014. Family model mining for function...
  • Hummel, B., Juergens, E., Steidl, D., 2011. Index-based model clone detection. In: Proceedings of the 5th International...
  • IecI.

    61131-3: Programmable controllers–part 3: Programming languages

  • Programmable logic controllers – part 3: Programming languages

    (2009)
  • Cited by (13)

    View all citing articles on Scopus

    Kamil Rosiak is a research assistant at the Institute of Software Engineering and Automotive Informatics.

    Before, he worked in the field of electronic intelligence and received his master’s degree in 2019. His research interests are on reverse-engineering of legacy software systems and the analysis of programming languages.

    Alexander Schlie graduated in computer science at the TU Braunschweig, Germany and received his M.Sc. in 2016.

    He works as a research assistant at the Institute of Software Engineering and Automotive Informatics.

    His research interests are on reverse-engineering variability information from legacy systems to allow for their restructuring and migration towards software product lines.

    Lukas Linsbauer is currently a postdoctoral researcher at the Institute of Software Engineering and Automotive Informatics at the Technical University of Braunschweig in Germany.

    His research interests include software product lines, traceability, and version control systems. He received his Ph.D. in Computer Science (Software Engineering) in 2016 from the Johannes Kepler University (JKU) in Linz (Austria) where he also spent time as a postdoctoral researcher at the Institute for Software Systems Engineering (ISSE) and the Christian Doppler Laboratory (CDL) for Monitoring and Evolution of Very-Large-Scale Software Systems (MEVSS).

    Birgit Vogel-Heuser is a Professor and Director of the Institute of Automation and In-formation Systems at Technical University of Munich.

    Her main research interests are systems and software engineering, and modeling of distributed and reliable embedded systems for automation and automated Production Systems.

    Ina Schaefer is chair of the Institute of Software Engineering and Automotive Informatics at the Technische Universität Braunschweig.

    She received her Ph.D. degree from the TU Kaiserslautern and worked as a postdoc at the Chalmers University of Technology in Gothenburg, Sweden.

    Her research interests are verification and testing methods for variant-rich and evolving software systems.

    View full text