Learning software configuration spaces: A systematic literature review

https://doi.org/10.1016/j.jss.2021.111044Get rights and content

Abstract

Most modern software systems (operating systems like Linux or Android, Web browsers like Firefox or Chrome, video encoders like ffmpeg, x264 or VLC, mobile and cloud applications, etc.) are highly configurable. Hundreds of configuration options, features, or plugins can be combined, each potentially with distinct functionality and effects on execution time, security, energy consumption, etc. Due to the combinatorial explosion and the cost of executing software, it is quickly impossible to exhaustively explore the whole configuration space. Hence, numerous works have investigated the idea of learning it from a small sample of configurations’ measurements. The pattern “sampling, measuring, learning” has emerged in the literature, with several practical interests for both software developers and end-users of configurable systems. In this systematic literature review, we report on the different application objectives (e.g., performance prediction, configuration optimization, constraint mining), use-cases, targeted software systems, and application domains. We review the various strategies employed to gather a representative and cost-effective sample. We describe automated software techniques used to measure functional and non-functional properties of configurations. We classify machine learning algorithms and how they relate to the pursued application. Finally, we also describe how researchers evaluate the quality of the learning process. The findings from this systematic review show that the potential application objective is important; there are a vast number of case studies reported in the literature related to particular domains or software systems. Yet, the huge variant space of configurable systems is still challenging and calls to further investigate the synergies between artificial intelligence and software engineering.

Introduction

End-users, system administrators, software engineers, and scientists have at their disposal thousands of options (a.k.a. features or parameters) to configure various kinds of software systems in order to fit their functional and non-functional needs (execution time, output quality, security, energy consumption, etc.). It is now ubiquitous that software comes in many variants and is highly configurable through conditional compilations, command-line options, runtime parameters, configuration files, or plugins. Software product lines (SPLs), software generators, dynamic system, self-adaptive systems, variability-intensive systems are well studied in the literature and enter in this class of configurable software systems (Svahnberg et al., 2005, Pohl et al., 2005, Apel et al., 2013, Sayagh et al., 2018, Benavides et al., 2010, Cashman et al., 2018, Hallsteinsen et al., 2008, Morin et al., 2009).

From an abstract point of view, a software configuration is simply a combination of options’ values. Though customization is highly desirable, it introduces an enormous complexity due to the combinatorial explosion of possible variants. For example, the Linux kernel has 15,000+ options and most of them can have 3 values: “yes”, “no”, or “module”. Without considering the presence of constraints to avoid some combinations of options, there may be 315,000 possible variants of Linux — the estimated number of atoms in the universe is 1080 and is already reached with 300 Boolean options. Though Linux is an extreme case, many software systems or projects exhibit a very large configuration space; this might bring several challenges.

On the one hand, developers struggle to maintain, understand, and test configuration spaces since they can hardly analyze or execute all possible variants. According to several studies (Halin et al., 2019, Sayagh et al., 2018), the flexibility brought by variability is expensive as configuration failures represent one of the most common types of software failures. Configuration failures is an “undesired effect observed in the system’s delivered service” that may occur due to a specific combination of options’ values (configurations) (Mathur, 2008). On the other hand, end-users fear software variability and stick to default configurations (Xu et al., 2015, Zheng et al., 2007) that may be sub-optimal (e.g., the software system will run very slowly) or simply inadequate (e.g., the quality of the output will be unsuitable).

Since it is hardly possible to fully explore all software configurations, the use of machine learning techniques is a quite natural and appealing approach. The basic idea is to learn out of a sample of configurations’ observations and hopefully generalize to the whole configuration space. There are several applications ranging from performance prediction, configuration optimization, software understanding to constraint mining (i.e., extraction of variability rules) – we will give a more exhaustive list in this literature review. For instance, end-users of x264 (a configurable video encoder) can estimate in advance the execution time of the command-line X264 --no_cabac --no_fast_pskip --rc_lookahead 60 --ref 5 -o vid.264 vid.y4m (see Fig. 1), since a machine learning model has been crafted to predict the performance of configurations. End-users may want to use the fastest configuration or know all configurations that meet an objective (e.g., encoding time should be less than 10 s). Developers of x264 can be interested in understanding the effects of some options and how options interact.

For all these use-cases, a pattern has emerged in the scientific literature: “sampling, measuring, learning”. The basic principle is that a procedure is able to learn out of a sample of configurations’ measurements (see Fig. 1). Specifically, many software configuration problems can actually be framed as statistical machine learning problems under the condition a sample of configurations’ observations is available. For example, the prediction of the performance of individual configurations can be formulated as a regression problem; appropriate learning algorithms (e.g., CART) can then be used to predict the performance of untested, new configurations. In this respect, it is worth noticing the dual use of the term feature in the software or machine learning fields: features either refer to software features (a.k.a. configuration options) or to variables a regressor aims to relate. A way to reconcile and visualize both is to consider a configuration matrix as depicted in Fig. 1. In a configuration matrix, each row describes a configuration together with observations/values of each feature and performance property. In the example of Fig. 1, the first configuration has the feature no_cabac set to False value and the feature ref set to 9 value while the encoding time performance value is 3.1876 s. We can use a sample of configurations (i.e., a set of measured configurations) to train a machine learning model (a regressor) with predictive variables being command-line parameters of x264. Unmeasured configurations could then be predicted. Even for large-scale systems like the Linux Kernel, the same process of “sampling, measuring, learning” can be followed (see, e.g. Acher et al., 2019a, Acher et al., 2019b). Some additional steps are worth exploring (like feature engineering prior to learning) while techniques for sampling and learning should be adapted to scale at its complexity, but the general process remains applicable.

Learning software configuration spaces is, however, not a pure machine learning problem and there are a number of specific challenges to address at the intersection of software engineering and artificial intelligence. For instance, the sampling phase involves a number of difficult activities: (1) picking configurations that are valid and conform to constraints among options – one needs to resolve a satisfiability problem; (2) instrumenting the executions and observations of software for a variety of configurations – it can have an important computational cost and is hard to engineer especially when measuring non-functional aspects of software; (3) meanwhile, we expect that the sample is representative to the whole population of valid configurations otherwise the learning algorithm may hardly generalize to the whole configuration space. The general problem is to find the right strategy to decrease the cost of labeling software configurations while minimizing prediction errors. From an empirical perspective, one can also wonder to what extent learning approaches are effective for real-world software systems present in numerous domains.

While several studies have covered different aspects of configurable systems over the last years, there has been no secondary study (such as systematic literature reviews) that identifies and catalogs individual contributions for machine learning configuration spaces. Thus, there is no clear consensus on what techniques are used to support the process, including which quantitative and qualitative properties are considered and how they can be measured and evaluated, as well as how to select a significant sample of configurations and what is an ideal sample size. This stresses the need for a secondary study to build knowledge from combining findings from different approaches and present a complete overview of the progress made in this field. To achieve this aim, we conduct a Systematic Literature Review (SLR) (Kitchenham and Charters, 2007) to identify, analyze and interpret all available important research in this domain. We systematically review research papers in which the process of sampling, measuring, and learning configuration spaces occurs — more details about our research methodology are given in Section 2. Specifically, we aim of synthesizing evidence to answer the following four research questions:

  • RQ1. What are the concrete applications of learning software configuration spaces?

  • RQ2. Which sampling methods and learning techniques are adopted when learning software configuration spaces?

  • RQ3. Which techniques are used to gather measurements of functional and non-functional properties of configurations?

  • RQ4. How are learning-based techniques validated?

To address RQ1, we analyze the application objective of the study (i.e., why they apply learning-based techniques). It would allow us to assess whether the proposed approaches are applicable. With respect to RQ2, we systematically investigate which sampling methods and learning techniques are used in the literature for exploring the SPL configuration space. With respect to RQ3, we give an in-depth view of how each study measures a sample of configurations. In addition, RQ4 follows identifying which sampling design and evaluation metrics are used for evaluation.

By answering these questions, we make the following five contributions:

  • 1.

    We identified six main different application areas: pure prediction, interpretability, optimization, dynamic configuration, evolution, and mining constraints.

  • 2.

    We provide a framework classification of four main stages used for learning: Sampling, Measuring, Learning, Validation.

  • 3.

    We describe 23 high-level sampling methods, 5 measurement strategies, 64 learning techniques, and 50 evaluation metrics used in the literature. As case studies, we identify 95 configurable systems targeting several domains, and functional and non-functional properties. We relate and discuss the learning and validation techniques with regard to their application objective.

  • 4.

    We identify a set of open challenges faced by the current approaches, in order to guide researchers and practitioners to use and build appropriate solutions.

  • 5.

    We build a Web repository (Pereira et al., 2019) to make our SLR results publicly available for the purpose of reproducibility and extension.

Overall, the findings of this SLR reveal that there is a significant body of work specialized in learning software configurable systems with an important application in terms of software technologies, application domains, or goals. There is a wide variety in the considered sampling or learning algorithms as well as in the evaluation process, mainly due to the considered subject systems and application objectives. Practitioners and researchers can benefit from the findings reported in this SLR as a reference when they select a learning technique for their own settings. To this end, this review provides a classification and catalog of specialized techniques in this field.

The rest of the paper is structured as follows. In Section 2, we describe the research protocol used to conduct the SLR. In Section 3, we categorize a sequence of key learning stages used by the ML state-of-the-art literature to explore highly configurable systems. In Section 4, we discuss the research questions. In Section 5, we discuss the current research themes in this field and present the open challenges that need attention in the future. In Section 6, we discuss the threats to the validity of our SLR. In Section 7, we describe similar secondary studies and indicate how our literature review differs from them. Finally, in Section 8, we present the conclusions of our work.

Section snippets

The review methodology

We followed the SLR guidelines by Kitchenham and Charters (2007) to systematically investigate the use of learning techniques for exploring the SPL configuration space. In this section, we present the SLR methodology that covers two main phases: planning the review and conducting the review. The paper selection process is shown in Fig. 2. Next, we report the details about each phase so that readers can assess their rigor and completeness, and reproduce our findings.

Literature review pattern: Sampling, measuring, learning

Understanding how the system behavior varies across a large number of variants of a configurable system is essential for supporting end-users to choose a desirable product (Apel et al., 2013). It is also useful for developers in charge of maintaining such software systems. In this context, machine learning-based techniques have been widely considered to predict configurations’ behavior and assist stakeholders in making informed decisions. Throughout our review effort, we have observed that such

Results and discussion of the research questions

In this section, we discuss the answers to our research questions defined in Section 1. In Section 4.1, we identify the main goal of the learning process. Next, in Sections 4.2 RQ2: Which sampling methods and learning techniques are adopted when learning software configuration spaces?, 4.3 RQ3: Which techniques are used to gather measurements of functional and non-functional properties of configurations?, 4.4 RQ4: How are learning-based techniques validated? we analyze in detail how each study

Emerging research themes and open challenges

In the previous sections, we have given an overview of the state-of-the-art in sampling, measuring, and learning software configuration spaces.11 Here, we will now discuss open challenges and their implications for research and practice.

There are a few reports of real-world adoption (Kolesnikov et al., 2018, Temple et al., 2016,

Threats to validity

This section discusses potential threats to validity that might have affected the results of the SLR. We faced similar threats to validity as any other SLR. The findings of this SLR may have been affected by bias in the selection of the primary studies, inaccuracy in the data extraction and in the classification of the primary studies, and incompleteness in defining the open challenges. Next, we summarize the main threats to the validity of our work and the strategies we have followed to

Related work

After the introduction of SLR in software engineering in 2004, the number of published reviews in this field has grown significantly (Kitchenham et al., 2009). A broad SLR has been conducted by Heradio et al. (2016) to identify the most influential researched topics in SPL, and how the interest in those topics has evolved over the years. Although these reviews are not directly related to ours, the high level of detail of their research methodology supported to structure and define our own

Conclusion

We presented a systematic literature review related to the use of learning techniques to analyze large configuration software spaces. We analyzed the literature in terms of a four-stage process: ”sampling, measuring, learning, and validation” (see Section 3). Our contributions are fourfold. First, we identified the application of each approach which can guide researchers and industrial practitioners when searching for an appropriate technique that fits their current needs. Second, we classified

CRediT authorship contribution statement

Juliana Alves Pereira: Conceptualization, Methodology, Writing – original draft, Data curation, Validation, Writing – review & editing. Mathieu Acher: Supervision, Conceptualization, Methodology, Writing – original draft, Data curation, Validation, Writing – review & editing. Hugo Martin: Conceptualization, Methodology, Writing – review & editing. Jean-Marc Jézéquel: Supervision, Writing – review & editing. Goetz Botterweck: Methodology, Validation, Writing – review & editing. Anthony

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research was partially funded by the ANR-17-CE25-0010-01 VaryVary project and by Science Foundation Ireland grant 13/RC/2094. We would like to thank Paul Temple for his early comments on a draft of this article.

Juliana Alves Pereira is currently a Post-Doctoral researcher at PUC-Rio (Brazil). She was a researcher at the University of Rennes I (Inria/Irisa, France). Juliana received her Ph.D. degree with distinction in 2018 from the University of Magdeburg, Germany. Her research thrives to automate software engineering by combining methods from software analysis, machine learning, and meta-heuristic optimization. In recent years, she has published and revised research papers in premier software

References (139)

  • MalhotraR.

    A systematic review of machine learning techniques for software fault prediction

    Appl. Soft Comput.

    (2015)
  • OchoaL. et al.

    A systematic literature review on the semi-automatic configuration of extended product lines

    J. Syst. Softw.

    (2018)
  • AcherM. et al.

    Learning from Thousands of Build Failures of Linux Kernel ConfigurationsTechnical report

    (2019)
  • AcherM. et al.

    Learning Very Large Configuration Spaces: What Matters for Linux Kernel SizesResearch report

    (2019)
  • AcherM. et al.

    Varylatex: Learning paper variants that meet constraints

  • AkersS.B.

    Binary decision diagrams

    IEEE Trans. Comput.

    (1978)
  • Alipourfard, O., Liu, H.H., Chen, J., Venkataraman, S., Yu, M., Zhang, M., 2017. Cherrypick: Adaptively unearthing the...
  • AmandB. et al.

    Towards learning-aided configuration in 3d printing: Feasibility study and application to defect prediction

  • ApelS. et al.

    Feature-Oriented Software Product Lines: Concepts and Implementation

    (2013)
  • Ashouri, A.H., Killian, W., Cavazos, J., Palermo, G., Silvano,...
  • BakK. et al.

    Clafer: unifying class and feature modeling

    Softw. Syst. Model.

    (2016)
  • BaoL. et al.

    Autoconfig: automatic configuration tuning for distributed message systems

  • BenavidesD. et al.

    Automated reasoning on feature models

  • BenavidesD. et al.

    Automated analysis of feature models 20 years later: a literature review

    Inf. Syst.

    (2010)
  • BoschJ. et al.

    Engineering ai systems: A research agenda

    (2020)
  • CashmanM. et al.

    Navigating the maze: the impact of configurability in bioinformatics software

  • ChenH. et al.

    Boosting the performance of computing systems through adaptive configuration tuning

  • CoutoM. et al.

    Products go green: Worst-case energy consumption in software product lines

  • CrawfordM. et al.

    Survey of review spam detection using machine learning techniques

    J. Big Data

    (2015)
  • DingY. et al.

    Autotuning algorithmic choice for input sensitivity

  • do Carmo MachadoI. et al.

    On strategies for testing software product lines: A systematic literature review

    Inf. Softw. Technol.

    (2014)
  • DuarteF. et al.

    Learning non-deterministic impact models for adaptation

  • EichelbergerH. et al.

    Using ivml to model the topology of big data processing pipelines

  • El AfiaA. et al.

    Performance prediction using support vector machine for the configuration of optimization algorithms

  • EtxeberriaL. et al.

    Performance-based selection of software and hardware features under parameter uncertainty

  • GargantiniA. et al.

    Combinatorial interaction testing for automated constraint repair

  • Ghamizi, S., Cordy, M., Papadakis, M., Traon,...
  • GrebhahnA. et al.

    Performance-influence models of multigrid methods: A case study on triangular grids

    Concurr. Comput.: Pract. Exper.

    (2017)
  • GrebhahnA. et al.

    Predicting performance of software configurations: There is no silver bullet

    (2019)
  • Guo, J., Czarnecki, K., Apel, S., Siegmund, N., Wasowski, A., 2013. Variability-aware performance prediction: A...
  • GuoJ. et al.

    Data-efficient performance learning for configurable systems

    Empir. Softw. Eng.

    (2017)
  • HalinA. et al.

    Test them all, is it worth it? assessing configuration sampling on the jhipster web development stack

    Empir. Softw. Eng.

    (2019)
  • HallM. et al.

    The weka data mining software: an update

    ACM SIGKDD Explor. Newsl.

    (2009)
  • HallsteinsenS.O. et al.

    Dynamic software product lines

    IEEE Comput.

    (2008)
  • HarmanM. et al.

    Search based software engineering for software product line engineering: a survey and directions for future work

  • HutterF. et al.

    Sequential model-based optimization for general algorithm configuration

  • Jamshidi, P., Cámara, J., Schmerl, B., Kästner, C., Garlan,...
  • JamshidiP. et al.

    An uncertainty-aware approach to optimal configuration of stream processing systems

  • JamshidiP. et al.

    Transfer learning for performance modeling of configurable systems: an exploratory analysis

  • JamshidiP. et al.

    Learning to sample: exploiting similarities across environments to learn performance models for configurable systems

  • Cited by (39)

    • Automating Feature Model maintainability evaluation using machine learning techniques

      2023, Journal of Systems and Software
      Citation Excerpt :

      The present work fits into at least four of these scenarios, as follows: pure prediction (we were able to predict the maintainability classification of an FM); interpretability of configurable systems (we used only white box algorithms, which allowed us to interpret the model results of ML and based on that to suggest precise changes in the values of the maintainability measures); dynamic configuration (since what we produced in this work can be applied both in the context of conventional SPL and in the context of dynamic SPL); and evolution (after evaluating the maintainability of the FM, we were also able to create a mechanism capable of suggesting FM refactorings, which the domain engineer can use on a process of continuous improvement of the SPL). Furthermore, in the works analyzed in Pereira et al. (2021), we could not find discussions surrounding the combination of maintainability assessment with refactoring suggestions. This contribution provides fundamental inputs to perform a continuous improvement process of the SPL.

    • Learning Graph Configuration Spaces with Graph Embedding in Engineering Domains

      2024, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    • Finding Near-optimal Configurations in Colossal Spaces with Statistical Guarantees

      2023, ACM Transactions on Software Engineering and Methodology
    View all citing articles on Scopus

    Juliana Alves Pereira is currently a Post-Doctoral researcher at PUC-Rio (Brazil). She was a researcher at the University of Rennes I (Inria/Irisa, France). Juliana received her Ph.D. degree with distinction in 2018 from the University of Magdeburg, Germany. Her research thrives to automate software engineering by combining methods from software analysis, machine learning, and meta-heuristic optimization. In recent years, she has published and revised research papers in premier software engineering conferences, symposiums, and journals. She is regularly presenting courses, tutorials, tools, and scientific results at national and international venues.

    Mathieu Acher is Associate Professor at University of Rennes 1/Inria, France. His research focuses on reverse engineering, modeling, and learning variability of software intensive systems with different contributions published at ASE, ESEC/FSE, SPLC, MODELS, IJCAI, or JSS, ESEM journal. He was PC co-chair of SPLC 2017 and will be PC co-chair of VaMoS 2020. He is currently leading a research project on machine learning and variability.12

    Hugo Martin is a Ph.D. Student at the University of Rennes 1, France. His research focuses on using interpretable machine learning to better understand configurable systems.

    Dr. Jean-Marc Jzquel is a Professor at the University of Rennes and Director of IRISA, one of the largest public research lab in Informatics in France. He is also head of research of the French Cyber-defense Excellence Cluster and the director of the Rennes Node of EIT Digital. In 2016, he received the Silver Medal from CNRS. His interests include model driven software engineering for software product lines, and specifically component based, dynamically adaptable systems with quality of service constraints, including security, reliability, performance, timeliness, etc. He is the author of 4 books and more than 250 publications in international journals and conferences. He was a member of the steering committees of the AOSD and MODELS conference series. He also served on the editorial boards of IEEE Computer, IEEE Transactions on Software Engineering, the Journal on Software and Systems, the Journal on Software and System Modeling and the Journal of Object Technology. He received an engineering degree from Telecom Bretagne in 1986, and a Ph.D. degree in Computer Science from the University of Rennes, France, in 1989.

    Goetz Botterweck is a Assoc. Professor in Computer Science at Trinity College, Dublin Ireland and with Lero – the Irish Software Research Centre. Previously, he held positions at the University of Limerick, Ireland, as Lecturer in Computer Science and Senior Research Fellow. His research interests are model-driven software engineering, software evolution, and software product lines. Botterweck received a Ph.D. in computer science from the University of Koblenz. He was PC co-chair of SPLC 2015 and ICSR 2017.

    Anthony Ventresque received his Ph.D. degree in Computer Science from the University of Nantes & INRIA France in 2008. Dr Ventresque is currently an Assistant Prof. in the School of Computer Science at University College Dublin, Ireland, and a Funded Investigator with Lero, the SFI Irish Software Research Centre. Previously, he held positions as Research Fellow at NTU, Singapore (20102011), UCD, Ireland (20122014), and IBM Research Dublin, Ireland (20142015).

    Editor: Raffaela Mirandola.

    12

    https://varyvary.github.io/.

    View full text