Elsevier

Theoretical Computer Science

Volume 832, 6 September 2020, Pages 68-97
Theoretical Computer Science

On invariance and linear convergence of evolution strategies with augmented Lagrangian constraint handling

https://doi.org/10.1016/j.tcs.2018.10.006Get rights and content

Abstract

In the context of numerical constrained optimization, we investigate stochastic algorithms, in particular evolution strategies, handling constraints via augmented Lagrangian approaches. In those approaches, the original constrained problem is turned into an unconstrained one and the function optimized is an augmented Lagrangian whose parameters are adapted during the optimization. The use of an augmented Lagrangian however breaks a central invariance property of evolution strategies, namely invariance to strictly increasing transformations of the objective function. We formalize nevertheless that an evolution strategy with augmented Lagrangian constraint handling should preserve invariance to strictly increasing affine transformations of the objective function and the scaling of the constraints—a subclass of strictly increasing transformations. We show that this invariance property is important for the linear convergence of these algorithms and show how both properties are connected.

Introduction

Evolution strategies (ESs) are randomized (or stochastic) algorithms that are widely used in industry for solving real-wold continuous optimization problems. Their success is due to their robustness and their ability to deal with a wide range of difficulties encountered in practice such as non-separability, ill-conditioning, and multi-modality. They are also well-suited for black-box optimization, a common scenario in industry where the mathematical expression of the objective function—or the source code that computes it—is not available. The covariance matrix adaptation evolution strategy (CMA-ES) [15] is nowadays considered the state-of-the-art method and is able to achieves linear convergence on a large class of functions when solving unconstrained optimization problems.

Linear convergence is a desirable property for an ES; it represents the fastest possible rate of convergence for a randomized algorithm. It has been widely investigated in the unconstrained case on comparison-based adaptive randomized algorithms [6], [7], [8], [9], [11], where the connection between linear convergence and invariance of the studied algorithms has been established.

On ESs for unconstrained optimization, linear convergence is commonly analyzed using a Markov chain approach that consists in finding an underlying homogeneous Markov chain with some “stability” properties, generally positivity and Harris-recurrence. If such a Markov chain exists, linear convergence can be deduced by applying a law of large numbers (LLN) for Markov chains. In [8], it is shown that the existence of a homogeneous Markov chain of interest stems from the invariance of the algorithm, namely invariance to strictly increasing transformations of the objective function, translation-invariance, and scale-invariance.

In this work, we study ESs for constrained optimization where the constraints are handled using an augmented Lagrangian approach. A general constrained optimization1 problem can be written as2arg minxf(x)subject tog(x)0, where f:RnR is the objective function and g:RnRm is the constraint function. The notation g(x)0 in this case is equivalent togi(x)0,i=1,,m, where g(x)=(g1(x),,gm(x)) and gi:RnR, i=1,,m. Augmented Lagrangian methods transform the initial constrained problem (1) into one or many unconstrained problems by defining a new function to minimize, the augmented Lagrangian. The use of an augmented Lagrangian, however, results in the loss of invariance to strictly increasing transformations of f, as well as g. Yet, invariance to a subset of strictly increasing transformations can be achieved: namely invariance to strictly increasing affine transformations of the objective function f and to the scaling of the constraint function g. We formulate that this invariance should be satisfied for an augmented Lagrangian ES. We explain how this property, along with translation-invariance and scale-invariance, is related to linear convergence of the algorithm by exhibiting a homogeneous Markov chain whose stability implies linear convergence.

This paper is organized as follows: first, we give an overview of augmented Lagrangian methods in Section 2. Then, we present our algorithmic setting in Section 3: we describe a general framework for building augmented Lagrangian randomized algorithms from adaptive randomized algorithms for unconstrained optimization in Section 3.1, then we use this framework to instantiate a practical ES with adaptive augmented Lagrangian in Section 3.2 and a more general step-size adaptive algorithm with augmented Lagrangian in Section 3.3. In Section 4, we discuss important invariance properties for augmented Lagrangian methods. Section 5 is dedicated to the analysis: we start by showing that our general augmented Lagrangian step-size adaptive algorithm satisfies the previously defined invariance properties in Section 5.1. In Section 5.2, we give an overview of the Markov chain approach for analyzing linear convergence in the unconstrained case, then we apply the same approach to investigate linear convergence of our general algorithm. We show in particular how invariance allows to achieve linear convergence on problems with linear constraints. We present our numerical results in Section 6 and provide a discussion in Section 7.

A preliminary version of this work was published in [5]. The focus was on identifying a homogeneous Markov chain for the general augmented Lagrangian algorithm we study, then deducing its linear convergence under sufficient stability conditions. The theoretical part of the original paper has been rewritten to a large extent in the present work. In particular, the construction of the Markov chain is illustrated on the particular case of convex quadratic objective functions for the sake of clarity. In addition, the present work investigates the invariance to transformations of the objective and constraint functions and its impact on linear convergence, while only transformations of the search space are discussed in the original work.

We denote Z0 the set of non-negative integers {0,1,} and Z>0 the set of positive integers {1,2,}. We denote R0 the set of non-negative real numbers and R>0 the set of positive real numbers. We denote [x]i the ith entry of a vector x. For a matrix M, [M]ij denotes the entry in its ith row and jth column. We denote 0 the vector (0,,0)Rn and In×nRn×n the identity matrix. We denote N(0,In×n) the multivariate normal distribution with mean 0 and covariance matrix In×n. We refer to a multivariate normal variable with mean 0 and covariance matrix In×n as standard multivariate normal variable in the remainder of the paper. We denote Im(f) the image of a function f and ∘ the function composition operator. We denote ⊙ the entrywise (Hadamard) product. For a vector x=(x1,,xk), x2 denotes the vector (x12,,xk2).

Section snippets

Augmented Lagrangian methods: overview and related work

Augmented Lagrangian (AL) methods are a family of constraint handling approaches. They were first introduced in [16], [20] as an alternative to penalty function methods, in particular quadratic penalty methods, whose convergence necessitates the penalty parameters to grow to infinity as the optimization progresses, thereby causing ill-conditioning [19].

Analogously to penalty function methods, AL methods proceed by transforming the constrained problem into one or many unconstrained optimization

Algorithmic framework

Given an adaptive randomized algorithm for unconstrained optimization, it is possible to build an AL algorithm for constrained optimization by applying the general framework described in [4]. In the following, we extend this framework to the case of multiple constraints and use it to construct a practical (μ/μw,λ)-ES with adaptive AL, as well as a general AL adaptive randomized algorithm that includes the previous (μ/μw,λ)-ES as a particular case.

Invariance and AL methods

Invariance is an important notion in science. From a mathematical optimization perspective, when an algorithm is invariant, its performance on a particular function can generalize to a whole class of functions. Comparison-based adaptive randomized algorithms for unconstrained optimization (see definition in (8) and (9)) are inherently invariant to strictly increasing transformations of the objective function f [8]. This is a direct consequence of their definition, as these algorithms use the

Invariance and linear convergence

We illustrate here the connection between invariance and linear convergence via the analysis of the GSAR-AL. We first analyze the invariance of the algorithm, then we show how this invariance can be used to define a homogeneous Markov chain whose stability implies the linear convergence of the algorithm.

We conduct the Markov chain analysis on a particular case of problem (1) where the constraints are linear, i.e.

  • A5

    the constraint function g:RnRm is defined as g(x)=Ax+b, where ARm×n is the matrix

Numerical results

We evaluate the (μ/μw,λ)-CSAoff-AL (Algorithm 1) on two linearly constrained convex quadratic functions: the sphere, fsphere, and the ellipsoid, fellipsoid, with a moderate condition number. These functions are defined according to (27) by taking H=In×n for fsphere and H diagonal with diagonal elements [H]ii=αi1n1, i=1,,n, for fellipsoid and with a condition number α=10.

We choose xopt to be at (10,,10) and construct the (active) linear constraints following the steps below:

  • 1.

    For the first

Discussion

We discussed throughout this work the connection between invariance and linear convergence of randomized adaptive algorithms for constrained optimization when the constraints are handled with an augmented Lagrangian approach.

We formalized invariance properties for augmented Lagrangian algorithms that are important to achieve linear convergence. We showed that although unconditional invariance to strictly increasing transformations of the objective and the constraint functions does no longer

Acknowledgements

This work was supported by the PGMO Numerical Black-Box Optimization for Energy Applications (NumBER) project Fondation mathématique Jacques Hadamard and by the grant ANR-2012-MONU-0009 (NumBBO) from the French National Research Agency.

References (21)

  • A. Auger

    Convergence results for the (1,λ)-SA-ES using the theory of φ-irreducible Markov chains

    Theoret. Comput. Sci.

    (2005)
  • A. Bienvenüe et al.

    Global convergence of evolution strategies in spherical problems: some simple proofs and difficulties

    Theoret. Comput. Sci.

    (2003)
  • D.V. Arnold

    Optimal weighted recombination

  • D.V. Arnold et al.

    Towards an augmented Lagrangian constraint handling approach for the (1+1)-ES

  • A. Atamna et al.

    Analysis of linear convergence of a (1+1)-ES with augmented Lagrangian constraint handling

  • A. Atamna et al.

    Augmented Lagrangian constraint handling for CMA-ES—case of a single linear constraint

  • A. Atamna et al.

    Linearly convergent evolution strategies via augmented Lagrangian constraint handling

  • A. Auger et al.

    Linear convergence on positively homogeneous functions of a comparison based step-size adaptive randomized search: the (1+1) ES with generalized one-fifth success rule

  • A. Auger et al.

    Linear convergence of comparison-based step-size adaptive randomized search via stability of Markov chains

    SIAM J. Optim.

    (2016)
  • E.G. Birgin et al.

    Global minimization using an augmented Lagrangian method with variable lower-level constraints

    Math. Program.

    (2010)
There are more references available in the full text version of this article.

Cited by (0)

View full text