On invariance and linear convergence of evolution strategies with augmented Lagrangian constraint handling
Introduction
Evolution strategies (ESs) are randomized (or stochastic) algorithms that are widely used in industry for solving real-wold continuous optimization problems. Their success is due to their robustness and their ability to deal with a wide range of difficulties encountered in practice such as non-separability, ill-conditioning, and multi-modality. They are also well-suited for black-box optimization, a common scenario in industry where the mathematical expression of the objective function—or the source code that computes it—is not available. The covariance matrix adaptation evolution strategy (CMA-ES) [15] is nowadays considered the state-of-the-art method and is able to achieves linear convergence on a large class of functions when solving unconstrained optimization problems.
Linear convergence is a desirable property for an ES; it represents the fastest possible rate of convergence for a randomized algorithm. It has been widely investigated in the unconstrained case on comparison-based adaptive randomized algorithms [6], [7], [8], [9], [11], where the connection between linear convergence and invariance of the studied algorithms has been established.
On ESs for unconstrained optimization, linear convergence is commonly analyzed using a Markov chain approach that consists in finding an underlying homogeneous Markov chain with some “stability” properties, generally positivity and Harris-recurrence. If such a Markov chain exists, linear convergence can be deduced by applying a law of large numbers (LLN) for Markov chains. In [8], it is shown that the existence of a homogeneous Markov chain of interest stems from the invariance of the algorithm, namely invariance to strictly increasing transformations of the objective function, translation-invariance, and scale-invariance.
In this work, we study ESs for constrained optimization where the constraints are handled using an augmented Lagrangian approach. A general constrained optimization1 problem can be written as2 where is the objective function and is the constraint function. The notation in this case is equivalent to where and , . Augmented Lagrangian methods transform the initial constrained problem (1) into one or many unconstrained problems by defining a new function to minimize, the augmented Lagrangian. The use of an augmented Lagrangian, however, results in the loss of invariance to strictly increasing transformations of f, as well as g. Yet, invariance to a subset of strictly increasing transformations can be achieved: namely invariance to strictly increasing affine transformations of the objective function f and to the scaling of the constraint function g. We formulate that this invariance should be satisfied for an augmented Lagrangian ES. We explain how this property, along with translation-invariance and scale-invariance, is related to linear convergence of the algorithm by exhibiting a homogeneous Markov chain whose stability implies linear convergence.
This paper is organized as follows: first, we give an overview of augmented Lagrangian methods in Section 2. Then, we present our algorithmic setting in Section 3: we describe a general framework for building augmented Lagrangian randomized algorithms from adaptive randomized algorithms for unconstrained optimization in Section 3.1, then we use this framework to instantiate a practical ES with adaptive augmented Lagrangian in Section 3.2 and a more general step-size adaptive algorithm with augmented Lagrangian in Section 3.3. In Section 4, we discuss important invariance properties for augmented Lagrangian methods. Section 5 is dedicated to the analysis: we start by showing that our general augmented Lagrangian step-size adaptive algorithm satisfies the previously defined invariance properties in Section 5.1. In Section 5.2, we give an overview of the Markov chain approach for analyzing linear convergence in the unconstrained case, then we apply the same approach to investigate linear convergence of our general algorithm. We show in particular how invariance allows to achieve linear convergence on problems with linear constraints. We present our numerical results in Section 6 and provide a discussion in Section 7.
A preliminary version of this work was published in [5]. The focus was on identifying a homogeneous Markov chain for the general augmented Lagrangian algorithm we study, then deducing its linear convergence under sufficient stability conditions. The theoretical part of the original paper has been rewritten to a large extent in the present work. In particular, the construction of the Markov chain is illustrated on the particular case of convex quadratic objective functions for the sake of clarity. In addition, the present work investigates the invariance to transformations of the objective and constraint functions and its impact on linear convergence, while only transformations of the search space are discussed in the original work.
We denote the set of non-negative integers and the set of positive integers . We denote the set of non-negative real numbers and the set of positive real numbers. We denote the ith entry of a vector x. For a matrix M, denotes the entry in its ith row and jth column. We denote 0 the vector and the identity matrix. We denote the multivariate normal distribution with mean 0 and covariance matrix . We refer to a multivariate normal variable with mean 0 and covariance matrix as standard multivariate normal variable in the remainder of the paper. We denote the image of a function f and ∘ the function composition operator. We denote ⊙ the entrywise (Hadamard) product. For a vector , denotes the vector .
Section snippets
Augmented Lagrangian methods: overview and related work
Augmented Lagrangian (AL) methods are a family of constraint handling approaches. They were first introduced in [16], [20] as an alternative to penalty function methods, in particular quadratic penalty methods, whose convergence necessitates the penalty parameters to grow to infinity as the optimization progresses, thereby causing ill-conditioning [19].
Analogously to penalty function methods, AL methods proceed by transforming the constrained problem into one or many unconstrained optimization
Algorithmic framework
Given an adaptive randomized algorithm for unconstrained optimization, it is possible to build an AL algorithm for constrained optimization by applying the general framework described in [4]. In the following, we extend this framework to the case of multiple constraints and use it to construct a practical -ES with adaptive AL, as well as a general AL adaptive randomized algorithm that includes the previous -ES as a particular case.
Invariance and AL methods
Invariance is an important notion in science. From a mathematical optimization perspective, when an algorithm is invariant, its performance on a particular function can generalize to a whole class of functions. Comparison-based adaptive randomized algorithms for unconstrained optimization (see definition in (8) and (9)) are inherently invariant to strictly increasing transformations of the objective function f [8]. This is a direct consequence of their definition, as these algorithms use the
Invariance and linear convergence
We illustrate here the connection between invariance and linear convergence via the analysis of the GSAR-AL. We first analyze the invariance of the algorithm, then we show how this invariance can be used to define a homogeneous Markov chain whose stability implies the linear convergence of the algorithm.
We conduct the Markov chain analysis on a particular case of problem (1) where the constraints are linear, i.e.
- A5
the constraint function is defined as , where is the matrix
Numerical results
We evaluate the -CSAoff-AL (Algorithm 1) on two linearly constrained convex quadratic functions: the sphere, , and the ellipsoid, , with a moderate condition number. These functions are defined according to (27) by taking for and H diagonal with diagonal elements , , for and with a condition number .
We choose to be at and construct the (active) linear constraints following the steps below:
- 1.
For the first
Discussion
We discussed throughout this work the connection between invariance and linear convergence of randomized adaptive algorithms for constrained optimization when the constraints are handled with an augmented Lagrangian approach.
We formalized invariance properties for augmented Lagrangian algorithms that are important to achieve linear convergence. We showed that although unconditional invariance to strictly increasing transformations of the objective and the constraint functions does no longer
Acknowledgements
This work was supported by the PGMO Numerical Black-Box Optimization for Energy Applications (NumBER) project Fondation mathématique Jacques Hadamard and by the grant ANR-2012-MONU-0009 (NumBBO) from the French National Research Agency.
References (21)
Convergence results for the -SA-ES using the theory of φ-irreducible Markov chains
Theoret. Comput. Sci.
(2005)- et al.
Global convergence of evolution strategies in spherical problems: some simple proofs and difficulties
Theoret. Comput. Sci.
(2003) Optimal weighted recombination
- et al.
Towards an augmented Lagrangian constraint handling approach for the -ES
- et al.
Analysis of linear convergence of a -ES with augmented Lagrangian constraint handling
- et al.
Augmented Lagrangian constraint handling for CMA-ES—case of a single linear constraint
- et al.
Linearly convergent evolution strategies via augmented Lagrangian constraint handling
- et al.
Linear convergence on positively homogeneous functions of a comparison based step-size adaptive randomized search: the ES with generalized one-fifth success rule
- et al.
Linear convergence of comparison-based step-size adaptive randomized search via stability of Markov chains
SIAM J. Optim.
(2016) - et al.
Global minimization using an augmented Lagrangian method with variable lower-level constraints
Math. Program.
(2010)