1 Introduction

Meta-heuristics are very successful at finding good solutions for hard optimization problems in practice. However, due to the nature of such algorithms and the problems they are applied to, it is generally very difficult to derive performance guarantees, or to determine the number of steps it takes until an optimal solution is found. In the present work we propose a simple adaptation of the particle swarm optimization (PSO) algorithm introduced by Eberhart and Kennedy (1995) and Kennedy and Eberhart (1995) to optimization problems over discrete domains. Our proposed algorithm assumes very little about the problem structure and consequently, it works naturally for a large class of discrete domains. It is reasonable to expect from a meta-heuristic that it solves black-box versions of many tractable problems in expected polynomial time. We provide a formal analysis based on Markov chains and establish under which conditions our algorithm satisfies this basic requirement. More concretely, we consider two classical problems that are easy to solve in a non-black-box setting, namely the problem of sorting items by transpositions and the problem OneMax, which asks to maximize the number of ones in a bitstring. Our analysis gives precise information about the expected number of steps our algorithm takes in order to solve these two reference problems. Our runtime bounds are essentially tight with respect to the Markov process we use to model the behavior of the algorithm.

For practical purposes, a meta-heuristic should, in one way or another, incorporate the following two general strategies: (1) find an improving solution locally (often referred to as exploitation) and (2) move to unexplored parts of the search space (often referred to as exploration). The first strategy essentially leads the algorithm to a local optimum while the second one helps the algorithm to avoid getting stuck when it is close to a local optimum. For our proposed algorithm, as for many other meta-heuristics, the tradeoff between the two strategies can be conveniently set by an algorithm parameter. Our analysis shows that there is a sharp threshold with respect to this parameter, where the expected runtime of the algorithm on the reference problems turns from polynomial to exponential. Hence, we can maximize the algorithm’s ability to escape local optima while still maintaining polynomial runtime on the reference problems.

A key tool for the runtime analysis of meta-heuristics for optimization problems over discrete domains is the fitness level method pioneered by Wegener (2002). The basic idea is to consider the level sets of the objective function of a problem instance and to determine the expected number of steps an algorithm takes to move to a better level set. This approach has been used extensively in the study of so-called elitist \((1+1)\)-EAs (Wegener 2002; Droste et al. 2002; Giel and Wegener 2003; Sudholt 2013). These algorithms keep a single “current” solution and update this solution only if a better one is found. Our analysis of the proposed PSO algorithm also relies on the fitness level method. However, since our algorithm also considers non-improving solutions in order to escape local optima, a much more involved analysis is required in order to determine the time it takes to move to a better fitness level.

We will refer in the following by expected optimization time to the expected number of evaluations of the objective function an algorithm performs until an optimal solution is found. Before giving a precise statement of our results we provide some background information on the PSO algorithm as well as the two reference problems we consider.

Fig. 1
figure 1

Search space for the problem of OneMax on \(\lbrace 0,1\rbrace ^4\) by bitflips, i.e., the 4-dimensional hypercube. Two bitstrings xy are adjacent iff x and y differ in exactly one position

1.1 Particle swarm optimization

The PSO algorithm has been introduced by Eberhart and Kennedy (1995) and Kennedy and Eberhart (1995) and is inspired by the social interaction of bird flocks. Fields of successful application of PSO are, among many others, Biomedical Image Processing (Schwab et al. 2015; Wachowiak et al. 2004), Geosciences (Onwunalu and Durlofsky 2010), Agriculture (Yang et al. 2017), and Materials Science (Ramanathan et al. 2009). In the continuous setting, it is known that the algorithm converges to a local optimum under mild assumptions (Schmitt and Wanka 2015). The algorithm has been adapted to various discrete problems and several results are available, for instance for binary problems (Sudholt and Witt 2010) and the traveling salesperson problem (TSP) (Hoffmann et al. 2011).

A PSO algorithm manages a collection (called swarm) of particles. Each particle consists of an (admissible) solution together with a velocity vector. Additionally each individual particle knows the local attractor, which is the best solution found by that particle. Information between particles is shared via a common reference solution called global attractor, which is the best solution found so far by all particles. In each iteration of the algorithm, the solution of each particle is updated based on its relative position with respect to the attractors and some random perturbation. Algorithm parameters balance the influence of the attractors and the perturbation and hence give a tradeoff between the two general search strategies “exploration” and “exploitation”. Although PSO has originally been proposed to solve optimization problems over a—typically rectangular—domain \(X \subseteq {\mathbb {R}} ^n\), several authors have adapted PSO to discrete domains. This requires a fundamental reinterpretation of the PSO movement equation because corresponding mathematical operations of a vector space are typically lacking in the discrete setting. An early discrete PSO variant is the binary PSO presented in Kennedy and Eberhart (1997) for optimizing over \(X=\{0,1\}^n\) where velocities determine probabilities such that a bit is zero or one in the next iteration. A PSO variant for optimizing over general integral domains \(X=\{0,1,\ldots ,M-1\}^n\), \(M \in {\mathbb {N}} \), has been proposed in Veeramachaneni et al. (2007).

1.2 Problems and search spaces

In this section we briefly define the optimization problems for which we will study the performance of the proposed PSO algorithm. The problem OneMax asks for a binary string of length n that maximizes the function

$$\begin{aligned} {\text{O}}\textsc {ne}M\textsc {ax} (x_1,\ldots ,x_n) = {\sum _{i=1}^n x_i}, \end{aligned}$$

which counts the number of ones in a binary string. A more general version of this problem asks to minimize the Hamming distance to an unknown binary string of length n. The proposed algorithm works exactly the same on the more general problem since it is indifferent to the actual bit values and each bit is handled independently. Therefore, the performance of our algorithm on this more general version is equal to its performance on OneMax. The corresponding search space is the n-dimensional hypercube: Any binary string of length n is a (feasible) solution, and two solutions are adjacent iff they differ by exactly one bitflip. For \(n=4\), the search space is shown in Fig. 1. More generally, a pseudo-Boolean function is any function \(f : \{0, 1\}^n \rightarrow {\mathbb {R}} \).

By the sorting problem we refer to the task of arranging n items in non-decreasing order using transpositions. An (algebraic) transposition \(t=(i\,\,\, j)\) is the exchange of the entries at positions i and j. Therefore, the search space is the following (undirected) graph: the vertices are the permutations on \(\{1,2,\ldots ,n\}\) and two vertices xy are adjacent iff there is a transposition t such that \(x \circ t = y\). The objective function is the transposition distance to the identity permutation.Footnote 1 Figure 2 shows the search space for the problem of sorting items \(\{1, 2, 3, 4\}\) using transpositions. Any two permutations drawn in the same vertical layer have the same objective value.

Fig. 2
figure 2

Search space for the problem of sorting four items by transpositions. Two permutations xy on \(\{1,2,3,4\}\) are adjacent iff there is a transposition t such that \(x \circ t = y\)

The sorting problem and OneMax have a unique optimum. Furthermore, the value of the objective function is the distance to the unique optimal solution in the corresponding directed graph.

1.3 Our contribution

We propose a simple adaptation of the PSO algorithm to optimization problems over discrete domains. We refer to this algorithm as D-PSO. The algorithm works naturally on a large class of discrete problems, for instance optimization problems over bitstrings, integral domains, and permutations. The general task is to optimize a function \(f:X\rightarrow {\mathbb {R}} \), where X is a finite set of feasible solutions. Our assumptions on the problem structure are the following. We assume that the set X is the vertex set of a finite, strongly connected graph and for any solution \(x \in X\), we can sample a neighbor of x efficiently and uniformly. The D-PSO algorithm essentially explores this graph, looking for an optimal vertex. In our analysis, we assume at first a swarm size of one as in Mühlenthaler et al. (2017), similar to the analysis of EAs and ACO in Sudholt and Witt (2010). We refer to the corresponding specialization of D-PSO as OnePSO. Indeed, for a single particle we have only a single attractor and, as a consequence, a single parameter is sufficient to control the tradeoff between the moving towards the attractor and performing a random perturbation.

Table 1 Summary of upper and lower bounds on the expected time taken by the algorithm OnePSO to solve the sorting problem and OneMax for \(c \in [0,1]\)

Our main results are upper and lower bounds on the expected optimization time of the proposed OnePSO algorithm for solving the sorting problem and OneMax in a black-box setting summarized in Table 1. Certainly there are faster algorithms for the sorting problem or OneMax in a non-black-box setting, e.g., quicksort for the sorting problem. The upper bounds we prove for OnePSO naturally hold for D-PSO and the bounds are tight with respect to our Markov model. The algorithm parameter c determines the probability of making a move towards the attractor. Depending on the parameter c, we obtain a complete classification of the expected optimization time of OnePSO. For \(c=0\), OnePSO performs a random walk on the search space, and for \(c=1\), the algorithm behaves like randomized local search (see Papadimitriou et al. 1990 or Doerr and Neumann 2020 for results on local search variants). For \(c\in (1/2,1)\) the OnePSO behaves essentially like the \((1+1)\)-EA variants from Droste et al. (2002); Scharnow et al. (2004), since \((1+1)\)-EA variants perform in expectation a constant number of elementary mutations to obtain an improved solution and OnePSO with \(c\in (1/2,1)\) in expectation also performs a constant number of elementary mutations to find an improved solution, before it returns to the current best solution. Therefore, OnePSO in a sense generalizes the \((1+1)\)-EA algorithm since a parameter choice in \(c\in [0,1]\) supplies a broader range of behavior options than exploring solutions which are in expectation a constant number of elementary mutations away from the current best solution. If \(c<1\) then the OnePSO uses similar subroutines as the Metropolis algorithm (see Metropolis et al. 1953), but for OnePSO new positions are always accepted and guidance to good solutions is instead implemented by a “drift” to the best position found so far.

Indeed bounds on the expected optimization time for OneMax and upper bounds on the expected optimization time for the sorting problem of the OnePSO with parameter \(c\in (1/2,1]\) match the respective bounds on the expected optimization time for \((1+1)\)-EA variants from citeDJW:02,STW:04. We show that for \(c \in [1/2, 1)\), the expected optimization time of OnePSO for sorting and OneMax is polynomial and for \(c \in (0, 1/2)\), the expected optimization time is exponential. For c in the latter range we provide lower bounds on the base of the exponential expression by \(\alpha (c)\) for sorting and \(\beta (c)\) for OneMax such that \(1< \beta (c)< \alpha (c) < 6\) (see Fig. 5). Please note that \(\alpha \) and \(\beta \) have been significantly improved compared to the conference version (Mühlenthaler et al. 2017) and the upper bound on OneMax has also been reduced heavily to an exponential term with base \(\beta (c)\). This means that the lower and upper bound on the base of the exponential expression for the expected optimization time is equal. Note that for \(c=1/2\) the expected time it takes to visit the attractor again after moving away from it is maximal while keeping the expected optimization time polynomial, i.e., we have a phase transition at \(c=1/2\) such that for any \(c\le 1/2\) the expected optimization time is polynomial and for any \(c>1/2\) the expected optimization time is exponential in n. Hence, the parameter choice \(c=1/2\) maximizes the time until the attractor is visited again, i. e., the particles can explore the search space to the largest possible extent, provided that OneMax and the sorting problem are solved efficiently in a black-box setting.

In order to obtain the bounds shown in Table 1, we use a Markov model which captures the behavior of the OnePSO algorithm between two consecutive updates of the attractor. Depending on whether we derive upper or lower bounds on the expected optimization time, the Markov model is instantiated in a slightly different way. The relevant quantity we extract from the Markov model is the expected number of steps it takes until the OnePSO algorithm returns to the attractor. We determine \(\varTheta \)-bounds on the expected return time by an analysis of appropriate recurrence equations. Similar recurrences occur, for example, in the runtime analysis of randomized algorithms for the satisfiability problem (Papadimitriou 1991; Schöning 1999). Thus, our analysis of the Markov model presented in Sect. 3 may be of independent interest. For \(c > 1/2\), the recurrence equations can be solved using standard methods. For the parameter choice \(c \le 1/2\) however, we need to solve recurrence equations with non-constant coefficients in order to get sufficiently accurate bounds from the model. The gaps between upper and lower bounds on the expected optimization times shown in Table 1 result from choosing best-case or worst-case bounds on the transition probabilities in the Markov model, which are specific to the optimization problem. Since our bounds on the transition probabilities are essentially tight, we can hope to close the gap between the upper and lower bounds on the sorting problem (especially in the exponential case) only by using a more elaborate model.

Furthermore, based on Wald’s equation and the Blackwell-Girshick equation we obtain also the variance of the number of function evaluations needed to find an optimal solution with respect to the Markov model.

Upper bounds To obtain the upper bounds shown in Table 1 we use the established fitness level method (e.g., see Wegener 2002). We instantiate our Markov model such that improvements of the attractor are only accounted for if the current position is at the attractor. The main difficulty is to determine the expected number of steps needed to return to the attractor. We obtain this quantity from the analysis of the corresponding recurrences with constant and non-constant coefficients. Furthermore, we obtain by integration closed-form expressions of the expected number of steps it takes to return to the attractor after an unsuccessful attempt to improve the current best solution.

Lower bounds The runtime of the OnePSO algorithm is dominated by the time required for the last improvement of the attractor, after which the global optimum has been found. We again use the Markov model and observe that in this situation, the global optimum can be reached only when the Markov model is in a specific state. We argue that the optimal solution is included in a certain set \({\hat{Y}}\) of indistinguishable states. Therefore, in expectation, this set needs to be hit \(\varOmega (\vert {\hat{Y}}\vert )\) times until the optimum has been found. By evaluation of the return time to the attractor we also obtain bounds on the return time to the set \({\hat{Y}}\). Furthermore, for D-PSO with a constant number of particles, we give a lower bound of \(\varOmega (\log n)\) for optimizing a pseudo-Boolean function and for \(P = \mathrm poly(n)\) particles a stronger lower bound of \(\varOmega (P\cdot n)\) for the same task.

Open problems Finally, we conjecture that the expected optimization time of OnePSO for sorting n items is asymptotically equivalent to n! if the attractor is not used at all (\(c = 0\)). An equivalent statement is that a random walk on the set of permutations of n items using single transpositions as neighborhood relation asymptotically takes expected time n! to discover a fixed given permutation starting at a random position. We provide theoretical evidence for this conjecture in “Appendix A”. Furthermore, we conjecture stronger lower bounds for OnePSO for sorting n items for \(c > 0\) and provide evidence in “Appendix B”.

Extensions This article extends the conference paper (Mühlenthaler et al. 2017) as follows. We present D-PSO, a discrete PSO algorithm with multiple particles in Sect. 2 and show in Sect. 5.3 which results for OnePSO generalize to D-PSO. For Pseudo-Boolean functions, a new general lower bound is presented, which holds for D-PSO (see Sect. 5.4). Furthermore, we give a refined analysis in the case of exponential runtime which allows us to determine the exact base of the exponential expression (see Sect. 4.4). In addition to the analysis of the expected optimization time, we also consider the variance of the expected optimization time (see Sect. 4.5). We conjecture that the expected number of steps needed until the random walk on the space of permutations hits a fixed permutation is n!. We provide evidence for this conjecture in “Appendix A”. Finally, we present an approximate model of the algorithm with average transition probabilities to obtain the actual bounds where lower and upper bounds on the expected optimization time differ (see “Appendix B”). Also, some theorems have been extended to larger classes of functions, which may make them more useful in other settings.

1.4 Related work

Runtime results are available for several other meta-heuristics for optimization problems over discrete domains, for example evolutionary algorithms (EAs) (Droste et al. 2002; Giel and Wegener 2003; Wegener 2002; Antipov et al. 2018, 2019) and ant colony optimization (ACO) (Doerr et al. 2007; Neumann and Witt 2007; Sudholt and Thyssen 2012). Most of the results relevant to this work concern the binary PSO algorithm and the \((1+1)\)-EA algorithm. For the binary PSO, Sudholt and Witt (2008, 2010) provide various runtime results. For instance, they give general lower bound of \(\varOmega (n/\log n)\) for every function with a unique global optimum and a bound of \(\varTheta (n \log n)\) on the function OneMax. Note that the binary PSO studied in Sudholt and Witt (2010) has been designed for optimizing over \(\{0,1\}^n\) and it is different from our proposed D-PSO, which can be applied to a much wider range of discrete problems. Sudhold and Witt show the following bound for the binary PSO.

Theorem 1

(Sudholt and Witt 2010, Theorem) Under certain assumptions on the algorithm parameters, the expected optimization time of the binary PSO for optimizing \(f : \{0,1\}^n \rightarrow {\mathbb {R}} \) is \(O(mn\log n) + \sum _{i=1}^{m-1} 1/s_i\), where m is the number of level sets of f and \(s_i\) is a lower bound on the probability to move from level i to level \(i-1\).

Essentially, this result reflects the fact that the binary PSO converges to the attractor in expected time \(O(n\log n)\) unless the attractor has been updated meanwhile. This happens once for each fitness level. For OneMax, this result yields an expected optimization time of \(O(n^2 \log n)\). By a more careful analysis of the binary PSO on OneMax, the following improved bound is established:

Theorem 2

(Sudholt and Witt 2010, Theorem) The expected optimization time of the binary PSO with a single particle optimizing OneMax is \(O(n \log n)\).

The \((1+1)\)-EA considered in Scharnow et al. (2004) is reminiscent of stochastic hill climbing: in each iteration, a random solution is sampled and the current solution is replaced if and only if the solution is better. In order to escape local optima, the distance between the current solution and the new one is determined according to Poisson distributed random variables. Scharnow et al. (2004) provide bounds on the expected optimization time of a \((1+1)\)-EA sorting n items. They consider various choices of objective functions (e.g., Hamming distance, transposition distance, ...) as well as mutation operators (e.g., transpositions, reversing keys in a certain range, ...). A general lower bound of \(\varOmega (n^2)\) is proved, which holds for all permutation problems having objective functions with a unique optimum Scharnow et al. (2004, Theorem 1). The most relevant runtime result for a comparison with our D-PSO algorithm is the following.

Theorem 3

(Scharnow et al. 2004, Theorem 2/Theorem 4) The expected optimization time of the \((1+1)\)-EA for sorting n items is \(\varTheta (n^2 \log n)\) if the objective function is the transposition distance to the sorted sequence and mutations are transpositions.

The upper bound can be obtained by the fitness level method and a lower bound of \(\varOmega (k/n^2)\) on the probability of improvement when the current solution is at transposition distance k to the attractor. The lower bound follows from a similar argument. In addition, for determining the lower bound (Scharnow et al. 2004) consider the Hamming distance to evaluate the distance between the current position and the optimum although the algorithm still uses the transposition distance to decide which position is better. In the conference version (Mühlenthaler et al. 2017) we incorrectly claimed that the proof does not apply to this setting. Thanks to an anonymous reviewer we can correct this statement here.

In contrast to the \((1+1)\)-EA algorithm, the binary PSO studied in Sudholt and Witt 2010 allows for non-improving solutions, but it converges to the attractor exactly once per fitness level. After the convergence occurred, the binary PSO behaves essentially like the \((1+1)\)-EA.

Additionally, Raß et al. (2019) is based on a preliminary version of the present paper. There the authors applied the OnePSO to the single-source shortest path problem. For this purpose they extend the Markov model presented here by allowing self loops. They also used the bounds by integration which are proved in this work without repeating the proof. The following upper and lower bounds on the expected optimization time are given which are dependent on the algorithm parameter c specifying the probability of movement towards the attractor.

Theorem 4

(Raß et al. 2019, Theorem 5/Theorem 7) The expected optimization time T(n), to solve the single-source shortest path problem with n nodes is bounded by

$$\begin{aligned} T(n)={\left\{ \begin{array}{ll} O(n^3)&\text {if }c\in (\frac{1}{2},1]\\ O(n^{{7}/{2}})&\text {if }c=\frac{1}{2}\\ O(n^4\cdot \varphi (c)^n)&\text {if }c\in (0,\frac{1}{2})\\ \end{array}\right. } \end{aligned}$$

and

$$\begin{aligned} T(n)={\left\{ \begin{array}{ll} \varOmega (n^2)&\text {if }c\in (\frac{1}{2},1]\\ \varOmega (n^{{5}/{2}})&\text {if }c=\frac{1}{2}\\ \varOmega ( (\varphi (c)-\varepsilon )^n)&\text {if }c\in (0,\frac{1}{2})\\ \end{array}\right. } , \end{aligned}$$

where \(\varphi (c)=e^{-(1-2c)/(1-c)}\cdot (\frac{1-c}{c})\) and any arbitrarily small \(\varepsilon >0\).

1.5 Organization of the paper

In Sect. 2 we introduce the algorithm D-PSO for solving optimization problems over discrete domains. In Sect. 3 we provide a Markov model for the behavior of the algorithm OnePSO—a restriction of D-PSO to one particle—between two updates of the local attractor. Section 4 contains a comprehensive analysis of this Markov model. The results from this section are used in Sect. 5 in order to obtain the bounds on the expected optimization time for OnePSO shown in Table 1 as well as lower bounds for D-PSO on pseudo-Boolean functions. Section 6 contains some concluding remarks.

2 Discrete PSO algorithm

In this section we introduce the D-PSO algorithm, a PSO algorithm that optimizes functions over discrete domains. A simplified version of the algorithm that uses just a single particle will be referred to as OnePSO. Note that OnePSO is different from the 1-PSO studied in Sudholt and Witt (2010), which is tailored to optimization over bitstrings. The D-PSO and OnePSO algorithm sample items from a finite set X in order to determine some \(x^{*} \in X\) that minimizes a given objective function \(f:X \longrightarrow {\mathbb {R}} \). In order to have a discrete PSO that remains true to the principles of the original PSO for optimization in the domain \({\mathbb {R}} ^n\) from Eberhart and Kennedy (1995) and Kennedy and Eberhart (1995), we need some additional structure on X: For each \(x \in X\) we have a set of neighbors \({\mathscr {N}}_X(x)\). If the set X is clear from the context we may drop the subscript. The neighborhood structure induces a solution graph with nodes X and arcs \(\{xy \mid x, y \in X, y \in {\mathscr {N}}(x)\}\). The distance \(d (x, y)\) of solutions \(x, y \in X\) is the length of a shortest (directed) xy-path in this graph. We assume that the solution graph is strongly connected, so the PSO cannot get “trapped” in a particular strongly connected component. The search spaces of our reference problems in combination with the used neighborhood relationship satisfy this assumption.

figure a

The D-PSO algorithm performs the steps shown in Algorithm 1. The initial positions of the particles are chosen uniformly at random (u.a.r.) from the search space X. The parameter \(c_{loc}\) determines the importance of the local attractor \(l_i\) of each particle, which is the best solution a single particle has found so far, and the parameter \(c_{glob}\) determines the importance of the global attractor g, which is the best solution all particles have found so far. In each iteration each particle moves towards the local attractor with probability \(c_{loc}\), moves towards the global attractor with probability \(c_{glob}\) and otherwise move to a random neighbor. If \(l_i\) equals g then the particles still move only with probability \(c_{glob}\) to g. Note that the attractors \(l_i\) and g are updated in lines 21 and 24 whenever a strictly better solution has been found.

Alternatively, one could choose to update local and global attractors whenever the new position is at least as good as the position of the attractor (use \(\le \) instead of < in lines 21 and 24). The theorems presented here can also be carried over to this modified setting. Admittedly, for functions with plateaus this version potentially performs better, since a plateau is traversed easily by the modified algorithm. However, for the problems considered in this work there are no plateaus. Additionally, the probability to improve the objective function in the situation where position and attractor differ but have equal objective function value is higher than in the situation where the position is placed at the attractor.

At first glance, Algorithm 1 may not seem like an implementation of the PSO ideas, since we are choosing only a single attractor on each move or even make a random move whereas the classical PSO uses local and global attractor at each move, but looking at several consecutive iterations we retain the tendency of movement towards all the attractors. We consider the PSO to be an infinite process, so we do not give a termination criterion. We assume that sampling of \(x'\) in lines 11, 14 and 17 can be performed efficiently. This is the case for the neighborhood structures we consider.

The algorithm OnePSO is simply given by Algorithm 1 with a single particle, i.e., we have \(P = 1\). Note that there is only a single attractor in this case. Hence, OnePSO has just a single parameter \(c=c_{glob}\) that determines the probability of moving towards the (global) attractor g. In all other aspects it behaves like the D-PSO algorithm.

Fig. 3
figure 3

State diagram of the Markov model

3 Markov model of OnePSO

We present a simple Markov model that captures the behavior of the OnePSO algorithm between two consecutive updates of the attractor. This model has been presented already in Mühlenthaler et al. (2017) but we repeat it here to present a self contained overview on the presented approach. As an extension to Mühlenthaler et al. (2017) we also present how the variance can be computed. This is essential for experiments and if OnePSO and D-PSO are actually used. Using this model we can infer upper and lower bounds on the expected optimization time of the OnePSO algorithm on suitable discrete functions. For our analysis, we assume that the objective function \(f: X \rightarrow {\mathbb {R}} \) has the property that every local optimum is a global one. That is, the function is either constant or any non-optimum solution x has a neighbor \(y \in {\mathscr {N}}(x)\) such that \(f(x) > f(y)\). Functions with that property are called unimodal functions. Although this restriction certainly narrows down the class of objective functions to which our analysis applies, the class seems still appreciably large, e.g., it properly contains the class of functions with a unique global optimum and no further local optima.

Assume that the attractor \(g \in X\) is fixed and g is not a minimizer of f. Under which conditions can a new “best” solution be found? Certainly, if the current position x is equal to g, then, by the described structure of f we get an improvement with positive probability. If \(x \ne g\) then the attractor may still be improved. However, for the purpose of upper bounding the expected optimization time of the OnePSO we dismiss the possibility that the attractor is improved if \(x \ne g\). As a result, we obtain a reasonably simple Markov model of the OnePSO behavior. Quite surprisingly, using the same Markov model, we are also able to get good lower bounds on the expected optimization time of the OnePSO (see Sect. 5.2 for the details).

Recall that we think of the search space in terms of a strongly connected graph. Let n be the diameter of the search space X, i.e., the maximum distance of any two points in X. We partition the search space according to the distance to the attractor \(g\in X\). That is, for \(0 \le i \le n\), let \(X_i = \{ x \in X \mid d (x, g) = i\}\). Note that this partition does not depend on the objective function. If the search space is not symmetric as in our case it could also be possible that some \(X_i\) are empty, because the maximal distance to a specific solution in the search space could be less than the diameter. The model consists of \(n+1\) states \(S_0,S_1,\ldots ,S_n\). Being in state \(S_i\) indicates that the current solution x of the OnePSO is in \(X_i\).

For each \(x\in X_i\) we denote by \(p_x\) the transition probability from x to an element in \(X_{i-1}\). The probabilities \(p_x\) in turn depend on the parameter c, which is the probability that the OnePSO explicitly moves towards the attractor. If the current position x is in \(X_{i}\) and the algorithm moves towards the attractor, then the new position is in \(X_{i-1}\). On the other hand, if the PSO updates x to any neighbor chosen u.a.r. from \({\mathscr {N}}(x)\), then the new position is in \(X_{i-1}\) with probability \(|{\mathscr {N}}(x) \cap X_{i-1}|/|{\mathscr {N}}(x)|\). So we obtain the transition probability

$$\begin{aligned} p_x = c + (1-c) \cdot \frac{|X_{i-1} \cap {\mathscr {N}}(x)|}{|{\mathscr {N}}(x)|} . \end{aligned}$$

Remark 1

In this work we assume that the probability that we move from a position \(x\in X_i\) to an element in \(X_i\) is zero, i. e., if we move from a position x to a neighboring position \(x'\in {\mathscr {N}}(x)\) then the distance to the attractor does always change (\(d (x,g)\ne d (x',g)\)).

This assumption holds for both problems we investigate in Sect. 5. Nevertheless, an extensions allowing transitions inside a level \(X_i\) is possible and has been considered already in the article (Raß et al. 2019) which is based on an arXiv preprint of this article.

Using the assumption in Remark 1 and the fact that \(X_i\) is defined by distance to a fixed position g the probability of moving from x to an element in \(X_j\), \(j \notin \{i-1, i+1\}\), is zero. Consequently, the probability of moving from x to an element in \(X_{i+1}\) is then \(1-p_x\). Furthermore, if the OnePSO is at position \(x \in X_n\) then any move brings us closer to the reference solution; so \(p_x = 1\) in these cases.

Please note that \(p_{x}\) and \(p_{x'}\) can differ even if \(x,x'\) are contained in the same set \(X_i\). Therefore we do not necessarily obtain a Markov model if we use the states \(S_i\) and the transition probabilities \(p_i=p_x\) for some \(x\in X_i\) as this value is not necessarily equal for all \(x'\in X_i\).

Nevertheless, we can analyze Markov chains using bounds on the transition probabilities. To be more precise, we can use \(p_i:=\min _{x\in X_i}p_x\) as lower bound and \(p_i':=\max _{x\in X_i}p_x\) as upper bound on the transition probabilities in the direction to the attractor to obtain an upper bound and lower bound on the expected number of iterations until the distance to the attractor is decreased respectively. Figure 3 shows the state diagram of this model.

Definition 1

By \({{\mathscr {M}}} \left( (p_i)_{1 \le i \le n}\right) \) we denote an instance of the Markov model with states \(S_0,S_1,\ldots ,S_n\) and data \(p_i\in {\mathbb {R}} _{\ge 0}\), \(1 \le i \le n\). Suppose we are in state \(S_i\). Then we move to state \(S_{i-1}\) with probability

$$\begin{aligned} &1\quad \text {if } i = n\\ &\min \{ 1, p_i \}\quad \text {if } 1 \le i < n \end{aligned}$$

and otherwise we move to state \(S_{i+1}\).

Please note that the data does not need to be in [0, 1] but for the ease of presentation we will refer to them as probabilities. Furthermore, supposed we are in state \(S_{n}\) even if \(p_n\ne 1\) the probability of moving to \(S_{n-1}\) is 1. This notation allows us to succinctly specify the currently used Markov model, e.g., the model which is used to obtain upper bounds is described by \({{\mathscr {M}}} \left( (\min _{x\in X_i}p_x)_{1\le i\le n} \right) \).

Our goal is to determine the expected number of steps needed to hit a solution which is better than the attractor after starting in \(S_0\). Let \(p_g\) be the probability to improve the attractor if we are currently in state \(S_0\), hence at the attractor. Then the probability \(p_g\) depends on f and the choice of g. We have that \(p_g\) is positive since f is unimodal. In order to reach a better solution from \(S_0\) we need in expectation \(1/p_g\) tries. If we are unsuccessful in some try, then the OnePSO moves to \(S_1\). For upper bounds we can ignore the chance to improve the attractor through other states. Thus we need to determine the expected number of steps it takes until we can perform the next try, that is, the expected first hitting time for the state \(S_0\), starting in \(S_1\). The expected number \(h_i\) of steps needed to move from \(S_i\) to \(S_0\) is given by the following recurrence:

$$\begin{aligned} \begin{aligned} h_{n}&= 1 + h_{n-1} ,\quad h_0=0 \\ h_{i}&= 1 + p_i \cdot h_{i-1} + (1 - p_i)\cdot h_{i+1}, \quad 1 \le i < n. \end{aligned} \end{aligned}$$
(1)

4 Analysis of the Markov model

In this section we prove upper and lower bounds on the expected return time to the state \(S_0\) of the Markov model from Sect. 3. These bounds are of key importance for our runtime analysis in Sect. 5. For the lower bounds we also introduce a notion of indistinguishability of certain states of a Markov model.

In our analysis the probabilities \(p_i\) are generally not identical. If we assume that \(p_1 = p_2 = \cdots = p_n = p\), then we obtain a non-homogeneous recurrence of order two with constant coefficients. In this case, standard methods can be used to determine the expected time needed to move to the attractor state \(S_0\) as a function of n (Graham et al. 1994, Chap. 7). Note also that for \(p=1/k\) this is exactly the recurrence that occurs in the analysis of a randomized algorithm for k-SAT (Papadimitriou 1991; Schöning 1999; Mitzenmacher and Upfal 2005, pp. 160f.). If \(p_i\) has not necessarily identical values which are dependent on i, then the recurrence can in some cases be solved, see e.g., Graham et al. (1994, Chap. 7) and Petkovšek (1992). Here, due to the structure of the recurrence, we can use a more pedestrian approach, which is outlined in the next section.

4.1 Reformulation of the recurrence

We first present a useful reformulation of Recurrence (1). From this reformulation we will derive closed-form expressions and asymptotic properties of the return time to the attractor of the transition probabilities.

Let \(W_{i}\) be the number of steps needed to move from state \(S_i\) to state \(S_{i-1}\) and let \(H_i:=\mathrm{E}[W_i]\) be its expectation. Then \(H_i\) can be determined from \(H_{i+1}\) as follows: In expectation, we need \(1/p_i\) trials to get from \(S_i\) to \(S_{i-1}\), and each trial, except for the successful one, requires \(1+H_{i+1}\) steps. The successful trial requires only a single step, so \(H_{i}\) is captured by the following recurrence:

$$\begin{aligned} H_i&= \frac{1}{p_i} \left( 1 + H_{i+1} \right) - H_{i+1} =\frac{1}{p_i}+\frac{1-p_i}{p_i}\cdot H_{i+1} , \quad 1 \le i < n \end{aligned}$$
(2)
$$\begin{aligned} H_n&= 1. \end{aligned}$$
(3)

Another interpretation is the following: \(W_i\) is equal to one with probability \(p_i\), the direct step to \(S_{i-1}\), and with probability \(1-p_i\) it takes the current step which leads to state \(S_{i+1}\), then \(W_{i+1}\) steps to go from \(S_{i+1}\) back to \(S_i\) and then again \(W_i\) steps. For the expected value of \(W_i\) this interpretation leads to the formula

$$\begin{aligned} H_i=p_i + (1-p_i)\cdot (1+H_{i+1}+H_i), \end{aligned}$$

which is equivalent to Eq. 2 after solving for \(H_i\). Please note that the probabilities \(p_i\) are mostly determined by some function depending on n and i. Unfolding the recurrence specified in Eq. (2) k times, \(1 \le k \le n\), followed by some rearrangement of the terms yields

$$\begin{aligned} H_1=\sum _{i=1}^{k-1}\left( \frac{1}{p_i}\cdot \prod _{j=1}^{i-1}\frac{1-p_j}{p_j}\right) + H_k\cdot \prod _{j=1}^{k-1}\frac{1-p_j}{p_j}. \end{aligned}$$
(4)

Thus, for \(k=n\) we obtain the following expression for \(H_1\):

$$\begin{aligned} H_1 = \sum _{i=1}^n \left( \frac{1}{p_i} \cdot \prod _{j=1}^{i-1} \frac{1-p_j}{p_j}\right) - \prod _{j=1}^{n}\frac{1-p_j}{p_j}, \end{aligned}$$
(5)

where the second term is a correction term which is required whenever \(p_n < 1\) (see Definition 1) in order to satisfy the initial condition given in Eq. (3). Equation (5) has also been mentioned in Droste et al. (2001, Lemma 3) in the context of the analysis of randomized local search or in Kötzing and Krejca (2018, Theorem 3). \(H_k\) can be obtained analogously, which leads to

$$\begin{aligned} H_k = \sum _{i=k}^n \left( \frac{1}{p_i} \cdot \prod _{j=k}^{i-1} \frac{1-p_j}{p_j}\right) - \prod _{j=k}^{n}\frac{1-p_j}{p_j}. \end{aligned}$$
(6)

4.2 Identical transition probabilities

If the probabilities \(p_i = p\) for some constant \(p \in [0,1]\) and \(1 \le i < n\), then Recurrences (2) become linear recurrence equations with constant coefficients. Standard methods can be used to determine closed-form expressions for \(h_i\) and \(H_i\). However, we are mainly interested in \(H_1\) and are able to determine closed-form expressions directly from Eq. (5).

Theorem 5

Let \(0< p < 1\). Then the expected return time \(H_1\) to \(S_0\) is

$$\begin{aligned} H_1 = h_1 = {\left\{ \begin{array}{ll} \dfrac{1-2p \left( \dfrac{1-p}{p} \right) ^n }{2p-1} & \text {if } p \ne \frac{1}{2}\\ 2n-1 & \text {if }p = \frac{1}{2}. \end{array}\right. } \end{aligned}$$
(7)

Proof

By setting \(p_i = p\) in Eq. (5) and performing some rearrangements the theorem is proved. \(\square \)

It is easily verified that this expression for \(h_1\) satisfies Eq. (1). So, with \(p_i = p\) we have that the time it takes to return to the attractor is bounded from above by a constant, a linear function, or an exponential exponential function in n if \(p > 1/2\), \(p = 1/2\), or \(p < 1/2\), respectively.

For the case \(p=1/2\) one can obtain this result also from the Gambler’s Ruin Model by mirroring the state 0 at n which would result in termination if values 0 or 2n appear. Starting at value 1 results in a first hitting time of \(2n-1\) in this Gambler’s Ruin Model as stated in Theorem 5.

4.3 Non-identical transition probabilities

Motivated by the runtime analysis of OnePSO applied to optimization problems such as sorting and OneMax, we are particularly interested in the expected time it takes to improve the attractor if the probabilities \(p_i\) are slightly greater than 1/2. By slightly we mean \(p_i = 1/2 + i/(2A(n))\) which appears in the analysis of OnePSO optimizing OneMax and the sorting problem, or \(p_i=1/2+A(i)/(2A(n))\) which appears in the analysis of OnePSO optimizing the sorting problem, where \(A:{\mathbb {N}} \rightarrow {\mathbb {N}} \) is some non-decreasing function of n such that \(\lim _{n \rightarrow \infty } A(n) = \infty \). Recall from Definition 1 that if \(p_i>1\) then we move from state \(S_i\) to state \(S_{i-1}\) with probability 1. Clearly, in this setting we cannot hope for a recurrence with constant coefficients. Our goal in this section is to obtain the asymptotics of \(H_1\) as \(n \rightarrow \infty \) for \(A(n) = n\) and \(A(n) = \left( {\begin{array}{c}n\\ 2\end{array}}\right) \). We show that for \(p_i=1/2+i/(2\cdot A(n))\) and \(A(n) = n\) the return time to the attractor is \(\varTheta (\sqrt{n})\), while for \(A(n) = \left( {\begin{array}{c}n\\ 2\end{array}}\right) \) the return time is \(\varTheta (n)\).

Lemma 1

Let \(M={{\mathscr {M}}} ((1/2+i/(2n))_{1\le i\le n})\). Then

$$\begin{aligned} H_1 = \frac{4^n}{\left( {\begin{array}{c}2n\\ n\end{array}}\right) } - 1 \sim \sqrt{\pi n}=\varTheta (\sqrt{n}). \end{aligned}$$

Proof

We have \(p_n = 1\) so the correction term in Eq. (5) is zero. We rearrange the remaining terms of Eq. (5) and find that

$$\begin{aligned} H_1&= \sum _{i=1}^n \frac{2n}{n + i} \prod _{j=1}^{i-1} \frac{n-j}{n+j} = 2\, \\&\quad \sum _{i=1}^n \frac{n!\,n!}{(n-i)!\cdot (n+i)!} \overset{i'=n-i}{=} \frac{2}{\left( {\begin{array}{c}2n\\ n\end{array}}\right) } \\&\quad \sum _{i'=0}^{n-1} \left( {\begin{array}{c}2n\\ i'\end{array}}\right) = \frac{4^n}{\left( {\begin{array}{c}2n\\ n\end{array}}\right) } - 1. \end{aligned}$$

Applying the well-known relation

$$\begin{aligned} \frac{4^n}{\sqrt{\pi n}} \left( 1-\frac{1}{4n}\right)<\left( {\begin{array}{c}2n\\ n\end{array}}\right) < \frac{4^n}{\sqrt{\pi n}} \end{aligned}$$

(for an elegant derivation, see Hirschhorn 2015) finishes the proof. \(\square \)

This lemma can be generalized for linearly growing probabilities.

Theorem 6

Let \(M = {{\mathscr {M}}} \left( (p_i)_{1 \le i \le n}\right) \), where \(p_i = 1/2 + i /(2A(n))\).

Then \(H_1 = \varTheta (\min (\sqrt{A(n)},n))\) with respect to M.

Fig. 4
figure 4

State diagram of the Markov model M and its modified versions \({{\tilde{M}}}\) and \({{\hat{M}}}\) used in the proof of Theorem 6

Proof

First, we consider the case \(A(n) \le n^2\). Let \(n' = A(n)\), which is the smallest number such that \(p_{n'} = 1/2 + n'/(2A(n)) \ge 1\). First, assume that \(n' \le n\) and consider the “truncated” model \(M' = {{\mathscr {M}}} \left( (p_i)_{1 \le i \le n'} \right) \). Please note that there is actually no difference between M and \(M'\) because the removed states are never visited as \(p_{n'}\), the probability to move from \(S_{n'}\) to \(S_{n'-1}\), is already one and \(S_{n'+1}\) is never visited. Let \(H_1'\) be the expected time to reach state \(S_0\) starting at state \(S_1\) with respect to \(M'\). By Lemma 1 we have \(H_1'=\Theta (\sqrt{A(n)})\), which is by the construction of \(M'\) equal to \(H_1\). On the other hand, assume that \(n' > n\) and consider the “extended” model \({{\tilde{M}}} = {{\mathscr {M}}} \left( (p_i)_{1 \le i \le n'} \right) \). M and \({{\tilde{M}}}\) are visualized in Fig. 4a with omitted probability of worsening. Let \({{\tilde{H}}}_1\) be the expected time to reach state \(S_0\) starting at state \(S_1\) with respect to \({{\tilde{M}}}\). By Lemma 1 we have \({{\tilde{H}}}_1 = \Theta (\sqrt{A(n)})\) and since \({{\tilde{H}}}_1 \ge H_1\) we obtain \(H_1 = O(\sqrt{A(n)})\).

To obtain a lower bound on \(H_1\) for the case \(n' > n\) we consider the model \({{\hat{M}}} = {{\mathscr {M}}} \left( ({{\hat{p}}}_i)_{1\le i\le {{\hat{n}}}} \right) \), where \(\hat{n} = \min (n,\lfloor \sqrt{A(n)}\rfloor )\) and \({{\hat{p}}}_i = 1/2 + 1/(2{{\hat{n}}})\) for \(1 \le i < {{\hat{n}}}\), and \({{\hat{p}}}_{{{\hat{n}}}} = 1\). For \(1\le i < {{\hat{n}}}\) we have that \({{\hat{p}}}_i \ge p_i\) because \(1/{{\hat{n}}}={{\hat{n}}}/{{\hat{n}}}^2\ge \hat{n}/(A(n))\). A schematic representation of M and \({{\hat{M}}}\) can be found in Fig. 4b. Let \({{\hat{H}}}_1\) denote the expected time to reach state \(S_0\) starting at state \(S_1\) in \({{\hat{M}}}\). Since \({{\hat{p}}}_i \ge p_i\) for \(1 \le i \le {{\hat{n}}}\), \({{\hat{H}}}_1\) is a lower bound on \(H_1\). This fact is obvious as in this Markov model one can move only to neighboring states. Increasing the probability of moving towards the final state consequently decreases the probability of movement in the opposite direction and therefore the hitting time to reach the final state is in expectation reduced if the probabilities of moving towards the final state are increased. Since \(p := {{\hat{p}}}_i\) is constant for \(1 \le i < {{\hat{n}}}\) we get from Theorem 5 that

$$\begin{aligned} {{\hat{H}}}_1 = \frac{1-2p\left( \dfrac{1-p}{p}\right) ^{{{\hat{n}}}}}{2p-1}. \end{aligned}$$

Substituting \(p = 1/2 + 1/{(2 {{\hat{n}}})}\) gives

$$\begin{aligned} {{\hat{H}}}_1&= {{{\hat{n}}}}-({{{\hat{n}}}}+1)\cdot \left( \frac{{{{\hat{n}}}}-1}{{{{\hat{n}}}}+1}\right) ^{{{\hat{n}}}} = {{{\hat{n}}}}-\frac{({{{\hat{n}}}}+1)^2}{{{{\hat{n}}}}-1}\cdot \left( 1-\frac{2}{{{{\hat{n}}}}+1}\right) ^{{{{\hat{n}}}}+1} \\&\ge {{{\hat{n}}}}-\frac{({{{\hat{n}}}}+1)^2}{{{{\hat{n}}}}-1}\cdot \mathrm{e}^{-2} =\varOmega ({{{\hat{n}}}}). \end{aligned}$$

Therefore \(H_1=\varOmega ({{{\hat{n}}}})=\varOmega (\min (n,\sqrt{A(n)}))\).

It remains to show that the statement holds if \(A(n) > n^2\). In this case, \(H_1 = O(n)\) is obtained by setting \(p_i = 1/2\), which is a lower bound on the probabilities of moving towards \(S_0\), for \(1 \le i < n\) and invoking Theorem 5. On the other hand, setting \(\breve{A}(n) = n^2\) and using \(\breve{M}={{\mathscr {M}}} ((1/2+i/(2\breve{A}(n)))_{1 \le i \le n})\) gives a lower bound on \(H_1\), because \(1/2+i/(2\breve{A}(n))\) is an upper bound on \(1/2 + i/(2A(n))\). As discussed above, for the case \({A}(n) \le n^2\), the expected time to reach state \(S_0\) starting at state \(S_1\) in \(\breve{M}\) is \(\Omega (\min (n,\sqrt{\breve{A}(n)}))\). Therefore, \(H_1 = \Omega (\min (n,\sqrt{\breve{A}(n)}))=\Omega (n)\), which completes the proof. \(\square \)

For our application, the sorting problem, the following special case of Theorem 6 will be of interest:

Corollary 1

Let \(M = {{\mathscr {M}}} ((1/2(1+i/\left( {\begin{array}{c}n\\ 2\end{array}}\right) ))_{1 \le i \le n})\), then \(H_1=\varTheta (n)\).

We will now consider a slightly different class of instances of the Markov model in order to obtain a lower bound on the OnePSO runtime for sorting in Sect. 5. For this purpose we consider transition probabilities \(p_i\) that increase in the same order as the divisor A(n), which suits our fitness level analysis of the sorting problem. This class of models is relevant for the analysis of the best case behavior of the OnePSO algorithm for sorting n items (see Theorem 3). Although we will only make use of a lower bound on \(H_1\) in this setting later on, we give the following \(\Theta \)-bounds:

Theorem 7

Let \(M = {{\mathscr {M}}} \left( (\frac{1}{2}\left( 1+A(i)/A(n)\right) )_{1\le i\le n}\right) \) where \(A:{\mathbb {N}} \rightarrow {\mathbb {R}} ^+\) is a non-decreasing function and \(A(n)=\Theta (n^d)\) for some \(d> 0\). Then \(H_1 = \Theta (n^{d/(d+1)})\) with respect to M.

This theorem is a significant extension to Mühlenthaler et al. (2017, Theorem 5) which covers only the special case \(p_i=1/2\cdot (1+\left( {\begin{array}{c}i+1\\ 2\end{array}}\right) /\left( {\begin{array}{c}n\\ 2\end{array}}\right) ).\)

Proof

Consider the expression for \(H_1\) given in Eq. (4). Since \(p_i > 1/2\) for \(1 \le i \le n\) the products are at most 1 and \(1/p_i\) is at most 2. Therefore for any \(k\in \lbrace 1,\ldots ,n\rbrace : H_1\le 2k+H_{k+1}\). As \(H_{k+1}\) is the expected number of steps to move from state \(S_{k+1}\) to \(S_k\), the states \(S_0\) to \(S_{k-1}\) are irrelevant for the calculation of \(H_{k+1}\) since they are never visited in between. Therefore also probabilities \(p_1\) to \(p_{k}\) do not matter. We truncate the model to states \(S_{k},\ldots ,S_n\). For these states the minimal probability of moving towards the attractor is \(p_{k+1}\ge p_k\). Therefore we can set \(p_i=p_{k}\) for \(i\in \lbrace k+1,\ldots ,n\rbrace \) to get an upper bound on the return time. By reindexing the states we obtain the model \(\tilde{M}={{\mathscr {M}}} ((p_{k})_{1\le i \le n-k})\) and, because of the truncation and the decrease of probabilities, \({{\tilde{H}}}_1\) is an upper bound on \(H_{k+1}\), where \({{\tilde{H}}}_1\) is the expected number of steps to move from state \(S_1\) to \(S_0\) in model \({{\tilde{M}}}\). In \({{\tilde{M}}}\) we have the fixed probabilities \(p_{k}\) and can therefore apply Theorem 5 to determine \({{\tilde{H}}}_1\). Therefore

$$\begin{aligned} H_{k+1}\le {{\tilde{H}}}_1 =\frac{1-2p_{k}\left( \frac{1-p_{k}}{p_{k}}\right) ^{n-k}}{2p_{k}-1} \le \frac{1}{2p_{k} -1} =\frac{A(n)}{A(k)}. \end{aligned}$$

Altogether we have \(H_1\le 2k+\frac{A(n)}{A(k)}\). With \(k=n^{d/(d+1)}\), where d is the degree of A, we get

$$\begin{aligned} H_1&\le 2n^{\frac{d}{d+1}}+\frac{A(n)}{A(n^{d/(d+1)})} =\varTheta (n^{\frac{d}{d+1}})+\frac{\varTheta (n^d)}{\varTheta ((n^{d/(d+1)})^d)} \\&=\varTheta (n^{\frac{d}{d+1}})+\varTheta (n^{d-d^2/(d+1)}) =\varTheta (n^{\frac{d}{d+1}}), \end{aligned}$$

which certifies that \(H_1=O(n^{d/(d+1)})\). By using Eq. (4) we have the following lower bound on \(H_1\):

$$\begin{aligned} H_1&\ge \sum _{i=1}^{k} \frac{1}{p_i} \prod _{j=1}^{i-1} \frac{1-p_j}{p_j} \ge \sum _{i=1}^{k} \prod _{j=1}^{i-1} \frac{1-p_j}{p_j} \end{aligned}$$

Note: \(\prod _{j=1}^{i-1} \frac{1-p_j}{p_j}\) is monotonically decreasing as \(p_j\ge 1/2\).

$$\begin{aligned}&\ge k\prod _{j=1}^{k-1} \frac{1-p_j}{p_j} = k\prod _{j=1}^{k-1} \left( 1-\frac{2\cdot A(j)}{A(n)+A(j)}\right) \\&\ge k\left( 1-\sum _{j=1}^{k-1} \frac{2\cdot A(j)}{A(n)+A(j)}\right) \ge k\left( 1-\sum _{j=1}^{k-1} \frac{2\cdot A(j)}{A(n)}\right) \end{aligned}$$

Note: A(j) is non-decreasing.

$$\begin{aligned}&\ge k\left( 1-k \cdot \frac{2\cdot A(k)}{A(n)}\right) \end{aligned}$$

As this equation holds for any \(k\in \lbrace 1,\ldots ,n\rbrace \) we can choose \(k=g\cdot n^z\) with a not yet fixed constant \(g\in (0,1)\) and \(z=d/(d+1)\), \(z\in (0,1)\). Please note that \(g\cdot n^z\) tends to infinity if n tends to infinity and therefore asymptotic expressions can also be applied if \(g\cdot n^z\) is the argument. Please also note that k has to be an integer but errors can be captured by some \(\varTheta (1)\) expressions. Substituting k by \(g\cdot n^z\) in the previous inequality results in

$$\begin{aligned} H_1&\ge \lceil g\cdot n^{z}\rceil \left( 1-\lceil g\cdot n^{z}\rceil \cdot \frac{2\cdot A\left( \lceil g\cdot n^z\rceil \right) }{A(n)}\right) \\&= g\cdot n^{z}\cdot \Theta (1)\left( 1-g\cdot n^{z} \cdot \Theta (1)\cdot \frac{\Theta \left( (g\cdot n^z)^d\right) }{\Theta (n^d)}\right) \\ {}&= g\cdot n^{z}\cdot \Theta (1)\left( 1-g^{d+1}\cdot n^{z+d\cdot z-d}\cdot \Theta (1)\right) \\ \end{aligned}$$

Note: \(z+d\cdot z-d=z\cdot (d+1)-d=d-d=0\).

$$\begin{aligned}&= g\cdot n^{z} \cdot \Theta (1)\left( 1- g^{d+1}\cdot \Theta (1)\right) \end{aligned}$$

Note: The last \(\Theta (1)\) can be bounded from above by some constant \(c_{\Theta }\) for large n (by definition of \(\Theta \)). Choose \(g=\root d+1 \of {1/(2\cdot c_{\Theta })}\). Then the expression in parentheses of the last term \(\left( 1- g^{d+1}\cdot \Theta (1)\right) \) may be negative for small n but is at least 1/2 for large n which implies that it is in \(\varOmega (1)\).

$$\begin{aligned}&\ge g\cdot n^{z}\cdot \varOmega \left( 1\right) =\varOmega \left( n^{\frac{d}{d+1}}\right) . \end{aligned}$$

\(\square \)

It may be verified that

$$\begin{aligned} g=\root d+1 \of {\frac{\liminf \limits _{n\rightarrow \infty }(A(n)/n^d)}{\limsup \limits _{n\rightarrow \infty }(A(n)/n^d)}\cdot \frac{1}{4}\cdot (1-\varepsilon )}, \end{aligned}$$

is a suitable choice for any \(0< \varepsilon < 1\), to replace k by \(g\cdot n^z\) to obtain the previous inequalities. To see this, note that the first fraction of \(\liminf \) and \(\limsup \) counterbalances the fluctuation of A(k)/A(n) relative to its \(\varTheta \)-bound. Furthermore, the factor 1/4 compensates the factor 2 which is hidden by \(\Theta \) and supplies the desired factor of 1/2 to ensure that the expression in parentheses is positive and at least 1/2 for large n. Finally, the factor of \(1 - \varepsilon \) is needed for some tolerance, because without it g is only sufficiently small in the limitFootnote 2 but not necessarily for large n.

This theorem is a generalization of Lemma 1 and implies the \(\Theta \)-bound stated in that lemma by using \(A(n)=n\).

4.4 Bounds by integration

Reformulations (4) and (5) of the recurrence (1) given in Sect. 4.1 do not yield closed-form expressions of \(H_1\), the expected number of steps it takes to return to the attractor after an unsuccessful attempt to improve the current best solution. In this section we derive closed-form expressions for \(H_1\). In order to get rid of the sums and products in Eqs. (4) and (5), we use the following standard approaches. First, sums may be approximated by their integral. This approach works quite well if the summand/integrand is monotonic, which is true in our case. Second, products can be transformed to integrals by reformulating the product by the exponential function of the sum of logarithms. This approach is an extension to Mühlenthaler et al. (2017) as it is not present there at all.

Theorem 8

Let \(M={{\mathscr {M}}} ((p(i))_{1\le i\le n})\) and \(p:[0,n]\rightarrow (0,1]\) be a non-decreasing function assigning the probabilities in the model, then

$$\begin{aligned} &H_1=\Omega \left( {\mathrm {base}}(p,n)^n \right) ,\\ &H_1=O\left( n\cdot {\mathrm {base}}(p,n)^n\right) \quad \text {and}\\ &H_1=\varTheta ^*\left( {\mathrm {base}}(p,n)^n \right) \quad ,\text { where}\\ &{\mathrm {base}}(p,n)=\sup _{k\in [0,n]}\exp \left( \int _0^{\frac{ k}{n}}\ln \left( \frac{1-p(n\cdot x)}{p(n\cdot x)}\right) \mathrm {d}x\right) . \end{aligned}$$

The integral in \({\mathrm {base}}(p,n)\) is maximized by \(k=\inf \lbrace x\mid x\in [0,n]\wedge p(x)\ge 1/2\rbrace \) or \(k=n\) if the infimum is taken on the empty set.

Please note that we use \(\varTheta ^*\)-bounds in the sense that polynomial factors can be omitted in \(\varTheta \)-bounds. This is a similar notation as in the more common use case of \(O^*\) where polynomial factors can also be omitted for upper bounds.

Proof

p(i) is non-decreasing in i and has values in ]0, 1] and therefore \(\frac{1-p(i)}{p(i)}\) and also \(\tau (i):=\ln \left( \frac{1-p(i)}{p(i)}\right) \) are non-increasing as the numerator is non-increasing and the denominator is non-decreasing. In the following series of equations, let \(k \in [0, n)\). Using Eq. (4), we obtain

$$\begin{aligned} H_1\!\!\!= & {} \sum _{i=1}^{n-1}\left( \frac{1}{p(i)}\cdot \prod _{j=1}^{i-1}\frac{1-p(j)}{p(j)}\right) +H_n\cdot \prod _{j=1}^{n-1}\frac{1-p(j)}{p(j)} \\\ge & {} \sum _{i=1}^{n-1}\left( \prod _{j=1}^{i-1}\frac{1-p(j)}{p(j)}\right) \\\ge & {} \prod _{j=1}^{\lfloor k\rfloor -1}\frac{1-p(j)}{p(j)} =\exp \left( \sum _{j=1}^{\lfloor k\rfloor -1}\tau (j)\right) \\&\text {Note: } \tau (j)=\ln \left( (1-p(j))/p(j)\right) \text { is non-increasing.}\\\ge & {} \exp \left( \int _1^{\lfloor k\rfloor }\tau (x)\mathrm {d}x\right) = \exp \left( n\int _\frac{1}{n}^{\frac{\lfloor k\rfloor }{n}}\tau (n\cdot x)\mathrm {d}x\right) \\= & {} \exp \left( n\cdot \left( \int _0^{\frac{ k}{n}}\ln \left( \frac{1-p(n\cdot x)}{p(n\cdot x)}\right) \mathrm {d}x\right. \right. \\&\left. \left. - \int _0^{\frac{ 1}{n}}\ln \left( \frac{1-p(n\cdot x)}{p(n\cdot x)}\right) \mathrm {d}x - \int _{\frac{\lfloor k\rfloor }{n}}^{\frac{k}{n}}\ln \left( \frac{1-p(n\cdot x)}{p(n\cdot x)}\right) \mathrm {d}x\right) \right) \\\ge & {} \exp \left( n\cdot \left( \int _0^{\frac{ k}{n}}\ln \left( \frac{1-p(n\cdot x)}{p(n\cdot x)}\right) \mathrm {d}x -\frac{2}{n}\ln \left( \frac{1-p(0)}{p(0)}\right) \right) \right) \\= & {} \left( \frac{p(0)}{1-p(0)}\right) ^2\exp \left( \int _0^{\frac{ k}{n}}\ln \left( \frac{1-p(n\cdot x)}{p(n\cdot x)}\right) \mathrm {d}x\right) ^n. \end{aligned}$$

As k can be chosen arbitrarily we get the claimed lower bound for \(H_1\), because \(\frac{p(0)}{1-p(0)}\) is a constant. As any integral is a continuous function also the whole expression in the supremum is a continuous function and therefore \(k=n\) can be allowed in the supremum without changing the value. Quite similar steps lead to the upper bound for \(H_1\). We start with Eq. (5).

$$\begin{aligned} H_1&= \sum _{i=1}^{n}\left( \frac{1}{p(i)}\cdot \prod _{j=1}^{i-1}\frac{1-p(j)}{p(j)}\right) -\prod _{j=1}^{n}\frac{1-p(j)}{p(j)}\\&\le \sum _{i=1}^{n}\left( \frac{1}{p(1)}\cdot \prod _{j=1}^{i-1}\frac{1-p(j)}{p(j)}\right) \\&\le \frac{n}{p(1)}\cdot \max _{i\in \lbrace 1,\ldots ,n\rbrace }\left( \prod _{j=1}^{i-1}\frac{1-p(j)}{p(j)}\right) \\&=\frac{n}{p(1)}\cdot \max _{i\in \lbrace 1,\ldots ,n\rbrace } \exp \left( \sum _{j=1}^{i-1}\ln \left( \frac{1-p(j)}{p(j)}\right) \right) \\&\le \frac{n}{p(1)}\cdot \max _{i\in \lbrace 1,\ldots ,n\rbrace }\exp \left( \int _{0}^{i-1}\ln \left( \frac{1-p(x)}{p(x)}\right) \mathrm {d}x\right) \\ {}&\overset{{k=i-1}}{\le } \quad \frac{n}{p(1)}\cdot \sup _{k\in [0,n]}\exp \left( \int _{0}^{k}\ln \left( \frac{1-p(x)}{p(x)}\right) \mathrm {d}x\right) \\&\le \frac{n}{p(1)}\cdot \sup _{k\in [0,n]}\exp \left( \int _{0}^{\frac{k}{n}}\ln \left( \frac{1-p(n\cdot x)}{p(n\cdot x)}\right) \mathrm {d}x\right) ^n. \end{aligned}$$

This proves the claimed upper bound and as the base of the exponential part is equal for upper and lower bound we obtain the claimed \(\varTheta ^*\) bound.

The logarithm in \({\mathrm {base}}(p,n)\) is positive as long as \(1-p(n\cdot x)\ge p(n\cdot x)\). Therefore the integral is maximized if we use the smallest possible k (the infimum) which satisfies the condition \(1-p(k)\le p(k)\Leftrightarrow p(k)\ge \frac{1}{2}\). \(\square \)

\(p(n\cdot x)\) can in most cases be tightly bounded by a value independent of n. This is the case if for example \(p(i)=c+(1-c)i/n\), which we have for the model solving OneMax by OnePSO. The k which maximizes the integral in the expression of \({\mathrm {base}}(p,n)\) is usually obtained by solving the simple equation \(p(k)=1/2\).

Therefore the integral can be evaluated and the base of the exponential part of the runtime can be determined.

4.5 Variance of the improvement time

We show that the standard deviation of the return time is in the same order as the return time. Therefore in experiments the average of such return times can be measured such that a small relative error can be achieved. Also the variance of \(W_i\), the number of steps needed to move from state \(S_i\) to state \(S_{i-1}\), can be computed recursively. Let \(V_i:=\mathrm{Var}[W_i]\) be the variance of \(W_i\).

To evaluate this variance we need the expectation and variance of a random variable which is the sum of random variables where the number of summed up random variables is also a random variable. Such random variables appear in the Galton-Watson process (see Durrett 2010) from one generation to the next generation.

Lemma 2

Let T be a random variable with non-negative integer values and let \((Y_i)_{i\in {{\mathbb {N}}}}\) be independent identically distributed random variables \(Y_i\sim Y\) which are also independent of T. Additionally let \(Z=\sum _{i=1}^{T}Y_i\). Then \(\mathrm{E}[Z]=\mathrm{E}[T]\cdot \mathrm{E}[Y]\) and \(\mathrm{Var}[Z]=\mathrm{E}[T]\cdot \mathrm{Var}[Y]+E[Y]^2\cdot \mathrm{Var}[T]\).

The statement on expected values is also known as Wald’s equation and the statement on the variance is known as the Blackwell-Girshick equation. The Blackwell-Girshick equation can be obtained by application of the law of total variance:

$$\begin{aligned} \mathrm{Var}[Z]&=\mathrm{E}[\mathrm{Var}[Z|T]]+\mathrm{Var}[\mathrm{E}[Z|T]]=\mathrm{E}[T\mathrm{Var}[Y]]+\mathrm{Var}[T\mathrm{E}[Y]]\\&=\mathrm{E}[T]\mathrm{Var}[Y]+\mathrm{E}[Y]^2\mathrm{Var}[T]. \end{aligned}$$

Also \(W_i\) can be specified as a sum of random variables where the number of summed up random variables is also a random variable. If we are currently in state \(S_i\) we have some success probability to move to \(S_{i-1}\) in the next iteration. Therefore the number of trials in \(S_i\) until we move to \(S_{i-1}\) follows a geometric distribution. In case of failure we move to \(S_{i+1}\) and need additional \(W_{i+1}\) steps until we can make our next attempt to move to \(S_{i-1}\). Therefore

$$\begin{aligned} W_i=\sum _{j=1}^{T-1}({{\tilde{W}}}_{i+1,j}+1) + 1, \end{aligned}$$

where T is a random variable distributed according to a geometric distribution with success probability equal to the probability of moving to \(S_{i-1}\) from \(S_i\) and each \({{\tilde{W}}}_{i+1,j}\) is an independent copy of \(W_{i+1}\).

Theorem 9

$$\begin{aligned} \mathrm{Var}[W_i]=V_i&=\frac{1-p_i}{p_i}\cdot V_{i+1}+\frac{1-p_i}{p_i^2}\cdot (H_{i+1}+1)^2 \nonumber \\&=\frac{1-p_i}{p_i}\cdot V_{i+1}+\frac{1}{1-p_i}\cdot (H_{i}-1)^2\quad 1\le i<n \end{aligned}$$
(8)
$$\begin{aligned} \mathrm{Var}[W_n]=V_n&=0, \end{aligned}$$
(9)

where \(p_i\) is the probability of moving to \(S_{i-1}\) from \(S_i\).

Proof

\(W_n=\mathrm{E}[W_n]=H_n=1\Rightarrow \mathrm{Var}[W_n]=V_n=\mathrm{Var}[1]=0\).

Let T be a random variable distributed according to a geometric distribution with success probability \(p_i\) and let all \(({{\tilde{W}}}_{i+1,j})_{y\in {{\mathbb {N}}}}\) be independent copies of \(W_{i+1}\).

$$\begin{aligned} \mathrm{Var}[W_i]&=\mathrm{Var}[W_i-1] =\mathrm{Var}\left[ \sum _{j=1}^{T-1}({{\tilde{W}}}_{i+1,j}+1) \right] \\&\overset{{\text {Lemma}\,2}}{=}\quad \mathrm{E}[T-1]\cdot \mathrm{Var}[W_{i+1}+1]+\mathrm{E}[W_{i+1}+1]^2\cdot \mathrm{Var}[T-1]\\&=\frac{1-p_i}{p_i}\cdot \mathrm{Var}[W_{i+1}]+(\mathrm{E}[W_{i+1}]+1)^2\cdot \mathrm{Var}[T] \\&=\frac{1-p_i}{p_i}\cdot V_{i+1}+\frac{1-p_i}{p_i^2}\cdot (H_{i+1}+1)^2 \end{aligned}$$

Finally the rightmost expression of Eq. (8) is obtained by replacing \(H_{i+1}\) according to Eq. (2). \(\square \)

Therefore one can evaluate \(H_i\) by Eqs. (2) and (3) and then one can evaluate \(V_i\) by Eqs. (8) and (9).

Please note that \(V_i\) will always be in the same order as \(H_i^2\). If \((1-p_i)/p_i\) is less than one then the recursively needed values of \(V_{j}\) for \(j>i\) become less important and we have mainly \(H_i^2\) and if \((1-p_i)/p_i\) is greater than one then \(H_i^2\) is growing by at least \(((1-p_i)/p_i)^2\) [see Eq. (2)] which is the square of the growing factor of \(V_i\).

As \(V_i\) is in the same order as \(H_i^2\) we obtain by an arithmetic average of T evaluations of \(W_i\) a relative error of approximately \(1/\sqrt{T}\). This is indeed a relevant statistic if evaluations are performed and is consolidated in the following corollary.

Corollary 2

Let \({{\tilde{W}}}_{i,j}\sim W_i\) be independent random variables. Then

$$\begin{aligned} \mathrm{E}\left[ \frac{\vert \sum _{j=1}^{T}\frac{{{\tilde{W}}}_{i,j}}{T} -H_i\vert }{H_i}\right] =O\left( \frac{1}{\sqrt{T}}\right) . \end{aligned}$$

5 Runtime analysis of OnePSO and D-PSO

As mentioned above, we present a runtime analysis of OnePSO for two combinatorial problems, the sorting problem and OneMax. Our analysis is based on the fitness level method (Wegener 2002), in particular its application to the runtime analysis of a \((1+1)\)-EA for the sorting problem in Scharnow et al. (2004). Consider a (discrete) search space X and an objective function \(f:X \rightarrow {\mathbb {R}} \), where f assigns m distinct values \(f_1< f_2< \cdots < f_m\) on X. Let \(S_i \subseteq X\) be the set of solutions with value \(f_i\). Assuming that some algorithm \({\mathscr {A}}\) optimizing f on X leaves fitness level i at most once then the expected runtime of \({\mathscr {A}}\) is bounded from above by \(\sum _{i=1}^m {1}/{s_i}\), where \(s_i\) is a lower bound on the probability of \({\mathscr {A}}\) leaving \(S_i\). The method has also been applied successfully, e.g., in Sudholt and Witt (2010) to obtain bounds on the expected runtime of a binary PSO proposed in Kennedy and Eberhart (1997).

5.1 Upper bounds on the expected optimization time

Similar to Scharnow et al. (2004) and Sudholt and Witt (2010), we use the fitness-level method to prove upper bounds on the expected optimization time of the OnePSO for sorting and OneMax. In contrast to the former, we allow non-improving solutions and return to the attractor as often as needed in order to sample a neighbor of the attractor that belongs to a better fitness level. Therefore, the time needed to return to the attractor contributes a multiplicative term to the expected optimization time, which depends on the choice of the algorithm parameter c.

We first consider the sorting problem. The structure of the search space of the sorting problem has been discussed already in Scharnow et al. (2004) and a detailed analysis of its fitness levels is provided in Mühlenthaler et al. (2017). In the following lemma we bound the transition probabilities for the Markov model for the sorting problem. This allows us to bound the runtime of OnePSO for the sorting problem later on.

Lemma 3

For the sorting problem on n items, \(c = 1/2\) and \(x\in X_i\), the probability \(p_x\) that OnePSO moves from x to an element in \(X_{i-1}\) is bounded from below by \(p_i = \frac{1}{2} (1+ i/\left( {\begin{array}{c}n\\ 2\end{array}}\right) )\). Furthermore, this bound is tight.

Proof

The lower bound \(p_i\) on \(p_x\) can be obtained by

$$\begin{aligned} p_x = \left( c+(1-c)\frac{\vert {\mathscr {N}}(x)\cap X_{i-1}\vert }{\vert {\mathscr {N}}(x)\vert }\right) = \frac{1}{2}\left( 1+\frac{\vert {\mathscr {N}}(x)\cap X_{i-1}\vert }{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }\right) \ge p_i . \end{aligned}$$

To show the above inequality, consider the attractor a and a permutation \(\tau \) such that \(x \circ \tau = a\). For each cycle of length k of \(\tau \), exactly \(k-1\) transpositions are needed to adjust the elements in this cycle and there are \(\left( {\begin{array}{c}k\\ 2\end{array}}\right) \ge k-1\) transpositions which decrease the transposition distance to the attractor a. Therefore the number of ways to decrease the transposition distance to a is bounded from below by the transposition distance to a. Hence, we have \(\vert {\mathscr {N}}(x)\cap X_{i-1}\vert \ge i\).

The lower bound is tight as it appears if only cycles of length two (or one) appear. In Mühlenthaler et al. (2017, Sect. 4) a more detailed discussion on improvement probabilities can be found. \(\square \)

Using Lemma 3 we prove the following bounds on the expected optimization time \(T_{{\mathrm{sort}}}(n)\) required by OnePSO for sorting n items by transpositions.

Theorem 10

(Mühlenthaler et al. 2017, Theorem 13) The expected optimization time \(T_{{\mathrm{sort}}}(n)\) of the OnePSO sorting n items is bounded from above by

$$\begin{aligned} T_{{\mathrm{sort}}}(n) = {\left\{ \begin{array}{ll} O(n^2 \log n ) & \text {if }c \in (\frac{1}{2}, 1] \\ O(n^{{3}} \log n) & \text {if }c = \frac{1}{2} \\ O\left( \left( \frac{1-c}{c} \right) ^n\cdot n^2\log n\right) & \text {if }c \in (0, \frac{1}{2}). \end{array}\right. } \end{aligned}$$

See Fig. 5 for a visualization of \(\frac{1-c}{c}\).

Proof

Consider the situation that the attractor has just been updated. Whenever the OnePSO fails to update the attractor in the next iteration it will take in expectation \(H_1\) iterations until the attractor is reached again and then it is improved with probability at least \(i/\left( {\begin{array}{c}n\\ 2\end{array}}\right) \). Again, if the OnePSO fails to improve the attractor we have to wait \(H_1\) steps, and so on. Since we do not consider the case that the attractor has been improved meanwhile, the general fitness level method yields an expected runtime of at most \(\sum _{i=1}^n((H_1+1)(1/s_i-1)+1) = H_1 \cdot O(n^2 \log n)\).

We now bound the expected return time \(H_1\). Let \(c \in (\frac{1}{2}, 1]\) and recall that \(p_i\) is the probability of moving from state \(S_i\) to state \(S_{i-1}\). Then \(1 \ge p_i> c > \frac{1}{2}\). Then the expression for \(H_1\) given in Theorem 5 is bounded from above by the constant \(1/(2c-1)\), so \(T_{{\mathrm{sort}}}(n)=O(n^2\log n)\). Now let \(c = \frac{1}{2}\), so \(p_i \ge \frac{1}{2} (1+ i/\left( {\begin{array}{c}n\\ 2\end{array}}\right) )\) by Lemma 3. Then, by Corollary 1, we have \(H_1 = O(n)\), so \(T_{{\mathrm{sort}}}(n)=O(n^3\log n)\). Finally, let \(c \in (0, \frac{1}{2})\). Then \(p_i> c > 0\), and by Theorem 5, \(H_1\) is bounded from above by

$$\begin{aligned} H_1 \le \frac{2c}{1-2c}\left( \frac{1-c}{c} \right) ^n = O\left( \left( \frac{1-c}{c}\right) ^n\right) , \end{aligned}$$

so \(T_{{\mathrm{sort}}}(n)=O\left( \left( \frac{1-c}{c} \right) ^n\cdot n^2\log n\right) \). \(\square \)

For \(c = 0\), OnePSO always moves to a uniformly drawn adjacent solution. Hence, the algorithm just behaves like a random walk on the search space. Hence, in this case, \(T_{{\mathrm{sort}}}(n)\) is the expected number of transpositions that need to be applied to a permutation in order to obtain a given permutation. We conjecture that \(T_{{\mathrm{sort}}}(n)\) has the following asymptotic behavior and provide theoretical evidence for this conjecture in the “Appendix A”.

Conjecture 1

\(T_{{\mathrm{sort}}}(n)\sim n!\) if \(c=0\).

Please note that the conjecture is actually only a conjecture on the upper bound as Theorem  supplies a proof that \(T_{{\mathrm{sort}}}(n)=\varOmega (n!)\) if \(c=0\).

Using a similar approach as in Theorem 10, we now bound the expected optimization time \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}}\) of OnePSO for OneMax.

Theorem 11

The expected optimization time \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}}\) of the OnePSO solving OneMax is bounded from above by

$$\begin{aligned} T_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n) = {\left\{ \begin{array}{ll} O(n \log n) \text {if }c \in (\frac{1}{2}, 1],\\ O(n^\frac{3}{2} \log n) \text {if }c = \frac{1}{2},\\ O\left( \beta (c)^n\cdot n^2\log n\right) \text {if }c \in (0, \frac{1}{2}),\text { and}\\ O(2^n) \text {if }c = 0. \end{array}\right. } \end{aligned}$$

where \(\beta (c)=2^{{1}/({1-c})}\cdot (1-c)\cdot c^{{c}/({1-c})}\).

See Fig. 5 for a visualization of \(\beta (c)\).

Proof

The argument is along the lines of the proof of Theorem 10. We observe that on fitness level \(0 \le i \le n\) there are i bit flips that increase the number of ones in the current solution. Therefore, \(s_i = i/n\) and the fitness level method yields an expected runtime of at most \(\sum _{i=1}^n (H_1 + 1)(1/s_i-1)+1 = H_1 \cdot O(n \log n)\). The bounds on \(H_1\) for \(c > \frac{1}{2}\) are as in the proof of Theorem 10. For \(c = \frac{1}{2}\) we invoke Lemma 1 and have \(H_1 = O(\sqrt{n})\). For \(c <\frac{1}{2}\) we use Theorem 8. The probabilities in the Markov model for \(H_1\) are \(p_i=c+(1-c)i/n\) which can be continuously extended to the non-decreasing function \(p(i)=c+(1-c)i/n\). Here \(k=n\cdot \frac{1-2c}{2(1-c)}\) solves the equation \(p(k)=\frac{1}{2}\). Hence, we need the value of

$$\begin{aligned}&{\mathrm {base}}(p,n) =\exp \left( \int _0^\frac{1-2c}{2(1-c)}\ln \left( \frac{1-c-(1-c)\cdot x}{c+(1-c)\cdot x}\right) \mathrm {d}x\right) \nonumber \\&=\exp \left( \int _0^\frac{1-2c}{2(1-c)}\left( \ln \left( {1- x}\right) -\ln \left( {\frac{c}{1-c}+x}\right) \right) \mathrm {d}x\right) \nonumber \\&=\exp \left( (x-1)\ln (1-x)- \left. \left( \frac{c}{1-c}+x\right) \ln \left( \frac{c}{1-c}+x\right) \right| ^\frac{1-2c}{2(1-c)}_0\right) \nonumber \\&=2^{{1}/({1-c})}\cdot (1-c)\cdot c^{{c}/({1-c})}=\beta (c). \end{aligned}$$
(10)

Now Theorem 8 gives the upper bound \(H_1=O(n\cdot \beta (c)^n)\).

It remains to consider the case that \(c=0\). The claimed bound on \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}}\) can be obtained by using the model \({{\mathscr {M}}} ((\frac{i}{n})_{1\le i\le n})\). Each state represents the distance to the optimal point. By Eq. (6) we have

$$\begin{aligned} H_k&=\sum _{i=k}^n\frac{n}{i}\prod _{j=k}^{i-1}\frac{n-j}{j} =\prod _{j=1}^{k-1}\frac{j}{n-j}\sum _{i=k}^n\frac{n}{i}\prod _{j=1}^{i-1}\frac{n-j}{j}\\&=\frac{1}{\left( {\begin{array}{c}n-1\\ k-1\end{array}}\right) }\sum _{i=k}^n\left( {\begin{array}{c}n\\ i\end{array}}\right) \le \frac{2^n}{\left( {\begin{array}{c}n-1\\ k-1\end{array}}\right) }. \end{aligned}$$

The maximal expected time to reach the optimal point is the sum of all \(H_k\):

$$\begin{aligned} T_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n)\le \sum _{k=1}^n H_k \le \sum _{k=1}^n \frac{2^n}{\left( {\begin{array}{c}n-1\\ k-1\end{array}}\right) } =2^n\cdot \left( 2+O\left( \frac{1}{n}\right) \right) =O(2^n). \end{aligned}$$

\(\square \)

We remark that the upper bounds given in Theorem 11 for \(c\in [\frac{1}{2},1]\) were presented in Mühlenthaler et al. (2017, Theorem 14) and that the upper bound for \(c\in (0,\frac{1}{2})\) is newly obtained using the bounds-by-integration from Sect. 4.4 and the proof of the upper bound for \(c=0\) is also new compared to Mühlenthaler et al. (2017). Admittedly, the bound for \(c=0\) is already available in the context of randomized local search and can be found in Garnier et al. (1999). Furthermore, note that for \(c=\frac{1}{2}\) it is not sufficient to use the lower bound \(p_i \ge p_1=\frac{1}{2} + \frac{1}{2n}\) in order to obtain the runtime bound given in Theorem 11.

By repeatedly running OnePSO and applying Markov’s inequality for the analysis, an optimal solution is found with high probability so we have the following Corollary.

Corollary 3

If the OnePSO is repeated \(\lambda \cdot \log _2(n)\) times but each repetition is terminated after \(2\cdot T(n)\) iterations, where T(n) is the upper bound on the expected number of iterations to find the optimum specified in Theorems 10and 11with suitable constant factor, then OnePSO finds the optimal solution with high probability.

5.2 Lower bounds via indistinguishable states

In this section we will provide lower bounds on the expected optimization time of OnePSO that almost match our upper bounds given in Sect. 5.1. We will use the Markov model from Sect. 3 to obtain these lower bounds. The main difference to the previous section is that we restrict our attention to the last improvement of the attractor, which dominates the runtime, both for sorting and OneMax. We will introduce the useful notion of indistinguishability of certain states of a Markov chain. Note that our lower bounds are significantly improved compared to the conference version (Mühlenthaler et al. 2017) by using the newly introduced bounds-by-integration from Sect. 4.4.

5.2.1 Indistinguishable states

We now introduce a notion of indistinguishability of certain states of a Markov chain already presented in Mühlenthaler et al. (2017). We will later use this notion to prove lower bounds on the expected optimization time of OnePSO for sorting and OneMax as follows: We show that the optimum is contained in a set \({\hat{Y}}\) of indistinguishable states. Therefore, in expectation, the states \({\hat{Y}}\) have to be visited \(\varOmega (|{\hat{Y}}|)\) times to hit the optimum with positive constant probability.

Definition 2

(Indistinguishable states) Let M be a Markov process with a finite set Y of states and let \({\hat{Y}}\subseteq Y\). Furthermore, let \((Z_i)_{i\ge 0}\) be the sequence of visited states of M and let \(T=\min \lbrace t>0\mid Z_t\in {\hat{Y}}\rbrace \). Then \({\hat{Y}}\) is called indistinguishable with respect to M if

  1. 1.

    the initial state \(Z_0\) is uniformly distributed over \({\hat{Y}}\), i. e., for all \(y\in Y\):

    $$\begin{aligned} \Pr [Z_0=y]=\mathbb {1} _{y\in {\hat{Y}}}/\vert {\hat{Y}}\vert = {\left\{ \begin{array}{ll} 1/\vert {\hat{Y}}\vert &{} \text {if }y\in {\hat{Y}}\\ 0 &{} \text {if }y\not \in {\hat{Y}} . \end{array}\right. } \end{aligned}$$
  2. 2.

    and the probabilities to reach states in \({\hat{Y}}\) from states in \({\hat{Y}}\) are symmetric, i. e., for all \(y_1,y_2\in {\hat{Y}}\):

    $$\begin{aligned} \Pr [Z_T=y_2\mid Z_0=y_1]=\Pr [Z_T=y_1\mid Z_0=y_2] . \end{aligned}$$

Now we can prove a lower bound on the expected time for finding a specific state.

Theorem 12

Let M be a Markov process as in Definition 2and let \({\hat{Y}}\) be indistinguishable with respect to M. Let h(M) be a positive real value such that \(\mathrm{E}[T]\ge h(M)\), then the expected time to reach a fixed \(y\in {\hat{Y}}\) is bounded below by \(h(M)\cdot \Omega (\vert {\hat{Y}}\vert )\).

Proof

Let \(T_i\) be the stopping time when \({\hat{Y}}\) is visited the ith time.

$$\begin{aligned} T_i=\min \lbrace t\ge 0 \mid \vert \lbrace k\mid 0\le k \le t \wedge Z_k\in {\hat{Y}}\rbrace \vert \ge i\rbrace . \end{aligned}$$

With Statement 1 of Definition 2\(Z_0\) is uniformly distributed over \({\hat{Y}}\). Therefore \(T_1=0\) and \(T_2=T\). Statement 2 of Definition 2 implies that \(\Pr [Z_{T_i}=y]=\mathbb {1} _{y\in {\hat{Y}}}/\vert {\hat{Y}}\vert \) for all \(i\ge 1\) by the following induction. The base case for \(i=1\) and \(T_i=0\) is ensured by the Statement 1 of Definition 2. The induction hypothesis is \(\Pr [Z_{T_{i-1}}=y]=\mathbb {1} _{y\in {\hat{Y}}}/\vert {\hat{Y}}\vert \). The inductive step is verified by the following series of equations.

$$\begin{aligned} \Pr [Z_{T_i}=y]&= \sum _{{\hat{y}}\in {\hat{Y}}}\Pr [Z_{T_{i-1}}={\hat{y}}]\cdot \Pr [Z_{T_i}=y\mid Z_{T_{i-1}}={\hat{y}}] \\ {\mathop {=}\limits ^{{\text {ind. hyp.}}}}\quad&\sum _{{\hat{y}}\in {\hat{Y}}}1/\vert {\hat{Y}}\vert \cdot \Pr [Z_{T_i}=y\mid Z_{T_{i-1}}={\hat{y}}] \\ {\mathop {=}\limits ^{{\text {Definition}\ 2\, \text {St.2}}}}&1/\vert {\hat{Y}}\vert \cdot \sum _{{\hat{y}}\in {\hat{Y}}}\Pr [Z_{T_i}={\hat{y}}\mid Z_{T_{i-1}}=y] = 1/\vert {\hat{Y}}\vert . \end{aligned}$$

It follows that for all \(i>0\) the difference \(T_{i+1}-T_i\) of two consecutive stopping times has the same distribution as T and also

$$\begin{aligned} \mathrm{E}[T_{i+1}-T_{i}]=\mathrm{E}[T]\ge h(M). \end{aligned}$$

Now let \(y\in {\hat{Y}}\) be fixed. The probability that y is not reached within the first \(T_{\lfloor \vert {\hat{Y}}\vert /2\rfloor -1}\) steps is bounded from below through union bound by

$$\begin{aligned} 1-\Pr [Z_0=y]-\sum _{i=1}^{\lfloor \vert {\hat{Y}}\vert /2\rfloor -1}\Pr [Z_{T_i}=y]\ge 1/2 \end{aligned}$$

and therefore the expected time to reach the fixed \(y\in {\hat{Y}}\) is bounded from below by

$$\begin{aligned}\frac{1}{2}\cdot \mathrm{E}[T_{\lfloor \vert {\hat{Y}}\vert /2\rfloor -1}]&=\frac{1}{2}\cdot \sum _{i=2}^{\lfloor \vert {\hat{Y}}\vert /2\rfloor -1}\mathrm{E}[T_i-T_{i-1}]\\&\ge \frac{1}{2}\cdot \sum _{i=2}^{\lfloor \vert {\hat{Y}}\vert /2\rfloor -1}h(M)=h(M)\cdot \Omega (\vert {\hat{Y}}\vert ). \end{aligned}$$

\(\square \)

5.2.2 Lower bounds on the expected optimization time for sorting

In this section we consider the sorting problem. Our first goal is to provide lower bounds on the expected return time to the attractor for the parameter choice \(c\in (0,\frac{1}{2})\).

Lemma 4

Let \(c\in (0,\frac{1}{2})\). For the sorting problem on n items, assume that the attractor has transposition distance one to the identity permutation. Then the expected return time \(H_1\) to the attractor is bounded from below by \(\Omega (\alpha (c)^n)\), where

$$\begin{aligned} \alpha (c)=\left( \frac{1+\sqrt{\frac{1-2c}{2(1-c)}}}{1-\sqrt{\frac{1-2c}{2(1-c)}}}\right) \cdot \exp \left( -2\sqrt{\frac{c}{1-c}}\arctan \left( \sqrt{\frac{1-2c}{2c}}\right) \right) . \end{aligned}$$

See Fig. 5 for a visualization of \(\alpha (c)\).

Proof

The probability of decreasing the distance to the attractor in state \(S_i\) can be bounded from above by

$$\begin{aligned} p_{i-1} \le c+(1-c)\cdot \frac{\left( {\begin{array}{c}i\\ 2\end{array}}\right) }{\left( {\begin{array}{c}n\\ 2\end{array}}\right) } = c+(1-c)\cdot \frac{i(i-1)}{n(n-1)} \le c+(1-c)\cdot \frac{i^2}{n^2}. \end{aligned}$$

We increase all indices by one such that \({{\tilde{p}}}_i=p_{i-1}\) such that we have n states again. Please note that \(H_2=\varOmega (H_1)\). This can be obtained by the following equations while using Eq. 2 and the fact that \(H_1\ge \frac{1}{1-c}\) is true in this case

$$\begin{aligned} H_2 =\frac{p_1}{1-p_1}H_1-\frac{1}{1-p_1} \ge \frac{c}{1-c}H_1-\frac{1}{1-c-o(1)}=\varOmega (H_1). \end{aligned}$$

We use Theorem 8 to get a lower bound on \(H_1\) by using \(p(i)=c+(1-c)\cdot \frac{i^2}{n^2}\). Here \(k=n\cdot \sqrt{\frac{1-2c}{2(1-c)}}\) maximizes the integral, because it solves the equation \(p(k)=\frac{1}{2}\). An application of Theorem 8 supplies

$$\begin{aligned} H_1\!=\!\Omega \left( \exp \left( \int _0^{\sqrt{\frac{1-2c}{2(1-c)}}}\ln \left( \frac{1-c-(1-c)x^2}{c+(1-c)x^2}\right) \mathrm {d}x\right) ^{\!\!\!n}\right) . \end{aligned}$$

In the following we calculate the exact value of this integral. The integrand can be converted to the expression

$$\begin{aligned}&\ln \left( \frac{1-c-(1-c)x^2}{c+(1-c)x^2}\right) =\ln \left( \frac{1-x^2}{\frac{c}{1-c}+x^2}\right) \\&\quad =\ln (1-x^2)-\ln \left( \frac{c}{1-c}+x^2\right) . \end{aligned}$$

The indefinite integral of \(\ln (1-x^2)\) is

$$\begin{aligned} x\cdot \ln (1-x^2)-2x+\ln \left( \frac{1+x}{1-x}\right) . \end{aligned}$$

It can be evaluated for values \(x\in [0,1[\), but this is fine as \(0\le k/n<1\). Furthermore the indefinite integral of \(\ln \left( \frac{c}{1-c}+x^2\right) \) is

$$\begin{aligned} x\cdot \ln \left( \frac{c}{1-c}+x^2\right) -2x+2\sqrt{\frac{c}{1-c}}\arctan \left( {x}\cdot {\sqrt{\frac{1-c}{c}}}\right) , \end{aligned}$$

which can be evaluated for all values, because \(\frac{c}{1-c}\) is positive. The indefinite integral of the whole expression is obtained by the subtraction of both

$$\begin{aligned}&x\cdot \ln (1-x^2)+\ln \left( \frac{1+x}{1-x}\right) -x\cdot \ln \left( \frac{c}{1-c}+x^2\right) \\&\quad -2\sqrt{\frac{c}{1-c}}\arctan \left( {x}\cdot {\sqrt{\frac{1-c}{c}}}\right) \end{aligned}$$

and evaluation of the bounds \(k/n=\sqrt{\frac{1-2c}{2(1-c)}}\) and 0 results in

$$\begin{aligned}&\left[ \sqrt{\frac{1-2c}{2(1-c)}}\cdot \ln \left( 1-\frac{1-2c}{2(1-c)}\right) +\ln \left( \frac{1+\sqrt{\frac{1-2c}{2(1-c)}}}{1-\sqrt{\frac{1-2c}{2(1-c)}}}\right) \right. [0]\\&\qquad -\,\sqrt{\frac{1-2c}{2(1-c)}}\cdot \ln \left( \frac{c}{1-c}+\frac{1-2c}{2(1-c)}\right) [0]\\&\left. \qquad -\,2\sqrt{\frac{c}{1-c}}\arctan \left( \frac{\sqrt{\frac{1-2c}{2(1-c)}}}{\sqrt{\frac{c}{1-c}}}\right) \right] -\left[ 0+\ln (1)-0-0\right] \\&\quad =\sqrt{\frac{1-2c}{2(1-c)}}\cdot \ln \left( \frac{1}{2(1-c)}\right) +\ln \left( \frac{1+\sqrt{\frac{1-2c}{2(1-c)}}}{1-\sqrt{\frac{1-2c}{2(1-c)}}}\right) \\&\qquad -\,\sqrt{\frac{1-2c}{2(1-c)}}\cdot \ln \left( \frac{1}{2(1-c)}\right) -2\sqrt{\frac{c}{1-c}}\arctan \left( \sqrt{\frac{1-2c}{2c}}\right) \\&\quad =\ln \left( \frac{1+\sqrt{\frac{1-2c}{2(1-c)}}}{1-\sqrt{\frac{1-2c}{2(1-c)}}}\right) -2\sqrt{\frac{c}{1-c}}\arctan \left( \sqrt{\frac{1-2c}{2c}}\right) . \end{aligned}$$

An application of the \(\exp \) function on this result gives the claimed lower bound. \(\square \)

Fig. 5
figure 5

The functions \(\alpha (c)\), \(\beta (c)\) and \(\frac{1-c}{c}\) for \(c \in (0,\frac{1}{2})\)

This lower bound is the best possible bound which can be achieved with this model as the probability \(p_i = c+(1-c)\cdot {\left( {\begin{array}{c}i+1\\ 2\end{array}}\right) }/{\left( {\begin{array}{c}n\\ 2\end{array}}\right) }\) actually appears at distance i if the permutation transforming the current position to the attractor consists of one cycle of length \(i+1\) and the remaining permutation consists of singleton cycles. For this improvement probability the bound is \(\varTheta ^*(\alpha (c)^n)\).

The following theorem supplies lower bounds on the expected optimization time of OnePSO on the sorting problem.

Theorem 13

The expected optimization time \(T_{{\mathrm{sort}}}(n)\) of the OnePSO sorting n items is bounded from below by

$$\begin{aligned} T_{{\mathrm{sort}}}(n) = {\left\{ \begin{array}{ll} \varOmega (n^2) & \text {if }c \in (\frac{1}{2}, 1] \\ \varOmega (n^{\frac{8}{3}}) & \text {if }c = \frac{1}{2} \\ \varOmega \left( \alpha (c)^n\cdot n^2\right) & \text {if }c \in (0, \frac{1}{2})\\ \varOmega \left( n!\right) & \text {if }c =0 . \end{array}\right. } \end{aligned}$$

Proof

The situation where already the initial position is the optimum has probability 1/n!. As \(1-1/n!>1/2\) for \(n\ge 2\) we have the same \(\varOmega \) bound if we ignore this case. In all other cases we can consider the situation that the attractor has just been updated to a solution that has distance one to the optimum. Without loss of generality, we assume that the attractor is the identity permutation and the optimum is the transposition \((0\,1)\). The number of steps required for the next (hence final) improvement of the attractor is a lower bound on the expected optimization time for the OnePSO. We determine a lower bound on this number for various choices of c.

For all \(c \in (0,1]\) we apply Theorem 12. We use all permutations as set of states Y in the Markov process M. Let \({\hat{Y}}=X_1\) be the subset of states which are a single swap away from the attractor. Therefore the optimal solution is contained in \({\hat{Y}}\), but up to the point when the OnePSO reaches the optimal solution it is indistinguishable from all other permutations in \({\hat{Y}}\). We will immediately prove that \({\hat{Y}}\) is actually indistinguishable with respect to M. Initially the particle is situated on the attractor and after a single step it is situated at a permutation in \({\hat{Y}}\), where each permutation has equal probability. We use the permutation after the first step as the initial state of the Markov process \(Z_0\) and all other \(Z_i\) are the successive permutations. Therefore Statement 1 of Definition 2 is fulfilled. Let \(T=\min \lbrace t>0\mid Z_t\in {\hat{Y}}\rbrace \) the stopping time of Theorem 12. For each sequence of states \(Z_0,\ldots ,Z_T\) there is a one to one mapping to a sequence \({{\tilde{Z}}}_0=Z_T,{{\tilde{Z}}}_1,\ldots ,{{\tilde{Z}}}_{T-1},{{\tilde{Z}}}_{T}=Z_0\) which has equal probability to appear. The sequence \({{\tilde{Z}}}_0,\ldots ,\tilde{Z}_T\) is not the reversed sequence, because the forced steps would then lead to the wrong direction, but the sequence can be obtained by renaming the permutation indices. The renaming is possible because the permutations \(Z_0\) and \(Z_T\) are both single swaps. As this one to one mapping exists also the Statement 2 of Definition 2 is fulfilled. Finally we need a bound on the expectation of T. If we are in \(X_1={\hat{Y}}\) we can either go to the attractor by a forced move or random move and return to \(X_1\) in the next step or we can go to \(X_2\) by a random move and return to \(X_1\) in expectation after \(H_2\) steps. We have \(\mathrm{E}[T]=\left( c+(1-c)/\left( {\begin{array}{c}n\\ 2\end{array}}\right) \right) \cdot 2 + (1-c)\cdot \left( 1-1/\left( {\begin{array}{c}n\\ 2\end{array}}\right) \right) (1+H_2)=\varOmega (H_2)=:h(M)\). Theorem 12 provides the lower bound \(\Omega (\vert {\hat{Y}}\vert \cdot H_2)\) for the runtime to find the fixed permutation \((0,1)\in {\hat{Y}}\) which is the optimal solution. From Eq. 2 we get \(H_2=(p_1\cdot H_1-1)/(1-p_1)\ge (c\cdot H_1-1)/(1-c)\). As \(H_1=\Omega (n^{2/3})\) for \(c=\frac{1}{2}\) (see Theorem 7) and \(H_1=\Omega (\alpha (c)^n)\) for \(c\in (0,\frac{1}{2})\) (see Lemma 4) also \(H_2=\Omega (H_1)\) for \(c\in (0,\frac{1}{2}]\) which results in the lower bounds \(T_{{\mathrm{sort}}}(n)=\Omega (\vert {\hat{Y}}\vert \cdot H_1)=\Omega (\left( {\begin{array}{c}n\\ 2\end{array}}\right) \cdot n^{2/3})=\Omega (n^{8/3})\) for \(c=\frac{1}{2}\) and \(T_{{\mathrm{sort}}}(n)=\Omega (\vert {\hat{Y}}\vert \cdot H_1)=\Omega (\left( {\begin{array}{c}n\\ 2\end{array}}\right) \cdot \alpha (c)^n)=\Omega (n^2\cdot \alpha (c)^n)\) for \(c\in (0,\frac{1}{2})\). Trivially the return time to \(X_1\) in M can be bounded by 2, which results in the lower bound \(T_{\mathrm{sort}}(n)=\Omega (n^2)\) for the case \(c\in (\frac{1}{2},1]\).

The lower bound for \(c=0\) can be derived directly from the indistinguishability property: Let \({\hat{Y}} = Y\). It is readily verified that the initial state is uniformly distributed over \({\hat{Y}}\). Furthermore, any \({\hat{Y}}\)-\({\hat{Y}}\)-path can be reversed and has the same probability to occur. Therefore, Condition 2 of Definition 2 is satisfied and the lower runtime bound follows from Theorem 12 by choosing \(h(M) = 1\). \(\square \)

Beside that formally proved lower bounds we conjecture the following lower bounds on the expected optimization time of OnePSO for sorting n items.

Conjecture 2

$$\begin{aligned} T_{{\mathrm{sort}}}(n) = {\left\{ \begin{array}{ll} \varOmega (n^2) & \text {if }c \in (\frac{1}{2}, 1] \\ \varOmega (n^{3}) & \text {if }c = \frac{1}{2} \\ \varOmega \left( \left( \frac{1-c}{c}\right) ^n\cdot n^2\right)& \text {if }c \in (0, \frac{1}{2}). \end{array}\right. } \end{aligned}$$

Note that these lower bounds differ from our upper bounds given in Theorem 10 only by a \(\log \)-factor. Evidence supporting this conjecture is given in “Appendix B”. We obtain our theoretical evidence by considering the average probability to move towards the attractor, instead of upper and lower bounds as before.

5.2.3 Lower bounds on the expected optimization time for OneMax

First we provide a lower bound on the expected return time to the attractor.

Lemma 5

Let \(c\in (0,\frac{1}{2})\). For OneMax, assume that the attractor has Hamming distance one to the optimum \(1^n\). Then the expected return time \(H_1\) to the attractor is bounded from below by \(H_1=\Omega (\beta (c)^n)\), where

$$\begin{aligned} \beta (c)=2^{{1}/({1-c})}\cdot (1-c)\cdot c^{{c}/({1-c})}. \end{aligned}$$

See Fig. 5 for a visualization of \(\beta (c)\).

Proof

We use Theorem 8. The value \(\beta (c)\) is already calculated in Theorem 11 Eq. 10. \(\square \)

This result enables us to prove lower bounds on \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n)\).

Theorem 14

The expected optimization time \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n)\) of the OnePSO for solving OneMax is bounded from below by

$$\begin{aligned} T_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n) = {\left\{ \begin{array}{ll} \varOmega (n \log n) \text {if }c \in (\frac{1}{2}, 1]\\ \varOmega (n^\frac{3}{2}) \text {if }c = \frac{1}{2}\\ \varOmega \left( \beta (c)^n\cdot n\right) \text {if }c \in (0, \frac{1}{2})\\ \varOmega \left( 2^n\right) \text {if }c =0 . \end{array}\right. } \end{aligned}$$

Proof

First, let \(c \in (\frac{1}{2}, 1]\). Then, with probability at least \(\frac{1}{2}\), the initial solution contains at least \(k = \lfloor n/2 \rfloor = \varOmega (n)\) zeros. Each zero is flipped to one with probability 1/n in a random move, and none of the k entries is set to one in a move towards the attractor. The expected time required to sample the k distinct bit flips is bounded from below by the expected time it takes to obtain all coupons in the following instance of the coupon collector’s problem: there are k coupons and each coupon is drawn independently with probability 1/k. The expected time to obtain all coupons is \(\varOmega (k \log k)\)  (Mitzenmacher and Upfal 2005, Sect. 5.4.1). It follows that the expected optimization time is \(\varOmega (n \log n)\) as claimed.

For \(c \in (0,\frac{1}{2}]\) we use the same approach as in the proof of Theorem . Also here the event that the initial solution is optimal can be ignored. Consider the situation that the attractor has just been updated to a solution that has distance one to the optimum. We use the set of all bit strings as set of states Y in the Markov process M. Let \({\hat{Y}}=X_1\) the subset of bit strings which is a single bit flip away from the attractor, hence \({\hat{Y}}\) contains the optimum. \(Z_i\) and T are instantiated as in the proof of Theorem . Therefore Statement 1 of Definition 2 is fulfilled. Again for each sequence of states \(Z_0,\ldots ,Z_T\) we have a one to one mapping to a sequence \(\tilde{Z}_0=Z_T,{{\tilde{Z}}}_1,\ldots ,{{\tilde{Z}}}_{T-1},{{\tilde{Z}}}_{T}=Z_0\) which has equal probability to appear. This sequence is again obtained by renaming the indices plus some bit changes according to the shape of the attractor. Hence also Statement 2 of Definition 2 is fulfilled. Hence \({\hat{Y}}\) is indistinguishable with respect to M. We obtain \(\mathrm{E}[T]=\Omega (H_2)=:h(M)\). Theorem 12 provides the lower bound \(\Omega (\vert {\hat{Y}}\vert \cdot H_2)\) for the runtime to find the optimal solution. From Eq. (2) we get \(H_2\ge (c\cdot H_1-1)/(1-c)\). As \(H_1=\Omega (n^{1/2})\) for \(c=\frac{1}{2}\) (see Theorem 6) and \(H_1=\Omega (\beta (c)^n)\) for \(c\in (0,\frac{1}{2})\) (see Lemma 5) also \(H_2=\Omega (H_1)\) for \(c\in (0,\frac{1}{2}]\) which results in the lower bounds \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n)=\Omega (\vert {\hat{Y}}\vert \cdot H_1)=\Omega (n\cdot n^{1/2})=\Omega (n^{3/2})\) for \(c=\frac{1}{2}\) and \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n)=\Omega (\vert {\hat{Y}}\vert \cdot H_1)=\Omega (n\cdot \beta (c)^n)=\Omega (n\cdot \beta (c)^n)\) for \(c\in (0,\frac{1}{2})\).

Again the lower bound for \(c=0\) can be obtained by the indistinguishability property. The proof for this case is identical to the corresponding part of Theorem . \(\square \)

Finally all runtime bounds claimed in Table 1 are justified and for this purpose all of the presented tools in Sect. 4 are used.

5.3 Bounds on the expected optimization time for D-PSO

The upper bounds on the runtime of OnePSO in Theorem  10 and Theorem 11 directly imply upper bounds for D-PSO. Recall that we denote by c the parameter of OnePSO and by \(T_{\mathrm{O\textsc {ne}M\textsc {ax}}} (n)\) and \(T_{{\mathrm{sort}}}(n)\) the expected optimization time of OnePSO for OneMax and sorting, respectively.

Corollary 4

Let \(T'_{\mathrm{O\textsc {ne}M\textsc {ax}}}\) and \(T'_{{\mathrm{sort}}}(n)\) be the expected optimization time of D-PSO for OneMax and sorting, respectively. If \(c = c_{glob}\), then \(T'_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n) = O(P\cdot T_{\mathrm{O\textsc {ne}M\textsc {ax}}}(n))\) and \(T'_{{\mathrm{sort}}}(n) =O(P\cdot T_{{\mathrm{sort}}}(n))\), where P is the number of particles.

Proof

In each trial to improve the value of the global attractor at least the particle which updated the global attractor has its local attractor at the same position as the global attractor. This particle behaves exactly like the single particle in OnePSO until the global attractor is improved. Therefore we have at most P times more objective function evaluations than OnePSO, where P is the number of particles. \(\square \)

One can refine this result by again looking on return times to an attractor.

If the global attractor equals the local attractor then this particle performs the same steps as all other particles having equal attractors. As all those particles perform optimization in parallel in expectation no additional objective function evaluations are made compared to the OnePSO.

For particles where the global attractor and the local attractor differ we can use previous arguments applied to the local attractor. With two different attractors alternating movements to the local and global attractor can cancel each other out. Therefore if (only) \(c_{loc}\) is fixed then for the worst case we can assume only \(c_{loc}\) as probability of moving towards the local attractor and \(1-c_{loc}\) as probability of moving away from the local attractor. This enables us to use Theorem 5 to calculate the expected time to reduce the distance to an attractor from one to zero. We denote the return time from Theorem 5 as \(\varPsi (n,p)\)

$$\begin{aligned} \varPsi (n,p):=\left. {\left\{ \begin{array}{ll} \dfrac{1-2p \left( \dfrac{1-p}{p} \right) ^n }{2p-1} & \text {if } p \ne \frac{1}{2}\\ 2n-1 & \text {otherwise} \end{array}\right. }\right\} ={\left\{ \begin{array}{ll} \varTheta (1) & \text {if }\frac{1}{2}<p\le 1\\ \varTheta (n) & \text {if }p = \frac{1}{2}\\ \varTheta \left( \left( \frac{1-p}{p} \right) ^n \right) & \text {if } 0<p < \frac{1}{2}. \end{array}\right. } \end{aligned}$$

If the position equals the local attractor and consequently differs from the global attractor the probability for improving the local attractor can be bounded from below by a positive constant. E.g., for the problem OneMax this constant is \(c_{glob}/2\) because for a move towards the global attractor for at least half of the differing bits the value of the global attractor equals the value of the optimal solution as the global attractor is at least as close to the optimum as the local attractor. Therefore the number of trials until the local attractor is improved is constant. As such an update occurs at most once for each particle and fitness level we obtain an additional summand of \(O(\varPsi (n,c_{loc})\cdot P\cdot n)\) instead of the factor P for the problems OneMax and the sorting problem.

In contrast to the upper bounds, the lower bounds for OnePSO do not apply for D-PSO for the following reason. The bottleneck used for the analysis of OnePSO is the very last improvement step. However, D-PSO may be faster at finding the last improvement because it may happen that the local and global attractor of a particle have both distance one to the optimum but are not equal. In this case, as described above, there is a constant probability of moving to the optimum if a particle is at one of the two attractors whereas for OnePSO the probability of moving towards the optimum if the particle is located at the attractor tends to zero for large n.

An analysis of experiments of D-PSO with small number of particles and OnePSO applied to the sorting problem and OneMax revealed only a small increase in the optimization time of D-PSO compared to OnePSO. This increase is way smaller than the factor P. For some parameter constellations also a significant decrease of the optimization time of D-PSO compared to OnePSO is achieved.

5.4 Lower bounds for pseudo-Boolean functions

Also for general pseudo-Boolean functions \(f:\lbrace 0,1\rbrace ^n\rightarrow {\mathbb {R}} \) we can prove lower bounds on the expected optimization time.

Theorem 15

If \(P=\varTheta (1)\), where P is the number of particles, then the expected optimization time of D-PSO optimizing pseudo-Boolean functions \((\lbrace 0,1\rbrace ^n\rightarrow {\mathbb {R}} \)) with a unique optimal position is in \(\varOmega (n\log (n))\).

Proof

If there are \(P=\varTheta (1)\) particles, then in expectation there are \(n/2^P=\varOmega (n)\) bits such that there is no particle where this bit of the optimal position equals the corresponding bit of the initial position. The expected optimization time is therefore bounded by the time that each such bit is flipped in a random move at least once. This subproblem corresponds to a coupon collectors problem and therefore we have the claimed lower bound of \(\varOmega (n\log (n))\). \(\square \)

For larger values of P we obtain an even higher lower bound by the following theorem.

Theorem 16

If \(P=O(n^k)\), where P is the number of particles and k is an arbitrary non-negative real value, then the expected optimization time of D-PSO optimizing pseudo-Boolean functions \((\lbrace 0,1\rbrace ^n\rightarrow {\mathbb {R}})\) with a unique optimal position is in \(\varOmega (n\cdot P)\).

Proof

To bound the probability to be at least some distance apart from the attractor after initialization we can use Chernoff bounds. For a fixed particle we can define

$$\begin{aligned} Y_i={\left\{ \begin{array}{ll} 1 & \begin{array}{l}\text {if the }i\text {th bit of the initial position differs from}\\ \text {the corresponding bit of the unique optimal position}\end{array}\\ 0 & \text {otherwise}. \end{array}\right. } \end{aligned}$$

Therefore \(Y=\sum _{i=1}^n Y_i\) is exactly the initial distance of the fixed particle to the unique optimal position. For each i we have that \(\Pr [Y_i=1]=\frac{1}{2}\) and \(\mathrm{E}[Y]=\frac{n}{2}\). By Chernoff bounds we obtain the lower bound

$$\begin{aligned} \Pr \left[ Y > \frac{n}{4}\right]&=1-\Pr \left[ Y\le \left( 1-\frac{1}{2}\right) \mathrm{E}[Y]\right] \ge 1-\exp \left( -\frac{\left( \frac{1}{2}\right) ^2\cdot \frac{n}{2}}{2}\right) \\&=1-\exp \left( -\frac{n}{16} \right) . \end{aligned}$$

The probability that the initial position of all P particles have distance at least \(\frac{n}{4}\) to the unique optimal position is the \(P\hbox {th}\) power of this probability and can be bounded from below for large n by

$$\begin{aligned} \left( 1-\exp \left( -\frac{n}{16} \right) \right) ^P \ge 1-P \cdot \exp \left( -\frac{n}{16} \right) \overset{n\ge 16\ln (2P)}{\ge } \frac{1}{2}. \end{aligned}$$

Please note that one can choose such an n as \(P=O(n^k)\) and \(16\cdot \ln (2\cdot {\mathrm{poly}}(n))=o(n)\) for any polynomial. If the distance of the positions of all particles is at least \(\frac{n}{4}\) then it takes at least \(\frac{n}{4}\) iterations until the optimal position can be reached as the distance can change only by one in each iteration. For each iteration P evaluations of the objective function are performed. Therefore we have at least \(\frac{n\cdot P}{4}\) objective function evaluations with probability at least \(\frac{1}{2}\) for large n which results in the claimed optimization time of \(\varOmega (n\cdot P)\). \(\square \)

This means if we choose , e.g., \(P=n^{10}\) we would have at least \(\varOmega (n^{11})\) function evaluations in expectation.

6 Conclusion

We propose a simple and general adaptation of the PSO algorithm for a broad class of discrete optimization problems. For one particle, we provide upper and lower bounds on its expected optimization time for the sorting problem and OneMax and generalize the upper bounds to D-PSO with arbitrary number of particles and we also prove lower bounds of D-PSO optimizing pseudo-Boolean functions. Depending on the parameter c, which is the probability of moving towards the attractor, the expected optimization time may be polynomial (\(c \ge 1/2\)) and exponential (\(c < 1/2\)), resp. The cornerstone of our analysis are \(\varTheta \)-bounds on the expected time it takes until the PSO returns to the attractor. Our analysis also provides the variance of this value. We analyze Markov chains and provide tools to evaluate expected return times for certain classes of transition probabilities. Additionally we establish a useful general property of indistinguishability of a Markov process for obtaining lower bounds on the expected first hitting time of a special state. Application of the presented tools on other Markov chains, often appearing in the analysis of randomized algorithms, would obviously be possible.

For future work, it would be interesting to see if the upper and lower bounds on the expected optimization time for OneMax given in Theorems 11 and 14 are valid for any linear function \(f : \{0,1\}^n \rightarrow {\mathbb {R}} \), \(f(x_1,x_2,\ldots ,x_n) = \sum _i w_i x_i\). Furthermore, we conjecture that the upper bounds on the sorting problem for \(c=0\) is n! and that the other proved upper bounds on the sorting problem are tight. Another direction for future work is to apply our technical tools to other meta-heuristics. In particular, our tools may be useful in the analysis of “non-elitist” meta-heuristics, for instance the Strong Selection Weak Mutation (SSWM) evolutionary regime introduced in Gillespie (1983) as an example of non-elitist algorithm.

Finally, it would be interesting to determine the return time to the state \(S_0\) in a more general Markov model \({{\mathscr {M}}} ((p_i)_{1 \le i \le n})\), where \(p_i = 1/2 + z(i, n)\) such that \(z(i, n) = {\text {poly}}(i) / {\text {poly}}(n)\), where the degrees of the polynomials differ, and z(in) is non-decreasing for \(1 \le i \le n\). This would generalize Theorems 6 and 7, and shed some light on the relation between z(in) and the return time to state \(S_0\). Here, we conjecture that for z(in) as defined above the return time is in \({\text {poly}}(n)\). Finally a proof for the claimed upper bound of O(n!) on the expected time to reach a specified permutation in the graph of permutations by an actual random walk searching for the optimum would be beneficial. To the best of our knowledge no proof exists so far.