1 Motivation

During the last decades, the log-optimal portfolio (LOP) has become increasingly important in portfolio theory. There is a significant number of publications related to the LOP—or to the growth-optimal portfolio (GOP), which is often treated synonymously. The reader can find a huge number of articles in MacLean et al. (2011). For an overview on the subject matter see Christensen (2005). In spite of a controversial debate (Merton and Samuelson 1974), it cannot be denied that the LOP has a number of nice and beautiful properties. For example, it is asymptotically optimal among all portfolios that share the same constraints on the portfolio weights (Cover and Thomas 1991, Chapter 15). Moreover, the LOP can be considered a discrete-time approximation of the GOP, which serves as a numéraire portfolio and thus plays a major role in financial mathematics (Karatzas and Kardaras 2007; Platen and Heath 2006). The GOP provides a link between financial mathematics, neoclassical finance, and financial econometrics (Frahm 2016). Hence, the LOP is of particular interest for a variety of reasons.

In this work, the statistical properties of LOP estimators are investigated. To the best of my knowledge, this is not done so far in the literature. We will consider the standard estimator for the LOP, i.e., the best constant re-balanced portfolio (BCRP), and the mean-variance estimator (MVE), which is based on a quadratic approximation of log-returns. The question of whether or not the BCRP or the MVE outperforms any other investment strategy is not discussed in this work. This seems to be well-investigated in the literature. In particular, the BCRP and the MVE are not compared with one another in order to clarify whether maximizing the logarithmic utility or the mean-variance objective function is more preferable (Hakansson 1971). Instead, the MVE is used only to approximate the BCRP. Correspondingly, the mean-variance optimal portfolio (MVOP) does not end in itself. Here, it just represents an approximation of the LOP.

The main conclusions of this work are as follows:

  1. (i)

    The MVE provides a very good approximation of the BCRP if re-balancing takes place on a daily basis. The numerical implementation of the MVE is quite easy and the corresponding algorithm is very fast even if the number of dimensions is high.

  2. (ii)

    One typically overestimates the expected out-of-sample log-return on the BCRP and even the expected log-return on the LOP. Similar statements hold true for the expected out-of-sample performance of the MVE and the performance of the MVOP.

  3. (iii)

    The BCRP exists and is unique under mild regularity conditions. Moreover, it is strongly consistent, which holds true also for the expected out-of-sample log-return on the BCRP and its in-sample average log-return. Similar results are obtained for the MVE.

  4. (iv)

    Although both the BCRP and the MVE are affected by short-selling constraints, they are \(\sqrt{n}\,\)-consistent. The asymptotic results can be used in order to construct hypothesis tests and to compute confidence regions.

  5. (v)

    Due to the constraints on the portfolio weights, the asymptotic results are inaccurate in most practical applications. Nonetheless, a finite-sample correction exists. It substantially improves the large-sample approximation of the MVE (and thus of the BCRP).

  6. (vi)

    However, the impact of estimation risk that comes from estimating expected asset returns is tremendous in most real-life situations. This problem is so serious that estimating the LOP becomes a futile endeavour if we have no prediction power.

The rest of this work is organized as follows: In Sect. 2 the basic assumptions are made and the mathematical notation is explained. Section 3 contains some elementary results and provides a simple characterization of the LOP. In Sect. 4 the small-sample and large-sample properties of the BCRP are derived, which includes its existence, uniqueness, and consistency. That section contains also the asymptotic distribution of the BCRP. The reader can find the corresponding results for the MVE in Sect. 5. In Sect. 6 some computational issues that are related to the BCRP are discussed and the finite-sample correction for the MVE is demonstrated. Section 7 concludes this work. Finally, the “Appendix” contains an important but quite tedious derivation.

2 Basic assumptions and notation

Throughout this work, \({\mathbb {N}}\) denotes the set of positive integers, i.e., \({\mathbb {N}}:=\big \{1,2,\ldots \big \}\), and the symbol “\(\log \)” stands for the natural logarithm. The symbol \({\varvec{0}}\) denotes a vector of zeros and \({\varvec{1}}\) is a vector of ones. The dimensions of \({\varvec{0}}\) and \({\varvec{1}}\) should always be clear from the context. Any tuple \(x=(x_1,x_2,\ldots ,x_d)\in \mathbb {R}^d\) is understood to be a column vector and \(x'=\big [x_1~x_2~\ldots ~x_d\big ]\) is the transpose of x. It is implicitly assumed that we have an underlying probability space \(\big (\Omega ,{\mathcal {A}},{\mathbb {P}}\big )\), where \(\Omega \) is some state space, \({\mathcal {A}}\) is a \(\sigma \)-algebra on \(\Omega \), and \({\mathbb {P}}\) is a probability measure on \({\mathcal {A}}\), which is often referred to as the physical or real-world probability measure in financial mathematics (Frahm 2016). A random quantity is a (measurable) real-valued function on \(\Omega \). According to probability theory, two random quantities are considered identical if and only if they coincide with probability 1. Analogously, any statement about a random quantity is meant to be true almost surely (a.s.). Hence, we can drop the additional remarks “\({\mathbb {P}}(\cdot )=1\)” and “a.s.” for convenience. For example, if X is a random variable, “\(X=x\)” with \(x\in \mathbb {R}\) means that \({\mathbb {P}}(X=x)=1\) and if Y is a random variable, too, then “\(X>Y\)” means that X is greater than Y with probability 1, etc. Further, if \(\big \{X_n\big \}_{n\in {\mathbb {N}}}\) is a random sequence, “\(X_n\rightarrow x\)” means that \(X_n\) converges a.s. to x as n tends to infinity. Thus, we may drop also the notation “\(n\rightarrow \infty \)”.

Consider an asset universe with one riskless asset and \(N\in {\mathbb {N}}\) risky assets. It is assumed that the assets are infinitely divisible and any market frictions are ignored. Let \(S_t=\big (S_{0t},S_{1t},\ldots ,S_{Nt}\big )\) be the vector of asset prices at time \(t=0,1,\ldots \,\), where \(S_{0t}\) denotes the price of the riskless asset. The unit of time is one trading day. It is assumed that \(S_{0t}=1\) for \(t=0,1,\ldots \) and that \(S_{i0}=1\) for \(i=1,2,\ldots ,N\). In the following, each statement that contains the index i or t is meant to be true for all i and t that are appropriate in the given context. The price process \(\big \{S_t\big \}_{t=0,1,\ldots }\) shall be positive. The time-index set is always \(\big \{0,1,\ldots \big \}\) and thus the subscript in “\(\{\cdot \}_{t=0,1,\ldots }\)” will be omitted for notational convenience.

Let \(X_t:=S_t/S_{t-1}\) be the vector of price relatives after the trading day t, where the division of \(S_t\) by \(S_{t-1}\) is understood to be componentwise. Any capital appreciation for Asset i during Day t, e.g., interest or dividend income, is considered part of the asset price \(S_{it}\). The portfolio weights of the risky assets are denoted by \(w_1,w_2,\ldots ,w_N\), whereas \(w_0\) is the weight of the riskless asset. Hence, \(w=(w_0,w_1,\ldots ,w_N)\in \mathbb {R}^{N+1}\) is a portfolio that consists of the riskless asset and N risky assets. Each single asset is considered a portfolio, i.e., a canonical vector in \(\mathbb {R}^{N+1}\). In order to distinguish the weights of the risky assets, \(w_1,w_2,\ldots ,w_N\), from the weight of the riskless asset, \(w_0\), the notation \({\tilde{w}}=(w_1,w_2,\ldots ,w_N)\in \mathbb {R}^N\) is used. This means that \(w=(w_0,{\tilde{w}})\). Analogously, \({\tilde{X}}_t\) indicates the risky part of \(X_t\), i.e., we have that \(X_t=(1,{\tilde{X}}_t)\). Finally, the return on Asset i after Day t is given by \(R_{it}:=X_{it}-1\) and \(R_t={\tilde{X}}_t-{\varvec{1}}=(R_{1t},R_{2t},\ldots ,R_{Nt})\) denotes the vector of risky asset returns. Since we assume that \(S_{0t}=1\) for \(t=0,1,\ldots \), the risk-free interest rate is supposed to be zero. This assumption is made without loss of generality, which will be explained below.

Although the following terms will be defined later on, their notation is used throughout this work and so it shall be clarified beforehand: The symbol \(w^*=(w^*_0,w^*_1,\ldots ,w^*_N)\) denotes the LOP, whereas \(w^\star =(w^\star _0,w^\star _1,\ldots ,w^\star _N)\) is the MVOP. Note that the former superscript, “\(*\),” has 6 spikes, whereas the latter, “\(\star \),” consists of 5 spikes. This shall symbolize a key observation of this work, namely that the LOP and the MVOP are almost indistinguishable in most practical applications, which hopefully does not hold for the symbols themselves. The symbols \({\tilde{w}}^*=(w^*_1,w^*_2,\ldots ,w^*_N)\) and \({\tilde{w}}^\star =(w^\star _1,w^\star _2,\ldots ,w^\star _N)\) denote the “risky parts” of \(w^*\) and \(w^\star \), respectively. Further, \(w_i\) is the portfolio weight of Asset i. By contrast, \(w_n\) is an estimator for the portfolio w, where \(n\in {\mathbb {N}}\) is the number of observations.Footnote 1 Consequently, \(w_{in}\) is the estimator for the portfolio weight of Asset i.

At the end of each trading day, the investor re-balances his portfolio according to a constant vector of portfolio weights w satisfying the budget constraint \({\varvec{1}}'w=1\). The portfolio value at Day \(n\in {\mathbb {N}}\) amounts to \(V_{wn}:=\prod _{t=1}^n w'X_t\). The investment capital might vanish during some trading day if we do not pose any additional constraints on the portfolio weights. In fact, if we allow the investor to enter short positions, the probability of going bankrupt, i.e., \(V_{wn}\le 0\), is positive unless we make some additional assumption about \(\big \{X_t\big \}\), but this is omitted in this work. Hence, the portfolio w must be an element of the (unit) simplex

$$\begin{aligned} {\mathcal {S}} := \Big \{w\in \mathbb {R}^{N+1}\!:\,w\ge {\varvec{0}}~\wedge ~{\varvec{1}}'w=1\Big \}. \end{aligned}$$

The assumption that \(w\in {\mathcal {S}}\) is crucial. It guarantees that \(w'X_t>0\) so that \(V_{wn}>0\) for all \(n\in {\mathbb {N}}\). Hence, the log-value process \(\log V_{wn}=\sum _{t=1}^n\log w'X_t\) exists for all \(w\in {\mathcal {S}}\) and \(n\in {\mathbb {N}}\), where \(\log w'X_t\) is referred to as the log-return on the portfolio after Day t. The short-selling constraints are indispensable because otherwise \(\log w'X_t\) might not be defined.

As already mentioned above, we can assume without loss of generality that the risk-free interest rate is zero: Let \(r\equiv R_0>-\,1\) be the risk-free interest rate and \(Y_t\) the vector of relative prices with \(Y_{0t}=1+r\). Then we could use the discounted relative-price vector \(X_t:=Y_t/(1+r)\). The log-return on any portfolio \(w\in {\mathcal {S}}\) amounts to \(\log (w'Y_t)=\log (1+r)+\log (w'X_t)\). We can ignore the first term, \(\log (1+r)\), provided we are interested only in maximizing the expected log-return on the portfolio w, which is our main focus here. Throughout this work, we suppose that \(X_t\) contains the discounted relative prices but omit the word “discounted” for convenience.

Now, the following basic assumptions are made:

A1.:

The relative-price process \(\big \{X_t\big \}\) is strictly stationary,

A2.:

the expected value of \(\log w'X_t\) is finite for all \(w\in {\mathcal {S}}\), and

A3.:

\(w'_1X_t\) and \(w'_2X_t\) do not coincide for any \(w_1,w_2\in \mathbb {R}^{N+1}\) with \(w_1\ne w_2\).

A1 is a fundamental assumption in econometrics and implies that the elements of \(\big \{X_t\big \}\) are identically distributed, but it is not assumed that they are serially independent. Further, A2 guarantees that we can work with the quantity \({\mathbf {E}}(\log w'X_t)\) and A3 requires the relative prices to span \((0,\infty )^{N+1}\). It follows that no risky asset can be replicated by a convex combination of other assets. More precisely, since \({\mathbb {P}}\big (w'_1X_t=w'_2X_t\big )\ne 1\) for all \(w_1,w_2\in \mathbb {R}^{N+1}\) with \(w_1\ne w_2\), it holds that \({\mathbb {P}}\big (w'X_t=0\big )\ne 1\) for all \(w\in \mathbb {R}^{N+1}\) with \(w\ne {\varvec{0}}\) and vice versa. Hence, it cannot happen that \({\tilde{w}}'R_t=c\in \mathbb {R}\) for any \({\tilde{w}}\in \mathbb {R}^N\) with \({\tilde{w}}\ne {\varvec{0}}\). Otherwise, we could define \(w_0:=-\,(c+{\varvec{1}}'{\tilde{w}})\) so that \(w'X_t=-\,(c+{\varvec{1}}'{\tilde{w}})+({\varvec{1}}'{\tilde{w}}+c)=0\) with \(w\ne {\varvec{0}}\). Conversely, if there is no \({\tilde{w}}\in \mathbb {R}^N\) with \({\tilde{w}}\ne {\varvec{0}}\) such that \({\tilde{w}}'R_t=c\in \mathbb {R}\), then we cannot have that \(w'X_t=0\) with \(w\ne {\varvec{0}}\) because otherwise \({\tilde{w}}'R=-\,(w_0+{\varvec{1}}'{\tilde{w}})\in \mathbb {R}\) with \({\tilde{w}}\ne {\varvec{0}}\).

To sum up, it holds that \({\mathbb {P}}\big (w'X_t=0\big )\ne 1\Leftrightarrow {\mathbb {P}}\big ({\tilde{w}}'R_t=c\big )\ne 1\) with \({\tilde{w}}\ne {\varvec{0}}\). How can we interpret this basic condition from an economical point of view? Suppose that \({\tilde{w}}'R_t=c\ne 0\). Then we have an arbitrage opportunity, which is not possible if the market is in equilibrium and the market participants are rational (Frahm 2018). By contrast, in the case that \({\tilde{w}}'R_t=0\), at least one risky asset is redundant. For example, let us assume that \({\tilde{w}}_N\ne 0\) without loss of generality. Then we can construct the portfolio \({\tilde{v}}:=-\,{\tilde{w}}/{\tilde{w}}_N\) of risky assets so that \({\tilde{v}}'R_t=0\) with \({\tilde{v}}_N=-\,1\). Now, we are able to replicate Asset N by a linear combination of all other assets, including the riskless asset, by using the portfolio \((1-{\varvec{1}}'({\tilde{v}}_1,{\tilde{v}}_2,\ldots ,{\tilde{v}}_{N-1}),{\tilde{v}}_1,{\tilde{v}}_2,\ldots ,{\tilde{v}}_{N-1})\in \mathbb {R}^N\), which satisfies the budget constraint. Hence, we can ignore Asset N and reduce the asset universe to \(N-1\) risky assets. Of course, also the converse is true. That is, if we are able to replicate a risky asset by linear combination of all other assets, we must have that \({\tilde{w}}'R_t=0\) with \({\tilde{w}}\ne {\varvec{0}}\).

Throughout this work, it is assumed that the dimension reduction has already been made in advance. Note also that without the dimension reduction the covariance matrix of \(R_t\) would not be positive definite because \({\tilde{w}}'R_t=c\) implies that \({\tilde{w}}'{\mathbf {Var}}\big (R_t\big ){\tilde{w}}={\mathbf {Var}}\big ({\tilde{w}}'R_t\big )=0\) for \({\tilde{w}}\ne {\varvec{0}}\) and vice versa. Hence, A3 is indispensable also for statistical reasons. In fact, this assumption plays a major role in portfolio theory, where it is typically required that \({\mathbf {Var}}(R_t)>0\), i.e., that the covariance matrix of the risky asset returns is positive definite.

3 The log-optimal portfolio

Definition 1

A log-optimal portfolio is a portfolio \(w^*\!\in {\mathcal {S}}\) that maximizes the expected log-return, i.e.,

$$\begin{aligned} w^* \in {{\,\mathrm{arg\,max}\,}}_{w\in {\mathcal {S}}}~{\mathbf {E}}\big (\log w'X_t\big ). \end{aligned}$$

The LOP is often associated with the “Kelly criterion” (Kelly 1956). Its asymptotic optimality properties are elaborated by Algoet and Cover (1988); Bell and Cover (1980) as well as Breiman (1961).Footnote 2 Although it was originally studied in information theory, it became of growing interest to the finance community over the last decades. As already mentioned in Sect. 1, the LOP is sometimes referred to as the GOP. However, the GOP is typically studied in a continuous-time framework, whereas the LOP is based on a discrete-time setting.Footnote 3

The Lagrange function of the optimization problem given by Definition 1 is

$$\begin{aligned} {\mathcal {L}}(w,\kappa ,\lambda ) = -\,{\mathbf {E}}\big (\log w'X_t\big ) - \kappa 'w + \lambda ({\varvec{1}}'w-1) \end{aligned}$$

with \(\kappa =(\kappa _0,\kappa _1,\ldots ,\kappa _N)\ge {\varvec{0}}\) and \(\lambda \in \mathbb {R}\). The corresponding Karush-Kuhn-Tucker (KKT) conditions are quite nice (Cover and Thomas 1991, Theorem 15.2.1). The following theorem establishes also the existence and uniqueness of the LOP.

Theorem 1

The LOP exists and is unique. It is characterized by \(w^*\!\in {\mathcal {S}}\) such that

$$\begin{aligned} {\mathbf {E}}\!\left( \frac{X_{it}}{w^{*\prime }X_t}\right) \left\{ \begin{array}{ll} = 1, &{}\quad w^*_i > 0 \\ \le 1, &{}\quad w^*_i = 0 \end{array}. \right. \end{aligned}$$

Proof

The simplex \({\mathcal {S}}\) is compact and convex. Further, the random variables \(v'X_t\) and \(w'X_t\) do not coincide for any \(v,w\in {\mathcal {S}}\) with \(v\ne w\). Hence, since the natural logarithm is strictly concave, for each \(0<\pi <1\) and \(v,w\in {\mathcal {S}}\) with \(v\ne w\) it holds that

$$\begin{aligned} \log \big [\pi v+(1-\pi )w\big ]'X_t= & {} \log \big (\pi v'X_t+(1-\pi )w'X_t\big ) \\\ge & {} \pi \log v'X_t + (1-\pi )\log w'X_t \end{aligned}$$

and

$$\begin{aligned} {\mathbb {P}}\Big (\log \big [\pi v+(1-\pi )w\big ]'X_t> \pi \log v'X_t + (1-\pi )\log w'X_t\Big ) > 0\,. \end{aligned}$$

This means that the objective function \(w\mapsto {\mathbf {E}}\big (\log w'X_t\big )\) is strictly concave, which implies that \(w^*\) exists and is unique. Further, the partial difference quotient

$$\begin{aligned} \frac{\log \big (w'x+\Delta w_ix_i\big )-\log w'x}{\Delta w_i}= & {} \frac{1}{\Delta w_i}\log \frac{w'x+\Delta w_ix_i}{w'x} \\= & {} \frac{1}{\Delta w_i}\log \left( 1+\frac{\Delta w_ix_i}{w'x}\right)> 0,\qquad \Delta w_i > 0, \end{aligned}$$

increases monotonically and tends to \(x_i/w'x>0\) as \(\Delta w_i\!\searrow 0\) for each \(x>{\varvec{0}}\). From the Monotone Convergence Theorem, we conclude that

$$\begin{aligned} \frac{\partial }{\partial w}\,{\mathbf {E}}\big (\log w'X_t\big ) = {\mathbf {E}}\!\left( \frac{X_t}{w'X_t}\right) . \end{aligned}$$

Hence, we have that \({\mathbf {E}}\big (X_t/w^{*\prime }X_t\big )=\lambda {\varvec{1}}-\kappa \) with \(w^*\!\in {\mathcal {S}}\), \(\lambda \in \mathbb {R}\), \(\kappa =(\kappa _0,\kappa _1,\ldots ,\kappa _N)\ge {\varvec{0}}\), and \(w^*_i\kappa _i=0\). From \(w^{*\prime }{\mathbf {E}}\big (X_t/w^{*\prime }X_t\big )={\mathbf {E}}\big (w^{*\prime }X_t/w^{*\prime }X_t\big )=1\), \(w^{*\prime }{\varvec{1}}=1\), and \(w^{*\prime }\kappa =0\), we conclude that \(\lambda =1\). Thus, we obtain

$$\begin{aligned} {\mathbf {E}}\left( \frac{X_t}{w^{*\prime }X_t}\right) = {\varvec{1}}-\kappa \,, \end{aligned}$$

which leads to the given expression in the theorem. \(\square \)

The portfolio weight \(w^*_i\) is bounded by \({\mathcal {S}}\) if and only if \({\mathbf {E}}(X_{it}/w^{*\prime }X_t)<1\), whereas the partial derivative equals 1 whenever \(w^*_i>0\). If all (optimal) portfolio weights are positive, the solution of the optimization problem, \(w^*\), lies in the interior of \({\mathcal {S}}\), which is denoted by \({\mathcal {S}}^{\text{ o }}\). Since we have that

$$\begin{aligned} (w-w^*)'{\mathbf {E}}\!\left( \frac{X_t}{w^{*\prime }X_t}\right) = w'\underset{=\,{\varvec{1}}}{\underbrace{{\mathbf {E}}\!\left( \frac{X_t}{w^{*\prime }X_t}\right) }} - \underset{=1}{\underbrace{{\mathbf {E}}\!\left( \frac{w^{*\prime }X_t}{w^{*\prime }X_t}\right) }} = 1-1 = 0\,,\quad \forall \,w\in {\mathcal {S}}^{\text{ o }}, \end{aligned}$$

the expected log-return stays constant after a local change of the portfolio weights.Footnote 4 This could be true even on the boundary of \({\mathcal {S}}\), \(\partial {\mathcal {S}}\), as long as \({\mathbf {E}}(X_t/w^{*\prime }X_t)={\varvec{1}}\). In this case, all portfolio weights are still unbounded by \({\mathcal {S}}\). By contrast, if (at least) one partial derivative is lower than 1, some portfolio weight must be zero, i.e., \(w^*\in \partial {\mathcal {S}}\). Then the expected log-return decreases after a local change of a portfolio weight that is bounded by \({\mathcal {S}}\). These basic considerations will be important later on when deriving the asymptotic properties of the LOP estimators.

4 The best constant re-balanced portfolio

Definition 2

A best constant re-balanced portfolio is a portfolio \(w^*_n\in {\mathcal {S}}\) that maximizes the in-sample average log-return, i.e.,

$$\begin{aligned} w^*_n \in {{\,\mathrm{arg\,max}\,}}_{w\in {\mathcal {S}}}~\frac{1}{n}\sum _{t=1}^n \log w'X_t\,. \end{aligned}$$
(1)

The relative prices contained in \(X_1,X_2,\ldots ,X_n\), except for the relative price 1 of the riskless asset, are nondegenerate random variables and so, in general, \(\frac{1}{n}\sum _{t=1}^n \log w'X_t\) is a nondegenerate random variable, too. This means that for each element of the state space, \(\omega \in \Omega \), and so for each realization of \(X_1,X_2,\ldots ,X_n\), we maximize \(\frac{1}{n}\sum _{t=1}^n \log w'X_t(\omega )\), which leads us to a particular realization, \(w^*_n(\omega )\), of a BCRP \(w^*_n\), which thus represents a random vector.

A BCRP can be considered an empirical version of the LOP. It is said to be the “best” constant re-balanced portfolio because \(w^*_n\) maximizes the final value after Period \(n\in {\mathbb {N}}\), i.e., \(V_{wn}\), over all constant re-balanced portfolios \(w\in {\mathcal {S}}\). However, the maximization is done in hindsight, i.e., after all asset prices have been revealed to the investor, and thus the BCRP is unknown in advance.

4.1 Small-sample properties

4.1.1 Existence and uniqueness

Let \(n\in {\mathbb {N}}\) be the number of price observations. The following additional assumption is made for statistical reasons:

A4.:

The sample of price relatives, i.e., \({\mathbf {X}}=\big [X_1~X_2~\cdots ~X_n\big ]\), has rank \(N+1\).

A4 can be considered an empirical version of A3, which implies that no risky asset is redundant. It requires that the number of observations exceeds the number of risky assets, i.e., \(n>N\).

Theorem 2

The BCRP exists and is unique. It is characterized by \(w^*_n\in {\mathcal {S}}\) such that

$$\begin{aligned} \frac{1}{n}\sum _{t=1}^n \frac{X_{it}}{w^{*\prime }_nX_t} \left\{ \begin{array}{ll} = 1, &{}\quad w^*_{in} > 0 \\ \le 1, &{}\quad w^*_{in} = 0 \end{array}. \right. \end{aligned}$$

Proof

Since \(w\mapsto \frac{1}{n}\sum _{t=1}^n\log w'X_t\) is a concave objective function, Eq. 1 represents a convex optimization problem and, because the simplex is compact and convex, the BCRP exists. The rank of \({\mathbf {X}}\) is full so that \(v,w\in \mathbb {R}^{N+1}\) must lead to different value processes unless \(v=w\). That is, the given objective function is strictly concave, which implies that the BCRP is unique. The rest of the proof follows by the arguments that are used in the proof of Theorem 1. \(\square \)

A simple numerical algorithm for computing the BCRP is developed by Cover (1984). We will come back to this point in Sect. 6.

4.1.2 Finite-sample bias

Let \(w_n\) be a portfolio that is based only on the price observations that have been made up to Day n. A standard assumption of portfolio theory is that \(w_n\) is stochastically independent of \(R_{n+1}\) or, equivalently, of \(X_{n+1}\) (Frahm 2015). If \(w_n\) would depend on \(R_{n+1}\), the decision of the investor at time n would be influenced by some asset returns at time \(n+1\) or, vice versa, his financial transactions would have an impact on forthcoming asset prices. In this case he would be able to predict the future price evolution on the basis of past asset prices. This is typically ruled out in finance theory and, especially, in portfolio theory. Put another way, we assume that the investor has no prediction power. This basic assumption will be elaborated also in Sect. 5.1.2.

For example, suppose that \(X_1,X_2,\ldots ,X_{n+1}\) are serially independent. Since \(w_n\) is a function of \(X_1,X_2,\ldots ,X_n\), the portfolio \(w_n\) does not dependent on \(X_{n+1}\). However, the converse is not true. Consider some (measurable) real-valued function f of some random variable \(\xi \). The fact that \(f(\xi )\) is independent of another random variable \(\zeta \) does not imply that \(\xi \) is independent of \(\zeta \). A trivial example is any constant function f. Another well-known and more sophisticated example is the case in which \(\xi _1,\xi _2,\ldots ,\xi _n\) are independent and identically normally distributed. Obviously, \(\xi _{n+1}:=\frac{1}{n}\sum _{t=1}^n\xi _t\) depends on \(\xi _1,\xi _2,\ldots ,\xi _n\), but it is known that \(f(\xi _1,\xi _2,\ldots ,\xi _n)=\sum _{t=1}^n (\xi _t-\xi _{n+1})^2\) is independent of \(\xi _{n+1}\).Footnote 5 Thus, although \(\xi _{n+1}\) depends on \(\xi _1,\xi _2,\ldots ,\xi _n\) and \(f(\xi _1,\xi _2,\ldots ,\xi _n)\) is not constant, but a nondegenerate random variable, \(f(\xi _1,\xi _2,\ldots ,\xi _n)\) is still independent of \(\xi _{n+1}\). We conclude that, although \(X_1,X_2,\ldots ,X_{n+1}\) may be serially dependent, a (random) portfolio based on \(X_1,X_2,\ldots ,X_n\) need not depend on \(X_{n+1}\).

Hence, we can make the following additional assumptions:

A5.:

The BCRP \(w^*_n\) is stochastically independent of \(X_{n+1}\).

A6.:

The BCRP does not coincide with the LOP, i.e., \({\mathbb {P}}\big (w^*_n=w^*\big )\ne 1\).

A6 just states that \(w^*_n=w^*\) holds only with probability lower than 1. This assumption is trivial, since otherwise we would not have any estimation risk at all.

Let X be any positive random vector that has the same distribution as the vectors \(X_1,X_2,\ldots \) of price relatives and define the following quantities:

  • \(\varphi (w):={\mathbf {E}}\big (\log w'X\big )\),

  • \(\varphi _n(w):={\mathbf {E}}\big (\frac{1}{n}\sum _{t=1}^n\log w'X_t\big )\), and

  • \(\varphi _{n+1}(w):={\mathbf {E}}\big (\log w'X_{n+1}\big )\).

Hence, by substituting w with \(w^*\) or \(w^*_n\), respectively, we can see that

  • \(\varphi (w^*)\) is the expected log-return on the LOP,

  • \(\varphi _n(w^*_n)\) represents the expected in-sample average log-return on the BCRP, and

  • \(\varphi _{n+1}(w^*_n)\) denotes the expected out-of-sample log-return on the BCRP.

The investor cannot achieve \(\varphi (w^*)\) because the LOP is unknown to him. Instead, he maximizes the average log-return \(\frac{1}{n}\sum _{t=1}^n\log w^{*\prime }_nX_t\) in order to compute the BCRP. At the end of Day n he applies the BCRP and one day later he obtains the log-return \(\log w^{*\prime }_nX_{n+1}\). For this reason, \(\varphi _{n+1}(w^*_n)\) may be considered the basic performance measure for \(w^*_n\).

The following theorem describes why the BCRP might lead to wrong conclusions in real-life situations, especially if the number of observations, n, is small.

Theorem 3

\(\varphi _{n+1}(w^*_n)<\varphi (w^*)<\varphi _n(w^*_n)\)

Proof

By definition, \(w^*\) is the element of \({\mathcal {S}}\) that maximizes the expected log-return. Moreover, due to A5 and A6, and the fact that \(w^*\) is unique, we have that

$$\begin{aligned} {\mathbf {E}}\big (\log w'X_{n+1}\,|\,w^*_n=w\big ) = {\mathbf {E}}\big (\log w'X_{n+1}\big ) \le {\mathbf {E}}\big (\log w^{*\prime }X_{n+1}\big ) \end{aligned}$$

with probability 1 but \({\mathbf {E}}\big (\log w'X_{n+1}\big )<{\mathbf {E}}\big (\log w^{*\prime }X_{n+1}\big )\) with positive probability. From the Law of Total Expectation and the stationarity of \(\big \{X_t\big \}\) we conclude that

$$\begin{aligned} \varphi _{n+1}(w^*_n)= & {} {\mathbf {E}}\Big ({\mathbf {E}}\big (\log w^{*\prime }_nX_{n+1}\,|\,w^*_n\big )\Big ) \\< & {} {\mathbf {E}}\big (\log w^{*\prime }X_{n+1}\big ) = {\mathbf {E}}\big (\log w^{*\prime }X\big ) = \varphi (w^*)\,. \end{aligned}$$

Moreover, since \(w^*_n\) is unique and does not coincide with \(w^*\), we have that

$$\begin{aligned} {\mathbb {P}}\left( \frac{1}{n}\sum _{t=1}^n\log w^{*\prime }_nX_t\ge \frac{1}{n}\sum _{t=1}^n\log w^{*\prime }X_t\right) = 1 \end{aligned}$$

and

$$\begin{aligned} {\mathbb {P}}\left( \frac{1}{n}\sum _{t=1}^n\log w^{*\prime }_nX_t>\frac{1}{n}\sum _{t=1}^n\log w^{*\prime }X_t\right) > 0. \end{aligned}$$

This means that

$$\begin{aligned} \varphi _n(w^*_n) = {\mathbf {E}}\!\left( \frac{1}{n}\sum _{t=1}^n\log w^{*\prime }_nX_t\right) > {\mathbf {E}}\!\left( \frac{1}{n}\sum _{t=1}^n\log w^{*\prime }X_t\right) = {\mathbf {E}}\big (\log w^{*\prime }X\big ) = \varphi (w^*)\,. \end{aligned}$$

\(\square \)

Hence, the expected out-of-sample log-return on the BCRP, \(\varphi _{n+1}(w^*_n)\), is always lower than the expected log-return on the LOP, i.e., \(\varphi (w^*)\). Nonetheless, the investor typically overestimates not only \(\varphi _{n+1}(w^*_n)\) but even \(\varphi (w^*)\) when computing \(\varphi _n(w^*_n)\) by maximizing \(\frac{1}{n}\sum _{t=1}^n\log w'X_t\). This phenomenon is not limited to the BCRP. It is a general problem of portfolio optimization (see, e.g., Frahm 2015; Frahm and Memmel 2010; Kan and Zhou 2007; Memmel 2004).

4.2 Large-sample properties

4.2.1 Consistency

For the subsequent analysis it is convenient to define the function \(x\mapsto f_w(x):=\log w'x\) for all \(w\in {\mathcal {S}}\) and \(x>{\varvec{0}}\) as well as the functions

$$\begin{aligned} w \mapsto M(w) := {\mathbf {E}}\big (f_w(X)\big )\qquad \text {and}\qquad w \mapsto M_n(w) := \frac{1}{n}\sum _{t=1}^n f_w(X_t) \end{aligned}$$

for all \(n\in {\mathbb {N}}\). We make the following statistical assumption, which is often used in the theory of empirical processes (see, e.g., van der Vaart 1998, Chapter 19):

A7.:

The family \({\mathcal {F}}=\big \{f_w\big \}_{w\in {\mathcal {S}}}\) is Glivenko-Cantelli, i.e.,

$$\begin{aligned} \sup _{w\in {\mathcal {S}}}|M_n(w)-M(w)| \rightarrow 0\,. \end{aligned}$$

Hence, the Strong Law of Large Numbers shall hold true for the sequence \(\big \{M_n(w)\big \}\)uniformly in \({\mathcal {S}}\). For example, according to van der Vaart (1998, p. 46), it is sufficient to guarantee that

  1. (i)

    w stems from a compact set,

  2. (ii)

    the elements of \({\mathcal {F}}\) are continuous for every \(x>{\varvec{0}}\), and

  3. (iii)

    they are dominated by an integrable function,

provided \(X_1,X_2,\ldots \) are serially independent.Footnote 6 The first two properties are clearly satisfied in our context. In order to see that the third property is satisfied, too, note that

$$\begin{aligned} -\,{\varvec{1}}'(\log x)^-\le w'\log x\le \log w'x\le \log {\varvec{1}}'x \end{aligned}$$

for all \(x>{\varvec{0}}\), where \((\log x)^-\) denotes the negative part of the vector \(\log x\). Hence, the function

$$\begin{aligned} x \mapsto g(x) = \max \Big \{{\varvec{1}}'(\log x)^-,\log {\varvec{1}}'x\Big \} \end{aligned}$$

dominates each \(f_w\). Since \({\mathbf {E}}\big (|\log w'X_t|\big )<\infty \) for all \(w\in {\mathcal {S}}\), we have that \({\mathbf {E}}\big (|\log X_{it}|\big )<\infty \) and thus \({\mathbf {E}}\big ({\varvec{1}}'(\log X_t)^-\big )<\infty \). Moreover, note that \({\mathbf {E}}\left( \log w'X_t\right) <\infty \) for \(w={\varvec{1}}/N\) and \({\varvec{1}}'X_t>1\) because \(X_{0t}=1\). Hence, we obtain

$$\begin{aligned} \infty > \log N + {\mathbf {E}}\!\left( \log \frac{{\varvec{1}}'X_t}{N}\right) = {\mathbf {E}}\!\left( \log N+\log \frac{{\varvec{1}}'X_t}{N}\right) = {\mathbf {E}}\big (\log {\varvec{1}}'X_t\big ). \end{aligned}$$

The maximum of two nonnegative and integrable random variables is also integrable. Thus, we conclude that \({\mathbf {E}}\big (g(X_t)\big )<\infty \), i.e., the dominating function g is integrable.

At the beginning of Sect. 2 it has already been mentioned that each statement that refers to a random quantity is meant to be true with probability 1. The next theorem asserts that the BCRP is strongly consistent for the LOP. This means that \(w^*_n\) converges almost surely to \(w^*\), which is simply denoted by “\(w^*_n\rightarrow w^*\),” i.e., without the additional remark “a.s.,” for convenience.

Theorem 4

\(w^*_n\rightarrow w^*\)

Proof

The BCRP \(w^*_n\) represents an M-estimator, whose criterion functions are given by M and \(M_n\). Let \(\varepsilon \) be any positive real number and \({\mathcal {P}}_\varepsilon :=\big \{w\in {\mathcal {S}}\!:\Vert w-w^*\Vert =\varepsilon \big \}\).Footnote 7 Since M is strictly concave, there exists some \(\delta >0\) such that \(M(w^*)-M(w)>\delta \) for all \(w\in {\mathcal {P}}_\varepsilon \). Now, since \({\mathcal {F}}\) is Glivenko-Cantelli, we can find a sufficiently large number \(m\in {\mathbb {N}}\) such that, for all natural numbers \(n\ge m\), \(|M_n(w^*)-M(w^*)|\le \delta /2\) and \(|M_n(w)-M(w)|\le \delta /2\) for all \(w\in {\mathcal {P}}_\varepsilon \). Thus, \(M_n(w)<M_n(w^*)\) for all \(w\in {\mathcal {P}}_\varepsilon \). Since \(M_n\) is strictly concave, too, we have that \(\Vert w^*_n-w^*\Vert <\varepsilon \) for all \(n\ge m\). This holds true for every \(\varepsilon >0\) and thus \(w^*_n\rightarrow w^*\). \(\square \)

The next theorem asserts that the expected out-of-sample log-return on the BCRP converges to the expected log-return on the LOP.

Theorem 5

\(\varphi _{n+1}(w^*_n)\rightarrow \varphi (w^*)\)

Proof

Theorem 4 and the Continuous Mapping Theorem reveal that \(\log w^{*\prime }_nx\rightarrow \log w^{*\prime }x\) for all \(x>{\varvec{0}}\). Further, we already know that there exists an integrable function \(x\mapsto g(x)\) such that \(|f_w(x)|\le g(x)\) for all \(w\in {\mathcal {S}}\) and \(x>{\varvec{0}}\). Hence, by the Dominated Convergence Theorem, we obtain

$$\begin{aligned} \varphi _{n+1}(w^*_n) = {\mathbf {E}}\big (\log w^{*\prime }_nX_{n+1}\big ) \rightarrow {\mathbf {E}}\big (\log w^{*\prime }X_{n+1}\big ) = {\mathbf {E}}\big (\log w^{*\prime }X\big )= \varphi (w^*)\,. \end{aligned}$$

\(\square \)

Finally, also the in-sample average log-return on the BCRP converges to the expected log-return on the LOP as the number of observations grows to infinity.

Theorem 6

\(\frac{1}{n}\sum _{t=1}^n \log w^{*\prime }_nX_t\rightarrow \varphi (w^*)\)

Proof

The statement is equivalent to \(|M_n(w^*_n)-M(w^*)|\rightarrow 0\). Thus, it suffices to demonstrate that

$$\begin{aligned} \big |M_n(w^*_n)-M(w^*_n)\big | \rightarrow 0\qquad \text {and}\qquad \big |M(w^*_n)-M(w^*)\big | \rightarrow 0. \end{aligned}$$

The former is an immediate consequence of A7. Moreover, the Dominated Convergence Theorem tells us that \(M(w_n)\rightarrow M(w)\) for every sequence \(\{w_n\}\) with \(w_n\in {\mathcal {S}}\) such that \(w_n\rightarrow w\in {\mathcal {S}}\). This means that M is continuous at each \(w\in {\mathcal {S}}\). Theorem 4 and the Continuous Mapping Theorem complete the proof. \(\square \)

4.2.2 Asymptotic distribution

In this section, the asymptotic distribution of \(\sqrt{n}\,\big (w^*_n-w^*\big )\) is established. This can be done for all dimensions of \(w^*\) that are not bounded by \({\mathcal {S}}\), i.e., \({\mathbf {E}}\big (X_{it}/w^{*\prime }X_t\big )=1\). As already explained at the end of Sect. 3, each other component of \(w^*\) is bounded by the simplex. If \(w^*_i=0\) represents such a component, i.e., \({\mathbf {E}}\big (X_{it}/w^{*\prime }X_t\big )<1\), it is well-known that

$$\begin{aligned} \sqrt{n}\,(w^*_{in}-w^*_i) = \sqrt{n}\,w^*_{in} \overset{{\tiny {\text{ p }}}}{\rightarrow }0\,, \end{aligned}$$

i.e., \(w^*_{in}\) is superconsistent. However, not all components of the LOP can be affected by the given constraints on the portfolio weights. Indeed, we must have that \({\mathbf {E}}\big (X_{it}/w^{*\prime }X_t\big )=1\) for at least one asset because otherwise the KKT conditions given by Theorem 1 cannot be satisfied. Thus, we can reduce the asset universe until there is no portfolio weight that is bounded by \({\mathcal {S}}\). The riskless asset need not be part of the reduced asset universe. However, in order to avoid the trivial solution \(w^*_n=1\), there should be at least two remaining assets in the universe.

Hence, we assume that the given asset universe has been reduced such that \({\mathbf {E}}\big (X_t/w^{*\prime }X_t\big )={\varvec{1}}\). This guarantees that

$$\begin{aligned} (w-w^*)'\nabla M(w^*) = (w-w^*)'{\mathbf {E}}\!\left( \frac{X}{w^{*\prime }X}\right) = 0\,. \end{aligned}$$

This means that the function M can be locally approximated at \(w^*\) by

$$\begin{aligned} M(w) = M(w^*) + \frac{1}{2}\,(w-w^*)'\nabla ^2M(w^*)(w-w^*) + o\big (\Vert w-w^*\Vert ^2\big ). \end{aligned}$$

From the Monotone Convergence Theorem we conclude that the Hessian is given by

$$\begin{aligned} \nabla ^2M(w^*) = -\,{\mathbf {E}}\!\left( \frac{XX'}{(w^{*\prime }X)^2}\right) . \end{aligned}$$

The following assumption implies that the Hessian is finite. Finally, A3 guarantees that \(\nabla ^2M(w^*)\) is negative definite.

A8.:

The second moments of \(X_t/(w^{*\prime }X_t)\) are finite.

Further, we have to make the following assumptions:

A9.:

The function \(f_w\) can be locally approximated at \(w^*\) by

$$\begin{aligned} f_w(X_t) = f_{w^*}(X_t) + (w-w^*)'\left( \frac{X_t}{w^{*\prime }X_t}\right) + \Vert w-w^*\Vert \,r(X_t;w), \end{aligned}$$

where the process \(\big \{r(X_t;w)\big \}\) is stochastically equicontinuous. This means that for every \(\epsilon >0\) and \(\eta >0\) there exists a neighborhood \({\mathcal {U}}\) of \(w^*\) in the simplex \({\mathcal {S}}\) such that

$$\begin{aligned} \limsup _{n\rightarrow \infty }\,{\mathbb {P}}^*\left( \sup _{w\in {\mathcal {U}}} \left| \sqrt{n}\left( \frac{1}{n}\sum _{t=1}^nr(X_t;w)-{\mathbf {E}}\big (r(X;w)\big )\right) \right| >\eta \right) < \epsilon , \end{aligned}$$

where \({\mathbb {P}}^*\) is an outer measure associated with \({\mathbb {P}}\).

A10.:

We have that

$$\begin{aligned} \sqrt{n}\,\left( \frac{1}{n}\sum _{t=1}^n\frac{X_t}{w^{*\prime }X_t}-{\varvec{1}}\right) \rightsquigarrow {\mathcal {N}}\big ({\varvec{0}},A\big ). \end{aligned}$$

Footnote 8

A9 is a basic regularity condition, which guarantees that the remainder \(r(X_t,w)\) of the linear approximation becomes negligible as \(n\rightarrow \infty \). To be more precise, it requires that

$$\begin{aligned}&\sqrt{n}\big (M_n(w)-M(w)\big ) \approx \sqrt{n}\big (M_n(w^*)-M(w^*)\big ) + \sqrt{n}\,(w-w^*)'\!\left( \frac{1}{n}\sum _{t=1}^n\frac{X_t}{w^{*\prime }X_t}-{\varvec{1}}\right) \end{aligned}$$

if the sample size, n, is large and w is close to \(w^*\).Footnote 9 Further, A10 says that the process \(\big \{X_t/w^{*\prime }X_t\big \}\) satisfies the Central Limit Theorem. In particular, if the elements of \(\big \{X_t\big \}\) are serially independent, we obtain the asymptotic covariance matrix

$$\begin{aligned} A = {\mathbf {Var}}\!\left( \frac{X}{w^{*\prime }X}\right) = {\mathbf {E}}\!\left( \frac{XX'}{(w^{*\prime }X)^2}\right) - {\varvec{1}}{\varvec{1}}'. \end{aligned}$$

Nonetheless, we could take also any form of serial dependence into account, provided the Central Limit Theorem expressed by A10 is satisfied. There exist many strong mixing conditions that guarantee that this theorem holds true for the process \(\big \{X_t/w^{*\prime }X_t\big \}\) (see, e.g., Bradley 2005).

Suppose that \(\Theta \subseteq \mathbb {R}^d\) is any parameter set and let \(\theta \in \Theta \) be the “true” parameter. The tangent cone at \(\theta \) is the set that we obtain after centering \(\Theta \) at \(\theta \), blowing it up by some factor \(\tau >0\), and taking the set limit for \(\tau \rightarrow \infty \) (Geyer 1994, p. 1993). In order to study the asymptotic behavior of a sequence \(\{\theta _n\}\) of global optimizers that converges to \(\theta \) it is crucial to guarantee that the parameter set \(\Theta \) is Chernoff regular (Geyer 1994), viz.

$$\begin{aligned} \liminf _{\tau \rightarrow \infty } \tau \big (\Theta -\theta \big ) = \limsup _{\tau \rightarrow \infty } \tau \big (\Theta -\theta \big ) =: \lim _{\tau \rightarrow \infty } \tau \big (\Theta -\theta \big )\,. \end{aligned}$$

In our context, the parameter \(\theta \) corresponds to \(w^*\!\in {\mathcal {S}}\), which represents the global solution of the convex optimization problem expressed by Definition 1. The simplex \({\mathcal {S}}\) is Chernoff regular and so let \({\mathcal {T}}_{{\mathcal {S}}}(w^*):=\lim _{\tau \rightarrow \infty }\tau \big ({\mathcal {S}}-w^*\big )\) be the tangent cone of the simplex at \(w^*\).Footnote 10

Consider any random vector \(Y\sim {\mathcal {N}}\big ({\varvec{0}},A\big )\) and define the function

$$\begin{aligned} \zeta \mapsto \Psi _Y(\zeta ) := \zeta 'Y - \frac{1}{2}\,\zeta '{\mathbf {E}}\!\left( \frac{XX'}{(w^{*\prime }X)^2}\right) \zeta \,,\qquad \zeta \in \mathbb {R}^{N+1}. \end{aligned}$$

The (unique) maximizer of \(\Psi _Y\) is denoted by

$$\begin{aligned} \zeta ^* := {{\,\mathrm{arg\,max}\,}}_{\zeta \in {\mathcal {T}}_{{\mathcal {S}}}(w^*)}~\Psi _Y(\zeta )\,. \end{aligned}$$

The following theorem describes the asymptotic behavior of the BCRP.

Theorem 7

We have that

$$\begin{aligned} \sqrt{n}\,\big (w^*_n-w^*\big )\rightsquigarrow {{\,\mathrm{arg\,max}\,}}_{\zeta \in {\mathcal {T}}_{{\mathcal {S}}}(w^*)}~\zeta '{\mathcal {N}}\big ({\varvec{0}},A\big )-\frac{1}{2}\,\zeta '{\mathbf {E}}\!\left( \frac{XX'}{(w^{*\prime }X)^2}\right) \zeta \,. \end{aligned}$$

Proof

The theorem asserts that \(\sqrt{n}\,\big (w^*_n-w^*\big )\rightsquigarrow \zeta ^*\), which is an immediate consequence of Theorem 4.4 in Geyer (1994). \(\square \)

Hence, if the sample size is large, \(\sqrt{n}\,\big (w^*_n-w^*\big )\) behaves essentially like the solution of a relatively simple quadratic optimization problem. In the case in which the elements of \(\big \{X_t\big \}\) are serially independent, we obtain

$$\begin{aligned} \sqrt{n}\,\big (w^*_n-w^*\big )\rightsquigarrow {{\,\mathrm{arg\,max}\,}}_{\zeta \in {\mathcal {T}}_{{\mathcal {S}}}(w^*)}~ \zeta '{\mathcal {N}}\left( {\varvec{0}},{\mathbf {Var}}\!\left( \frac{X}{w^{*\prime }X}\right) \right) -\frac{1}{2}\,\zeta '{\mathbf {E}}\!\left( \frac{XX'}{(w^{*\prime }X)^2}\right) \zeta \,. \end{aligned}$$

The following corollary establishes the long-run distribution of the log-return on the BCRP relative to the log-return on the LOP.

Corollary 1

$$\begin{aligned} \log \frac{V_{w^*_nn}}{V_{w^*n}} \rightsquigarrow \zeta ^{*\prime }{\mathcal {N}}\big ({\varvec{0}},A\big ) - \frac{1}{2}\,\zeta ^{*\prime }{\mathbf {E}}\!\left( \frac{XX'}{(w^{*\prime }X)^2}\right) \zeta ^*. \end{aligned}$$

Proof

Note that

$$\begin{aligned} M_n(w^*_n) = \frac{1}{n}\sum _{t=1}^n \log w^{*\prime }_nX_t\qquad \text {and}\qquad M_n(w^*) = \frac{1}{n}\sum _{t=1}^n \log w^{*\prime }X_t, \end{aligned}$$

i.e.,

$$\begin{aligned} n\big (M_n(w^*_n)-M_n(w^*)\big ) = \sum _{t=1}^n \log \frac{w^{*\prime }_nX_t}{w^{*\prime }X_t} = \log \frac{V_{w^*_nn}}{V_{w^*n}}. \end{aligned}$$

The rest of the proof follows from Theorem 4.4 in Geyer (1994). \(\square \)

This completes our analysis of the BCRP. In the next section we focus on the MVE and derive its corresponding statistical properties.

5 The mean-variance estimator

Consider some portfolio \(w\in {\mathcal {S}}\) and let \({\tilde{w}}=(w_1,w_2,\ldots ,w_N)\) be the “risky part” of that portfolio. The return on Asset i after Day t is given by \(R_{it}=X_{it}-1\) and so the return on w amounts to \(R_{wt}={\tilde{w}}'R_t\), where \(R_t=(R_1,R_2,\ldots ,R_N)\) denotes the vector of risky asset returns.Footnote 11 The assumptions A1 to A10 shall still hold true. Now, we make the following additional assumption:

B1.:

The second moments of \(R_t\) are finite.

Let R be any random vector that has the same distribution as \(R_1,R_2,\ldots \,\). Define \(\mu :={\mathbf {E}}(R)\) and \(\Sigma :={\mathbf {E}}(RR')\). Note that the matrix \(\Sigma \) contains the second noncentral moments of the risky asset returns and thus it is not the covariance matrix of R. We already know that A3 guarantees that there cannot be any \({\tilde{w}}\in \mathbb {R}^N\) with \({\tilde{w}}\ne {\varvec{0}}\) such that \({\tilde{w}}'R=c\in \mathbb {R}\), i.e., \(\Sigma \) is positive definite.

Now, we may apply the quadratic approximation \(\log (1+r)\approx r-\frac{1}{2}r^2\) and come to the conclusion that

$$\begin{aligned} {\mathbf {E}}\big (\log (1+R_{wt})\big ) \approx {\mathbf {E}}\left( {\tilde{w}}'R_t-\frac{1}{2}\,({\tilde{w}}'R_t)^2\right) = {\tilde{w}}'\mu - \frac{1}{2}\,{\tilde{w}}'\Sigma \,{\tilde{w}}\,. \end{aligned}$$
(2)

This section is build upon the observation that this approximation is very good in most practical applications. Hence, instead of maximizing the expected log-return, we can simply maximize the objective function \(w\mapsto {\tilde{w}}'\mu -\frac{1}{2}\,{\tilde{w}}'\Sigma \,{\tilde{w}}\).Footnote 12 In the following, this objective function is called “mean-variance” although \(\Sigma \) contains the second noncentral moments of R and so it does not coincide with the covariance matrix \({\mathbf {Var}}(R)\).

Definition 3

A mean-variance optimal portfolio is a portfolio \(w^\star \!\in {\mathcal {S}}\) that maximizes the mean-variance objective function, i.e.,

$$\begin{aligned} w^\star \in {{\,\mathrm{arg\,max}\,}}_{w\in {\mathcal {S}}}~{\tilde{w}}'\mu - \frac{1}{2}\,{\tilde{w}}'\Sigma \,{\tilde{w}}\,. \end{aligned}$$

Some important remarks may be appropriate at this point:

  • The vector \(w^\star \) is called mean-variance optimal, although \(\Sigma \) is not a covariance matrix. However, in most practical applications, \(\Sigma \) is close to \({\mathbf {Var}}(R)\) whenever R is a vector of daily asset returns.

  • We focus on the feasible set \({\mathcal {S}}\) only because \(w^\star \) serves as an approximation of the LOP. However, in general a mean-variance optimal portfolio need not be restricted to \({\mathcal {S}}\).

  • Under general (but quite technical) regularity conditions, the MVOP can be considered an approximation of the GOP (Karatzas and Kardaras 2007). Nonetheless, due to the reasons explained in Sect. 3, we should refrain from calling \(w^\star \) “GOP.”

Now, the Lagrange function of the optimization problem expressed by Definition 3 is

$$\begin{aligned} {\mathcal {L}}(w,\kappa ,\lambda ) = -\,{\tilde{w}}'\mu + \frac{1}{2}\,{\tilde{w}}'\Sigma \,{\tilde{w}} - \kappa 'w + \lambda ({\varvec{1}}'w-1). \end{aligned}$$

The following theorem is analogous to Theorem 2.

Theorem 8

The MVOP exists and is unique. It is characterized by \(w^\star \!\in {\mathcal {S}}\) such that the ith component of \(\mu -\Sigma \,{\tilde{w}}^\star \) is

$$\begin{aligned} \left\{ \begin{array}{ll} = \lambda , &{} w^\star _i>0 \\ \le \lambda , &{} w^\star _i=0 \\ \end{array},\qquad \lambda \ge 0\,. \right. \end{aligned}$$

Proof

The objective function

$$\begin{aligned} {\tilde{w}} \mapsto {\tilde{w}}'\mu - \frac{1}{2}\,{\tilde{w}}'\Sigma \,{\tilde{w}} \end{aligned}$$

is strictly concave and the given set of constraints on the portfolio weights \(w_1,w_2,\ldots ,w_N\), i.e., \({\tilde{w}}\ge {\varvec{0}}\) and \({\varvec{1}}'{\tilde{w}}\le 1\), is closed and convex. Hence, the “risky part” of \(w^\star \), i.e., \({\tilde{w}}^\star \), exists and is unique, which means that \(w^\star \) exists and is unique, too. Thus, we must have that

$$\begin{aligned} \begin{bmatrix} 0 \\ \mu - \Sigma \,{\tilde{w}}^\star \\ \end{bmatrix} = \lambda {\varvec{1}} - \kappa \end{aligned}$$

with \(w^\star \!\in {\mathcal {S}}\), \(\lambda \in \mathbb {R}\), \(\kappa =(\kappa _0,\kappa _1,\ldots ,\kappa _N)\ge {\varvec{0}}\), and \(w^\star _i\kappa _i=0\) for \(i=0,1,\ldots ,N\). It follows that \(\lambda =\kappa _0\ge 0\). \(\square \)

The next corollary shows how to identify the components of \(w^\star \) that are bounded by \({\mathcal {S}}\). This will be helpful later on.

Corollary 2

The number \(\lambda \) in Theorem 8 is uniquely determined by \(\lambda ={\tilde{w}}^{\star \prime }\big (\mu -\Sigma \,{\tilde{w}}^\star \big )\). Moreover, the portfolio weight

  • \(w^\star _0\) is bounded by \({\mathcal {S}}\) if and only if \(\lambda >0\), whereas

  • \(w^\star _i\) is bounded by \({\mathcal {S}}\) if and only if the ith component of \(\mu -\Sigma \,{\tilde{w}}^\star \) is lower than \(\lambda \).

Proof

The proof of Theorem 8 reveals that \({\tilde{w}}^{\star \prime }\big (\mu -\Sigma \,{\tilde{w}}^\star \big )=\lambda =\kappa _0\). Since \(w^\star \) is unique, the same holds true for \(\lambda \). Moreover, \(w^\star _0\) is bounded by \({\mathcal {S}}\) if and only if \(\kappa _0>0\), i.e., \(\lambda >0\), whereas \(w^\star _i\) is bounded by \({\mathcal {S}}\) if and only if \(\kappa _i>0\), i.e., the ith component of \(\mu -\Sigma \,{\tilde{w}}^\star \) is below \(\lambda \). \(\square \)

In the following, let

$$\begin{aligned} \mu _n := \frac{1}{n}\sum _{t=1}^n R_t\qquad \text {and}\qquad \Sigma _n := \frac{1}{n}\sum _{t=1}^n R_tR'_t \end{aligned}$$

be the moment estimators for \(\mu \) and \(\Sigma \). Now, we are ready to define the MVE for \(w^\star \), which serves also as an estimator for the LOP \(w^*\).

Definition 4

A mean-variance estimator for \(w^\star \) is a portfolio \(w^\star _n\in {\mathcal {S}}\) that maximizes the in-sample mean-variance objective function, i.e.,

$$\begin{aligned} w^\star _n \in {{\,\mathrm{arg\,max}\,}}_{w\in {\mathcal {S}}}~{\tilde{w}}'\mu _n - \frac{1}{2}\,{\tilde{w}}'\Sigma _n{\tilde{w}}\,. \end{aligned}$$

5.1 Small-sample properties

5.1.1 Existence and uniqueness

Let \({\mathbf {R}}=\big [R_1~R_2~\ldots ~R_n\big ]\) be the sample of risky asset returns. A4 implies that we cannot find any \({\tilde{w}}\in \mathbb {R}^N\) with \({\tilde{w}}\ne {\varvec{0}}\) such that \({\mathbf {R}}'{\tilde{w}}={\varvec{0}}\). Hence, we have that

$$\begin{aligned} {\tilde{w}}'\Sigma _n{\tilde{w}} = {\tilde{w}}'\left( \frac{1}{n}\sum _{t=1}^n R_tR'_t\right) {\tilde{w}} = \frac{{\tilde{w}}'{\mathbf {R}}{\mathbf {R}}'{\tilde{w}}}{n} > 0 \end{aligned}$$

for all \({\tilde{w}}\in \mathbb {R}^N\) with \({\tilde{w}}\ne {\varvec{0}}\), which means that \(\Sigma _n\) is positive definite.

The following corollary is a straightforward consequence of Theorem 8 and thus its proof can be skipped.

Corollary 3

The MVE exists and is unique. It is characterized by \(w^\star _n\in {\mathcal {S}}\) such that the ith component of \(\mu _n-\Sigma _n{\tilde{w}}^\star _n\) is

$$\begin{aligned} \left\{ \begin{array}{ll} = \lambda _n, &{} w^\star _{in}>0 \\ \le \lambda _n, &{} w^\star _{in}=0 \\ \end{array},\qquad \lambda _n\ge 0\,. \right. \end{aligned}$$

Numerical procedures for solving quadratic optimization problems exist in abundance and so it is easy to compute \(w^\star _n\) even if the number of dimensions is high. Two points, which are discussed in more detail in Sect. 6, are worth emphasizing:

  1. (i)

    The estimates \(w^*_{in}\) and \(w^\star _{in}\) are indistinguishable in most real-life situations.Footnote 13 Put another way, the MVE leads to a very good approximation of the BCRP.

  2. (ii)

    Cover’s algorithm (1984) for \(w^*_n\) is slow compared to quadratic optimization algorithms for \(w^\star _n\). In particular, this holds true in the high-dimensional case.

5.1.2 Finite-sample bias

Let \(w_n\) be any portfolio that is constructed on the basis of the asset returns \(R_1,R_2,\ldots ,R_n\). We know that the quantity \({\tilde{w}}'_nR_{n+1}-\frac{1}{2}\big ({\tilde{w}}'_nR_{n+1}\big )^2\) approximates the out-of-sample log-return on \(w_n\) and thus we call

$$\begin{aligned} {\mathbf {E}}\left( {\tilde{w}}'_nR_{n+1} - \frac{1}{2}\,\big ({\tilde{w}}'_nR_{n+1}\big )^2\right) \end{aligned}$$

the expected out-of-sample performance of \(w_n\). As already mentioned before, it is reasonable to presume that \(w_n\) is stochastically independent of \(R_{n+1}\). Otherwise, the investment decision at time t would depend on some asset returns that occur one day later, which is usually considered implausible in finance theory. Thus, we obtain the conditional expectation

$$\begin{aligned} {\mathbf {E}}\left( {\tilde{w}}'_nR_{n+1}-\frac{1}{2}\big ({\tilde{w}}'_nR_{n+1}\big )^2\,|\,w_n\right)= & {} {\tilde{w}}'_n\underset{=\,\mu }{\underbrace{{\mathbf {E}}\big (R_{n+1}\,|\,w_n\big )}} - \frac{1}{2}\,{\tilde{w}}'_n\underset{=\,\Sigma }{\underbrace{{\mathbf {E}}\big (R_{n+1}R'_{n+1}\,|\,w_n\big )}}{\tilde{w}}_n \\= & {} {\tilde{w}}'_n\mu -\frac{1}{2}\,{\tilde{w}}'_n\Sigma \,{\tilde{w}}_n, \end{aligned}$$

which can be viewed as the out-of-sample performance of \(w_n\). Correspondingly, due to the Law of Total Expectation, its expected out-of-sample performance is

$$\begin{aligned} \phi _{n+1}(w_n) := {\mathbf {E}}\left( {\tilde{w}}'_nR_{n+1}-\frac{1}{2}\big ({\tilde{w}}'_nR_{n+1}\big )^2\right) = {\mathbf {E}}\left( {\tilde{w}}'_n\mu -\frac{1}{2}\,{\tilde{w}}'_n\Sigma \,{\tilde{w}}_n\right) . \end{aligned}$$

The latter expectation is a basic performance measure in portfolio optimization (see, e.g., Frahm 2015; Kan and Zhou 2007; Markowitz and Usmen 2003).Footnote 14 Hence, as already mentioned before, it is an implicit assumption of portfolio theory that \(w_n\) is stochastically independent of \(R_{n+1}\).

If \(w_n\equiv w\) is a fixed portfolio, we have that \(\phi _{n+1}(w)={\tilde{w}}'\mu -\frac{1}{2}\,{\tilde{w}}'\Sigma \,{\tilde{w}}\). In this case we may drop the prefix “expected out-of-sample” and just say that \(\phi _{n+1}(w)\) is the performance of w. Further, then we can simply write \(\phi (w)\) instead of \(\phi _{n+1}(w)\). In particular,

$$\begin{aligned} \phi (w^\star ) = {\tilde{w}}^{\star \prime }\mu -\frac{1}{2}\,{\tilde{w}}^{\star \prime }\Sigma \,{\tilde{w}}^\star \end{aligned}$$

represents the performance of the MVOP.

Hence, the following assumptions, which are analogous to A5 and A6, are made:

B2.:

The MVE \(w^\star _n\) is stochastically independent of \(R_{n+1}\).

B3.:

The MVE does not coincide with the MVOP, i.e., \({\mathbb {P}}(w^\star _n=w^\star )\ne 1\).

Due to B2 the expected out-of-sample performance of the MVE amounts to

$$\begin{aligned} \phi _{n+1}(w^\star _n) = {\mathbf {E}}\left( {\tilde{w}}^{\star \prime }_n\mu -\frac{1}{2}\,{\tilde{w}}^{\star \prime }_n\Sigma \,{\tilde{w}}^\star _n\right) . \end{aligned}$$

Finally, \({\tilde{w}}'_n\mu _n-\frac{1}{2}{\tilde{w}}'_n\Sigma _n{\tilde{w}}_n\) represents the in-sample performance of the portfolio \(w_n\) and thus

$$\begin{aligned} \phi _n(w^\star _n) := {\mathbf {E}}\left( {\tilde{w}}^{\star \prime }_n\mu _n - \frac{1}{2}\,{\tilde{w}}^{\star \prime }_n\Sigma _n{\tilde{w}}^\star _n\right) \end{aligned}$$

is the expected in-sample performance of the MVE.

The following theorem is similar to Theorem 3.

Theorem 9

\(\phi _{n+1}(w^\star _n)<\phi (w^\star )<\phi _n(w^\star _n)\)

Proof

By definition, \(w^\star \) is the portfolio that maximizes the performance. Due to B2 and B3, we conclude that

$$\begin{aligned} \phi _{n+1}(w^\star _n) = {\mathbf {E}}\left( {\tilde{w}}^{\star \prime }_n\mu -\frac{1}{2}\,{\tilde{w}}^{\star \prime }_n\Sigma \,{\tilde{w}}^\star _n\right) < {\tilde{w}}^{\star \prime }\mu - \frac{1}{2}\,{\tilde{w}}^{\star \prime }\Sigma \,{\tilde{w}}^\star = \phi (w^\star )\,. \end{aligned}$$

Moreover, since \(w^\star _n\) is unique and does not coincide with \(w^\star \), we have that

$$\begin{aligned} {\mathbb {P}}\left( {\tilde{w}}^{\star \prime }_n\mu _n - \frac{1}{2}\,{\tilde{w}}^{\star \prime }_n\Sigma _n{\tilde{w}}^\star _n\ge {\tilde{w}}^{\star \prime }\mu _n - \frac{1}{2}\,{\tilde{w}}^{\star \prime }\Sigma _n{\tilde{w}}^\star \right) = 1 \end{aligned}$$

and

$$\begin{aligned} {\mathbb {P}}\left( {\tilde{w}}^{\star \prime }_n\mu _n - \frac{1}{2}\,{\tilde{w}}^{\star \prime }_n\Sigma _n{\tilde{w}}^\star _n>{\tilde{w}}^{\star \prime }\mu _n - \frac{1}{2}\,{\tilde{w}}^{\star \prime }\Sigma _n{\tilde{w}}^\star \right) > 0\,, \end{aligned}$$

which means that

$$\begin{aligned} \phi _n(w^\star _n)= & {} {\mathbf {E}}\left( {\tilde{w}}^{\star \prime }_n\mu _n - \frac{1}{2}\,{\tilde{w}}^{\star \prime }_n\Sigma _n{\tilde{w}}^\star _n\right) > {\mathbf {E}}\left( {\tilde{w}}^{\star \prime }\mu _n - \frac{1}{2}\,{\tilde{w}}^{\star \prime }\Sigma _n{\tilde{w}}^\star \right) \\= & {} {\tilde{w}}^{\star \prime }\mu - \frac{1}{2}\,{\tilde{w}}^{\star \prime }\Sigma \,{\tilde{w}}^\star = \phi (w^\star )\,. \end{aligned}$$

\(\square \)

Theorem 9 shows that we still suffer from the same problems that we have already found for the BCRP. This means that the in-sample performance of the MVE typically overestimates its expected out-of-sample performance and even the performance of the MVOP.

5.2 Large-sample properties

5.2.1 Consistency

The next assumption requires that \(\big \{R_t\big \}\) and \(\big \{R_tR'_t\big \}\) obey the Strong Law of Large Numbers. This holds true under very mild regularity conditions. If \(R_1,R_2,\ldots \) are serially independent, B1 is already sufficient. However, there exist much weaker mixing conditions, which guarantee that the Strong Law of Large Numbers is satisfied both for \(\big \{R_t\big \}\) and for \(\big \{R_tR'_t\big \}\). These mixing conditions are typically discussed in ergodic theory (see, e.g., Davidson 1994).

B4.:

The estimators \(\mu _n\) and \(\Sigma _n\) are strongly consistent for \(\mu \) and \(\Sigma \), i.e., \(\mu _n\rightarrow \mu \) and \(\Sigma _n\rightarrow \Sigma \).

Theorem 10

\(w^\star _n\rightarrow w^\star \)

Proof

Note that

$$\begin{aligned} w^\star _n = {{\,\mathrm{arg\,max}\,}}_{w\in {\mathcal {S}}}~{\tilde{w}}'\mu _n - \frac{1}{2}\,{\tilde{w}}'\Sigma _n{\tilde{w}} \end{aligned}$$

represents a function of \(\mu _n\) and \(\Sigma _n\). Since \({\mathcal {S}}\) is convex, this function is continuous in \(\mu _n\) and \(\Sigma _n\). From B4 and the Continuous Mapping Theorem it follows that \(w^\star _n\rightarrow w^\star \). \(\square \)

The next theorem is analogous to Theorem 5.

Theorem 11

\(\phi _{n+1}(w^\star _n)\rightarrow \phi (w^\star )\)

Proof

The objective function \(w\mapsto {\tilde{w}}'\mu -\frac{1}{2}\,{\tilde{w}}'\Sigma \,{\tilde{w}}\) is continuous in \(w\in {\mathcal {S}}\) and the set \({\mathcal {S}}\) is compact. From the Extreme Value Theorem we conclude that it has a minimum, a, and a maximum b. Hence, \(w\mapsto \max \big \{|a|,|b|\big \}\) is a dominating function and it is clearly integrable. We already know that \(w^\star _n\rightarrow w^\star \) and from the Dominated Convergence Theorem it follows that

$$\begin{aligned} \phi _{n+1}(w^\star _n) = {\mathbf {E}}\!\left( {\tilde{w}}^{\star \prime }_n\mu -\frac{1}{2}\,{\tilde{w}}^{\star \prime }_n\Sigma \,{\tilde{w}}^\star _n\right) \rightarrow {\tilde{w}}^{\star \prime }\mu -\frac{1}{2}\,{\tilde{w}}^{\star \prime }\Sigma \,{\tilde{w}}^\star = \phi (w^\star )\,. \end{aligned}$$

\(\square \)

Moreover, analogous to Theorem 6, the Continuous Mapping Theorem immediately implies that

$$\begin{aligned} {\tilde{w}}^{\star \prime }_n\mu _n-\frac{1}{2}\,{\tilde{w}}^{\star \prime }_n\Sigma _n{\tilde{w}}^\star _n \rightarrow {\tilde{w}}^{\star \prime }\mu - \frac{1}{2}\,{\tilde{w}}^{\star \prime }\Sigma \,{\tilde{w}}^\star = \phi (w^\star ), \end{aligned}$$

i.e., the in-sample performance of the MVE converges to the performance of the MVOP.

5.2.2 Asymptotic distribution

Now, the asymptotic distribution of \(\sqrt{n}\,(w^\star _n-w^\star )\) is derived. If some portfolio weight \(w^\star _i\) is bounded by \({\mathcal {S}}\) it must be zero and the associated MVE is superconsistent, i.e., \(\sqrt{n}\,w^\star _{in}\overset{{\tiny {\text{ p }}}}{\rightarrow }0\). Hence, in order to derive the asymptotic distribution of \(\sqrt{n}\,(w^\star _n-w^\star )\), we must guarantee that no component of the MVOP \(w^\star \) is bounded by \({\mathcal {S}}\). According to Corollary 2, this holds true if and only if \(\mu -\Sigma \,{\tilde{w}}^\star ={\varvec{0}}\), i.e., \({\tilde{w}}^\star =\Sigma ^{-1}\mu \). However, in practical situations it often happens that the weight of the riskless asset, \(w^\star _0\), is bounded by \({\mathcal {S}}\), which means that the Lagrange multiplier \(\lambda \) in Theorem 8 is positive. In this case, we must abandon the riskless asset from our asset universe and focus on the risky assets. Then the MVOP is simply characterized by \({\tilde{w}}^\star \!\in {\mathcal {S}}\) such that the ith component of \(\mu -\Sigma \,{\tilde{w}}^\star \) is

$$\begin{aligned} \left\{ \begin{array}{ll} = \lambda , &{}\quad w^\star _i>0 \\ \le \lambda , &{}\quad w^\star _i=0 \\ \end{array} \right. \end{aligned}$$

with \(\lambda >0\). Thus, in the case in which the riskless asset has been removed, we assume that the remaining asset universe is such that \(\mu -\Sigma \,{\tilde{w}}^\star =\lambda {\varvec{1}}\) for any \(\lambda >0\).

Consider the family \({\mathcal {F}}=\big \{f_w\big \}_{w\in {\mathcal {S}}}\) with

$$\begin{aligned} r \mapsto f_w(r) = {\tilde{w}}'r - \frac{1}{2}\,({\tilde{w}}'r)^2 \end{aligned}$$

for all \(w\in {\mathcal {S}}\) and \(r\in \mathbb {R}^N\). Further, define the functions

$$\begin{aligned} w \mapsto F(w) := {\mathbf {E}}\big (f_w(R)\big ) = {\tilde{w}}'\mu - \frac{1}{2}\,{\tilde{w}}'\Sigma \,{\tilde{w}} \end{aligned}$$

and

$$\begin{aligned} w \mapsto F_n(w) := \frac{1}{n}\sum _{t=1}^n f_w(R_t) = {\tilde{w}}'\mu _n - \frac{1}{2}\,{\tilde{w}}'\Sigma _n{\tilde{w}}\,. \end{aligned}$$

It is obvious that the function F can be locally approximated at \(w^\star \) by

$$\begin{aligned} F(w) = F(w^\star ) - \frac{1}{2}\,({\tilde{w}}-{\tilde{w}}^\star )'\Sigma \,({\tilde{w}}-{\tilde{w}}^\star ), \end{aligned}$$

where \(\Sigma \) is positive definite.

The next regularity conditions are analogous to A9 and A10:

B5.:

The function \(f_w\) can be locally approximated at \(w^\star \) by

$$\begin{aligned} f_w(R_t) = f_{w^\star }(R_t) + ({\tilde{w}}-{\tilde{w}}^\star )'\big (R_t-R_tR'_t{\tilde{w}}^\star \big ) + \Vert {\tilde{w}}-{\tilde{w}}^\star \Vert \,r(R_t;{\tilde{w}}), \end{aligned}$$

where the process \(\big \{r(R_t;{\tilde{w}})\big \}\) is stochastically equicontinuous.

B6.:

We have that

$$\begin{aligned} \sqrt{n}\,\big (\mu _n-\mu \big ) - \sqrt{n}\,\big (\Sigma _n-\Sigma \big ){\tilde{w}}^\star \rightsquigarrow {\mathcal {N}}\big ({\varvec{0}},B\big ). \end{aligned}$$

Once again, B5 guarantees that the remainder \(r(X_t,w)\) of the linear approximation becomes negligible as \(n\rightarrow \infty \). Further, B6 requires the joint asymptotic normality of the given estimators for \(\mu \) and \(\Sigma \) after the usual standardization. Since \(\mu _n\) and \(\Sigma _n\) represent the moment estimators of \(\mu \) and \(\Sigma \), basically it states that \(\big \{R_t-R_tR'_t{\tilde{w}}^\star \big \}\) should satisfy the Central Limit Theorem.Footnote 15

The latter assumption indicates that we can decompose the estimation risk into two parts:

  1. (i)

    \(\sqrt{n}\,\big (\mu _n-\mu \big )\) represents the estimation risk that can be attributed to \(\mu \), whereas

  2. (ii)

    \(\sqrt{n}\,\big (\Sigma _n-\Sigma \big ){\tilde{w}}^\star \) stands for the estimation risk that is related to \(\Sigma \).

Note that such a risk decomposition cannot be accomplished for the BCRP.

In some cases it is possible to calculate the asymptotic covariance matrix B in B6. For example, if \(R_1,R_2,\ldots \) are serially independent and normally distributed, we have that

$$\begin{aligned} B= & {} (1-{\tilde{w}}^{\star \prime }\mu )^2\,\Gamma - (1-{\tilde{w}}^{\star \prime }\mu )(\Gamma \,{\tilde{w}}^\star )\mu ' - (1-{\tilde{w}}^{\star \prime }\mu )\mu (\Gamma \,{\tilde{w}}^\star )' \\&+ \big ({\tilde{w}}^{\star \prime }\Gamma \,{\tilde{w}}^\star \big )\,\big (\Gamma +\mu \mu '\big ) + \big (\Gamma \,{\tilde{w}}^\star \big )\big (\Gamma \,{\tilde{w}}^\star \big )', \end{aligned}$$

where \(\Gamma =\Sigma -\mu \mu '\) denotes the covariance matrix of R.Footnote 16 More precisely, we can apply the decomposition \(B=B_\mu +B_\Sigma \), where

$$\begin{aligned} B_\mu = \Gamma - \Big [\big (\Gamma \,{\tilde{w}}^\star \big )\mu '+2({\tilde{w}}^{\star \prime }\mu )\Gamma +\mu \big (\Gamma \,{\tilde{w}}^\star \big )'\Big ] \end{aligned}$$

quantifies the estimation risk that is associated with \(\mu \) and

$$\begin{aligned} B_\Sigma= & {} ({\tilde{w}}^{\star \prime }\mu )^2\,\Gamma + ({\tilde{w}}^{\star \prime }\mu )(\Gamma \,{\tilde{w}}^\star )\mu ' + ({\tilde{w}}^{\star \prime }\mu )\mu (\Gamma \,{\tilde{w}}^\star )' + \big ({\tilde{w}}^{\star \prime }\Gamma \,{\tilde{w}}^\star \big )\big (\Gamma +\mu \mu '\big )\\&+ \big (\Gamma \,{\tilde{w}}^\star \big )\big (\Gamma \,{\tilde{w}}^\star \big )' \end{aligned}$$

measures the estimation risk related to \(\Sigma \). Similar results can be obtained if we assume that R has an elliptical distribution possessing heavy tails and tail dependence. Alternatively, we could apply a (block) bootstrap (see, e.g., Politis 2003) in order to approximate B, or even \(B_\mu \) and \(B_\Sigma \), without making any parametric assumption.

Consider any random vector \(Z\sim {\mathcal {N}}\big ({\varvec{0}},B\big )\). Now, we may define

$$\begin{aligned} \varsigma \mapsto \Phi _Z(\varsigma ) := {\tilde{\varsigma }}'Z - \frac{1}{2}\,{\tilde{\varsigma }}'\Sigma \,{\tilde{\varsigma }},\qquad \varsigma \in \mathbb {R}^{N+1}, \end{aligned}$$

with \(\varsigma =(\varsigma _0,\varsigma _1,\ldots ,\varsigma _N)\) and \({\tilde{\varsigma }}=(\varsigma _1,\varsigma _2,\ldots ,\varsigma _N)\). The (unique) maximizer of \(\Phi _Z\) is given by

$$\begin{aligned} \varsigma ^\star = {{\,\mathrm{arg\,max}\,}}_{\varsigma \in {\mathcal {T}}_{{\mathcal {S}}}(w^\star )}~\Phi _Z(\varsigma )\,. \end{aligned}$$

The following theorem clarifies the asymptotic behavior of the MVE. This result follows by the same arguments that were used for Theorem 7 and so the proof can be skipped.

Theorem 12

We have that

$$\begin{aligned} \sqrt{n}\,\big (w^\star _n-w^\star \big )\rightsquigarrow {{\,\mathrm{arg\,max}\,}}_{\varsigma \in {\mathcal {T}}_{{\mathcal {S}}}(w^\star )}~{\tilde{\varsigma }}'{\mathcal {N}}\big ({\varvec{0}},B\big )-\frac{1}{2}\,{\tilde{\varsigma }}'\Sigma \,{\tilde{\varsigma }}\,. \end{aligned}$$

In the case in which the riskless asset has been removed from the asset universe, we may consider the (unique) maximizer

$$\begin{aligned} {\tilde{\varsigma }}^\star = {{\,\mathrm{arg\,max}\,}}_{{\tilde{\varsigma }}\in {\mathcal {T}}_{{\mathcal {S}}}({\tilde{w}}^\star )}~{\tilde{\varsigma }}'Z - \frac{1}{2}\,{\tilde{\varsigma }}'\Sigma \,{\tilde{\varsigma }} \end{aligned}$$

and then Theorem 12 reads

$$\begin{aligned} \sqrt{n}\,\big ({\tilde{w}}^\star _n-{\tilde{w}}^\star \big ) \rightsquigarrow {{\,\mathrm{arg\,max}\,}}_{{\tilde{\varsigma }}\in {\mathcal {T}}_{{\mathcal {S}}}({\tilde{w}}^\star )}~{\tilde{\varsigma }}'{\mathcal {N}}\big ({\varvec{0}},B\big )-\frac{1}{2}\,{\tilde{\varsigma }}'\Sigma \,{\tilde{\varsigma }}\,. \end{aligned}$$

6 Some practical remarks

6.1 Computational issues

Cover’s (1984) algorithm for the BCRP is simple and works like this:

  1. (i)

    Choose any initial portfolio \(w^{(0)}\in {\mathcal {S}}\) and set \(k\leftarrow 0\).

  2. (ii)

    Update the portfolio according to

    $$\begin{aligned} w^{(k+1)} = w^{(k)}\!\odot \frac{1}{n}\sum _{t=1}^n\frac{X_t}{w^{(k)\prime }X_t} \end{aligned}$$

    and set \(k\leftarrow k+1\).Footnote 17

  3. (iii)

    Repeat the second step until the largest component of the vector

    $$\begin{aligned} \frac{1}{n}\sum _{t=1}^n\frac{X_t}{w^{(k)\prime }X_t} \end{aligned}$$

    falls below a critical threshold just above 1.

The computations made during this work are based on MATLAB. The critical threshold for the BCRP is \(\exp 10^{-6}\). Further, the MOSEK optimization toolbox for MATLAB is used in order to compute the MVE, which proves to be very fast and reliable. It turns out that the BCRP, \(w^*_n\), and the MVE, \(w^\star _n\), are almost identical. However, computing \(w^\star _n\) by quadratic optimization is much faster. In order to demonstrate these statements, we can simulate n independent and identically distributed vectors of daily asset returns \(R_1,R_2,\ldots ,R_n\sim {\mathcal {N}}\big (\mu ,\Gamma \big )\) with

$$\begin{aligned} \mu = \frac{0.1}{250}\,{\varvec{1}}\qquad \text {and}\qquad \Gamma = \frac{0.2^2}{250}\,\big (0.3\,{\varvec{1}}{\varvec{1}}'+0.7\,{\mathbf {I}}_N\big ). \end{aligned}$$
(3)

Let us assume that the number of risky assets is \(N=100\) and the number of daily observations is \(n=250\). In this case, both \(w^*_0\) and \(w^\star _0\) are bounded by \({\mathcal {S}}\), i.e., \(w^*_0=w^\star _0=0\). Thus, we abandon the riskless asset from the asset universe.

The numerical simulations are done 100 times. Each time Cover’s algorithm for \({\tilde{w}}^*_n\) and the quadratic optimizer for \({\tilde{w}}^\star _n\) is applied. On average, Cover’s algorithm needs 5.5914 s, whereas MOSEK takes only 0.0103 s.Footnote 18 The supremum norm of \({\tilde{w}}^*_n-{\tilde{w}}^\star _n\) is 0.0173. Although Cover’s algorithm is much slower than the quadratic optimizer, the outcome of the latter turns out to be slightly better: The quadratic optimizer leads to an annualized average log-return of 0.4892, whereas Cover’s algorithm yields only 0.4890 per year. That is, the quadratic optimizer comes even closer to the (true) BCRP than Cover’s algorithm. In fact, the average log-returns produced by the quadratic optimizer are always better than those of Cover’s algorithm. Hence, \({\tilde{w}}^\star _n\) dominates \({\tilde{w}}^*_n\) in a numerical sense. Moreover, Cover’s algorithm is very slow in high dimensions, whereas the quadratic optimizer works well even for \(N=1000\) and \(n=2500\), in which case the computational time for \({\tilde{w}}^\star _n\) is still below 1 s.

There is another computational issue. For applying the asymptotic results derived in Sect. 4.2.2 we have to simulate the random vector \(Y\sim {\mathcal {N}}({\varvec{0}},A)\), where the covariance matrix A appears in A10. The problem is that A is singular. More precisely, we have that

$$\begin{aligned} {\tilde{w}}^{*\prime }A\,{\tilde{w}}^* = {\tilde{w}}^{*\prime }{\mathbf {E}}\!\left( \frac{XX'}{({\tilde{w}}^{*\prime }X)^2}\right) {\tilde{w}}^* - {\tilde{w}}^{*\prime }{\varvec{1}}{\varvec{1}}'{\tilde{w}}^* = {\mathbf {E}}\!\left( \frac{({\tilde{w}}^{*\prime }X)^2}{({\tilde{w}}^{*\prime }X)^2}\right) - 1^2 = 0, \end{aligned}$$

which means that A is not positive definite. Thus, we have to apply a matrix decomposition in order to simulate Y. This issue does not arise when applying the asymptotic results derived in Sect. 5.2.2, in which case we must simulate the random vector \(Z\sim {\mathcal {N}}({\varvec{0}},B)\). As already mentioned in Sect. 5.2.2, we can even provide a closed-form expression for B in many standard situations. The principal approach is demonstrated in the “Appendix”.

To sum up, the quadratic approximation proposed at the beginning of Sect. 5 works very well and, in contrast to the BCRP, the MVE does not suffer from computational issues. For this reason, we focus on \({\tilde{w}}^\star _n\) in the following discussion.

Fig. 1
figure 1

100 realizations of \({\tilde{w}}^\star _n\) on the basis of 250, 2500, and 1 million daily observations (from left to right) if \(\mu \) is unknown (upper part) and if it is known (lower part)

6.2 Statistical inference

Let us assume that the elements of \(\big \{R_t\big \}\) are serially independent and normally distributed. To keep things as simple as possible, we may choose the parameterization in Eq. 3. Further, let the number of risky assets be \(N=2\) and the number of observations be \(n=250\).Footnote 19 Once again, we generate 100 samples and with each one we compute a realization of \({\tilde{w}}^\star _n\). On the upper left of Fig. 1 we can see that most of the estimates are far away from \({\tilde{w}}^\star =(0.5,0.5)\). The vast majority of the estimates are boundary solutions. More precisely, we have 50 estimates that equal (0, 1) and 41 that correspond to (1, 0). The given result does not improve, essentially, if we increase the number of observations to \(n=2500\) and it is still sobering even for 1 million observations. By contrast, if we assume that \(\mu \) was known, the estimates turn out to be much better (see the lower part of Fig. 1). In particular, there is no more estimate at the boundary of the simplex, and in the case of \(n=10^6\) observations the estimates are almost identical with \({\tilde{w}}^\star \).

Are we able to replicate the finite-sample results by a large-sample approximation? For this purpose we could use Theorem 12 and the expressions for \(B_\mu \) and \(B_\Sigma \) presented in Sect. 5.2.2. The corresponding realizations of the synthetic estimator \({\tilde{w}}^\infty _n:={\tilde{w}}^\star +{\tilde{\varsigma }}^\star /\sqrt{n}\,\) are depicted in Fig. 2. The upper left of this figure indicates that there are 90 realizations outside the simplex. This is because the large-sample approximation is based on the maximizer \({\tilde{\varsigma }}^\star \), which belongs to the tangent cone of \({\mathcal {S}}\) at \({\tilde{w}}^\star \). Hence, the support of \({\tilde{w}}^\infty _n\) does not correspond to \({\mathcal {S}}\). Similarly, there are 82 realizations of \({\tilde{w}}^\infty _n\) missing in the simplex on the upper center. By contrast, the simplex on the upper right contains all 100 realizations of \({\tilde{w}}^\infty _n\). The picture changes essentially on the lower part of Fig. 2, where it is assumed that \(\mu \) is known. In this case, we cannot find any realization of \({\tilde{w}}^\infty _n\) outside \({\mathcal {S}}\). Moreover, the large-sample approximation satisfyingly reproduces the finite-sample results that are depicted on the lower part of Fig. 1.

Fig. 2
figure 2

100 realizations of \({\tilde{w}}^\infty _n\) on the basis of 250, 2500, and 1 million daily observations (from left to right) if \(\mu \) is unknown (upper part) and if it is known (lower part)

The problem is that the expected asset returns are unknown in real life. However, we can essentially improve the large-sample approximation by applying a finite-sample correction in order to guarantee that the realizations always belong to the simplex. We know from Theorem 12 that, if the sample size is large, \(\sqrt{n}\,\big ({\tilde{w}}^\star _n-{\tilde{w}}^\star \big )\) behaves essentially like the maximizer, \({\tilde{\varsigma }}^\star \), of \({\tilde{\varsigma }}\mapsto {\tilde{\varsigma }}'Z-\frac{1}{2}\,{\tilde{\varsigma }}'\Sigma \,{\tilde{\varsigma }}\) over the tangent cone of \({\mathcal {S}}\) at \({\tilde{w}}^\star \). Hence, since the sample size is not large enough, we may substitute \({\tilde{\varsigma }}^\star \) with

$$\begin{aligned} {\tilde{\varsigma }}^\star _n := {{\,\mathrm{arg\,max}\,}}_{{\tilde{\varsigma }}\in \sqrt{n}\,({\mathcal {S}}-{\tilde{w}}^\star )} {\tilde{\varsigma }}'Z - \frac{1}{2}\,{\tilde{\varsigma }}'\Sigma \,{\tilde{\varsigma }}\,. \end{aligned}$$

Footnote 20The corrected version of \({\tilde{w}}^\infty _n\) reads \({\tilde{w}}^\diamond _n:={\tilde{w}}^\star +{\tilde{\varsigma }}^\star _n/\sqrt{n}\,\), which always belongs to the simplex.

Fig. 3
figure 3

Empirical distribution functions of \(w^\star _{1n}\) (black line) versus \(w^\diamond _{1n}\) (green line) for 250, 2500, and 1 million observations (from left to right). (Color figure online)

In order to verify that the finite-sample correction works fine, we may compare the empirical distribution functions of 10,000 realizations of \(w^\star _{1n}\) and \(w^\diamond _{1n}\), where \(\mu \) is assumed to be unknown. We still have only \(N=2\) risky assets and the parameterization is the same as before (see Eq. 3). The results are given in Fig. 3. Obviously, the finite-sample correction serves its purpose. Indeed, the corrected large-sample approximation is very accurate for all sample sizes.

Figure 3 reveals that most realizations of \({\tilde{w}}^\diamond _n\) are either (0, 1) or (1, 0) unless the sample size equals \(n=10^6\). The LOP corresponds to \({\tilde{w}}^\star =(0.5,0.5)\) and thus it is precisely in between (0, 1) and (1, 0). It seems that estimating the LOP is a mission impossible in real-life situations—at least without any prior information about \(\mu \). Table 1 contains the probability that the realization of the MVE is a single-asset portfolio for different numbers of assets (\(N=5,50,100,500,1000\)) and observations (\(n=250,2500,5000,10{,}000,10^6\)). The results are based on 1000 realizations of \({\tilde{w}}^\diamond _n\) for each combination of N and n. Note that the LOP always corresponds to the equally weighted portfolio, i.e., \({\tilde{w}}^\star ={\varvec{1}}/N\). The table shows that, in all practical applications, the MVE proposes a single-asset portfolio with high probability although the LOP is well-diversified. It is worth emphasizing that the results would not change essentially if we substitute the MVE with the BCRP, since these estimators for the LOP are almost identical.

Now, in principle, we are able to construct hypothesis tests and compute confidence regions. For example, we could try to apply a hypothesis test of the form \(H_0\!:{\tilde{w}}^\star ={\tilde{w}}^\star _0\) vs. \(H_1\!:{\tilde{w}}^\star \ne {\tilde{w}}^\star _0\) for any \({\tilde{w}}^\star _0\in {\mathcal {S}}\) even in the case of \(N>2\).Footnote 21 However, in the light of the previous results, we may doubt that any hypothesis test will ever lead to a rejection or that a confidence region will ever be sufficiently small in real-life situations. This conclusion might appear negative to the reader, but the author fears that this is the price we sometimes have to pay in science.

Table 1 Probability that \({\tilde{w}}^\star _n\) is a single-asset portfolio

7 Conclusion

A quadratic approximation of log-returns works very well on a daily basis. Thus, in order to find the BCRP, we may focus on the MVE, which can easily be computed. The corresponding algorithm is very fast even if the number of dimensions is high and the results are even better compared to Cover’s algorithm for the BCRP. However, in most practical applications, we typically overestimate the expected out-of-sample performance of the MVE and even the performance of the MVOP. The same holds true for the expected out-of-sample log-return on the BCRP and the expected log-return on the LOP.

Both the BCRP and the MVE exist and are unique under mild regularity conditions. Moreover, they are strongly consistent. Analogously, both their out-of-sample performance measures and their in-sample performances converge to the performance of the LOP or the MVOP, respectively, as the number of observations grows to infinity. The given estimators for the LOP are even \(\sqrt{n}\,\)-consistent. In principle, the asymptotic results derived in this work can be used for constructing hypothesis tests and for computing confidence regions, but for this purpose one should apply a finite-sample correction, which substantially improves the large-sample approximation.

However, it turns out that the impact of estimation risk concerning \(\mu \) is tremendous in most real-life situations. Estimating the LOP without having any prediction power seems to be a futile undertaking. The estimators often lead to a single-asset portfolio even if the LOP corresponds to the equally weighted portfolio and thus is well-diversified. The given results confirm a general rule, which has become folklore during the last decades, namely that portfolio optimization typically fails on estimating expected asset returns.