Next Article in Journal
Statistical Analysis of the Evolutive Effects of Language Development in the Resolution of Mathematical Problems in Primary School Education
Next Article in Special Issue
High-Performance Tracking for Proton Exchange Membrane Fuel Cell System PEMFC Using Model Predictive Control
Previous Article in Journal
High-Order Accurate Flux-Splitting Scheme for Conservation Laws with Discontinuous Flux Function in Space
Previous Article in Special Issue
Discrete Velocity Boltzmann Model for Quasi-Incompressible Hydrodynamics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Minimax Estimation in Regression under Sample Conformity Constraints

by
Andrey Borisov
1,2,3,4
1
Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 44/2 Vavilova Str., 119333 Moscow, Russia
2
Moscow Aviation Institute, 4, Volokolamskoe Shosse, 125993 Moscow, Russia
3
Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, GSP-1, 1-52 Leninskiye Gory, 119991 Moscow, Russia
4
Moscow Center for Fundamental and Applied Mathematics, Lomonosov Moscow State University, GSP-1, Leninskie Gory, 119991 Moscow, Russia
Mathematics 2021, 9(10), 1080; https://doi.org/10.3390/math9101080
Submission received: 6 April 2021 / Revised: 30 April 2021 / Accepted: 6 May 2021 / Published: 11 May 2021
(This article belongs to the Special Issue Control, Optimization, and Mathematical Modeling of Complex Systems)

Abstract

:
The paper is devoted to the guaranteeing estimation of parameters in the uncertain stochastic nonlinear regression. The loss function is the conditional mean square of the estimation error given the available observations. The distribution of regression parameters is partially unknown, and the uncertainty is described by a subset of probability distributions with a known compact domain. The essential feature is the usage of some additional constraints describing the conformity of the uncertain distribution to the realized observation sample. The paper contains various examples of the conformity indices. The estimation task is formulated as the minimax optimization problem, which, in turn, is solved in terms of saddle points. The paper presents the characterization of both the optimal estimator and the set of least favorable distributions. The saddle points are found via the solution to a dual finite-dimensional optimization problem, which is simpler than the initial minimax problem. The paper proposes a numerical mesh procedure of the solution to the dual optimization problem. The interconnection between the least favorable distributions under the conformity constraint, and their Pareto efficiency in the sense of a vector criterion is also indicated. The influence of various conformity constraints on the estimation performance is demonstrated by the illustrative numerical examples.

1. Introduction

The problems of the heterogeneous parameter estimation in the regression under the model uncertainty are considered intensively from the various points of view. The guaranteeing (or minimax) approach gives one of the most prospective tools to solve these problems. For the proper formulation of an estimation problem in minimax terms one usually needs:
  • A description of the uncertainty set in the observation model;
  • A class of the admissible estimators;
  • An optimality criterion (a loss function) as a function of the argument pair “estimator–uncertain parameter value”.
The problem is to find the estimator that minimizes the maximal losses over the whole uncertainty set.
In the related literature, the parametric uncertainty set is specified either by geometric [1,2,3,4,5,6,7], or by statistical [8,9,10,11,12,13,14,15] constraints. In the former case, the uncertain parameters are treated as nonrandom but unknown ones lying within the fixed uncertainty set. In the latter case, the parameters are supposed to be random with unknown distribution, and the uncertainty set is formed by all the admissible distributions. In both cases, the guaranteeing estimation presumes a solution to a two-person game problem: the first player is “a statistician”, and the performer of the second, “external” player role is dictated by the problem statement—it might be nature, another human or device. Nevertheless, the guaranteeing approach suggests the unified prescription: finding the best estimator under the worst behavior of the uncertainty. In practice, such a universality leads to a loss of some prior information.
Let us explain this point by an example: the statistician knows that the source of the uncertainty is nature. This means he/she “should bear in mind that nature, as a player, is not aiming for a maximal win (that is, does not want us to suffer a maximal loss), and in this sense, it is ‘impartial’ in the choice of strategies” [12]. Hence, in this case, the minimax approach is too pessimistic and leads to cautious and coarse estimates. Even if we know the second player is a human, this does not imply his/her “bad will” towards the statistician. Hopefully, the second player has goal other than maximizing the loss of the statistician. If the goal of the second player is known, one can change the estimation criterion and transform the initial problem into a non-antagonistic game [16]. Otherwise, the statistician can identify the goal indirectly, relying on the available observations. Hence, in the latter case, it seems natural to introduce additional constraints to the uncertainty set, depending on the realized observations.
The paper aims to present a solution to the minimax estimation problem under additional constraints, which are determined by a conformity index of the uncertain parameters to the available observations.
The paper is organized as follows. Section 2 contains the formal problem statement with the conformity index based on the likelihood function. The section presents the assumptions concerning the observation model, which guarantee the correctness of the stated estimation problem and the existence of its solution. It also contains the comparison of the problem with the recent investigations.
Section 3 provides the main result: the initial estimation problem is reformulated as a game problem, which has a saddle point, defining the minimax estimator completely. Moreover, the point is a solution to a dual finite-dimensional constrained optimization problem, which is simpler than the initial minimax problem. The form of the minimax estimator and properties of the least favorable distributions (LFD) is also included in the section.
Section 4 is devoted to the analysis of the obtained results. First, a numerical algorithm for the dual optimization problem solution is presented along with its accuracy characterization. Second, some other conformity indices based on the empirical distribution function (EDF) and sample mean are also introduced. Third, a new concept of the uncertain distribution choice under a vector criterion is considered. The first criterion component, being the loss function introduced in Section 2, describes the influence of the uncertainty on the estimation quality. The second component is the conformity index, which characterizes the accordance of the unknown distribution of γ and the realized observations Y = y . We present an assertion that the LFD in the minimax estimation problem is Pareto-efficient in the sense of the introduced vector criterion.
Section 5 presents the numerical examples, which illustrate the influence of various conformity constraints on the estimation performance. Section 6 contains concluding remarks.
The following notations are used in this manuscript:
  • B ( S ) is the Borel σ -algebra of the topological space S (is S is the whole space) or its contraction to the set S (if S is a set of the topological space);
  • col ( A 1 , , A n ) is a column vector formed by the ordinary or block components A 1 , , A n ;
  • row ( A 1 , , A n ) is a row vector formed by the ordinary or block components A 1 , , A n ;
  • a , b is a scalar product of two finite-dimensional vectors;
  • C ( X ) is a set of all continuous real-valued functions with the domain X ;
  • x is the Euclidean norm of the vector x;
  • P F { A } is the probability of the event A corresponding to the distribution F;
  • E F X is a mathematical expectation of the random vector X with the distribution F;
  • conv ( S ) is a convex hull of the set S .

2. Statement of Problem

2.1. Formulation

Let us consider the following observation model:
Y = A ( X , γ ) + B ( X , γ ) V .
Here:
  • γ C B ( R m ) is an unobservable random vector, having an unknown cumulative distribution function (cdf) F;
  • X R n is a random unobservable vector with a known cdf Ψ ( d x | γ ) dependent on the value of γ ;
  • Y R k is a vector of observations;
  • V R k is a random vector of observation errors with the known probability density function (pdf) ϕ V ( v ) ;
  • A ( · ) : C × R n R k is a nonrandom function characterizing the observation plant;
  • B ( · ) : C × R n R k × k is a nonrandom function characterizing the observation error intensity.
The observation model is defined on the family of the probability triplets { ( Ω , F , P F ) } F F , where:
  • The outcome space Ω C × R m × R k contains all admissible values of the compound vector col ( γ , X , V ) ;
  • σ -algebra is determined as F B ( C × R m × R k ) ;
  • The probability measures P F are determined as:
    P F { γ d q , X d x , V d v } Ψ ( d x | q ) F ( d q ) φ V ( v ) d v .
Using the generalized Bayes rule [17], it is easy to verify that the function:
L ( y | q ) R n | det ( B ( q , x ) ) | 1 ϕ V ( B 1 ( q , x ) ( y A ( q , x ) ) ) Ψ ( d x | q )
is the conditional pdf of the observation Y given γ : P F { Y d y | γ = q } = L ( y | q ) d y . Furthermore, the function:
L ( y , F ) C L ( y | q ) F ( d q )
defines the pdf of the observation Y under the assumption that the distribution law of γ equals F:
L ( y , F ) = P F { Y d y } d y = C L ( y | q ) F ( d q ) .
Below in the paper we refer to the function L ( y , F ) as the sample conformity index based on the likelihood function.
Our aim is to estimate the function h ( γ , X ) : C × R m R l of ( γ , X ) , and the admissible estimators are the functions h ¯ ( Y ) : R k R l of the available observations.
The loss function is a conditional mean square of estimate error given the available observations:
J ( h ¯ , F | y ) E F h ( γ , X ) h ¯ ( Y ) 2 | Y = y ,
and the corresponding estimation criterion:
J * ( h ¯ | y ) sup F F L J ( h ¯ , F | y )
characterizes the maximal loss for a fixed estimator h ¯ within the class F L of the uncertain distributions of γ , for which L ( y , F ) L .
The minimax estimation problem for the vector h is to find an estimator h ^ ( · ) , such that:
h ^ ( y ) Argmin h ¯ H J * ( h ¯ | y ) ,
where H is a class of admissible estimators.

2.2. Necessary Assumptions Concerning Observation Model

To state the minimax estimation problem (8) properly and guarantee the existence of its solution we have to make additional assumptions concerning the uncertainty of γ , the observation model (1) and the estimated vector h:
(i)
The set C is compact.
(ii)
Let F be a family of all probability distributions with a support lying within the set C . The set F L is itself a convex *–weakly compact [18] subset of F .
(iii)
The constraint
L ( y , F ) L
holds for all F F L . The inequality (9) is called the conformity constraint of the level L based on the likelihood function (or, shortly, likelihood constraint).
(iv)
The set F L is nonempty.
(v)
A ( · , · ) , B ( · , · ) , h ( · , · ) C ( C × R n ) .
(vi)
pdf ϕ V ( v ) > 0 for v R k ; ϕ V ( v ) C ( R k ) ; the function Ψ ( d x | q ) is a regular version of the conditional distribution for q C .
(vii)
The observation noise is uniformly non-degenerate, i.e.,
min ( q , x ) C × R n B ( q , x ) B T ( q , x ) λ 0 I > 0 .
(viii)
The inequalities
R k v 2 ϕ V ( v ) d v < ,
sup q C R n A ( q , x ) 2 Ψ ( d x | q ) K A < ,
sup q C R n h ( q , x ) 2 Ψ ( d x | q ) K h <
are true.
(ix)
The set of admissible estimators H contains only the functions h ¯ ( · ) : R k R l , for which:
sup q C R k h ¯ ( y ) 2 L ( y | q ) d y < .

2.3. Argumentation

First, we discuss the sense of the assumptions in the subsection above.
Conditions (i)–(iv), describing the set F L , have the following interpretation.
The requirement for C to be compact (i.e., fulfillment of condition (i)) is standard for the minimax estimation problems (see, e.g., [2,3]). In the case the prior information about the vector γ limited by the knowledge of its domain C only, it is rather natural to treat γ as a random vector with an unknown distribution F F . In practice we often have some additional prior information concerning the moment characteristics of γ , hence the entire uncertainty set F can be significantly reduced. If, for example, μ ( q ) = col ( μ 1 ( q ) , , μ N ( q ) ) : C R N is a vector of convex moment functions, and we know the vector μ ¯ col ( μ ¯ 1 , , μ ¯ N ) R N of their upper bounds, then the set of admissible distributions takes the form F F : C μ j ( q ) F ( d q ) μ ¯ j , j = 1 , N ¯ . The *-weak compactness and convexity can be easily verified for this subset. Further in the presentation, we do not stress the explicit form of the “total” constraints other than (9) forming the subset F L : they should just guarantee the closure and convexity for F L . That is the sense of condition (ii).
The conditional pdf L ( y | q ) (3) can also be treated as the likelihood function of the parameter γ , calculated at the point q given the observed sample Y = y . This likelihood value reflects the relevance of the parameter value q to the realized observation y. By analogy, the function L ( y , F ) can be considered as some generalization of the likelihood function that evaluates the correspondence between the uncertain distribution F and the realized observation y. The following lower and upper bounds for this value are obvious:
0 < L ̲ ( y ) min q C L ( y | q ) L ( y , F ) max q C L ( y | q ) L ¯ ( y ) .
Below in the paper we suppose that the likelihood level L lies in [ L ̲ ( y ) , L ¯ ( y ) ] . The subset formed by constraint { F F : C L ( y | q ) F ( d q ) L } is called a distribution subset satisfying the likelihood conformity constraint of the level L. It is nonempty because it contains at least all distributions with the support lying within the set { q C : L ( y | q ) L } .
Adjusting the level L, we can vary the uncertainty set F L , choosing the distributions F, which are more or less relevant to the realized observations Y = y . That is an essence of condition (iii). Condition (iv) is obvious: all the constraints, defining the set F L , should be feasible.
Condition (v) is technical: it provides correctness of a subsequent change of measure. The condition is non-restricting because the broad class of the functions A, B and h can be approximated by the continuous functions. Conditions (vi) and (vii) guarantee correct utilization of the Fubini theorem and an abstract variant of the Bayes formula [19]. In practice these conditions are usually valid. Condition (viii) guarantees finite variance for both the observations and estimated vector independently of the distribution F.
Condition (ix) guarantees a finite variance of the estimate h ¯ ( Y ) independently of F F L .
The solution to (8) is obvious in the case of the one-point set F L = { F } . This means the distribution F of the parameter γ is known, and the initial problem is reduced to the traditional optimal in the mean square sense (MS-optimal) estimation problem. The case of the one-point set C = { q } is quite similar. In both cases the optimal estimator is completely defined by the conditional expectation (CE): h ^ ( y ) = E F h ( γ , X ) | Y = y in the case of a known distribution F, and h ^ ( y ) = E { q } h ( q , X ) | Y = y in the “one point” case.
In the general case of F L this result is inapplicable, because the CE E F h ( γ , X ) | Y = y is a functional of the unknown distribution F.
The stated estimation problem has a transparent interpretation. First, under prior uncertainty of the distribution F the replacement of the loss function (6) by guaranteeing analog looks natural. Second, utilization of the CE in the criterion means that the desired estimate should be calculated optimally for each observed sample. The criteria in the form of the CE appear often in estimation and control problems [11,17,20,21,22]. Mostly, the estimation is the preliminary stage in the solution to the optimization and/or control problem under incomplete information. The random disturbances/noises in such observation systems represent:
  • A result of natural (non-human) impacts;
  • A randomized (or generalized) control [23,24], used in the dynamic system;
  • A result of some uncontrollable (parasitic) input signals of “the external players”.
The impact of the two latter types is not necessarily the nonrandom functions of available observations, but some “extra generated” random processes with distributions dependent on the observations. This type of control is used in the areas of telecommunications [25,26], cellular networks [27], technical systems [28], etc. The proposed minimax criterion allows inhibiting the negative effect of the “additional randomness” in the external signals (the third type of disturbances mentioned above) to the estimation quality.
The additional comprehension of the natural gaps, which are inherent to the minimax estimation paradigm, and the ways of their partial coverage can be revealed by the following interpretation. It is well-known that in the case a minimax estimation problem can be reduced to a two-person game with a saddle point, the minimax estimator is the best one calculated for the LFD. The form of the LFD can be very strange and artificial. Moreover, the conformity degree of the LFD to the realized observations can be too low. Thus, the utilization of various sample conformity indices (particularly the ones based on the likelihood function) admits all to describe this degree, restrict it from below, implicitly reduce the distribution uncertainty set and exclude “exotic’’ variants of the LFDs.
Minimax estimation of the regression parameters is an investigation object in the various settings. Mostly, the observation model is a linear function of the estimated parameters corrupted by an additive Gaussian noise. The optimality criterion is a mathematical expectation of some loss function. In [29], the problem is solved by engaging the framework of the fuzzy sets. The authors of [30,31] used the criterion other than the traditional mean square one, and the estimated vector was random with the uncertain discrete distribution. In [32], the Gaussian noises have an uncertain but bounded covariance matrix. The papers [33,34,35] are also devoted to the minimax Bayesian estimation in the regression under various geometric and moment constraints of the estimated parameters. The criterion functions are p norms of the estimation errors.
The optimality criterion in the form of CE and the admissibility of nonlinear estimates distinguish the proposed estimation problem from the recently considered ones [2,3,5,6,7,9]. A closely related problem considered in [11] has an essential difference. The uncertain parameter in [11] was treated as unknown and nonrandom, and hence the initial minimax problem could not be solved in terms of the saddle points. Moreover, the statistic uncertainty in [11] gave no possibility to take into account any additional prior and posterior information about the moment characteristics, conformity indices, etc. The paper [14] was devoted to the particular case of the likelihood constraints only. An idea to use confidence sets, calculated by the available statistical data, as the uncertainty sets of the distribution moments was used in [36] for the conditionally-minimax prediction.

3. The Main Result

As is known, the CE is determined in a non-unique way, hence we should specify a version of the CE so as to use it in further inferences. If the distribution F of the vector γ is known, then the CE of an integrable random value h ( γ , X ) : C × R m R can be calculated by the abstract variant of the Bayes formula:
h ^ F ( y ) = C × R n h ( q , x ) | det ( B ( q , x ) ) | 1 ϕ V ( B 1 ( q , x ) ( y A ( q , x ) ) ) Ψ ( d x | q ) F ( d q ) C × R n | det ( B ( q , x ) ) | 1 ϕ V ( B 1 ( q , x ) ( y A ( q , x ) ) ) Ψ ( d x | q ) F ( d q ) ,
i.e., E F h ( γ , X ) | Y = y = h ^ F ( y ) (11) P F − a.s. Below in the presentation we use the CE version, defined by (11). It should also be noted that if h ^ ( · ) is the desired minimax estimator, then the inclusion (8) must be satisfied point-wise for any sample y R k .
Further in the paper the function:
J * ( F | y ) min h ¯ H J ( h ¯ , F | y ) = J ( h ^ F , F | Y ) = h 2 ^ F ( y ) h ^ F ( y ) 2
is called the dual criterion for J * (7). All CEs in (12) are calculated by (11).
Using (3) for the calculation of L , the notation:
ν ( q , x | y ) | det B ( q , x ) | 1 ϕ V ( B ( q , x ) 1 ( y A ( q , x ) ) ) ,
and the CE version (11), the loss function (6) can be rewritten in the form:
J ( h ¯ , F | y ) = C × R n h ( q , x ) h ¯ ( y ) 2 ν ( q , x | y ) Ψ ( d x | y ) F ( d q ) C L ( y | q ) F ( d q ) .
As can be seen from (14), the function J ( h ¯ , F | y ) is neither convex nor concave in F, which complicates the solution to the estimation problem (8). Moreover, the argument F lies in the abstract infinite-dimensional space of the probability measures. Nevertheless, the problem can be reduced to a standard finite-dimensional minimax problem with a convex–concave criterion.
First, we introduce a new reference measure F and verify that the loss function (14) represents a functional, which is linear in F .
Let:
F ( F , d q | y ) L ( y | q ) F ( d q ) C L ( q | y ) F ( d q ) .
Lemma 1.
If conditions (i)–(ix) are satisfied, then the following assertions are true.
  • F ( F , d q | y ) is a probability measure for y R k , and F ( F , · | y ) F ( · ) . The transformation (15) is a bijection of F into itself, and its inversion F has the form:
    F ( F , d q | y ) L 1 ( y | q ) F ( d q ) C L 1 ( q | y ) F ( d q ) .
  • The set F L of all distributions obtained from F L by the transformation (15):
    F L { F ¯ ( · ) : F F L , F ¯ ( · ) = F ( F , · | y ) }
    is convex and *-weakly closed.
The proof of Lemma 1 is given in Appendix A.
Applying the Fubini theorem and keeping in mind (11) and (15), we can rewrite the loss function (14) in the form:
J ( h ¯ , F | y ) = C R n h ( q , x ) h ¯ ( y ) 2 ν ( q , x | y ) Ψ ( d x | y ) L ( y | q ) L ( y | q ) F ( d q ) C L ( q | y ) F ( d q ) = C R n h ( q , x ) h ¯ ( y ) 2 ν ( q , x | y ) Ψ ( d x | y ) L ( y | q ) F ( F , d q | y ) = J ( h ¯ , F | y ) .
To reduce the initial problem to some finite-dimensional equivalent, we also introduce the vectors:
w ( y | q ) col ( w 1 ( y | q ) , w 2 ( y | q ) ) R + 1 : w 1 ( y | q ) E F h ( γ , X ) 2 | Y = y , γ = q = R n h ( q , x ) 2 ν ( q , x ) Ψ ( d x | y ) L ( y | q ) , w 2 ( y | q ) E F h ( γ , X ) | Y = y , γ = q = R n h ( q , x ) ν ( q , x ) Ψ ( d x | y ) L ( y | q ) ;
w ( F | y ) col ( w 1 ( F | y ) , w 2 ( F | y ) ) R + 1 : w 1 ( F | y ) E F h ( γ , X ) 2 | Y = y = C w 1 ( y | q ) F ( F , d q | y ) , w 2 ( F | y ) E F h ( γ , X ) | Y = y = C w 2 ( y | q ) F ( F , d q | y ) ,
and their collections generated by the subsets C and F L :
W ( C | y ) { w ( y | q ) : q C } , W ( F L | y ) { w ( F | y ) : F F L } .
Here and below the notation H ( y ) also stands for the whole set of the estimate values h ¯ H calculated for the fixed argument y.
The set W ( F L | y ) B ( R + 1 ) is compact; moreover (see [37]), the inclusion W ( F L | y ) conv ( W ( C | y ) ) holds.
On the set R × R + 1 we prepare the new loss function:
J ( η , w ) w 1 2 η , w 2 + η 2 = w 1 w 2 2 + η w 2 2 .
It is easy to verify that the loss function (18) can be expressed via (22):
J ( h ¯ , F | y ) = C J ( h ¯ ( y ) , w ( y | q ) ) F ( F , d q | y ) = J ( h ¯ ( y ) , w ( F | y ) ) .
The corresponding guaranteeing criterion takes the form:
J * ( η | y ) sup w W ( F L | y ) J ( η , w ) ,
and its dual can be determined as:
J * ( w ) min η H ( y ) J ( η , w ) = J ( w 2 , w ) = w 1 w 2 2 .
The finite-dimensional minimax problem is to find:
h ^ ( y ) Argmin η H ( y ) J * ( η | y ) .
From the definitions of W ( F L | y ) , H ( y ) and criterion (23) it follows that the problem (25) is equivalent to the initial minimax estimation problem (8):
min h ¯ H J * ( h ¯ | y ) = min η H ( y ) J * ( η | y ) J ( y ) ,
Argmin h ¯ H J * ( h ¯ | y ) | y { h ^ ( y ) : J * ( h ^ | y ) = J ( y ) } = Argmin η H ( y ) J * ( η | y )
for y R k .
The following theorem characterizes the solution to the finite-dimensional minimax problem in terms of a saddle point of the loss function J .
Theorem 1.
For y R k , the loss function J ( η , w ) (22) has the unique saddle point ( h ^ ( y ) , w ^ ( y ) ) on the set H ( y ) × W ( F L | y ) . The second block subvector w ^ ( y ) = col ( w ^ 1 ( y ) , w ^ 2 ( y ) ) W ( F L | y ) of the saddle point is the unique solution to the finite-dimensional dual problem:
{ w ^ ( y ) } = Argmax w W ( F L | y ) J * ( w ) ,
and h ^ ( y ) = w ^ 2 ( y ) is the second sub-vector of this optimum w ^ ( y ) .
The proof of Theorem 1 is given in Appendix B.
By the definition of W ( F L | y ) , for any vector w ^ ( y ) there exists at least one distribution F ^ such that:
w ^ 1 ( y ) = E F ^ h ( γ , X ) 2 | Y = y , w ^ 2 ( y ) = E F ^ h ( γ , X ) | Y = y .
F ^ is an LFD, and the whole set of the distributions satisfying (29) is denoted by F ^ L .
Theorem 1 allows to obtain a solution to the initial minimax estimation problem. The result is formulated as:
Corollary 1.
The estimator w ^ 2 ( y ) introduced in Theorem 1 is a solution to the minimax estimation problem (8), i.e., h ^ ( y ) = w ^ 2 ( y ) point-wise. The set { ( h ^ , F ^ ) } F ^ F ^ L presents the saddle points of the loss function J (6) on the set H × F L . The estimator h ^ ( y ) is invariant to the LFD choice: if F ^ and F ^ are different LFDs then E F ^ h ( γ , X ) | Y = y = E F ^ h ( γ , X ) | Y = y = w ^ 2 ( y ) .
The following assertion characterizes the key property of the LFD set F ^ L .
Corollary 2.
There exists a variant of the LFD F ^ L F ^ concentrated at most at dim W ( F L | y ) + 1 points of the set C .
The proof of Corollary 2 is given in Appendix C.
The presence of the discrete version of LFD is a remarkable fact. Let us remind the reader that initially, we suppose that the uncertain vector γ lies in the set C . The deterministic hypothesis concerning γ hopelessly obstructed the solution to the minimax estimation problem. To overcome this obstacle we provide the randomness of γ : the vector keeps constant during the individual observation experiment, and the stochastic nature of γ appears from experiment to experiment only. The existence of a discrete LFD returns us partially to the primordial situation. The point is that there exists a set of γ values that are the most difficult for estimation. Tuning to these parameters we can obtain estimates of γ with the guaranteed quality.
Theorem 1 and Corollary 1 simplify the solution to the initial problem (8), reducing it to the maximization of the finite-dimensional quadratic function (28) over the convex compact set.

4. Analysis and Extensions

4.1. Dual Problem: A Numerical Solution

To simplify presentation of the numerical algorithm of problem (28)’s solution, we suppose that the uncertainty set F L takes the form F L = { F F : L L } , i.e., it is restricted by the conformity constraint only.
Let us consider the case C { q j } j = 1 , M ¯ R m , which corresponds to the practical problem of Bayesian classification [10,38]. Here the dual problem (28) has the form w ^ ( y ) = Argmax w conv ( W ( C | y ) ) J * ( w ) . Its solution can be represented as w ^ ( y ) = j = 1 M P ^ j ( y ) w ( q j | y ) , where P ^ ( y ) row ( P ^ 1 ( y ) , , P ^ M ( y ) ) is a solution to the standard quadratic programming problem (QP problem):
P ^ ( y ) Argmin p 1 , , p M 0 : j = 1 M p j = 1 j = 1 M p j w 1 ( q j | y ) j , j = 1 M p j p j w 2 ( q j | y ) , w 2 ( q j | y ) .
Consequently, in the case of finite C the minimax estimation problem can be reduced to the standard QP problem with well-investigated properties and advanced numerical procedures.
Utilization of the finite subsets C ( · ) instead of the original domain C allows to calculate the “mesh” approximations for the solution to (8).
Let:
  • ϵ n 0 be a decreasing nonrandom sequence characterizing the approximation accuracy;
  • { C ( ϵ n ) } n N : C ( ϵ 1 ) C ( ϵ 2 ) C ( ϵ 3 ) C be a sequence of embedded subdivisions;
  • ω 1 ϵ ( y ) max q 1 , q 2 C : q 1 q 2 < ϵ | w 1 ( q 1 | y ) w 1 ( q 2 | y ) | , ω 2 ϵ ( y ) max q 1 , q 2 C : q 1 q 2 < ϵ w 2 ( q 1 | y ) w 2 ( q 2 | y )
    be modulus of continuity for w 1 ( y | q ) and w 2 ( y | q ) .
The assertion below characterizes the divergence rate of the approximating solutions to the initial minimax estimate.
Lemma 2.
If { w ^ ( n | y ) } n N are corresponding solutions to the problems:
w ^ n ( y ) = Argmax w conv ( W ( C ( ϵ n ) | y ) ) J * ( w )
then the following sequences are convergent as ϵ n 0 :
w ^ n ( y ) w ^ ( y ) , J ( y ) max w conv ( W ( C ( ϵ n ) | y ) ) J * ( w ) ω 1 ϵ n ( y ) + K [ ω 2 ϵ n ( y ) ] 2 0
with some constant 0 < K < .
The proof of Lemma 2 is given in Appendix D.

4.2. The Least Favorable Distribution in the Light of the Pareto Efficiency

The minimax estimation problem under the conformity constraints is tightly interconnected with the choice of the distribution F ^ that is optimal in the sense of a vector-valued criterion. On the one hand, the solution to the considered estimation problem is grounded on the evaluation of the distribution F ^ , maximizing the dual criterion (12): I 1 ( F | y ) J * ( F | y ) max F . On the other hand, the distribution F should conform to the realized sample Y = y , and the maximization of the conformity index leads to the following optimization problem: I 2 ( F | y ) L ( y , F ) max F .
Obviously, the criteria I 1 and I 2 are conflicting; hence the proper choice of F requires the application of the vector optimization techniques.
Let:
  • F ^ 0 be a set of the LFDs in the estimation problem (8) without conformity constrains (i.e., as L = 0 );
  • L ˜ ( y ) max F F ^ 0 L ( y , F ) ;
  • M [ L ˜ ( y ) , L ¯ ( y ) ] be an arbitrary fixed conformity level;
  • w ^ ( y ) = Argmax w W ( F M | y ) J * ( w ) be a solution to the finite-dimensional dual problem;
  • F ^ M be the set of corresponding LFDs.
Lemma 3.
Any least favorable distribution F ^ M F ^ M is Pareto-efficient with respect to the vector-valued criterion ( I 1 , I 2 ) .
The proof of Lemma 3 follows directly from the Germeyer theorem [16].
Consideration of the constrained minimax estimation problem in light of the optimization by the vector criterion is somehow close to the one investigated in [31], where the estimation quality is characterized by the 2 norm of the error, and the Shannon entropy is characterized as a measure of the statistical uncertainty of the estimated vector.

4.3. Other Conformity Indices

First, we consider the conformity constraint (9) thoroughly. It admits the following treatment. Let F ˜ F be some reference distribution. The constraint L ( y , F ) L ( y , F ˜ ) is a specific case of (9); the feasible distributions F should be relevant to the available observations Y = y no less than the reference distribution F ˜ is. One more treatment is also acceptable. Let q ˜ C be some “guess” value of the uncertain parameter γ , and α > 0 be a fixed value. The constraint:
L ( y , F ) L ( y | q ˜ ) α
is a specific case of (9): it means that the likelihood ratio of any feasible distribution F to the one-point distribution at q ˜ should be no less that the level α . Obviously, the guess value q ˜ could be chosen from the maxima of the function L , i.e., q ˜ Argmax q C L ( y | q ) , but calculation of these maxima is itself a nontrivial problem of likelihood function maximization. In Section 5 we use some modification of (33):
L ( y , F ) min q C n L ( y | q ) max q C n L ( y | q ) min q C n L ( y | q ) r
where C n C is a known subset, and r ( 0 , 1 ) is a fixed parameter. This form is important, because in the case of C = C n it guarantees for the constraint (34) to be active in the considered minimax optimization problem for each r ( 0 , 1 ) .
Furthermore, the proposed conformity index L ( y , F ) (9) is a non-unique numerical characteristic that describes the interconnection between F and Y. For example, an alternative conformity index can be defined as C f ( L ( y | q ) ) F ( d q ) , where f ( · ) : R R is some continuous nondecreasing function. Another way to introduce this index is to set it as S ( y ) L ( y , F ) d y = P F { Y S ( y ) } , i.e., as a probability that the observation Y lies in the confidence set S ( y ) B ( R k ) .
For a particular case of the observation model (1) we can propose one more conformity index that is based on the EDF. Let us consider the observation model with the “pure uncertain” estimated parameter γ :
Y t = A ( γ ) + B ( γ ) V t , t = 1 , T ¯ .
Here:
  • Y T col ( Y 1 , , Y T ) are available observations;
  • γ C R m is a random vector with unknown distribution F;
  • V T col ( V 1 , , V T ) are the observation errors that are i.i.d. centered normalized random values with the pdf ϕ V ( v ) .
If the value γ is known, the observations { Y t } t = 1 , T ¯ can be considered as i.i.d. random values, whose pdf is equal to ϕ V ( v ) after some shifting and scaling. The EDF of the sample { Y t } t = 1 , T ¯ has the form:
F T * ( y ) 1 T t = 1 T I ( y Y t ) .
On the other hand, the cdf F Y ( y ) of any observation Y t for a fixed distribution F can be calculated as:
F Y ( y ) y C ϕ V u A ( q ) B ( q ) F ( d q ) d u .
The sample conformity index based on the EDF is the following value:
M ( Y T , F ) F T * F Y = sup y R | F T * ( y ) F Y ( y ) | .
The new uncertainty set F M describing all admissible distributions F satisfies conditions (i), (ii) and (iv) above, but condition (iii) is replaced by the following one:
(x)
the constraint
M ( Y T , F ) M
This holds for all F F M and some fixed level M > 0 . It is called the constraint based on the EDF.
The proposed conformity index represents the well known Kolmogorov distance used in the goodness-of-fit test. One also knows the asymptotic characterization of M ( Y T , F ) :
lim T P M ( Y T , F ) < x T = + ( 1 ) j e 2 j 2 x 2 .
Furthermore, the value M ( Y T , F ) can be easily calculated, because the function F T * is piece-wise constant while F Y is continuous:
M ( Y T , F ) = max 1 t T max ( | F T * ( Y t ) F Y ( Y t ) | , | F T * ( Y t ) F Y ( Y t ) | ) ,
and the cdf F Y is calculated by (37).
The distribution set determined by (39) takes the form:
F F : M + F T * ( Y t ) Y t C ϕ V u A ( q ) B ( q ) F ( d q ) d u M + F T * ( Y t ) , t = 1 , T ¯ .
Using the variational series Y ( T ) col ( Y ( 1 ) , , Y ( T ) ) of the sample Y T , and recalling F T * ( Y ( t ) ) = t T , F T * ( Y ( t ) ) = t 1 T , (40) can be rewritten in the form:
F F : M + t T Y ( t ) C ϕ V u A ( q ) B ( q ) F ( d q ) d u M + t 1 T , t = 1 , T ¯ .
It can be seen that this set is a convex closed polyhedron, lying in F , with at most 2 T facets. All assertions formulated in Section 3 are valid after replacing the uncertainty set F L , generated by the likelihood function, by the set F M , generated by the EDF. Moreover, the mesh algorithm for the dual optimization problem solution, presented above in Section 4.1, can also be applied to this case.
Let us consider the observation model (35) again. We can use the sample mean Y ¯ 1 T Y t as one more conformity index. Let us remind the reader that due to the model property, the random parameter γ ( ω ) is constant for each sample Y T . For rather large T values, the central limit theorem allows to treat the normalized value T ( Y ¯ A ( γ ) ) | B ( γ ) | as a standard Gaussian random one. We then fix a standard Gaussian quantile c α of the confidence level α and exscind the subset:
C α q C : Y ¯ c α | B ( γ ) | T A ( q ) Y ¯ + c α | B ( γ ) | T C .
If C α is compact then the set F α of all probability distributions with the domain lying in C α is called the set of admissible distributions satisfying the sample mean conformity constraint of the level α.
The comparison of the minimax estimates, calculated under various types of the conformity constraints, is presented in the next section.

5. Numerical Examples

5.1. Parameter Estimation in the Kalman Observation System

Let us consider the linear Gaussian discrete-time (Kalman) observation system:
X t = a X t 1 + b V t , t = 1 , T ¯ , x 0 N ( 0 , P 0 ) , Y t = c X t + f W t , t = 0 , T ¯ ,
where:
  • X T col ( X 0 , , X T ) is an unobservable state trajectory (the autoregression X t is supposed to be stable);
  • Y T col ( Y 0 , , Y T ) are available observations;
  • V T col ( V 1 , , V T ) and W T col ( W 0 , , W T ) are vectorizations of independent standard Gaussian discrete-time white noises;
  • P 0 , c and f are known parameters;
  • γ col ( a , b ) is an uncertain vector lying in the fixed rectangle C [ a ̲ , a ¯ ] × [ b ̲ , b ¯ ] .
Our goal is to calculate the proposed minimax estimates of the uncertain vector γ and analyze their performance depending on the specific form of the loss function (6). To vary the loss function we can either specify the estimated test signal h ( · ) or determine different Euclidean weighted norms. We choose the second approach and define the following norm · ξ X , ξ γ for the compound vector: Z col ( X T , γ ) :
Z ξ X , ξ γ ξ X 2 t = 1 T X t 2 + ξ γ 2 ( a 2 + b 2 ) ,
and the corresponding loss function takes the form:
J ξ X , ξ γ ( Z ¯ , F | Y T ) E F Z Z ¯ ( Y T ) ξ X , ξ γ 2 | Y T .
In the case ξ γ = 1 and ξ X = 0 we obtain “the traditional” case of the mean-square loss conditional function J 0 , 1 ( Z ¯ , F | Y T ) = E F γ γ ¯ ( Y T ) 2 | Y T , and the estimation quality of γ ¯ ( · ) is determined directly through the loss function. Using ξ γ = 0 and ξ X = 1 we transform the loss function into J 1 , 0 ( Z ¯ , F | Y T ) = E F X X ¯ ( Y T ) 2 | Y T , and the estimation of γ appears indirectly via the estimation of the state trajectory X T .
The minimax estimation is calculated by the numerical procedure introduced in Section 4.1 with the uniform mesh C h a , h b of the uncertainty set C ; h a and h b are corresponding mesh steps along each coordinate.
We calculate the minimax estimate with the likelihood conformity constraint of the form:
L ( Y T , F ) min ( a , b ) C h a , h b L ( Y T | ( a , b ) ) max ( a , b ) C h a , h b L ( Y T | ( a , b ) ) min ( a , b ) C h a , h b L ( Y T | ( a , b ) ) r ,
where r ( 0 , 1 ) is a confidence ratio.
We compare the proposed minimax estimate with some known alternatives.
The calculations have been executed with the following parameter values: C = [ 0.1 ; 0.1 ] × [ 0.1 ; 1 ] , a = 0.1 , b = 0.1 , P 0 = 0.5 , c = 1 , f = 0.5 , T = 1000 , h a = 0.01 , h b = 0.045 . The choice of the parameters can be explained by the following facts. First, the point ( 0.1 ; 0.1 ) of actual parameter values belongs to the domain of the LFD for both loss functions J 0 , 1 and J 1 , 0 . This means the appearance of just the LFD for both cases. Second, in spite of sufficient observation length, the signal-to-noise ratio is rather small, which prevents high performance of the asymptotic estimation methods.
Figure 1 presents the evolution of the minimax estimates a ^ 0 , 1 ( r ) and a ^ 1 , 0 ( r ) of a drift coefficient depending on the confidence ratio r ( 0 , 1 ) . The minimax estimates are compared with;
  • The estimate a ¯ M S ( Y T ) calculated by the moment/substitution method [12]:
    a ¯ M S = t = 1 T y t 1 y t / ( t = 1 T y t 2 T f 2 ) , b ¯ M S = 1 c 2 1 ( a ¯ M S ) 2 ( t = 1 T y t 2 T f 2 ) ;
  • The Bayesian estimate a ^ F 1 ( Y T ) (11) calculated under the assumption that prior distribution F 1 of γ is uniform over the whole uncertainty set C ;
  • The Bayesian estimate a ^ F 2 ( Y T ) (11) calculated under the assumption that the prior distribution F 2 of γ is uniform over the vertices of C ;
  • The estimate a ¯ E K F ( Y T ) calculated by the extended Kalman filter (EKF) algorithm [39] and subsequent residual processing;
  • The maximum likelihood estimate (MLE) a ¯ M L E ( Y T ) calculated by the expectation/maximization algorithm (EM algorithm) [17].
Figure 2 contains a similar comparison of the diffusion coefficient estimates b ^ 0 , 1 ( r ) and b ^ 1 , 0 ( r ) .
The results of this experiment allow us to make the following conclusions.
  • Both minimax estimates ( a ^ 0 , 1 ( r ) , b ^ 0 , 1 ( r ) ) and ( a ^ 1 , 0 ( r ) , b ^ 1 , 0 ( r ) ) converge to the MLE ( a ¯ M L E , b ¯ M L E ) as r 1 . Nevertheless, the rate of convergence depends on the specific choice of the loss function ( J 0 , 1 or J 1 , 0 in the considered case).
  • Both minimax estimates are more conservative than the MLE, because they take into account a chance for other points of the LFD domain to be realized.
  • Under an appropriate choice of the confidence ratio r, both minimax estimates become more accurate than other candidates, except for the MLE.

5.2. Parameter Estimation under Additive and Multiplicative Observation Noises

We consider the observations Y T col ( Y 1 , , Y T ) :
Y t = a X t + V t , t = 1 , T ¯ .
Here:
  • a is an estimated value;
  • X T col ( X 1 , , X T ) is a vector of the i.i.d. unobservable random values (multiplicative noise): X 1 R [ 0 , 1 ] ;
  • V T col ( V 1 , , V T ) is a vector of the i.i.d. unobservable random values (additive noise): V 1 N ( 0 , σ ) .
We assume that the parameter a is random with unknown distribution, whose support set lies within the known set C [ c 1 , c 2 ] . The loss function has the form:
J ( a ¯ , F | Y T ) = E F a a ¯ ( Y T ) 2 | Y T .
In this example our goal is to compare the minimax estimates of the parameter a under conformity constraint based either on the likelihood function or on the EDF.
The minimax estimations are calculated for the following parameter values: a = 2 , T = 20 , C = [ 2 , 3 ] , σ = 0.1 . We use the proposed numerical procedure under a uniform mesh C h of the set C with the step h = 0.005 . The example has some features. First, the observation model contains both the additive ( V T ) and multiplicative ( X T ) heterogeneous noises. Second, the available observed sample is not too long to provide the high quality for the consistent estimates. Third, the exact value of a is equal 2; meanwhile under the constraint absence there exists a discrete variant of the LFD with the finite support set { 2 , 3 } . This means that the LFD is realized only in the considered observation model.
The likelihood conformity constraint looks similar to the one from the previous subsection:
L ( Y T , F ) min q C h L ( Y T | q ) max q C h L ( Y T | q ) min q C h L ( Y T | q ) r ,
where r ( 0 , 1 ) is a confidence ratio.
Figure 3 contains comparison of the minimax estimate a ^ ( r ) with its actual value a, the (consistent asymptotically Gaussian) M-estimate a ¯ s u b 2 T t = 1 T Y t , obtained by the moment/substitution method [12] and the MLE a ¯ M L E .
Next, we investigate minimax posterior estimates under the conformity constraint based on the EDF. The constraint is of the form:
max F F C h M ( Y T , F ) M ( Y T , F ) max F F C h M ( Y T , F ) min F F C h M ( Y T , F ) r ,
where r ( 0 , 1 ) is some fixed confidence ratio, and F C h is a “mesh” approximation of the set F C corresponding to the uniform “mesh” C h . The form (46) of the conformity constraint provides that it is active in the minimax optimization problem for any r ( 0 , 1 ) .
Figure 4 contains:
  • The EDF F Y * ( y ) calculated by the sample Y T ;
  • The cdf’s F Y q ( y ) = y ϕ V u A ( q ) B ( q ) d u of Y, corresponding to the one-point distribution concentrated at the point q ( q = 2 , 3 );
  • The cdf F ¯ Y ( y )   F ¯ Y ( y ) Argmin F F C h M ( Y T , F ) , closest to the EDF F Y * ( y ) within the set F C h .
Note that F Y 2 ( y ) is a cdf of Y corresponding to the actual value of a.
Figure 5 contains a comparison of the minimax estimate a ^ ( r ) under the conformity constraint, based on the EDF, with its actual value a, the moment/substitution estimate a ¯ s u b and the MLE a ¯ M L E .
The results of this experiment allow us to make the following conclusions.
  • The minimax estimate a ^ ( r ) under the conformity constraint, based on the EDF, does not converge to the MLE a ¯ M L E as r 1 .
  • Under an appropriate choice of the confidence ratio r, the minimax estimate under the EDF constraint becomes more accurate than other candidates, including the MLE.

6. Conclusions

The paper contains the statement and solution to a new minimax estimation problem for the uncertain stochastic regression parameters. The optimality criterion is the conditional mean square of the estimation error given the realized observations. The class of admissible estimators contains all (linear and nonlinear) statistics with finite variance. The a priori information concerning the estimated vector is incomplete: the vector is random and the part of its components lies in the known compact. The key feature of the considered problem is the presence of the additional constraints for the statistical uncertainty, restricting from below the correspondence degree between the uncertainty and realized observations. The paper presents various indices, characterizing this conformity via the likelihood function, the EDF and the sample mean.
We propose a reduction of the initial optimization problem in the abstract infinite-dimensional spaces to the standard finite-dimensional QP problem with convex constraints along with an algorithm of its numerical realization and precision analysis.
The minimax estimation problem is solved in terms of the saddle points, i.e., besides the estimators with the guaranteed quality, we have a description of the LFDs. First, the investigation of the LFDs’ domains allowed us the detection of the uncertain parameter values, which are the worst for the estimation. Second, the consideration of the performance index pair “conformity index–guaranteed estimation quality” uncovered rather a new conception of the parameter estimation under a vector optimality criterion. The paper contains an assertion, which states that the LFDs are Pareto-optimal for the vector-valued criterion above.
The paper focuses mostly on the conformity indices related to the likelihood function; thus, it is obvious that the performance of the minimax estimate is compared with the one of the MLE. In general, the MLE has several remarkable properties, in particular the asymptotic minimaxity under some additional restrictions [12]. However, the estimate is non-robust to the prior statistical uncertainty. The proposed minimax estimate can be considered as a robustified version of the MLE, which is ready for application in the cases of the short non-asymptotic samples or the violation of the conditions for the MLE asymptotic minimaxity.
The conformity constraints are not exhausted by the likelihood function. In the paper, we present other conformity indices based on the EDF and sample mean. We demonstrate that the minimax estimates with the EDF conformity constraint are better than the MLE. One of the points of the paper is that the flexible choice of the conformity indices and design of the additional conformity constraints for each individual applied estimation problem allows obtaining a tradeoff between the prior uncertainty and available observations.
The reason to choose one or another conformity index depends not only on the conditions of the specific practical estimation problem solved under the minimax settings. One of the essential conditions is the possibility of its quick computation for the subsequent verification of the conformity constraint. For example, calculation of the likelihood conformity constraint (33) with the guess value L ( y | q ¯ ) = max q L ( y | q ) tends to necessarily solve the auxiliary maximization problem for the likelihood function, which is nontrivial itself. Thus, the conformity indices based on the EDF or sample moments look more prospective from the computational point of view.
The applicability of the proposed minimax estimate also depends on the presence of the analytical formula of the estimates w ( y | q ) , or the fast numerical algorithms of its calculation. In turn, this possibility is a base for the subsequent effective solution to the QP problem and specification of the LFD.
Finally, the key indicator affecting the estimate calculation process and its precision is the number of the mesh nodes in the approximation C ( ϵ n ) of the uncertainty set C . It is a function of “the size of C / the mesh step ϵ n ′ ratio and dimensionality m of C .
All of the factors above characterize the limits of possible applicability of the proposed minimax estimation method for the solution to one or another practical problem.

Funding

The research was supported by the Ministry of Science and Higher Education of the Russian Federation, project No. 075-15-2020-799.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing is not applicable to this article.

Conflicts of Interest

The author declares no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
cdfcumulative distribution function
CEconditional expectation
EDFempirical distribution function
EKFextended Kalman filter
EM algorithmexpectation/maximization algorithm
LFDleast favorable distribution
MLEmaximum likelihood estimate
MS-optimaloptimal in the mean square sense
pdfprobability density function
QP problemquadratic programming problem

Appendix A

Proof of Lemma 1.
Conditions (v)—(viii) imply fulfillment of the inequalities:
L ( y | q ) sup x R n ν ( q , x | y ) 1 λ 0 k / 2 max x R n ϕ V ( x ) M < .
Furthermore, for ϵ ( 0 < ϵ < 1 ) there exists a compact set S ( ϵ ) B ( R n ) , such that S ( ϵ ) Ψ ( d x | q ) 1 ϵ , and by the Weierstrass theorem m ( y ) min ( q , x ) C × S ( ϵ ) ν ( q , x | y ) > 0 . Each measure F F can be associated with the measure μ F ( d q | y ) L ( y | q ) F ( d q ) . Obviously, μ F F , and μ F is finite, i.e., 0 < m ( y ) C μ F ( d q | y ) M < . Hence, y R k and F F . The measure F ( F , d q | y ) (15) is probabilistic; moreover F F . The measure F ( F , d q | y ) (16) is also a probabilistic one defined on ( C , B ( C ) ) , F F , and the denominator in (16) has the following lower and upper bounds:
0 < 1 M C L 1 ( y | q ) F ( d q ) 1 m ( y ) < .
From (15) and (16) it follows that F F , and the corresponding measure transformations are mutually inverse, i.e., F F the identity F ( F ( F ) ) F ( F ( F ) ) F holds, and, moreover, { F ( F ) : F F } = { F ( F ) : F F } = F . Assertion (1) of Lemma 1 is proven.
The set F L is *-weakly closed, because the set F L is, and the function L ( y | q ) is nonnegative, continuous and bounded in q C .
Let F 1 , F 2 F L be two arbitrary distributions from F L , and F α α F 1 + ( 1 α ) F 2 be its convex linear combination with a fixed parameter α [ 0 , 1 ] . We should prove that F α F L . By the definition of F L there exist distributions F 1 , F 2 F L such that F 1 = F ( F 1 ) and F 2 = F ( F 2 ) . Furthermore, for the convex combination F β = β F 1 + ( 1 β ) F 2 with
β α L ( F 2 | y ) α L ( F 2 | y ) + ( 1 α ) L ( F 1 | y ) [ 0 , 1 ] ,
we can verify easily that F α = F ( F β ) , i.e., F α F L . Assertion (2) of Lemma 1 is proven. □

Appendix B

Proof of Theorem 1.
The set H ( y ) = R by condition (ix); thus it is convex and closed. The set F L is convex and *-weakly closed due to Lemma 1. From this fact and (20) it follows that W ( F L | y ) is also a convex closed set. Moreover, it is bounded due to condition (viii). The function J (22) is strictly convex in η and concave (affine) in w. These conditions are sufficient for the existence of a saddle point [40]. It should be noted that both the set H ( y ) × W ( F L | y ) and the saddle point ( h ^ ( y ) , w ^ ( y ) ) depend on the observed sample y. For the saddle point the following equalities are true:
J ( h ^ ( y ) , w ^ ( y ) ) = min η H ( y ) max w W ( F L | y ) J ( η , w ) = max w W ( F L | y ) min η H ( y ) J ( η , w ) = max w W ( F L | y ) J * ( w ) ,
i.e., w ^ ( y ) Argmax w W ( F L | y ) J * ( w ) .
Now we prove the uniqueness of the saddle point w ^ ( y ) . Let w ( y ) = col ( w 1 ( y ) , w 2 ( y ) ) and w ( y ) = col ( w 1 ( y ) , w 2 ( y ) ) be two different saddle points, and J ( y ) J * ( w ( y ) ) = J * ( w ( y ) ) and w ( y ) α w ( y ) + ( 1 α ) w ( y ) be arbitrary convex combinations of the chosen points ( 0 < α < 1 ). After elementary algebraic transformations we have:
J * ( w ( y ) ) = J ( y ) + α ( 1 α ) w ( y ) 2 w 2 ( y ) 2 > J ( y ) ,
which contradicts our assumption that w ( y ) and w ( y ) are two different solutions to the finite-dimensional dual problem. Theorem 1 is proven. □

Appendix C

Proof of Corollary 2.
The set W ( F L | y ) B ( R + 1 ) is compact, and W ( F L | y ) conv W ( C | y ) . By the Krein–Milman theorem [37], each point of the set W ( F L | y ) can be represented as a convex combination at most of dim W ( F L | y ) + 1 extreme points of the set W ( F L | y ) .
Obviously, all extreme points of W ( F L | y ) belong to the set W ( C | y ) . Hence, for the point w ^ ( y ) which is a solution to the finite-dimensional dual problem (28), there exists a finite set { q s ( y ) } s = 1 , S ¯ C , 1 S dim W ( F L | y ) + 1 of parameters, and weights { P s ( y ) } s = 1 , S ¯ ( P s ( y ) 0 , s = 1 S P s ( y ) = 1 ) such that:
w ^ ( y ) = s = 1 S P s ( y ) w ( q s ( y ) | y ) .
The parameters and weights define the reference measure (15) on the space ( C , B ( C ) ) :
F ^ ( d q | y ) s = 1 S P s ( y ) δ q s ( y ) ( d q ) .
We can establish the initial measure by (16):
F ^ ( d q | y ) = s = 1 S L 1 ( q s ( y ) | y ) P s ( y ) δ q s ( y ) ( d q ) s = 1 S L 1 ( q s ( y ) | y ) P s ( y ) .
It is easy to verify that E F ^ h ( γ , X ) 2 | Y = y = w 1 ( y ) and E F ^ h ( γ , X ) | Y = y = w 2 ( y ) , i.e., F ^ is the required LFD. Corollary 2 is proven. □

Appendix D

Proof of Lemma 2.
Without loss of generality we suppose each ϵ n -mesh contains at least dim W ( F | y ) + 2 points. By Corollary 2 the solution to problem (28) can be represented in form (A1). By the condition of Lemma 2 there exists a set { q ^ s ( ϵ n | y ) } s = 1 , S ¯ C ( ϵ n ) , such that max 1 s S q ^ s ( y ) q s ( ϵ n | y ) ϵ n . For the vector w ( ϵ n | y ) s = 1 S P ^ s ( y ) w ( q s ( ϵ n | y ) | y ) the inequalities
| w ^ 1 ( y ) w 1 ( ϵ n | y ) | s = 1 S P ^ s ( y ) | w 1 ( q ^ s ( y ) | y ) w 1 ( q s ( ϵ n | y ) | y ) | ω 1 ( ϵ n | y ) ,
w ^ 2 ( y ) w 2 ( ϵ n | y ) s = 1 S P ^ s ( y ) w 1 ( q ^ s ( y ) | y ) w 1 ( q s ( ϵ n | y ) | y ) ω 2 ( ϵ n | y )
hold. Furthermore, the sequence of inequalities
max w conv ( W ( C ( ϵ n ) | y ) ) J * ( w ) = J ( y ) min w conv ( W ( C ( ϵ n ) | y ) ) ( w ^ 1 ( y ) w 1 + w 2 2 w ^ 2 ( y ) 2 )
J ( y ) | w ^ 1 ( y ) w 1 ( ϵ n | y ) | + w ^ 2 ( y ) w 2 ( ϵ n | y ) 2 2 w ^ 2 ( y ) , w ^ 2 ( y ) w 2 ( ϵ n | y )
J ( y ) ω 1 ( ϵ n | y ) + ω 2 ( ϵ n | y ) + 2 M m ( y ) K h ω 2 ( ϵ n | y )
proves the convergence max w conv ( W ( C ( ϵ n ) | y ) ) J * ( w ) J ( y ) as ϵ n 0 .
Let w ^ ( n | y ) w ^ ( y ) as ϵ n 0 . Then there exists a subsequence { ϵ n k } n k N , such that w ^ ( n k | y ) w ¯ ( y ) w ^ ( y ) . This means that J * ( w ^ ( y ) ) = J * ( w ¯ ( y ) ) , which contradicts the uniqueness of the solution to the finite-dimensional dual problem (28). Lemma 2 is proven. □

References

  1. Calafiore, G.; El Ghaoui, L. Robust maximum likelihood estimation in the linear model. Automatica 2001, 37, 573–580. [Google Scholar] [CrossRef]
  2. Kurzhanski, A.B.; Varaiya, P. Dynamics and Control of Trajectory Tubes; Birkhäuser: Basel, Switzerland, 2014. [Google Scholar]
  3. Matasov, A. Estimators for Uncertain Dynamic Systems; Kluwer: Dortrecht, The Netherlands, 1998. [Google Scholar]
  4. Borisov, A.V.; Pankov, A.R. Optimal filtering in stochastic discrete-time systems with unknown inputs. IEEE Trans. Autom. Control 1994, 39, 2461–2464. [Google Scholar] [CrossRef]
  5. Pankov, A.R.; Semenikhin, K.V. Minimax identification of a generalized uncertain stochastic linear model. Autom. Remote Control 1998, 59, 1632–1643. [Google Scholar]
  6. Poor, V.; Looze, D. Minimax state estimation for linear stochastic systems with noise uncertainty. IEEE Trans. Autom. Control 1981, 26, 902–906. [Google Scholar] [CrossRef]
  7. Soloviev, V. Towards the Theory of Minimax-Bayesian Estimation. Theory Probab. Its Appl. 2000, 44, 739–754. [Google Scholar] [CrossRef]
  8. Blackwell, D.; Girshick, M. Theory of Games and Statistical Decisions; Wiley: New York, NY, USA, 1954. [Google Scholar]
  9. Martin, C.; Mintz, M. Robust filtering and prediction for linear systems with uncertain dynamics: A game-theoretic approach. IEEE Trans. Autom. Control 1983, 28, 888–896. [Google Scholar] [CrossRef]
  10. Berger, J.O. Statistical Decision Theory and Bayesian Analysis; Springer: Berlin/Heidelberg, Germany, 1985. [Google Scholar]
  11. Anan’ev, B. Minimax Estimation of Statistically Uncertain Systems Under the Choice of a Feedback Parameter. J. Math. Syst. Estim. Control 1995, 5, 1–17. [Google Scholar]
  12. Borovkov, A. Mathematical Statistics; Australia Gordon & Breach: Blackburn, Australia, 1998. [Google Scholar]
  13. Epstein, L.; Ji, S. Ambiguous volatility, possibility and utility in continuous time. J. Math. Econ. 2014, 50, 269–282. [Google Scholar] [CrossRef] [Green Version]
  14. Borisov, A.V. A posteriori minimax estimation with likelihood constraints. Autom. Remote Control 2012, 73, 1481–1497. [Google Scholar] [CrossRef]
  15. Arkhipov, A.; Semenikhin, K. Minimax Linear Estimation with the Probability Criterion under Unimodal Noise and Bounded Parameters. Autom. Remote Control 2020, 81, 1176–1191. [Google Scholar] [CrossRef]
  16. Germeier, Y. Non-Antagonistic Games; Springer: New York, NY, USA, 1986. [Google Scholar]
  17. Elliott, R.J.; Moore, J.B.; Aggoun, L. Hidden Markov Models: Estimation and Control; Springer: New York, NY, USA, 1995. [Google Scholar]
  18. Yosida, K. Functional Analysis; Grundlehren der Mathematischen Wissenschaften; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  19. Liptser, R.; Shiryaev, A. Statistics of Random Processes: I. General Theory; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
  20. Kats, I.; Kurzhanskii, A. Estimation in Multistep Systems. Proc. USSR Acad. Sci. 1975, 221, 535–538. [Google Scholar]
  21. Petersen, I.R.; James, M.R.; Dupuis, P. Minimax optimal control of stochastic uncertain systems with relative entropy constraints. IEEE Trans. Autom. Control 2000, 45, 398–412. [Google Scholar] [CrossRef] [Green Version]
  22. Xie, L.; Ugrinovskii, V.A.; Petersen, I.R. Finite horizon robust state estimation for uncertain finite-alphabet hidden Markov models with conditional relative entropy constraints. In Proceedings of the 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601), Nassau, Bahamas, 14–17 December 2004; Volume 4, pp. 4497–4502. [Google Scholar] [CrossRef]
  23. El Karoui, N.; Jeanblanc Picque, M. Contrôle de processus de Markov. Séminaire Probab. Strasbg. 1988, 22, 508–541. [Google Scholar]
  24. Lee, E.; Markus, L. Foundations of Optimal Control Theory; SIAM Series in Applied Mathematics; Wiley: Hoboken, NJ, USA, 1967. [Google Scholar]
  25. Floyd, S.; Jacobson, V. Random early detection gateways for congestion avoidance. IEEE/ACM Trans. Netw. 1993, 1, 397–413. [Google Scholar] [CrossRef]
  26. Low, S.H.; Paganini, F.; Doyle, J.C. Internet congestion control. IEEE Control Syst. Mag. 2002, 22, 28–43. [Google Scholar] [CrossRef] [Green Version]
  27. Altman, E.; Avrachenkov, K.; Menache, I.; Miller, G.; Prabhu, B.J.; Shwartz, A. Dynamic Discrete Power Control in Cellular Networks. IEEE Trans. Autom. Control 2009, 54, 2328–2340. [Google Scholar] [CrossRef]
  28. Perruquetti, W.; Barbot, J.P. Sliding Mode Control in Engineering; Marcel Dekker, Inc.: New York, NY, USA, 2002. [Google Scholar]
  29. Arnold, B.F.; Stahlecker, P. Fuzzy prior information and minimax estimation in the linear regression model. Stat. Pap. 1997, 38, 377–391. [Google Scholar] [CrossRef]
  30. Donoho, D.; Johnstone, I.; Stern, A.; Hoch, J. Does the maximum entropy method improve sensitivity? Proc. Natl. Acad. Sci. USA 1990, 87, 5066–5068. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  31. Donoho, D.L.; Johnstone, I.M.; Hoch, J.C.; Stern, A.S. Maximum Entropy and the Nearly Black Object. J. R. Stat. Society. Ser. B 1992, 54, 41–81. [Google Scholar] [CrossRef]
  32. Pham, D.S.; Bui, H.H.; Venkatesh, S. Bayesian Minimax Estimation of the Normal Model with Incomplete Prior Covariance Matrix Specification. IEEE Trans. Inf. Theory 2010, 56, 6433–6449. [Google Scholar] [CrossRef] [Green Version]
  33. Donoho, D.L.; Johnstone, I.M. Minimax risk over lp-balls for lq-error. Probab. Theory Relat. Fields 1994, 99, 277–303. [Google Scholar] [CrossRef]
  34. Donoho, D.L.; Johnstone, I.M. Minimax estimation via wavelet shrinkage. Ann. Stat. 1998, 26, 879–921. [Google Scholar] [CrossRef]
  35. Donoho, D.L.; Johnstone, I.; Montanari, A. Accurate Prediction of Phase Transitions in Compressed Sensing via a Connection to Minimax Denoising. IEEE Trans. Inf. Theory 2013, 59, 3396–3433. [Google Scholar] [CrossRef] [Green Version]
  36. Bosov, A.; Borisov, A.; Semenikhin, K. Conditionally Minimax Prediction in Nonlinear Stochastic Systems. IFAC-PapersOnLine 2015, 48, 802–807. [Google Scholar] [CrossRef]
  37. Kadets, V. A Course in Functional Analysis and Measure Theory; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
  38. Gelman, A.; Carlin, J.B.; Stern, H.S.; Rubin, D.B. Bayesian Data Analysis, 2nd ed.; Chapman and Hall/CRC: London, UK, 2004. [Google Scholar]
  39. Anderson, B.; Moore, J. Optimal Filtering; Prentice-Hall: Upper Saddle River, NJ, USA, 1979. [Google Scholar]
  40. Grabiner, J.; Balakrishnan, A. Applications of Mathematics: Applied Functional Analysis; Applications of Mathematics; Springer: New York, NY, USA, 1981. [Google Scholar]
Figure 1. Estimation of the drift coefficient a.
Figure 1. Estimation of the drift coefficient a.
Mathematics 09 01080 g001
Figure 2. Estimation of the diffusion coefficient b.
Figure 2. Estimation of the diffusion coefficient b.
Mathematics 09 01080 g002
Figure 3. Estimation of the coefficient a under conformity constraint based on the likelihood function.
Figure 3. Estimation of the coefficient a under conformity constraint based on the likelihood function.
Mathematics 09 01080 g003
Figure 4. The EDF of Y and different cdf’s of Y under various choices of a.
Figure 4. The EDF of Y and different cdf’s of Y under various choices of a.
Mathematics 09 01080 g004
Figure 5. Estimation of the coefficient a under conformity constraint based on the EDF.
Figure 5. Estimation of the coefficient a under conformity constraint based on the EDF.
Mathematics 09 01080 g005
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Borisov, A. Minimax Estimation in Regression under Sample Conformity Constraints. Mathematics 2021, 9, 1080. https://doi.org/10.3390/math9101080

AMA Style

Borisov A. Minimax Estimation in Regression under Sample Conformity Constraints. Mathematics. 2021; 9(10):1080. https://doi.org/10.3390/math9101080

Chicago/Turabian Style

Borisov, Andrey. 2021. "Minimax Estimation in Regression under Sample Conformity Constraints" Mathematics 9, no. 10: 1080. https://doi.org/10.3390/math9101080

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop