Discussion about inaccuracy measure in information theory using co-copula and copula dual functions

https://doi.org/10.1016/j.jmva.2021.104725Get rights and content

Abstract

Inaccuracy measure is an important measure in information theory, which have been considered recently by many researchers, so that various generalizations have been introduced for this measure. In this paper, two new inaccuracy measures using co-copula and dual of a copula in copula theory are introduced and their properties under specific conditions are investigated. Including, under the establishment of proportional reversed hazard rate model and proportional hazard rate model, we obtain bounds and inequalities for these two inaccuracy measures, and we show that the triangle inequality can also exist for both of these measures. Also, under the assumption of radial symmetry, we prove the equality of these two new inaccuracies. In addition, we obtain a characterization property using the equality of these two inaccuracy measures for radially symmetric distributions. We provide examples to evaluate the results. Finally, in supplementary material section, by introducing estimators for the introduced inaccuracy measures, we examine some of the results using simulation methods and provide an example with real data.

Introduction

In information theory, there are various measures, which have been considered recently by many researchers because of their effectiveness in various scientific fields such as the copula theory, reliability theory, and survival analysis. Systems dealing with the transmission, storage, and processing of information have become commonplace at all levels of society. We live in a population commonly referred to as the information society and one of a key role is information. Hence it is not surprising that all different sectors want to know what is information in fact and thus to acquire more knowledge in order to use their effects as far as possible. This theory is a science that deals with the concept of the size and application of information. Entropy, introduced by Shannon [33], is one of the important measures in this theory that relates information with uncertainty. So it is an uncertainty measure; see Jan [13] for more details. The Shannon entropy for the absolutely continuous random variable X1 with support SX1 is defined by H(X1)=SX1f1(x1)log(f1(x1))dx1,where f1() is the probability density function of X1. This measure can indicate the magnitude of the error in estimating the probability of an event occurring due to inaccurate information in the observations and can also be used as a measure to detect a model as well as determining the model dispersion. Using the concept of entropy, Popel [28] showed that the transition from a weakly turbulent plasma state to a strongly turbulent one, can be treated as a nonequilibrium phase transition. More knowledge about the concepts of entropy and its applications can be found in Cover and Thomas [6].

Inaccuracy is another measure in information theory. Kerridge [15] defined an inaccuracy measure for two discrete random variables X1 and Y1 with probability mass functions p1() and p2() and countable supports SX1 and SY1, respectively. This measure (Kerridge’s inaccuracy) for determining X1 from Y1 reads as follows: I(X1,Y1)=x1SX1SY1p1(x1)log(p2(x1)).Subsequently, Nath [24] defined this measure as an important generalization of the Shannon entropy, for two absolutely continuous random variables X1 and Y1 by I(X1,Y1)=x1SX1SY1f1(x1)log(g1(x1))dx1,where f1() and g1() are the probability density functions of the variables X1 and Y1 and SX1 and SY1 are supports of X1 and Y1, respectively. Kerridge [15] about one of the main reasons to pay attention to the inaccuracy measure said: “Suppose that an experimenter states the probabilities of the various possible outcomes of an experiment. His statement can lack precision in two ways: he may not have enough information and so his statement is vague, or some of them, which he has, may be incorrect. All statistical estimations and inference problems are concerned with making statements which may be inaccurate in either or both of these ways. The communication theory of Shannon and Weaver [34] provides a general theory of uncertainty which enables us to deal with the vagueness aspect of inaccuracy: the usefulness of this in statistical problems has been shown by several authors, including Lindley [19] and Kullback and Leibler [16]. However, the theory so far has not been able to deal with inaccuracy in its wider sense, and so its use has been limited. This limitation is now removed with introduce inaccuracy measure”. Also he said: “In the ordinary development of communication theory there is an interesting duality between information and entropy, or uncertainty. This is because one reasonable measure of the uncertainty of a situation is the amount of knowledge which would have to be obtained before certainty could be achieved”. Then he showed that the inaccuracy can be related to an amount of missing information.

The inaccuracy measure indicates the error that the experimenter commits in estimating the probability of events occurring due to incorrect use of the G1 distribution rather than the correct distribution of F1 and also due to incorrect information in the observations. Clearly, if X1 and Y1 are identically distributed, then I(X1,Y1)=H(X1). On the other hand, since the inaccuracy measure establishes equations I(X1,X1)=H(X1) and I(X1,Y1)=H(X1)+x1SX1SY1f1(x1)logf1(x1)g1(x1),dx1,and due to the nonnegativity of the last integral, this measure and even its generalizations can be used as a statistic of goodness-of-fit test; see Park et al. [27] for more details. Also Kumar et al. [17] introduced a dynamic measure of inaccuracy between two past lifetime distributions over the interval (0,t), and they studied a characterization problem for this dynamic inaccuracy measure based on proportional reversed hazard rate model. Kundu et al. [18] studied the cumulative residual and past inaccuracy measures for truncated random variables, which are extensions of the corresponding cumulative entropies, and they obtained several properties for them. Psarrakos and Di Crescenzo [30] introduced and studied an inaccuracy measure concerning the relevation transform of two nonnegative continuous random variables. The inaccuracy measure was extended from univariate to multivariate. In this paper, the bivariate inaccuracy is considered. Suppose that X=(X1,X2) and Y=(Y1,Y2) are two continuous bivariate random vectors with joint probability density functions f and g and joint supports SX=(lX1,rX1)×(lX2,rX2),SY=(lY1,rY1)×(lY2,rY2),respectively, such that SX=SY and lXi<rXi+, for i{1,2}. In this case, the inaccuracy measure in the bivariate situation is BI(X,Y)=lX1rX1lX2rX2f(x1,x2)log(g(x1,x2))dx2dx1.Ghosh and Kundu [8] defined the bivariate cumulative past inaccuracy in terms of the bivariate distribution function as BCPI(X,Y)=lX1rX1lX2rX2F(x1,x2)log(G(x1,x2))dx2dx1,where F(,) and G(,) are the joint distribution functions of the random vectors X and Y, respectively. In (2), if the joint distribution functions are replaced by the joint survival functions, then a bivariate cumulative residual inaccuracy is obtained as follows: BCRI(X,Y)=lX1rX1lX2rX2F¯(x1,x2)log(G¯(x1,x2))dx2dx1,where F¯(,) and G¯(,) are the joint survival functions of the random vectors X and Y, respectively. If X and Y are identically distributed, then Eqs. (1), (2), and (3) are converted to bivariate entropy, bivariate cumulative past entropy, and bivariate cumulative residual entropy, respectively. In recent years, the use of copula functions in information theory for obtaining new results has attracted the attention of researchers. Indeed according to the authors researches, little research has been done on the subject of inaccuracy measure based on copula theory. Ahmadi et al. [1] developed a copula-based approach aiming to express the dynamic mutual information for past and residual bivariate lifetimes in an alternative way. Ma and Sun [20] provided a new way of understanding and estimating the mutual information using the copula function. Hao and Singh [11] proposed a maximum entropy copula method for multisite monthly streamflow simulation, in which the temporal and spatial dependence structure is imposed as constraints to derive the maximum entropy copula. Gronneberg and Hjort [9] adapted the arguments leading to the original Akaike information criterion (AIC) formula, related to empirical estimation of a certain Kullback–Leibler distance. This resulted a significantly different formula compared with the AIC, which is named the copula information criterion. Pougaza and Djafari [29] were interested to find the bivariate distribution when they knew only its marginals. They determined a multivariate distribution maximized an entropy with given marginals. Mohtashami and Amini [22] obtained various measures in view of copulas for bivariate distributions. They investigated properties of information measures and their links with copula. Hosseini and Ahmadi [12] discussed the inaccuracy in terms of copula density function as well as copula function and introduced two new inaccuracy measures, namely, copula density inaccuracy and copula cumulative past inaccuracy, which are, respectively, of the form CI(X,Y)=0101cX(u,v)log(cY(G1(F11(u)),G2(F21(v))))dvduand CCPI(X,Y)=0101CX(u,v)log(CY(G1(F11(u)),G2(F21(v))))dvdu,where c(,) and C(,) are two important functions in copula theory and known as the copula density function and copula function, respectively.

Accordingly, in this paper, we intend to extend the use of copula theory concepts to bivariate inaccuracy measures (1), (2), and (3) in information theory. To this end, in the second section, we first recall the basic concepts of the copula function. In Section 3, we introduce a new inaccuracy measure, which is known as the co-copula inaccuracy, and then we obtain upper and lower bounds for the proportional hazard rate model. We also obtain results and inequalities such as the triangle inequality for co-copula inaccuracy by establishing a proportional hazard rate model or lower orthant random order between random vectors. In the fourth section, we introduce another measure of inaccuracy based on the copula dual function, and we call it as the copula dual inaccuracy measure. Also, under the assumption of radial symmetry, we show that these two new inaccuracies are equal and obtain a characterization property from equality of these two inaccuracies under the conditions for radially symmetric distributions. In addition, for these two new inaccuracy measures, we derive results and inequalities, including the triangle inequality, by establishing a proportional (reversed) hazard rate model or upper and lower orthant random order between random vectors. In Section 5, we give some examples to further examine some of the results. In supplementary material section, by introducing estimators for the inaccuracy measures, we examine some of the results obtained by using simulation methods, and we also provide an example with real data. It should be noted that further results and examples are provided in Appendix A.

Section snippets

Copula function and preliminary results

Copulas are functions that link between multivariate distribution functions and their marginal distribution functions. In other words, the copulas are multivariate distribution functions that their one-dimensional marginals are uniformly distributed over the distance (0,1). The copula concept was introduced by Sklar [36]. Also Fisher [7] outlined the reasons for attention to copulas in statistics and probability. He mentioned that copulas are the starting point for constructing families of

Co-copula inaccuracy measure

If, in relation (4), the function C(,) is replaced by the function C(,), then the following new inaccuracy measure, CCI, is obtained: CCI(X,Y)=0101CX(u,v)log(CY(G¯1(F¯11(u)),G¯2(F¯21(v))))dvdu.By considering (10), it can be concluded that CCI(X,Y)0. In the special case when Yi=stXi, relation (14) leads to the following equality: CCI(X,Y)=0101CX(u,v)log(CY(u,v))dvdu.

Copula dual inaccuracy measure

In relation (4), suppose that the function C(,) is replaced by the function C̃(,). Then the following new inaccuracy measure, CDI, is obtained: CDI(X,Y)=0101C̃X(u,v)log((C̃Y(G1(F11(u),G2(F21(v))))))dvdu.By considering relation (11), it can be concluded that CDI(X,Y)0. In special case when Yi=stXi, relation (23) is reduced to CDI(X,Y)=0101C̃X(u,v)log(C̃Y(u,v))dvdu.

Examples

In this section, four examples are presented to examine some of the results.

Example 1

Suppose that X and Y have the copulas FGM and AMH, respectively, as follows: CX(u,v)=uv(1+θ(1u)(1v))and CY(u,v)=uv1λ(1u)(1v),where 1θ1 and 1λ1 are the dependency coefficients (see Nelsen [25] and Nelsen et al. [26] for further reading). Also X1 and X2 have the standard exponential distribution, and Y1 and Y2 have the exponential distribution with parameters α and β, respectively. By using relationships (10),

Conclusion

Inaccuracy is one of the measures in information theory. This measure can be shown in estimating the probability of events occurring, the error caused by the use of the claimed distribution by the experimenter instead of the original distribution, and the incorrect information in the observations. It is also used as a tool in goodness-of-fit tests. The inaccuracy measure can be used to various sciences such as the reliability theory, copula theory, survival analysis and lead to new results. In

CRediT authorship contribution statement

Toktam Hosseini: Data curation, Writing - original draft. Mehdi Jabbari Nooghabi: Conceptualization, Methodology Management, Editing.

Acknowledgments

We thank the editor, associate editor and reviewers for their careful reviews and insightful comments, which have led to a significant improvement of this article. This research was supported by a grant from Ferdowsi University of Mashhad, Iran ; No. 2/53086.

References (40)

  • ChenL. et al.

    Copulas and Its Application in Hydrology and Water Resources

    (2019)
  • CoverT.M. et al.

    Elements of Information Theory

    (2006)
  • FisherN.I.

    Copulas

  • GhoshA. et al.

    Bivariate extension of (dynamic) cumulative residual and past inaccuracy measures

    Statist. Papers

    (2019)
  • GronnebergS. et al.

    The copula information criteria

    Scand. J. Stat.

    (2014)
  • HaoZ. et al.

    Modeling multisite streamflow dependence with maximum entropy copula

    Water Resour. Res.

    (2013)
  • T. Hosseini, J. Ahmadi, Results on inaccuracy measure in information theory based on copula function, in: Proceedings...
  • JanC.A. et al.

    Information Theory

    (1997)
  • KerridgeD.F.

    Inaccuracy and inference

    J. R. Stat. Soc. Ser. B Stat. Methodol.

    (1961)
  • KullbackS. et al.

    On information and sufficiency

    Ann. Math. Stat.

    (1951)
  • View full text