Discussion about inaccuracy measure in information theory using co-copula and copula dual functions
Introduction
In information theory, there are various measures, which have been considered recently by many researchers because of their effectiveness in various scientific fields such as the copula theory, reliability theory, and survival analysis. Systems dealing with the transmission, storage, and processing of information have become commonplace at all levels of society. We live in a population commonly referred to as the information society and one of a key role is information. Hence it is not surprising that all different sectors want to know what is information in fact and thus to acquire more knowledge in order to use their effects as far as possible. This theory is a science that deals with the concept of the size and application of information. Entropy, introduced by Shannon [33], is one of the important measures in this theory that relates information with uncertainty. So it is an uncertainty measure; see Jan [13] for more details. The Shannon entropy for the absolutely continuous random variable with support is defined by where is the probability density function of . This measure can indicate the magnitude of the error in estimating the probability of an event occurring due to inaccurate information in the observations and can also be used as a measure to detect a model as well as determining the model dispersion. Using the concept of entropy, Popel [28] showed that the transition from a weakly turbulent plasma state to a strongly turbulent one, can be treated as a nonequilibrium phase transition. More knowledge about the concepts of entropy and its applications can be found in Cover and Thomas [6].
Inaccuracy is another measure in information theory. Kerridge [15] defined an inaccuracy measure for two discrete random variables and with probability mass functions and and countable supports and , respectively. This measure (Kerridge’s inaccuracy) for determining from reads as follows: Subsequently, Nath [24] defined this measure as an important generalization of the Shannon entropy, for two absolutely continuous random variables and by where and are the probability density functions of the variables and and and are supports of and , respectively. Kerridge [15] about one of the main reasons to pay attention to the inaccuracy measure said: “Suppose that an experimenter states the probabilities of the various possible outcomes of an experiment. His statement can lack precision in two ways: he may not have enough information and so his statement is vague, or some of them, which he has, may be incorrect. All statistical estimations and inference problems are concerned with making statements which may be inaccurate in either or both of these ways. The communication theory of Shannon and Weaver [34] provides a general theory of uncertainty which enables us to deal with the vagueness aspect of inaccuracy: the usefulness of this in statistical problems has been shown by several authors, including Lindley [19] and Kullback and Leibler [16]. However, the theory so far has not been able to deal with inaccuracy in its wider sense, and so its use has been limited. This limitation is now removed with introduce inaccuracy measure”. Also he said: “In the ordinary development of communication theory there is an interesting duality between information and entropy, or uncertainty. This is because one reasonable measure of the uncertainty of a situation is the amount of knowledge which would have to be obtained before certainty could be achieved”. Then he showed that the inaccuracy can be related to an amount of missing information.
The inaccuracy measure indicates the error that the experimenter commits in estimating the probability of events occurring due to incorrect use of the distribution rather than the correct distribution of and also due to incorrect information in the observations. Clearly, if and are identically distributed, then . On the other hand, since the inaccuracy measure establishes equations and and due to the nonnegativity of the last integral, this measure and even its generalizations can be used as a statistic of goodness-of-fit test; see Park et al. [27] for more details. Also Kumar et al. [17] introduced a dynamic measure of inaccuracy between two past lifetime distributions over the interval , and they studied a characterization problem for this dynamic inaccuracy measure based on proportional reversed hazard rate model. Kundu et al. [18] studied the cumulative residual and past inaccuracy measures for truncated random variables, which are extensions of the corresponding cumulative entropies, and they obtained several properties for them. Psarrakos and Di Crescenzo [30] introduced and studied an inaccuracy measure concerning the relevation transform of two nonnegative continuous random variables. The inaccuracy measure was extended from univariate to multivariate. In this paper, the bivariate inaccuracy is considered. Suppose that and are two continuous bivariate random vectors with joint probability density functions and and joint supports respectively, such that and , for . In this case, the inaccuracy measure in the bivariate situation is Ghosh and Kundu [8] defined the bivariate cumulative past inaccuracy in terms of the bivariate distribution function as where and are the joint distribution functions of the random vectors and , respectively. In (2), if the joint distribution functions are replaced by the joint survival functions, then a bivariate cumulative residual inaccuracy is obtained as follows: where and are the joint survival functions of the random vectors and , respectively. If and are identically distributed, then Eqs. (1), (2), and (3) are converted to bivariate entropy, bivariate cumulative past entropy, and bivariate cumulative residual entropy, respectively. In recent years, the use of copula functions in information theory for obtaining new results has attracted the attention of researchers. Indeed according to the authors researches, little research has been done on the subject of inaccuracy measure based on copula theory. Ahmadi et al. [1] developed a copula-based approach aiming to express the dynamic mutual information for past and residual bivariate lifetimes in an alternative way. Ma and Sun [20] provided a new way of understanding and estimating the mutual information using the copula function. Hao and Singh [11] proposed a maximum entropy copula method for multisite monthly streamflow simulation, in which the temporal and spatial dependence structure is imposed as constraints to derive the maximum entropy copula. Gronneberg and Hjort [9] adapted the arguments leading to the original Akaike information criterion (AIC) formula, related to empirical estimation of a certain Kullback–Leibler distance. This resulted a significantly different formula compared with the AIC, which is named the copula information criterion. Pougaza and Djafari [29] were interested to find the bivariate distribution when they knew only its marginals. They determined a multivariate distribution maximized an entropy with given marginals. Mohtashami and Amini [22] obtained various measures in view of copulas for bivariate distributions. They investigated properties of information measures and their links with copula. Hosseini and Ahmadi [12] discussed the inaccuracy in terms of copula density function as well as copula function and introduced two new inaccuracy measures, namely, copula density inaccuracy and copula cumulative past inaccuracy, which are, respectively, of the form and where and are two important functions in copula theory and known as the copula density function and copula function, respectively.
Accordingly, in this paper, we intend to extend the use of copula theory concepts to bivariate inaccuracy measures (1), (2), and (3) in information theory. To this end, in the second section, we first recall the basic concepts of the copula function. In Section 3, we introduce a new inaccuracy measure, which is known as the co-copula inaccuracy, and then we obtain upper and lower bounds for the proportional hazard rate model. We also obtain results and inequalities such as the triangle inequality for co-copula inaccuracy by establishing a proportional hazard rate model or lower orthant random order between random vectors. In the fourth section, we introduce another measure of inaccuracy based on the copula dual function, and we call it as the copula dual inaccuracy measure. Also, under the assumption of radial symmetry, we show that these two new inaccuracies are equal and obtain a characterization property from equality of these two inaccuracies under the conditions for radially symmetric distributions. In addition, for these two new inaccuracy measures, we derive results and inequalities, including the triangle inequality, by establishing a proportional (reversed) hazard rate model or upper and lower orthant random order between random vectors. In Section 5, we give some examples to further examine some of the results. In supplementary material section, by introducing estimators for the inaccuracy measures, we examine some of the results obtained by using simulation methods, and we also provide an example with real data. It should be noted that further results and examples are provided in Appendix A.
Section snippets
Copula function and preliminary results
Copulas are functions that link between multivariate distribution functions and their marginal distribution functions. In other words, the copulas are multivariate distribution functions that their one-dimensional marginals are uniformly distributed over the distance . The copula concept was introduced by Sklar [36]. Also Fisher [7] outlined the reasons for attention to copulas in statistics and probability. He mentioned that copulas are the starting point for constructing families of
Co-copula inaccuracy measure
If, in relation (4), the function is replaced by the function , then the following new inaccuracy measure, CCI, is obtained: By considering (10), it can be concluded that . In the special case when , relation (14) leads to the following equality:
Copula dual inaccuracy measure
In relation (4), suppose that the function is replaced by the function . Then the following new inaccuracy measure, CDI, is obtained: By considering relation (11), it can be concluded that . In special case when , relation (23) is reduced to
Examples
In this section, four examples are presented to examine some of the results.
Example 1 Suppose that and have the copulas FGM and AMH, respectively, as follows: and where and are the dependency coefficients (see Nelsen [25] and Nelsen et al. [26] for further reading). Also and have the standard exponential distribution, and and have the exponential distribution with parameters and , respectively. By using relationships (10),
Conclusion
Inaccuracy is one of the measures in information theory. This measure can be shown in estimating the probability of events occurring, the error caused by the use of the claimed distribution by the experimenter instead of the original distribution, and the incorrect information in the observations. It is also used as a tool in goodness-of-fit tests. The inaccuracy measure can be used to various sciences such as the reliability theory, copula theory, survival analysis and lead to new results. In
CRediT authorship contribution statement
Toktam Hosseini: Data curation, Writing - original draft. Mehdi Jabbari Nooghabi: Conceptualization, Methodology Management, Editing.
Acknowledgments
We thank the editor, associate editor and reviewers for their careful reviews and insightful comments, which have led to a significant improvement of this article. This research was supported by a grant from Ferdowsi University of Mashhad, Iran ; No. 2/53086.
References (40)
- et al.
Simulating a multivariate sea storm using archimedean copulas
Coast. Eng.
(2013) - et al.
Proportional reversed hazard rate model and its applications
J. Statist. Plann. Inference
(2007) Modelling dependence
Insurance Math. Econom.
(2008)- et al.
Mutual information is copula entropy
Tsinghua Sci. Technol.
(2011) - et al.
On cumulative residual Kullback–Leibler information
Statist. Probab. Lett.
(2012) - et al.
A revisit to the dependence structure between the stock and foreign exchange markets: A dependence-switching copula approach
J. Bank. Financ.
(2013) - et al.
Nonparametric estimation of the dependence function for a multivariate extreme value distribution
J. Multivariate Anal.
(2008) - et al.
On dynamic mutual information for bivariate lifetimes
Adv. Appl. Probab.
(2015) - et al.
Symmetry and dependence properties within a semiparametric family of bivariate copulas
J. Nonparametr. Stat.
(2002) - et al.
Characterizations of proportional hazard and reversed hazard rate models based on symmetric and asymmetric Kullback–Leibler divergences
Sankhya B
(2019)
Copulas and Its Application in Hydrology and Water Resources
Elements of Information Theory
Copulas
Bivariate extension of (dynamic) cumulative residual and past inaccuracy measures
Statist. Papers
The copula information criteria
Scand. J. Stat.
Modeling multisite streamflow dependence with maximum entropy copula
Water Resour. Res.
Information Theory
Inaccuracy and inference
J. R. Stat. Soc. Ser. B Stat. Methodol.
On information and sufficiency
Ann. Math. Stat.
Cited by (3)
Some Generalizations Concerning Inaccuracy Measures
2023, Results in MathematicsEstimation of Weighted Residual Inaccuracy Measure for Right Censored Dependent Data
2021, Journal of the Indian Society for Probability and Statistics