Storage Space Allocation Strategy for Digital Data with Message Importance

Liu, Shanyun; She, Rui; Zhu, Zheqi; Fan, Pingyi

doi:10.3390/e22050591

Open AccessArticle

Storage Space Allocation Strategy for Digital Data with Message Importance

¹

Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

²

Beijing National Research Center for Information Science and Technology (BNRist), Beijing 100084, China

^*

Author to whom correspondence should be addressed.

Entropy 2020, 22(5), 591; https://doi.org/10.3390/e22050591

Submission received: 29 April 2020 / Accepted: 19 May 2020 / Published: 25 May 2020

(This article belongs to the Special Issue Entropy Measures for Data Analysis II: Theory, Algorithms and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

This paper mainly focuses on the problem of lossy compression storage based on the data value that represents the subjective assessment of users when the storage size is still not enough after the conventional lossless data compression. To this end, we transform this problem to an optimization, which pursues the least importance-weighted reconstruction error in data reconstruction within limited total storage size, where the importance is adopted to characterize the data value from the viewpoint of users. Based on it, this paper puts forward an optimal allocation strategy in the storage of digital data by the exponential distortion measurement, which can make rational use of all the storage space. In fact, the theoretical results show that it is a kind of restrictive water-filling. It also characterizes the trade-off between the relative weighted reconstruction error and the available storage size. Consequently, if a relatively small part of total data value is allowed to lose, this strategy will improve the performance of data compression. Furthermore, this paper also presents that both the users’ preferences and the special characteristics of data distribution can trigger the small-probability event scenarios where only a fraction of data can cover the vast majority of users’ interests. Whether it is for one of the reasons above, the data with highly clustered message importance is beneficial to compression storage. In contrast, from the perspective of optimal storage space allocation based on data value, the data with a uniform information distribution is incompressible, which is consistent with that in the information theory.

Keywords:

lossy compression storage; optimal allocation strategy; weighted reconstruction error; message importance measure; importance coefficient

1. Introduction

As large amounts of mobile devices such as Internet of things (IoT) devices or smartphones are utilized, the contradiction between limited storage space and sharply increasing data deluge becomes increasingly serious in the era of big data [1,2]. This exceedingly massive data makes the conventional data storage mechanisms inadequate within a tolerable time, and therefore the data storage is one of the major challenges in big data [3]. Note that storing all the data becomes more and more dispensable nowadays, and it is also not conducive to reduce data transmission costs [4,5]. In fact, data compression storage is widely adopted in many applications, such as IoT [2], industrial data platform [6], bioinformatics [7], wireless networking [8]. Thus, the research on data compression storage becomes increasingly paramount and compelling nowadays.

In conventional source coding, data compression is carried out by removing the data redundancy, where short descriptions are assigned to the most frequent class [9]. Based on it, the tight bounds for lossless data compression are given. In order to further increase the compression rate, one needs to use more information. A quintessential example is to use some side information [10]. Another possible solution is to compress the data with quite a few losses first and then reconstruct them with acceptable distortion, which is referred to as lossy compression [11,12,13]. Some adaptive compressions are adopted extensively. For example, Reference [14] proposed an adaptive compression scheme in IoT systems, and Reference [15] investigated the backlog-adaptive source coding system in terms of age of information. In fact, most of the previous compression methods usually carried out compression by means of contextual data or leveraging data transformation techniques [4].

Although these previous methods of data compression perform satisfactorily in their respective application scenarios, there is still much room for improvement when facing rapidly growing large-scale data. Moreover, they also do not take the data value into account. This paper focuses on the problem of how to further compress data with acceptable distortion to implement the specified requirements in data storage when the storage size is still not enough to guarantee the lossless storage after the conventional lossless data compression. This paper will realize this goal by reallocating storage space based on the data value which represents the subjective assessment of users. Here, we take the importance-aware weighting in the weighted reconstruction error to measure the total cost in data storage with unequal costs.

Generally, users prefer to care about the crucial part of data that attracts their attention rather than the whole data itself. In many real-world applications, such as cost-sensitive learning [16,17,18] and unequal error protection [19,20], different errors bring different costs. To be specific, the distortion in the data that users care about may be catastrophic if the loss of some data being insignificant for users is allowed. Similar to coresets [21], the data needing to be processed was reduced to those users as the main focus rather than the whole data set. Unlike coresets, the data needing to be processed in this paper no longer pursues approximately representing the raw data, and it is expected to minimize the storage cost with respect to the importance weighting value. In fact, although the data deluge sharply increases, the significant data that users care about is still rare in a lot of scenarios of big data. In this sense, it can be regarded as the sparse representation from the perspective of the data value, and we can use it to compress data.

Alternatively, it is interesting to achieve data compression by storing a fraction of data, which preserves as much information as possible regarding the data that users care about [22,23]. This paper also employs this strategy. However, there are subtle but critical differences between the compression storage strategy proposed in this paper with those in Reference [22,23]. In fact, Reference [22] focused on Pareto-optimal data compression, which presents the trade-off between retained entropy and class information. However, this paper puts forward an optimal compression storage strategy for digital data from the viewpoint of message importance, and it gives the trade-off between the relative weighted reconstruction error (RWRE) and the available storage size. Furthermore, the compression method based on message importance was preliminarily discussed in Reference [23] to solve the big data storage problem in wireless communications, while this paper will aim to discuss the optimal storage space allocation strategy with limited storage space, in general, based on message importance. Moreover, the constraints are also different. That is, the available storage size is limited in this paper, while the total code length of all the events is given in Reference [23].

From users’ attention viewpoint, the data value can be considered as the subjective assessment of users on the importance of data. Actually, much of the research in the last decade suggested that the study from the perspective of message importance is rewarding to obtain new findings [20,24,25]. Thus, there may be effective performance improvement in storage systems when taking message importance into account. For example, Reference [26] discussed the lossy image compression method with the aid of a content-weighted importance map. Since any quantity can be seen as important if it agrees with the intuitive characterization of the user’s subjective degree of concern of data, the cost in data reconstruction for specific user preferences is regarded as the importance in this paper, which will be used as the weight in the weighted reconstruction error.

Since we desire to achieve data compression by keeping only a small portion of important data and abandoning less important data, this paper mainly focuses on the case where only a fraction of data take up the vast majority of the users’ interests. Actually, these scenarios are not rare in big data. A quintessential example should be cited that the minority subset detection is overwhelmingly paramount in intrusion detection [27,28]. Moreover, this phenomenon is also exceedingly typical in financial crime detection systems for the fact that only a few illicit identities catch our eyes to prevent financial frauds [29]. Actually, when a certain degree of information loss can be acceptable, people prefer to take high-probability events for granted and abandon them to maximize the compressibility. These cases are referred to as small-probability event scenarios in this paper. In order to depict the message importance in small-probability event scenarios, message importance measure (MIM) was proposed in Reference [30]. Furthermore, MIM is fairly effective in many applications of big data, such as IoT [31], mobile edge computing [32]. In addition, Reference [33] expanded MIM to the general case, and it presented that MIM can be adopted as a special weight in designing the recommendation system. Since there is no universal data value model, we might as well take the case where the MIM describes the cost of the error as a quintessential example to analyze the property of the optimal storage space allocation strategy.

In this paper, we firstly propose a particular storage space allocation strategy for digital data on the best effort in minimizing the importance-weighted reconstruction error when the total available storage size is provided. For digital data, we formulate this problem as an optimization problem, and present the optimal storage strategy by means of a kind of restrictive water-filling. For the given available storage size, the storage size is mainly determined by the values of message importance and probability distribution of event class in a data sequence. In fact, this optimal allocation strategy adaptively prefers to provide more storage size for crucial data classes in order to make the rational use of resources, which is in accord with the cognitive mechanism of human beings.

Afterward, we focus on the properties of this optimal storage space allocation strategy when the importance weights are characterized by MIM. It is noted that there is a trade-off between the RWRE and the available storage size. The constraints on the performance of this storage system are true, and they depend on the importance coefficient and the probability distribution of events classes. On the one hand, the RWRE increases with the increasing of the absolute value of importance coefficient for the fact that the overwhelming majority of important information will gather in a fraction of data as the importance coefficient increases to negative/positive infinity, which suggests the influence of users’ preferences. On the other hand, the compression performance is also affected by probability distribution of event classes. In fact, the more closely the probability distribution matches the requirement of the small-probability event scenarios, the more effective this compression strategy becomes. Furthermore, it is also obtained that the RWRE in a uniform distribution is larger than any other distributions for the same available storage size. In this regard, the uniform distribution is incompressible from the perspective of optimal storage space allocation based on data value, which is consistent with the conclusion in information theory [34].

The main contributions of this paper can be summarized as follows. (1) It proposes a new digital data compression strategy taking message importance into account, which can help improve the design of a big data storage system. (2) We illuminate the properties of this new method, which can characterize the trade-off between the RWRE and the available storage size. (3) It shows that the data with highly clustered message importance is beneficial to compression storage, and it also finds that the data with a uniform information distribution is incompressible from the perspective of optimal storage space allocation based on data value, which is consistent with that in information theory.

The rest of this paper is organized as follows. The system model is introduced in Section 2, including the definition of weighted reconstruction error, distortion measure, and problem formulation. In Section 3, we solve the problem of optimal storage space allocation in three kinds of system models and give the solutions. The properties of this optimal storage space allocation strategy based on MIM are fully discussed in Section 4. The effects of the importance coefficient and the probability of event classes on RWRE are also investigated in detail. Section 5 illuminates the properties of this optimal storage strategy when the importance weight is characterized by Non-parametric MIM. The numerical results are shown and discussed in Section 6, which verifies the validity of the developed theoretical results in this paper. Finally, we give the conclusion in Section 7.

2. System Model

This section introduces the system model, including the definition of the weighted reconstruction error, the modeling of distortion measure, in order to illustrate how we formulate the lossy compression problem as an optimization problem for digital data based on message importance. In order to make the formulation and discussion more clear, the main notations in this paper are listed in Table 1.

2.1. Modeling Weighted Reconstruction Error Based on Message Importance

The data storage system may lack storage space frequently when facing a super-large scale of data to store. When the storage size is still not enough after the lossless conventional data compression, the optimum allocation of storage space based on data value may be imperative. For this purpose, we consider the following storage system, which stores K pieces of data. Let

x = x_{1}, x_{2}, \dots, x_{k}, \dots, x_{K}

be the sequence of raw data. Assume that all the data redundancy have been removed after the lossless conventional data compression, and each data

x_{k}

needs to take up storage space with size of

S x_{k}

if this data can be recovered without any distortion. However, in many scenarios of big data, the storage size is still not enough in this case. That is to say, the actual required storage space

\sum_{k = 1}^{K} S x_{k}

is larger than the maximum available storage space

T K

, where T is the maximum available average storage size.

In fact, users prefer to care about the paramount part of data that attracts their attention rather than the whole data itself. In this perspective, storing all data without distortion may be unnecessary. Considering that the natural distribution of storage space is not invariably reasonable and the high value data in big data is usually sparse, the rational storage space allocation by minimizing the loss of data value may solve the above problem of insufficient storage space, if a certain amount of data value is allowed to be lost. After the data compression by means of the rational storage space allocation, we use

{\hat{x}}_{1}, {\hat{x}}_{2}, \dots, {\hat{x}}_{k}, \dots, {\hat{x}}_{K}

to denote the compressed data sequence, and assume that the compressed data

{\hat{x}}_{k}

takes up storage space with size of

S {\hat{x}}_{k}

in practice for

1 \leq k \leq K

.

The lossy data compression usually pursues the least the storage cost while retaining as much information users required as possible [22]. In the lossless conventional data compression, the costs of different data are assumed to be the same. However, different kinds of errors may result in unequal costs in many real-world applications [16,17,18,19]. In this model, we use the notation

W_{k}

to denote the error cost for the reconstructed data. Namely,

W_{k}

is with respect to the data value of data

x_{k}

, and it is regarded as the message importance in this paper. Here, we define the weighted reconstruction error to describe the total cost in data storage with unequal costs, which is given by

\begin{matrix} D (x, W) = \frac{1}{K} \sum_{k = 1}^{K} W_{k} D_{f} (S x_{k}, S {\hat{x}}_{k}), \end{matrix}

(1)

where

D_{f} (S x_{k}, S {\hat{x}}_{k})

characterizes the distortion between the raw data and the compressed data in data reconstruction, which characterizes the loss degree of data value with allocated storage size.

Consider the situation where the data is stored according to its category for easier retrieval, which can also make the recommendation system based on it more effective [33]. Since data classification is becoming increasingly convenient and accurate nowadays due to the rapid development of machine learning [35,36], this paper assumes that the event class can be easily detected and known in the storage system. Moreover, assume the data that belongs to the same class has the same importance-weight and occupies the same storage size. Hence,

x

can be seen as a sequence of K symbols from an alphabet

{a_{1}, a_{2}, \dots, a_{n}}

where

a_{i}

represents event class i. This storage model is summarized and shown in Figure 1. In this case, the weighted reconstruction error based on importance is formulated as

\begin{matrix} D (x, W) & = \sum_{i = 1}^{n} \frac{N (a_{i} | x)}{K} W_{i} D_{f} (S a_{i}, S {\hat{a}}_{i}) \end{matrix}

(2)

\begin{matrix} = \sum_{i = 1}^{n} p_{i} W_{i} D_{f} (S a_{i}, S {\hat{a}}_{i}), \end{matrix}

(2a)

where

N (a_{i} | x)

is the number of times the i-class occurs in the sequence

x

. Let

p_{i} = N (a_{i} | x) / K

denote the probability of event class i in data sequence

x

.

2.2. Modeling Distortion between the Raw Data and the Compressed Data

We focus on the formula of

D_{f}

in this part, which characterizes the distortion between the raw data and the compressed data with specified storage size. Usually, there is no universal characterization of distortion measure, especially in speech coding and image coding [34]. In fact,

D_{f}

should characterize the loss degree of data value with allocated storage size. In this respect, the conventional distortion measures are not appropriate since they do not take unequal costs into account. In order to facilitate the analysis and design, this paper proposes an exponential distortion measure to discuss the following special case.

We assume that the data is digital and ignore the storage formats and standards in concrete application environments. On its application fields, it may be useful in some scenarios with counting systems, such as finance, or medicine, as the general merchandise. Let the description of the raw data

a_{i}

be

L_{i}

bits, and

a_{i} = \sum_{j = 0}^{L_{i} - 1} b_{j} \times r^{j}

where r is radix (

r > 1

). Actually, the radix represents the base of the system in practical application, such as

r = 2

in a binary system. In particular,

L_{i}

will approach the infinite number if

a_{i}

is an arbitrary real number. When the storage size is still not enough after the lossless conventional data compression, there is only

l_{i}

bits assigned to it in order to compress data further based on the message importance. For convenience, the smaller

(L_{i} - l_{i})

numbers are discarded in this process. When restoring the compressed data, the discarded digits are set to the same pre-specified number or random numbers in the actual system. Let

b_{j}^{*}

be the

(j + 1)

-th discarded digit for

j = 0, 1, \dots, L_{i} - l_{i} - 1

, and assume that

b_{j}^{*}

is a random number in

{0, \dots, r - 1}

. In this case, the compressed data is

{\hat{a}}_{i} = \sum_{j = L_{i} - l_{i}}^{L_{i} - 1} b_{j} \times r^{j} + \sum_{j = 0}^{L_{i} - l_{i} - 1} b_{j}^{*} \times r^{j}

. As a result, the absolute error is

| a_{i} - {\hat{a}}_{i} |

, which meets

\begin{matrix} | a_{i} - {\hat{a}}_{i} | = | \sum_{j = 0}^{L_{i} - l_{i} - 1} (b_{j} - b_{j}^{*}) \times r^{j} | \leq r^{L_{i} - l_{i}} - 1 . \end{matrix}

(3)

When

l_{i} = 0

, which means there is no information stored, the supremum of absolute error reaches the maximum and it is

| a_{i} - {\hat{a}}_{i} | \leq r^{L_{i}} - 1

. In order to better weigh the different costs, we define the relative error by normalizing the absolute error to the interval

[0, 1]

based on the above maximum absolute error

r^{L_{i}} - 1

. Moreover, we adopt the supremum of this relative error as the distortion measure

D_{f}

, which is given by

\begin{matrix} D_{f} (S a_{i}, S {\hat{a}}_{i}) = D_{f} (L_{i}, l_{i}) = \frac{r^{L_{i} - l_{i}} - 1}{r^{L_{i}} - 1} . \end{matrix}

(4)

In particular, we obtain

D_{f} (L_{i}, L_{i}) = 0

and

D_{f} (L_{i}, 0) = 1

. Moreover, it is easy to check that

0 \leq D_{f} (L_{i}, l_{i}) \leq 1

and

D_{f} (L_{i}, l_{i})

decreases with the increasing of

l_{i}

. In fact,

D_{f}

can be regarded as the percentage of data value loss in this case. Thus, the weighted reconstruction error in Equation (1) represents the total cost in data storage based on the loss degree.

In this stored procedure, the compression rate is

(\sum_{i = 1}^{n} p_{i} l_{i}) / (\sum_{i = 1}^{n} p_{i} L_{i})

, and the total saving storage size is

\sum_{i = 1}^{n} p_{i} (L_{i} - l_{i}) K

. Actually, K denotes the number of data, and it is extremely big due to the sharply increasing data deluge in the era of big data. Therefore, although

(L_{i} - l_{i})

is not always large, the saving storage size is still exceedingly substantial since K is exceedingly large.

Furthermore, to simplify the comparisons under different conditions, the weighted reconstruction error is also normalized to the relative weighted reconstruction error (RWRE). In fact, the RWRE characterizes the relative total cost in the data compression, and it is given by

\begin{matrix} D_{r} (x, W) = D_{r} (W, L, l) = \frac{D (x, W)}{max_{l_{i}} D (x, W)} = \frac{\sum_{i = 1}^{n} p_{i} W_{i} D_{f} (L_{i}, l_{i})}{\sum_{i = 1}^{n} p_{i} W_{i}} = \frac{\sum_{i = 1}^{n} p_{i} W_{i} \frac{r^{L_{i} - l_{i}} - 1}{r^{L_{i}} - 1}}{\sum_{i = 1}^{n} p_{i} W_{i}}, \end{matrix}

(5)

where

L = {L_{1}, \dots, L_{n}}

and

l = {l_{1}, \dots, l_{n}}

.

2.3. Problem Formulation

2.3.1. General Storage System

In fact, the actual storage size of each data after the compression can then be expressed as

\sum_{i = 1}^{n} p_{i} l_{i}

. For each given maximum available storage space constraint

\sum_{i = 1}^{n} p_{i} l_{i} \leq T

, where T denotes the maximum available average storage size, we shall optimize the storage resources allocation strategy of this system by minimizing the RWRE, which can be expressed as

\begin{matrix} P_{1} : min_{l_{i}} & D_{r} (x, W) \end{matrix}

(6)

\begin{matrix} s . t . & \sum_{i = 1}^{n} p_{i} l_{i} \leq T \end{matrix}

(6a)

\begin{matrix} 0 \leq l_{i} \leq L_{i} for i = 1, 2, \dots, n . \end{matrix}

(6b)

The storage systems, which can be characterized by Problem

P_{1}

, are referred to as the general storage system.

Remark 1.

In fact, this paper focuses on allocating resources by category with taking message importance into account, while the conventional source coding searches the shortest average description length of a random variable.

2.3.2. Ideal Storage System

In practice, the storage size of raw data is usually assigned to be the same for ease of use. Thus, we mainly consider the case where the original storage size of each data is the same, and use L to denote it (i.e.,

L_{i} = L

for

i = 1, 2, \dots, n

). As a result, we have

\begin{matrix} min_{l_{i}} D_{r} (x, W) = \frac{r^{L} min_{l_{i}} \sum_{i = 1}^{n} p_{i} W_{i} r^{- l_{i}}}{(r^{L} - 1) \sum_{i = 1}^{n} p_{i} W_{i}} - \frac{1}{r^{L} - 1} . \end{matrix}

(7)

Thus, the problem

P_{1}

can be rewritten as

\begin{matrix} P_{2} : min_{l_{i}} & \sum_{i = 1}^{n} p_{i} W_{i} r^{- l_{i}} \end{matrix}

(8)

\begin{matrix} s . t . & \sum_{i = 1}^{n} p_{i} l_{i} \leq T \end{matrix}

(8a)

\begin{matrix} 0 \leq l_{i} \leq L for i = 1, 2, \dots, n . \end{matrix}

(8b)

For convenience, we use the ideal storage system to represent the storage systems, which can be described by Problem

P_{2}

. Moreover, we will mainly focus on the characteristics of the solutions in Problem

P_{2}

in this paper.

2.3.3. Quantification Storage System

A quantification storage system quantizes and stores the real data acquired from sensors in the real world. The data is usually a real number, which requires an infinite number of bits to describe it accurately. That is, the original storage size of each class approaches the infinite number, (i.e.,

L_{i} = L \to + \infty

for

i = 1, 2, \dots, n

), in this case. As a result, the RWRE can be rewritten as

\begin{matrix} D_{r} (x, W) = lim_{L \to \infty} \{\frac{\sum_{i = 1}^{n} p_{i} W_{i} r^{- l_{i}}}{(1 - r^{- L}) \sum_{i = 1}^{n} p_{i} W_{i}} - \frac{1}{r^{L} - 1}\} = \frac{\sum_{i = 1}^{n} p_{i} W_{i} r^{- l_{i}}}{\sum_{i = 1}^{n} p_{i} W_{i}} . \end{matrix}

(9)

Therefore, the problem

P_{1}

in this case is reduced to

\begin{matrix} P_{3} : min_{l_{i}} & \sum_{i = 1}^{n} p_{i} W_{i} r^{- l_{i}} \end{matrix}

(10)

\begin{matrix} s . t . & \sum_{i = 1}^{n} p_{i} l_{i} \leq T \end{matrix}

(10a)

\begin{matrix} l_{i} \geq 0 for i = 1, 2, \dots, n . \end{matrix}

(10b)

3. Optimal Allocation Strategy with Limited Storage Space

In this section, we shall first solve the problem

P_{1}

and give the solutions. In fact, the solutions provide the optimal storage space allocation strategy for digital data on the best effort in minimizing the relative weighted reconstruction error (RWRE) when the total available storage size is limited. Then, the problem

P_{2}

will be solved, the solutions of which characterize the optimal storage space allocation strategy with the same original storage size. Moreover, we shall also discuss the solutions in the case where the original storage size of each class approaches the infinite number by studying the problem

P_{3}

.

3.1. Optimal Allocation Strategy in General Storage System

Theorem 1.

For a storage system with probability distribution

(p_{1}, p_{2}, \dots, p_{n})

,

L_{i}

is the storage size of the raw data of the class i for

i = 1, 2, \dots, n

. For a given maximum available average storage size T (

0 \leq T \leq \sum_{i = 1}^{n} p_{i} L_{i}

), when the radix is r (

r > 1

), the solution of Problem

P_{1}

is given by

l_{i} = \{\begin{matrix} 0 & if l_{i} < 0, \\ \frac{ln (ln r) + ln W_{i} - ln (1 - r^{- L_{i}}) - ln λ^{*}}{ln r} & if 0 \leq l_{i} \leq L_{i}, \\ L_{i} & if l_{i} > L_{i}, \end{matrix}

(11)

where

λ^{*}

is chosen so that

\sum_{i = 1}^{n} p_{i} l_{i} = T

.

Proof.

By means of Lagrange multipliers and Karush–Kuhn–Tucher conditions, when ignoring the constant

\sum_{i = 1}^{n} p_{i} W_{i}

, we set up the functional

J = \sum_{i = 1}^{n} p_{i} W_{i} \frac{r^{L_{i} - l_{i}} - 1}{r^{L_{i}} - 1} + λ^{*} (\sum_{i = 1}^{n} p_{i} l_{i} - T) + μ_{1} (l_{1} - L_{1}) + \dots + μ_{n} (l_{n} - L_{n}) .

(12)

Differentiating with respect to

l_{i}

and setting the derivative to zero, we have

\begin{matrix} \frac{\partial J}{\partial l_{i}} = - p_{i} W_{i} \frac{r^{- l_{i}}}{1 - r^{- L_{i}}} + λ^{*} p_{i} + μ_{i} = 0 & for i = 1, 2, \dots, n \end{matrix}

(13)

\begin{matrix} \sum_{i = 1}^{n} p_{i} l_{i} - T = 0 \end{matrix}

(13a)

\begin{matrix} μ_{i} (l_{i} - L_{i}) = 0 & for i = 1, 2, \dots, n \end{matrix}

(13b)

\begin{matrix} l_{i} - L_{i} \leq 0 & for i = 1, 2, \dots, n \end{matrix}

(13c)

\begin{matrix} μ_{i} \geq 0 & for i = 1, 2, \dots, n \end{matrix}

(13d)

\begin{matrix} l_{i} \geq 0 & for i = 1, 2, \dots, n \end{matrix}

(13e)

Hence, we obtain

\begin{matrix} l_{i} = \frac{ln p_{i} + ln (ln r) + ln W_{i} - ln (1 - r^{- L_{i}}) - ln (λ^{*} p_{i} + μ_{i})}{ln r} . \end{matrix}

(14)

First, it is easy to check that Equations (13b)–(13d) hold when

μ_{i} = 0

and

l_{i} \leq L_{i}

. Hence, we have

\begin{matrix} l_{i} = \frac{ln (ln r) + ln W_{i} - ln (1 - r^{- L_{i}}) - ln λ^{*}}{ln r} . \end{matrix}

(15)

Second, if

l_{i}

in Equation (14) is larger than

L_{i}

, we will have

μ_{i} > 0

and

l_{i} = L_{i}

due to Equations (13b)–(13d).

Third, if

l_{i} < 0

, we will let

l_{i} = 0

according to Equation (13e).

Moreover,

λ^{*}

is chosen so that

\sum_{i = 1}^{n} p_{i} l_{i} = T

due to Equation (13a).

Therefore, based on the discussion above, we get Equation (11) in order to ensure

0 \leq l_{i} \leq L_{i}

. □

Remark 2.

Let

\tilde{N}

be the number of

l_{i}

which meets

0 \leq l_{i} \leq L_{i}

and

{I_{j}, j = 1, 2, \dots, \tilde{N}}

is part of the sequence of

{1, 2, \dots, N}

which satisfies

0 \leq ln (ln r) + ln W_{I_{j}} - ln (1 - r^{- L_{I_{j}}}) - ln λ^{*} \leq L_{I_{j}} ln r

. Furthermore,

{T_{j}, j = 1, 2, \dots, {\tilde{N}}_{L}}

is used to denote the part of the sequence of

{1, 2, \dots, N}

which satisfies

ln (ln r) + ln W_{T_{j}} - ln (1 - r^{- L_{T_{j}}}) - ln λ^{*} > L_{T_{j}} ln r

.

Substituting Equation (11) in the constraint

\sum_{i = 1}^{n} p_{i} l_{i} = T

, we have

\begin{matrix} ln λ^{*} = ln ln r + \frac{\sum_{j = 1}^{\tilde{N}} p_{I_{j}} ln W_{I_{j}} - \sum_{j = 1}^{\tilde{N}} p_{I_{j}} ln (1 - r^{- L_{I_{j}}}) - ln r (T - \sum_{j = 1}^{{\tilde{N}}_{L}} p_{T_{j}} L_{T_{j}})}{\sum_{j = 1}^{\tilde{N}} p_{I_{j}}} . \end{matrix}

(16)

Hence, for

0 \leq l_{i} \leq L

, we obtain

\begin{matrix} l_{i} = \frac{T - \sum_{j = 1}^{{\tilde{N}}_{L}} p_{T_{j}} L_{T_{j}}}{\sum_{j = 1}^{\tilde{N}} p_{I_{j}}} + \frac{ln W_{i}}{ln r} - \frac{\sum_{j = 1}^{\tilde{N}} p_{I_{j}} ln W_{I_{j}}}{ln r \sum_{j = 1}^{\tilde{N}} p_{I_{j}}} + \frac{\sum_{j = 1}^{\tilde{N}} p_{I_{j}} ln (1 - r^{- L_{I_{j}}})}{ln r \sum_{j = 1}^{\tilde{N}} p_{I_{j}}} . \end{matrix}

(17)

In fact, T,

p_{i}

, r,

L_{i}

are usually constraints for a given storage system, and therefore

l_{i}

is only determined by the second and the third items on the right side of Equation (17), which means the storage size depends on the message importance and the probability distribution of class for the given available storage size.

Remark 3.

Since the actual compressed storage size

l_{i}^{*}

must be an integer, the actual storage size allocation strategy is

\begin{matrix} l_{i}^{*} = min ({⌊\frac{T - \sum_{j = 1}^{{\tilde{N}}_{L}} p_{T_{j}} L_{T_{j}}}{\sum_{j = 1}^{\tilde{N}} p_{I_{j}}} + \frac{ln W_{i}}{ln r} - \frac{\sum_{j = 1}^{\tilde{N}} p_{I_{j}} ln W_{I_{j}}}{ln r \sum_{j = 1}^{\tilde{N}} p_{I_{j}}} + \frac{\sum_{j = 1}^{\tilde{N}} p_{I_{j}} ln (1 - r^{- L_{I_{j}}})}{ln r \sum_{j = 1}^{\tilde{N}} p_{I_{j}}}⌋}^{+}, L_{i}), \end{matrix}

(18)

where

{(x)}^{+}

is equal to x when

x \geq 0

, and it is zero when

x < 0

. In addition,

⌊ x ⌋

is the largest integer smaller than or equal to x.

3.2. Optimal Allocation Strategy in Ideal Storage System

Then, we pay attention to the case where the original storage size of each data is the same. Based on Theorem 1, we get the following corollary in the ideal storage system.

Corollary 1.

For a storage system with probability distribution

(p_{1}, p_{2}, \dots, p_{n})

, the original storage size of each class is the same, which is given by

L_{i} = L

for

i = 1, 2, \dots, n

. For a given maximum available average storage size T (

0 \leq T \leq L

), when the radix is r (

r > 1

), the solution of Problem

P_{2}

is given by

l_{i} = \{\begin{matrix} 0 & i f l_{i} < 0, \\ \frac{ln (ln r) + ln W_{i} - ln λ}{ln r} & i f 0 \leq l_{i} \leq L, \\ L & i f l_{i} > L, \end{matrix}

(19)

where λ is chosen so that

\sum_{i = 1}^{n} p_{i} l_{i} = T

.

Proof.

Let

λ = λ^{*} (1 - r^{- L})

and

L_{i} = L

for

i = 1, 2, \dots, n

. Substituting them in Equation (11), we find that

l_{i}

in this case can be rewritten as Equation (19). □

Substituting Equation (19) in the constraint

\sum_{i = 1}^{n} p_{i} l_{i} = T

, we obtain

\begin{matrix} ln λ = ln ln r + \frac{\sum_{j = 1}^{\tilde{N}} p_{I_{j}} ln W_{I_{j}} - ln r (T - T_{N_{L}})}{\sum_{j = 1}^{\tilde{N}} p_{I_{j}}}, \end{matrix}

(20)

where

\tilde{N}

,

{\tilde{N}}_{L}

,

I_{j}

,

T_{j}

is still given by Remark 2 with letting

λ = λ^{*} (1 - r^{- L})

. In addition,

T_{N_{L}} = \sum_{j = 1}^{{\tilde{N}}_{L}} p_{T_{j}} L

. Hence, for

0 \leq l_{i} \leq L

, we obtain

\begin{matrix} l_{i} = \frac{T - T_{N_{L}}}{\sum_{j = 1}^{\tilde{N}} p_{I_{j}}} + \frac{ln W_{i}}{ln r} - \frac{\sum_{j = 1}^{\tilde{N}} p_{I_{j}} ln W_{I_{j}}}{ln r \sum_{j = 1}^{\tilde{N}} p_{I_{j}}} . \end{matrix}

(21)

Remark 4.

Since the actual compressed storage size

l_{i}^{*}

must be an integer, the actual storage size allocation strategy is

\begin{matrix} l_{i}^{*} = min ({⌊\frac{T - T_{N_{L}}}{\sum_{j = 1}^{\tilde{N}} p_{I_{j}}} + \frac{ln W_{i}}{ln r} - \frac{\sum_{j = 1}^{\tilde{N}} p_{I_{j}} ln W_{I_{j}}}{ln r \sum_{j = 1}^{\tilde{N}} p_{I_{j}}}⌋}^{+}, L) . \end{matrix}

(22)

Remark 5.

When

\tilde{N} = n

,

0 \leq l_{i} \leq L

always holds for

1 \leq i \leq n

, and the actual storage size is given by

\begin{matrix} l_{i}^{*} = ⌊T + \frac{ln W_{i} - \sum_{i = 1}^{n} p_{i} ln W_{i}}{ln r}⌋ . \end{matrix}

(23)

In order to illustrate the geometric interpretation of this algorithm, let

\begin{matrix} β = \frac{ln ln r - ln λ}{ln r} . \end{matrix}

(24)

Hence, the optimal storage size can be simplified to

\begin{matrix} l_{i} = & \{\begin{matrix} 0, if β - \frac{ln (1 / W_{i})}{ln r} < 0 . \\ β - \frac{ln (1 / W_{i})}{ln r} if 0 \leq β - \frac{ln (1 / W_{i})}{ln r} \leq L . \\ L, if β - \frac{ln (1 / W_{i})}{ln r} > L . \end{matrix} \end{matrix}

(25)

The monotonicity of optimal storage size with respect to importance weight is discussed in the following theorem.

Theorem 2.

Let

(p_{1}, p_{2}, \dots, p_{n})

be a probability distribution and

W = W_{1}, \dots, W_{n}

be importance weights. L and r are fixed positive integers (

r > 1

). The solution of Problem

P_{2}

meets:

l_{i} \geq l_{j}

if

W_{i} > W_{j}

for

\forall i, j \in {1, 2, \dots, n}

.

Proof.

Refer to the Appendix A. □

This gives rise to a kind of restrictive water-filling, which is presented in Figure 2. Choose a constant

β

so that

\sum_{i = 1}^{n} p_{i} l_{i} = T

. The storage size depends on the difference between

β

and

\frac{ln (1 / W_{i})}{ln r}

. In Figure 2, we obtain that

β

characterizes the height of water surface, and

\frac{ln (1 / W_{i})}{ln r}

determines the bottom of the pool. Actually, no storage space is assigned to the data when this difference is less than zero. When the difference is in the interval

[0, L]

, this difference is exactly the storage size. Furthermore, the storage size will be truncated to L bits if the difference is larger than L. Compared with the conventional water-filling, the lowest height of the bottom of the pool is constricted in this restrictive water-filling.

Remark 6.

The restrictive water-filling in Figure 2 is summarized as follows.

For the data with extremely small message importance, $\frac{ln (1 / W_{i})}{ln r}$ is so large that the bottom of the pool is above the water surface. Thus, the storage size of this kind of data is zero.
For the data with small message importance, $\frac{ln (1 / W_{i})}{ln r}$ is large, and therefore the bottom of the pool is high. Thus, the storage size of this kind of data is small.
For the data with large message importance, $\frac{ln (1 / W_{i})}{ln r}$ is small, and therefore the bottom of the pool is low. Thus, the storage size of this kind of data is large.
For the data with extremely large message importance, $\frac{ln (1 / W_{i})}{ln r}$ is so small that the bottom of the pool is constricted in order to truncate the storage size to L.

Thus, this optimal storage space allocation strategy is a high efficient adaptive storage allocation algorithm for the fact that it can make rational use of all the storage space according to message importance to minimize the RWRE.

This solution can be gotten by means of the recursive algorithm in practice, which is shown in Algorithm 1, where we define an auxiliary function as

\begin{matrix} f (i, W, P, L, T, r, K_{min}, K_{max}) & = \{\begin{matrix} L & if 1 \leq i < K_{min} . \\ \frac{T - \sum_{j = 1}^{K_{min} - 1} p_{j} L}{\sum_{j = K_{min}}^{K_{max}} p_{j}} + \frac{ln W_{i}}{ln r} - \frac{\sum_{j = K_{min}}^{K_{max}} p_{j} ln W_{j}}{ln r \sum_{j = K_{min}}^{K_{max}} p_{j}} & if K_{min} \leq i \leq K_{max} . \\ 0 & if K_{max} < i \leq n . \end{matrix} \end{matrix}

(26)

3.3. Optimal Allocation Strategy in Quantification Storage System

Corollary 2.

For a given maximum available average storage size T (

T \geq 0

), when probability distribution is

(p_{1}, p_{2}, \dots, p_{n})

and the radix is r (

r > 1

), the solution of Problem

P_{3}

is given by

\begin{matrix} l_{i} = {(\frac{ln (ln r) + ln W_{i} - ln λ}{ln r})}^{+}, \end{matrix}

(27)

where λ is chosen so that

\sum_{i = 1}^{n} p_{i} l_{i} = T

.

Proof.

Let

L \to \infty

in Corollary 1, the solutions in Equation (19) can be simplified to Equation (27). □

In fact, the optimal storage space allocation strategy in this case can be seen as a kind of water-filling, which gets rid of the constraint on the lowest height of the bottom of the pool.

Algorithm 1 Storage Space Allocation Algorithm

Require:

The message importance,

W = {W_{i}, i = 1, 2, \dots, n}

(Sort it to satisfy

W_{1} \geq W_{2} \geq \dots \geq W_{n}

)

The probability distribution of source,

P = {p_{i}, i = 1, 2, \dots, n}

The original storage size, L and

L = {L_{i} = L, i = 1, 2, \dots, n} = {L, \dots, L}

The maximum available average storage size, T

The radix, r

The auxiliary variables,

K_{min}, K_{max}

(Let

K_{min} = 1, K_{max} = n

as the original values)

Ensure:

The compressed storage size,

l = {l_{i}, i = 1, \dots, n}

Denote this recursive algorithm as

ϕ (W, P, L, T, r, K_{min}, K_{max})

1:: ${l_{i}}^{'} \leftarrow f (i, W, P, L, T, r, K_{min}, K_{max})$ for $i = 1, \dots, n$ ⊳ See Equation (26)
2:: if $\forall t \in {1, \dots, n}$ such that $0 \leq l_{t}^{'} \leq L$ and $\sum_{i = 1}^{n} p_{i} l_{i}^{'} = T$
3:: $l_{i} \leftarrow {l_{i}}^{'}$ for $i = 1, \dots, n$
4:: else if $K_{max} > K_{min}$
5:: $l^{(1)} \leftarrow ϕ (W, P, L, T, r, K_{min}, K_{max} - 1)$ (Make a recursive call with $K_{max} \leftarrow K_{max} - 1$ )
6:: $ϵ^{(1)} = D_{r} (W, L, l^{(1)})$ (Calculate the RWRE with $l^{(1)}$ ) ⊳ See Equation (5)
7:: $l^{(2)} \leftarrow ϕ (W, P, L, T^{'}, r, K_{min} + 1, K_{max})$ (Make a recursive call with $K_{min} \leftarrow K_{min} + 1$ )
8:: $ϵ^{(2)} = D_{r} (W, L, l^{(2)})$ (Calculate the RWRE with $l^{(2)}$ ) ⊳ See Equation (5)
9:: if $ϵ^{(1)} \leq ϵ^{(2)}$
10:: $l \leftarrow l^{(1)}$
11:: else
12:: $l \leftarrow l^{(2)}$
13:: end
14:: else
15:: $l_{K_{min}} \leftarrow (T - \sum_{i = 1}^{K_{min} - 1} p_{i} L) / p_{K_{min}}$ , $l_{i} \leftarrow L$ when $i < K_{min}$ , $l_{i} \leftarrow 0$ when $i > K_{min}$
16:: end
17:: end
18:: return $l$

4. Property of Optimal Storage Strategy Based on Message Importance Measure

Considering that the ideal storage system can capture most of the characteristics of the lossy compression storage model in this paper, we focus on the properties of optimal storage strategy in it in this section for ease of analysis. Specifically, we ignore rounding and adopt

l_{i}

in Equation (19) as the optimal storage size of the i-th class in this section. Moreover, we focus on a special kind of the importance weight. Namely, the message importance measure (MIM) is adopted as the importance weight in this part, for the fact that it can effectively measure the cost of the error in data reconstruction in the small-probability event scenarios [23,31].

4.1. Normalized Message Importance Measure

In order to facilitate comparison under different parameters, the normalized MIM is used and we can write

\begin{matrix} W_{i} = \frac{e^{ϖ (1 - p_{i})}}{\sum_{j = 1}^{n} e^{ϖ (1 - p_{j})}}, \end{matrix}

(28)

where

ϖ

is the importance coefficient, whose selection is discussed in Reference [37]. In fact, the MIM characterizes the user’s subjective concern degree of data, and

ϖ

is an indicator that reflects the user preferences. In practice, the values of

ϖ

depend on the user preferences. For instance, when

ϖ

is positive, the user only focuses on the small-probability events, while the large-probability events are focused on when

ϖ

is negative [33].

Actually, it is easy to check that

0 \leq W_{i} \leq 1

for

i = 1, 2, \dots, n

. Moreover, it is obvious that the sum of those in all event classes is one.

4.1.1. Positive Importance Coefficient

For positive importance coefficient (i.e.,

ϖ > 0

), let

α_{1} = arg min_{i} p_{i}

and assume

p_{α_{1}} < p_{i}

for

i \neq α_{1}

. The derivative of it with respect to the importance coefficient is

\begin{matrix} \frac{\partial W_{α_{1}}}{\partial ϖ} = \frac{\sum_{j = 1}^{n} (p_{j} - p_{α_{1}}) e^{ϖ (2 - p_{α_{1}} - p_{j})}}{{(\sum_{j = 1}^{n} e^{ϖ (1 - p_{j})})}^{2}} \geq 0 . \end{matrix}

(29)

Therefore,

W_{α_{1}}

increases as

ϖ

increases. In particular, as

ϖ

approaches positive infinity, we have

\begin{matrix} lim_{ϖ \to + \infty} W_{α_{1}} = & lim_{ϖ \to + \infty} \frac{e^{ϖ (1 - p_{α_{1}})}}{\sum_{j = 1}^{n} e^{ϖ (1 - p_{j})}} \end{matrix}

(30)

\begin{matrix} = & lim_{ϖ \to + \infty} \frac{e^{ϖ (1 - p_{α_{1}})}}{e^{ϖ (1 - p_{α_{1}})} + \sum_{j \neq α_{1}} e^{ϖ (1 - p_{j})}} \end{matrix}

(30a)

\begin{matrix} = & lim_{ϖ \to + \infty} \frac{1}{1 + \sum_{j \neq α_{1}} e^{ϖ (p_{α_{1}} - p_{j})}} \end{matrix}

(30b)

\begin{matrix} = & 1 . \end{matrix}

(30c)

Obviously,

lim_{ϖ \to + \infty} W_{i} = 0

for

i \neq α_{1}

.

Remark 7.

As ϖ approaches positive infinity, the importance weight with the smallest probability is one and others are all zero, which means only a fraction of data almost contains all of the critical information that users care about in the viewpoint of this message importance.

4.1.2. Negative Importance Coefficient

When the importance coefficient is negative (i.e.,

ϖ < 0

), let

α_{2} = arg max_{i} p_{i}

and assume

p_{α_{2}} > p_{i}

for

i \neq α_{2}

. Its derivative with respect to the importance coefficient is

\begin{matrix} \frac{\partial W_{α_{2}}}{\partial ϖ} = \frac{\sum_{j = 1}^{n} (p_{j} - p_{α_{2}}) e^{ϖ (2 - p_{α_{2}} - p_{j})}}{{(\sum_{j = 1}^{n} e^{ϖ (1 - p_{j})})}^{2}} \leq 0 . \end{matrix}

(31)

Therefore,

W_{α_{2}}

decreases as

ϖ

increases. In particular, as

ϖ

approaches negative infinity, we have

\begin{matrix} lim_{ϖ \to - \infty} W_{α_{2}} = & lim_{ϖ \to - \infty} \frac{e^{ϖ (1 - p_{α_{2}})}}{\sum_{j = 1}^{n} e^{ϖ (1 - p_{j})}} \end{matrix}

(32)

\begin{matrix} = & lim_{ϖ \to - \infty} \frac{e^{ϖ (1 - p_{α_{2}})}}{e^{ϖ (1 - p_{α_{2}})} + \sum_{j \neq α_{2}} e^{ϖ (1 - p_{j})}} \end{matrix}

(32a)

\begin{matrix} = & lim_{ϖ \to - \infty} \frac{1}{1 + \sum_{j \neq α_{2}} e^{ϖ (p_{α_{2}} - p_{j})}} \end{matrix}

(32b)

\begin{matrix} = & 1 . \end{matrix}

(32c)

Obviously,

lim_{ϖ \to - \infty} W_{i} = 0

for

i \neq α_{2}

.

Remark 8.

As ϖ approaches negative infinity, the importance weight with the biggest probability is one and others are all zero. If the biggest probability is far from 1, the majority of message importance can also be included in those data with the highest probability, and the corresponding part of the data is not too much.

4.2. Optimal Storage Size for Each Class

Assume

\tilde{N} = n

and ignore rounding, due to Equation (23), we obtain

\begin{matrix} l_{i} = & T + \frac{ln \frac{e^{ϖ (1 - p_{i})}}{\sum_{j = 1}^{n} e^{ϖ (1 - p_{j})}} - \sum_{i = 1}^{n} p_{i} ln \frac{e^{ϖ (1 - p_{i})}}{\sum_{j = 1}^{n} e^{ϖ (1 - p_{j})}}}{ln r} \\ = & T + \frac{ϖ}{ln r} (γ_{p} - p_{i}), \end{matrix}

(33)

where

γ_{p}

is an auxiliary variable and it is given by

\begin{matrix} γ_{p} = \sum_{i = 1}^{n} p_{i}^{2} . \end{matrix}

(34)

In fact, it is a functional of the minus Rényi entropy of order two, i.e.,

γ_{p} = e^{- H_{2} (P)}

where

H_{2} (P)

is the Rényi entropy

H_{α} (\cdot)

when

α = 2

[38]. Furthermore, we have the following lemma on

γ_{p}

.

Lemma 1.

Let

(p_{1}, p_{2}, \dots, p_{n})

be a probability distribution, then we have

\begin{matrix} \frac{1}{n} & \leq γ_{p} \leq 1, \end{matrix}

(35)

\begin{matrix} - \frac{1}{4} & \leq γ_{p} - p_{i} \leq 1 . \end{matrix}

(35a)

Proof.

Refer to Appendix B. □

Thus, we find

l_{i} > T

if

(1 / n - p_{i}) ϖ > 0

. Furthermore, we obtain

l_{i} = T

when

p_{i} = γ_{p}

.

Theorem 3.

Let

(p_{1}, p_{2}, \dots, p_{n})

be a probability distribution and

W_{i} = e^{ϖ (1 - p_{i})} / \sum_{j = 1}^{n} e^{ϖ (1 - p_{j})}

be the importance weight. The optimal storage sizes in the ideal storage system have the following properties:

(1): $l_{i} \geq l_{j}$ if $p_{i} < p_{j}$ for $\forall i, j \in {1, 2, \dots, n}$ when $ϖ > 0$ ;
(2): $l_{i} \leq l_{j}$ if $p_{i} < p_{j}$ for $\forall i, j \in {1, 2, \dots, n}$ when $ϖ < 0$ .

Proof.

Refer to Appendix C. □

Remark 9.

As noted in [31], the data with smaller probability usually possesses larger importance when

ϖ > 0

, while the data with larger probability usually possesses larger importance when

ϖ < 0

. Therefore, this optimal allocation strategy makes rational use of all the storage space by providing more storage size for the paramount data and less storage size for the insignificance data. It agrees with the intuitive idea, which is that users generally are more concerned about the data that they need rather than the whole data itself.

Lemma 2.

Let

(p_{1}, p_{2}, \dots, p_{n})

be a probability distribution and r be radix. L and T are positive integers, and

T < L

. If ϖ meets

0 \leq T + ϖ (γ_{p} - p_{i}) / ln r \leq L

, then we have

\tilde{N} = n

.

Proof.

According to Equation (33a) and constraint

0 \leq T + ϖ (γ_{p} - p_{i}) / ln r \leq L

, we obtain

0 \leq l_{i} \leq L

for

\forall i \in {1, 2, \dots, n}

. In this case,

\tilde{N} = n

. □

In fact, when

ϖ \geq 0

, due to Equation (33a) and Lemma 1, we obtain

\begin{matrix} 0 \leq T - \frac{ϖ}{4 ln r} \leq T + \frac{ϖ (γ_{p} - p_{i})}{ln r} & \leq T + \frac{ϖ}{ln r} \leq L . \end{matrix}

(36)

Similarly, when

ϖ < 0

, we have

\begin{matrix} 0 \leq T + \frac{ϖ}{ln r} \leq T + \frac{ϖ (γ_{p} - p_{i})}{ln r} & \leq T - \frac{ϖ}{4 ln r} \leq L . \end{matrix}

(37)

According to Equations (36) and (37), we find

\tilde{N} = n

always holds if

max (4 ln r (T - L), - T / ln r) \leq ϖ \leq min (4 T ln r, ln r (L - T))

.

4.3. Relative Weighted Reconstruction Error

For convenience, we also use

D (x, ϖ)

to denote the relative weighted reconstruction error (RWRE)

D (x, W)

. Due to Equation (7), we have

\begin{matrix} D_{r} (x, ϖ) = \frac{1}{r^{L} - 1} (\frac{\sum_{i = 1}^{n} p_{i} e^{ϖ (1 - p_{i})} r^{L - l_{i}}}{\sum_{i = 1}^{n} p_{i} e^{ϖ (1 - p_{i})}} - 1) . \end{matrix}

(38)

If the maximum available average storage size T is zero, then we will have

l_{i} = 0

for

i = 1, 2, \dots, n

. In this case,

D_{r} (x, ϖ) = 1

. On the contrary,

D_{r} (x, ϖ) = 0

when

l_{i} = L

for

i = 1, 2, \dots, n

.

Theorem 4.

D_{r} (x, ϖ)

has the following properties:

(1): $D_{r} (x, ϖ)$ is monotonically decreasing with ϖ in $(0, + \infty)$ ;
(2): $D_{r} (x, ϖ)$ is monotonically increasing with ϖ in $(- \infty, 0)$ ;
(3): $D_{r} (x, ϖ) \leq D_{r} (x, 0) = (r^{L - T} - 1) / (r^{L} - 1)$ .

Proof.

Refer to Appendix D. □

Remark 10.

As shown in Remark 7 and Remark 8, the overwhelming majority of important information will gather in a fraction of data as the importance coefficient increases to negative/positive infinity. Therefore, we can heavily reduce the storage space with extremely small of RWRE with the increasing of the absolute value of the importance coefficient. In fact, this special characteristic of weight reflects the effect of users’ preferences. That is, it is beneficial for data compression that the data that users care about is highly clustered. Moreover, when

ϖ = 0

, all the importance weights are the same, which leads to the incompressibility, in a sense, for the fact that there is no special characteristic of weight for users to make rational use of storage space.

In the following part of this section, we will discuss the cases where

0 \leq T + ϖ (γ_{p} - p_{i}) / ln r \leq L

for

i = 1, \dots, n

, which means all

l_{i}

can be given by Equation (33a) and

n = \tilde{N}

due to Lemma 2. In this case, substituting Equation (33a) in Equation (7), the RWRE is

\begin{matrix} D_{r} (x, ϖ) = \frac{e^{ϖ (1 - γ_{p})} r^{Δ}}{(r^{L} - 1) \sum_{i = 1}^{n} p_{i} e^{ϖ (1 - p_{i})}} - \frac{1}{r^{L} - 1}, \end{matrix}

(39)

where

Δ = L - T

, which characterizes the average compressed storage space of each data.

Since

0 \leq T + ϖ (γ_{p} - p_{i}) / ln r \leq L

, we have

\begin{matrix} \{\begin{matrix} \frac{ϖ (γ_{p} - p_{α_{1}})}{ln r} \leq L - T \leq L + \frac{ϖ (γ_{p} - p_{α_{2}})}{ln r} & if ϖ \geq 0 . \\ \frac{ϖ (γ_{p} - p_{α_{2}})}{ln r} \leq L - T \leq L + \frac{ϖ (γ_{p} - p_{α_{1}})}{ln r} & if ϖ < 0 . \end{matrix} \end{matrix}

(40)

Hence,

\begin{matrix} δ_{1} \leq D_{r} (x, ϖ) \leq δ_{2} . \end{matrix}

(41)

where

\begin{matrix} δ_{1} = \{\begin{matrix} \frac{e^{ϖ (1 - p_{α_{1}})}}{(r^{L} - 1) \sum_{i = 1}^{n} p_{i} e^{ϖ (1 - p_{i})}} - \frac{1}{r^{L} - 1} & if ϖ \geq 0, \\ \frac{e^{ϖ (1 - p_{α_{2}})}}{(r^{L} - 1) \sum_{i = 1}^{n} p_{i} e^{ϖ (1 - p_{i})}} - \frac{1}{r^{L} - 1} & if ϖ < 0, \end{matrix} \end{matrix}

(42)

and

\begin{matrix} δ_{2} = \{\begin{matrix} \frac{e^{ϖ (1 - p_{α_{2}})} r^{L}}{(r^{L} - 1) \sum_{i = 1}^{n} p_{i} e^{ϖ (1 - p_{i})}} - \frac{1}{r^{L} - 1} & if ϖ \geq 0 . \\ \frac{e^{ϖ (1 - p_{α_{1}})} r^{L}}{(r^{L} - 1) \sum_{i = 1}^{n} p_{i} e^{ϖ (1 - p_{i})}} - \frac{1}{r^{L} - 1} & if ϖ < 0 . \end{matrix} \end{matrix}

(43)

Theorem 5.

For a given storage system with the probability distribution of data sequence

P = (p_{1}, p_{2}, \dots, p_{n})

, let L, r be fixed positive integers (

r > 1

), and ϖ meets

0 \leq T + ϖ (γ_{p} - p_{i}) / ln r \leq L

for

i = 1, 2, \dots, n

. For the given least upper bound of the RWRE δ (

δ_{1} \leq δ \leq δ_{2}

where

δ_{1}

and

δ_{1}

is defined in Equation (41)), the maximum average compressed storage size of each data

Δ^{*} (δ)

is given by

\begin{matrix} Δ^{*} (δ) & = \frac{ln (1 + δ (r^{L} - 1)) + L (ϖ, P) - ϖ + ϖ γ_{p}}{ln r} \end{matrix}

(44)

\begin{matrix} \geq \frac{ln (1 + δ (r^{L} - 1))}{ln r}, \end{matrix}

(44a)

where

L (ϖ, P) = ln \sum_{i = 1}^{n} p_{i} e^{ϖ (1 - p_{i})}

, and the equality of Equation (44a) holds if the probability distribution of the data sequence is a uniform distribution or the importance coefficient is zero.

Proof.

It is easy to check that

\tilde{N} = n

according to Lemma 2 for the fact that

0 \leq T + ϖ (γ_{p} - p_{i}) / ln r \leq L

. Let

D (x, ϖ) \leq δ

. By means of Equation (39), we solve this inequality and obtain

\begin{matrix} Δ \leq \frac{ln (1 + δ (r^{L} - 1)) + L (ϖ, P) - ϖ + ϖ γ_{p}}{ln r} = Δ^{*} (δ), \end{matrix}

(45)

where

L (ϖ, P) = ln \sum_{i = 1}^{n} p_{i} e^{ϖ (1 - p_{i})}

. Then we have the following inequality:

\begin{matrix} Δ^{*} (δ) \overset{(a)}{\geq} \frac{ln (1 + δ (r^{L} - 1)) + ln e^{\sum_{i = 1}^{n} p_{i} ϖ (1 - p_{i})} - ϖ + ϖ γ_{p}}{ln r} = \frac{ln (1 + δ (r^{L} - 1))}{ln r}, \end{matrix}

where

(a)

follows from Jensen’s inequality. Since the exponential function is strictly convex, the equality holds only if

ϖ (1 - p_{i})

is constant everywhere, which means

(p_{1}, p_{2}, \dots, p_{n})

is a uniform distribution or the importance coefficient

ϖ

is zero. □

Remark 11.

In conventional source coding, the encoding length depends on the entropy of sequence, and a sequence is incompressible if its probability distribution is a uniform distribution [34]. In Theorem 5, the uniform distribution is also the worst case, since the system achieves the minimum compressed storage size. Although the focus is different, they both show that the uniform distribution is detrimental for compression.

Furthermore, taking

ϖ > 0

as an example, it is also noted that

\begin{matrix} Δ^{*} (δ) \leq Δ^{*} (δ_{2}) = L + \frac{ϖ (γ_{p} - p_{α_{2}})}{ln r} \leq L, \end{matrix}

(46)

for the fact that

γ_{p} \leq p_{α_{2}}

. In order to make

Δ^{*} (δ_{2})

approaches L,

γ_{p} - p_{α_{2}}

should be as close to zero as possible in the range where

0 \leq T + ϖ (γ_{p} - p_{i}) / ln r \leq L

for

i = 1, 2, \dots, n

holds.

When the importance coefficient is constant, for two probability distributions

P

and

Q

, if

L (ϖ, P) + ϖ γ_{p} > L (ϖ, Q) + ϖ γ_{q}

, then we will obtain

Δ^{*}

in

P

is larger than that in

Q

. In fact,

L (ϖ, P)

is defined as MIM in [30], and

γ_{p} = e^{- H_{2} (P)}

[38]. Thus, the maximum average compressed storage size of each data is under the control of MIM and Rényi entropy of order two. For typical small-probability event scenarios where there is an exceedingly small probability, the MIM is usually large, and

γ_{p}

is also not small simultaneously with big probability. Therefore,

Δ^{*} (δ)

is usually large in this case. As a result, much more compressed storage space can be saved in typical small-probability event scenarios while compared to that in uniform probability distribution. Namely, the data can be compressed by means of the characteristic of the typical small-probability events, which may help to improve the design of practical storage systems in big data.

5. Property of Optimal Storage Strategy Based on Non-Parametric Message Importance Measure

In this section, we define the importance weight based on the form of non-parametric message importance measure (NMIM) to characterize the relative weighted reconstruction error (RWRE) [23]. Then, the importance weight of i-th class in this section is given by

\begin{matrix} W_{i} = \frac{e^{(1 - p_{i}) / p_{i}}}{\sum_{j = 1}^{n} e^{(1 - p_{j}) / p_{j}}} . \end{matrix}

(47)

Due to Equation (22), the optimal storage size in the ideal storage system by this importance weight is given by

\begin{matrix} l_{i}^{*} & = min ({⌊\frac{T - T_{N_{L}}}{\sum_{j = 1}^{\tilde{N}} p_{I_{j}}} + \frac{1}{p_{i} ln r} - \frac{1}{ln r} - \frac{ln \sum_{j = 1}^{n} e^{(1 - p_{j}) / p_{j}}}{ln r} - \frac{\sum_{j = 1}^{\tilde{N}} (1 - p_{I_{j}} - p_{I_{j}} ln \sum_{j = 1}^{n} e^{(1 - p_{j}) / p_{j}})}{ln r \sum_{j = 1}^{\tilde{N}} p_{I_{j}}}⌋}^{+}, L) \\ = min ({⌊\frac{T - T_{N_{L}}}{\sum_{j = 1}^{\tilde{N}} p_{I_{j}}} + \frac{1}{p_{i} ln r} - \frac{\tilde{N}}{ln r \sum_{j = 1}^{\tilde{N}} p_{I_{j}}}⌋}^{+}, L) . \end{matrix}

(48)

For two probabilities

p_{i}

and

p_{j}

, if

p_{i} < p_{j}

, then we will have

W_{i} > W_{j}

. In this case, we obtain

l_{i}^{*} \geq l_{j}^{*}

according to Theorem 2.

Assume

\tilde{N} = n

and ignore rounding, due to Equation (23), we obtain

\begin{matrix} l_{i} = T + \frac{1}{p_{i} ln r} - \frac{n}{ln r} . \end{matrix}

(49)

Let

0 \leq l_{i} \leq L

in this case, we find

\begin{matrix} \frac{1}{n + (L - T) ln r} \leq p_{i} \leq & \{\begin{matrix} \frac{1}{n - T ln r} if n > T ln r . \\ 1 if n \leq T ln r . \end{matrix} \end{matrix}

(50)

Generally, this constraint does not invariably hold, and therefore we usually do not have

\tilde{N} = n

.

For the quantification storage system as shown in

P_{3}

in this section, if the maximum available average storage size satisfies

n \leq T ln r

, an arbitrary probability distribution will make Equation (50) hold, which means

\tilde{N} = n

. In this case, substituting Equation (47) in Equation (9), the RWRE can be expressed as

\begin{matrix} D_{r} (x, W) = \frac{\sum_{i = 1}^{n} p_{i} e^{(1 - p_{i}) / p_{i}} r^{- l_{i}}}{\sum_{i = 1}^{n} p_{i} e^{(1 - p_{i}) / p_{i}}} = e^{n - 1 - L (P)} r^{- T}, \end{matrix}

(51)

where

L (P) = ln \sum_{i = 1}^{n} p_{i} e^{(1 - p_{i}) / p_{i}}

, which is defined as the NMIM [23].

It is noted

D_{r} (x, W) = 0

as T approaches positive infinity. Since

n \leq T ln r

, we find

D_{r} (x, W) \leq r^{- 1 - L (P)}

. Furthermore, since that

L (P) \geq n - 1

according to Reference [23], we obtain

D_{r} (x, W) \leq r^{- n}

. Let

D_{r} (x, W) \leq δ

, we have

\begin{matrix} T \geq \frac{n - 1 - L (P) - ln δ}{ln r} . \end{matrix}

(52)

Obviously, for a given RWRE, the minimum average required storage size for the quantification storage system decreases with increasing of

L (P)

. That is to say, the data with large NMIM will get a large compression ratio. In fact, the NMIM in the typical small-probability event scenarios is generally large according to Reference [23]. Thus, this compression strategy is effective in the typical small-probability event scenarios.

Furthermore, due to Reference [23],

L (P) \approx ln p_{α_{1}} e^{\frac{1 - p_{α_{1}}}{p_{α_{1}}}}

when

p_{α_{1}}

is small. Hence, for small

p_{α_{1}}

, the RWRE in this case can be reduced to

\begin{matrix} D_{r} (x, W) \approx \frac{e^{n - 1 / p_{α_{1}}}}{p_{α_{1}}} r^{- T} . \end{matrix}

(53)

It is easy to check that

D_{r} (x, W)

increases as

p_{α_{1}}

increases in this case.

6. Numerical Results

We now present numerical results to validate the developed theoretical results in this paper. In this section, we assume all the data is digital, and the exponential distortion measure

D_{f}

in Equation (4) is adopted. Furthermore, the relative weighted reconstruction error (RWRE) in Equation (5) is used to characterize the change of total data value before and after the lossy compression based on data value, which represents the total cost of this compression.

6.1. Success Rate of Compressed Storage in General Storage System

This part presents the success rate of compressed storage in the general storage system to show the effectiveness of our method, and it considers the following scenario of data storage.

There are eight categories of data, and the probability distribution of the data category randomly generates in each storing. Moreover, each category of data gets a randomly generated data value, which is in the interval

(0, 100)

. After the lossless conventional data compression, where the data value is assumed to be unchanged, the storage size of each data is a randomly generated number between 10 and 30. The maximum available average storage size T is also varying from 10 to 30 bits. It is considered as a successful data compression when the compressed storage size is not larger than the maximum available storage size. However, when the amount of data to be stored is extremely big, the compressed storage size may still not be enough after the lossless conventional data compression. In this case, the optimal storage space allocation strategy in this paper can be used if a certain amount of data value is allowed to be lost. As a contrast, we also divide up the maximum available storage space equally among all categories of data on the basis of the lossless conventional data compression, which is presented as the equal allocation strategy in Figure 3. Assume that it can also be seen as a successful data compression if the RWRE in this process is less than or equal to the specified amount that can be acceptable by users. For each value of T, this numerical simulation is repeated 10,000 times. The success rate of compressed storage is given by

N_{s}

/10,000, where

N_{s}

is the number of times the successful data compression happens in all the experiments.

Figure 3 shows the relationship between the success rate of compressed storage and the maximum available average storage size T. It is observed that the success rate of conventional data compression is almost one when the available storage size is large (

T > 26

bits). However, when the available storage size is not big (

T < 26

bits), the success rate of conventional data compression decreases with decreasing of the maximum available average storage size until it is zero. Furthermore, when a certain amount of data value is allowed to be lost, the success rate can be improved on the basis of the lossless conventional data compression for the same T. More important, the success rate of the optimal allocation strategy is the largest among these three considered compression strategies. For the same maximum available average storage size, the success rates of the optimal allocation strategy and the equal allocation strategy increase as the maximum acceptable RWRE increases. In fact, the success rate of equal allocation strategy is exceedingly close to that of conventional data compression when the maximum acceptable RWRE is small (e.g.,

10^{- 7}

). In general, if a small quantity of total data value is allowed to be lost, our optimal allocation strategy will further improve the performance of data compression on the basis of the lossless conventional data compression.

6.2. Optimal Storage Size Based on Message Importance Measure in Ideal Storage System

We illustrate the characteristics of optimal storage size based on message importance measure (MIM) in an ideal storage system in this part by means of a broken line graph, which demonstrates the theoretical analyses in Section 4.2. For ease of illustrating, we ignore rounding and the optimal storage size of the i-th class is given by

l_{i}

in Equation (19).

The broken line graph of the optimal storage size is shown in Figure 4, when the probability distribution is

P = (0.03, 0.07, 0.1395, 0.2205, 0.25, 0.29)

. In fact,

0.2205 \approx γ_{P}

and

1 / n \approx 0.167

. The maximum available average storage size T is 4 bits, and the original storage size of each data is 10 bits. The importance coefficients are given by

ϖ_{1} = - 35, ϖ_{2} = - 10, ϖ_{3} = 0, ϖ_{4} = 10, ϖ_{5} = 35

, respectively. Some observations can be obtained. When

ϖ > 0

, the optimal storage size of the i-th class decreases with the increasing of its probability. On the contrary, the optimal storage size of the i-th class increases as its probability increases when

ϖ < 0

. In addition, the optimal storage size is invariably equal to T (

T = 4

) when

ϖ = 0

. Furthermore,

l_{i}

increases as

ϖ

increases for

i = 1, 2, 3

, and it decreases with

ϖ

for

i = 5, 6

. For importance coefficients with small absolute values (

ϖ_{2}, ϖ_{3}, ϖ_{4}

),

0 < l_{i} < L

holds for

i = 1, 2, \dots, 6

, and

l_{4}

is extremely close to T (

T = 4

).

6.3. The Property of the RWRE Based on MIM in Ideal Storage System

Then we focus on the properties of the RWRE. In this part, we will give several numerical results as quintessential examples to validate our theoretical founds in Section 4.3. Without loss of generality, let the original storage size of each data be 16 bits, and the maximum available average storage size T is varying from 0 to 8 bits. Although any range of T can be used, we choose this range to make the results more clear. Moreover, the normalized MIM is adopted to describe the data value that represents the subjective assessment of users.

Figure 5 and Figure 6 both present the relationship between the RWRE and the maximum available average storage size with the probability distribution

(0.031, 0.052, 0.127, 0.208, 0.582)

. In fact, the compression ratio is given by

T / L

in this case, and the RWRE represents the total cost, which measures the compression distortion from the viewpoint of data value. Therefore, these two figures essentially show the trade-off between the compression ratio and the total compressed storage cost.

Figure 5 focuses on the error of RWRE by rounding number with different values of the importance coefficient

ϖ

(

ϖ = - 20, 0, - 12, 20

). In Figure 5, the RWRE

D_{r}

is acquired by substituting Equation (19) in Equation (38), while the RWRE

D_{r}^{*}

is obtained by substituting Equation (22) in Equation (38). In this figure,

D_{r}^{*}

has a tiered descent as the available average storage size increases, while

D_{r}

monotonically decreases with increasing in the available average storage size. Figure 5 also shows that

D_{r}

is always less than or equal to

D_{r}^{*}

and they are very close to each other for the same importance coefficient, which means that

D_{r}

can be used as the lower bound of

D_{r}^{*}

to reflect the characteristics of

D_{r}^{*}

.

Furthermore, some other observations can be obtained in Figure 6. For the same T, the RWRE increases as

ϖ

increases when

ϖ < 0

, while the RWRE decreases with increasing of

ϖ

when

ϖ > 0

. In addition, the RWRE is the largest when

ϖ = 0

. These results prove the validity of Theorem 4. It is also observed that the RWRE always decreases with increasing of T for given

ϖ

. Furthermore, for any importance coefficient, the RWRE will be 1 if available average storage size is zero. Generally, there is a trade-off between the RWRE and the available storage size, and the results in this paper propose an alternative lossy compression strategy based on message importance.

Then let the importance coefficient

ϖ

be five and the maximum available average storage size T be varying from two to eight bits. Although any range of T can be used, we choose this range to make the results more clear. In addition, the original storage size is still 16 bits. Furthermore, the average compressed storage space of each data is given by

Δ = L - T

. In this case, Figure 7 shows that the relationship between the RWRE and the average compressed storage space of each data

Δ

for different probability distributions. In fact, it can also be seen as reflecting the relationship between the total compressed storage cost and the average saving storage size. The probability distributions and some auxiliary variables are listed in Table 2. In fact, we take these five probability distributions as examples, and

L (ϖ, P) + ϖ e^{- H_{2} (P)}

of them decreases monotonously. Obviously, all probability distributions satisfy

0 \leq T + ϖ (γ_{p} - p_{i}) / ln r \leq L

. It is observed that the RWRE always increases with increasing of

Δ

for a given probability distribution. Some other observations are also obtained. For the same

Δ

, the RWRE of uniform distribution is the largest all the time. Furthermore, if the RWRE is required to be less than a specified value, which is exceedingly common in actual systems in order to make the difference between the raw data and the stored data accepted, the maximum average compressed storage size of each data will increase with increasing of

L (ϖ, P) + ϖ e^{- H_{2} (P)}

. As an example, when the RWRE is required to be smaller than

0.01

, the maximum average compressed storage size of

P_{1}

,

P_{2}

,

P_{3}

,

P_{4}

,

P_{5}

of each data is

11.85

,

10.97

,

9.99

,

9.73

,

9.36

, respectively. In particular, the maximum average compressed storage size of each data in a uniform distribution is the smallest, which suggests the data with a uniform distribution is incompressible from the perspective of optimal storage space allocation based on the data value.

6.4. The Property of the RWRE Based on Non-Parametric MIM in a Quantification Storage System

Figure 8 presents the relationship between the RWRE and the maximum available average storage size T for different probability distributions in a quantification storage system, which proves the validity of theoretical results in Section 5. In this part, we use the normalized non-parametric message importance measure (NMIM) to characterize the data value that represents the subjective assessment of users. The probability distributions and some auxiliary variables are listed in Table 3.

Some observations can be obtained. First, the RWRE always decreases with the increasing of the maximum available average storage size for a given probability distribution, and there is a trade-off between the RWRE and the maximum available average storage size. When the maximum available average storage size is small (

T < n / ln r

), the RWRE decreases largely compared to the case where T is large. In addition, when the maximum available average storage size is large (

T > n / ln r

), the difference between these RWREs remains the same at the logarithmic Y-axis. In fact, according to Equation (51), this difference between two probabilities in this figure is the difference of NMIM divided by

log 10

. As an example, the difference between

P_{1}

and

P_{4}

in this figure is 30, which satisfies this conclusion for the fact that

(L (P_{1}) - L (P_{4})) / log 10 \approx 30

. Moreover, the RWRE in

P_{1}

is very close to that in

P_{2}

, and the minimum probabilities in these two probability distributions are the same, i.e.,

p_{α_{1}} = 0.007

. It suggests that the data with the same minimum probability will have the same compression performance no matter how the distribution changes, if the minimum probability is very small. In addition, it is also observed that the RWRE decreases as NMIM

L (P)

increases for the same T, which means this compression strategy is more effective in the large NMIM cases.

7. Conclusions

In this paper, we focused on the problem of lossy compression storage when the storage size is still not enough after conventional lossless data compression. By means of the message importance to characterize the data value, we define the weighted reconstruction error to describe the total cost in data storage. Based on it, we presented an optimal storage space allocation strategy for digital data from the perspective of data value by the exponential distortion measure, which pursues the least error with respect to the data value for restricted storage size. We gave the solutions by a kind of restrictive water-filling, which presented an alternative way to design an effective storage space allocation strategy. In fact, this optimal allocation strategy prefers to provide more storage size for crucial event classes in order to make the rational use of resources, which agrees with the individuals’ cognitive mechanism.

Then, we presented the properties of this strategy based on the message importance measure (MIM) detailedly. It is obtained that there is a trade-off between the relative weighted reconstruction error (RWRE) and available storage size. In fact, if a small quantity of loss of total data value is accepted by users, this strategy will further compress data based on the conventional methods of data compression. Moreover, the compression performance of this storage system improves as the absolute value of importance coefficient increases. This is due to the fact that a fraction of data can contain the overwhelming majority of useful information that exerts a tremendous fascination on users as the importance coefficient approaches negative/positive infinity, which suggests that the users’ interest is highly-concentrated. On the other hand, the probability distribution of event classes also has an effect on the compression results. When the useful information is only highly enriched in a small portion of raw data naturally from the viewpoint of users, such as the small-probability event scenarios, it is obvious that we can compress the data greatly with the aid of these characteristics of distribution. Furthermore, the properties of storage size and RWRE based on non-parametric MIM were also discussed. In fact, the RWRE in the data with a uniform distribution was invariably the largest in any case. Therefore, this paper harbors the idea that the data with uniform information distribution is incompressible from the perspective of optimal storage size allocation based on data value, which is consistent with the well known conclusion in information theory in a sense.

Proposing a more general distortion measure between the raw data and the compressed data, which no longer only applies to digital data, and using it to acquire the high-efficiency lossy data compression systems from the perspective of message importance are of our future interests. In addition, we are also interested in using this optimal storage space allocation strategy in a real application with a real data stream in the future.

Author Contributions

Conceptualization, S.L., R.S. and P.F.; Formal analysis, S.L., R.S., Z.Z. and P.F.; methodology, S.L., R.S. and P.F.; writing—original draft, S.L.; writing—review and editing, S.L., R.S., Z.Z. and P.F. All authors have read and agreed to the published version of the manuscript.

Funding

The authors are very thankful for the support of the National Natural Science Foundation of China (NSFC) No. 61771283 and Beijing Natural Science Foundation (4202030).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

IoT	Internet of things
MIM	message importance measure
RWRE	relative weighted reconstruction error
NMIM	non-parametric message importance measure

Appendix A. Proof of Theorem 2

In fact, Equation (25) can be rewritten as

\begin{matrix} l_{i} = & \{\begin{matrix} 0 if W_{i} < e^{- β ln r} . \\ β - \frac{- ln W_{i}}{ln r} if e^{- β ln r} \leq W_{i} \leq e^{(L - β) ln r} . \\ L if W_{i} > e^{(L - β) ln r} . \end{matrix} \end{matrix}

(A1)

When

W_{i} > W_{j}

, we have

\begin{matrix} l_{i} - l_{j} = & \{\begin{matrix} 0 if p_{i} > e^{(L - β) ln r}, p_{j} > e^{(L - β) ln r} . \\ L - β - \frac{ln W_{j}}{ln r} if p_{i} > e^{(L - β) ln r}, e^{- β ln r} \leq p_{j} \leq e^{(L - β) ln r} . \\ L if p_{i} > e^{(L - β) ln r}, p_{j} < e^{- β ln r} . \\ \frac{ln W_{i} - ln W_{j}}{ln r} if e^{- β ln r} \leq p_{i} \leq e^{(L - β) ln r}, e^{- β ln r} \leq p_{j} \leq e^{(L - β) ln r} . \\ β - \frac{- ln W_{i}}{ln r} if e^{- β ln r} \leq p_{i} \leq e^{(L - β) ln r}, p_{j} < e^{- β ln r} . \\ 0 if p_{i} < e^{- β ln r}, p_{j} < e^{- β ln r} . \end{matrix} \end{matrix}

(A2)

Due to Equation (8b), we obtain that

0 \leq β - \frac{- ln W_{i}}{ln r} \leq L

, and therefore

L - β - \frac{ln W_{j}}{ln r} \geq 0

. Furthermore, it is easy to check that

\frac{ln W_{i} - ln W_{j}}{ln r}

since that

W_{i} > W_{j}

. Thus,

l_{i} - l_{j} \geq 0

if

W_{i} > W_{j}

for

\forall i, j \in {1, 2, \dots, n}

. The proof is completed.

Appendix B. Proof of Lemma 1

(1) For

γ_{p}

, it is noted that

\begin{matrix} \sum_{i = 1}^{n} p_{i}^{2} = \frac{1}{n} (\sum_{i = 1}^{n} p_{i}^{2} \sum_{i = 1}^{n} 1^{2}) \geq \frac{1}{n} {(\sum_{i = 1}^{n} p_{i})}^{2} = \frac{1}{n}, \end{matrix}

(A3)

where the equality holds only if

(p_{1}, p_{2}, \dots, p_{n})

is a uniform distribution. Moreover,

\begin{matrix} \sum_{i = 1}^{n} p_{i}^{2} \leq \sum_{i = 1}^{n} p_{i} = 1, \end{matrix}

(A4)

where the equality holds only if there is only

p_{t} = 1

(

t \in {1, 2, \dots, n}

) and

p_{k} = 0

for

k \neq t

.

(2) For

γ_{p} - p_{i}

, we have

\sum_{i = 1}^{n} p_{i}^{2} - p_{i} \leq \sum_{i = 1}^{n} p_{i}^{2} \leq 1

. We have equality if and only if

p_{t} = 1

and

p_{i} = 0

for

i \neq t

. Therefore, we only need to check

\sum_{i = 1}^{n} p_{i}^{2} - p_{i} \geq - 1 / 4

.

First, if

n = 1

, we obtain

\sum_{i = 1}^{n} p_{i}^{2} - p_{i} = 0

.

Second, if

n = 2

, we obtain

\sum_{i = 1}^{n} p_{i}^{2} - p_{i} = 2 {(p_{1} - 3 / 4)}^{2} - 1 / 8

. It is easy to check that

\sum_{i = 1}^{n} p_{i}^{2} - p_{i} \geq - 1 / 8

.

Third, if

n > 3

, we use the method of Lagrange multipliers. Let

\begin{matrix} J (p) = \sum_{j = 1}^{n} p_{j}^{2} - p_{i} - λ (\sum_{j = 1}^{n} p_{j} - 1) . \end{matrix}

(A5)

Setting the derivative to 0, we obtain

\begin{matrix} 2 p_{j}^{*} - λ & = 0 for j \neq i \end{matrix}

(A6)

\begin{matrix} 2 p_{j}^{*} - 1 - λ & = 0 for j = i . \end{matrix}

(A6a)

Substituting

p_{j}^{*}

in the constraint

\sum_{j = 1}^{n} p_{j}^{*} = 1

, we have

\begin{matrix} \frac{λ (n - 1)}{2} + \frac{λ + 1}{2} = 1 . \end{matrix}

(A7)

Hence, we find

λ = 1 / n

and

\begin{matrix} p_{j}^{*} = & \{\begin{matrix} \frac{n + 1}{2 n} if j = i, \\ \frac{1}{2 n} if j \neq i . \end{matrix} \end{matrix}

(A8)

In this case, we get

\begin{matrix} \sum_{j = 1}^{n} p_{j}^{2} - p_{i} = \frac{n - 1}{4 n^{2}} + \frac{{(n + 1)}^{2}}{4 n^{2}} - \frac{n + 1}{2 n} = \frac{- n^{2} + n}{4 n^{2}} \geq - \frac{1}{4} . \end{matrix}

(A9)

Thus, Lemma 1 is proved.

Appendix C. Proof of Theorem 3

(1) First, let

p_{i} < p_{j}

when

ϖ > 0

. It is noted that

\begin{matrix} W_{i} = \frac{e^{ϖ (1 - p_{i})}}{\sum_{k = 1}^{n} e^{ϖ (1 - p_{k})}} > \frac{e^{ϖ (1 - p_{j})}}{\sum_{k = 1}^{n} e^{ϖ (1 - p_{k})}} = W_{j} . \end{matrix}

(A10)

Therefore, we find

l_{i} \geq l_{j}

since that

W_{i} > W_{j}

, due to Theorem 2.

(2) Second, let

p_{i} < p_{j}

when

ϖ < 0

. It is noted that

\begin{matrix} W_{i} = \frac{e^{ϖ (1 - p_{i})}}{\sum_{k = 1}^{n} e^{ϖ (1 - p_{k})}} < \frac{e^{ϖ (1 - p_{j})}}{\sum_{k = 1}^{n} e^{ϖ (1 - p_{k})}} = W_{j} . \end{matrix}

(A11)

Therefore, we find

l_{i} \leq l_{j}

since that

W_{i} < W_{j}

, due to Theorem 2. The proof is completed.

Appendix D. Proof of Theorem 4

We define an auxiliary function as

\begin{matrix} f (ϖ) = \frac{\sum_{i = 1} p_{i} e^{ϖ (1 - p_{i})} r^{- l_{i}}}{\sum_{j = 1}^{n} p_{j} e^{ϖ (1 - p_{j})}} . \end{matrix}

(A12)

According to Equation (38), it is noted that the the monotonicity of

D_{r} (x, ϖ)

with respect to

ϖ

is the same with that of

f (ϖ)

.

Without loss of generality, let

l_{i}

of

p_{i}

be

\begin{matrix} l_{i} = & \{\begin{matrix} L if i = 1, 2, \dots, t_{1}, \\ \frac{ln (ln r) + ln W_{i} - ln λ}{ln r} if i = t_{1} + 1, \dots, t_{2}, \\ 0 if i = t_{2} + 1, t_{2} + 2, \dots, n, \end{matrix} \end{matrix}

(A13)

where

λ

is given by Equation (20) where

{T_{j}, j = 1, \dots, {\tilde{N}}_{L}} = {1, 2, \dots, t_{1}}

and

{I_{j}, j = 1, \dots, \tilde{N}} = {t_{1} + 1, \dots, t_{2}}

.

The derivative of

l_{i}

with respect to

ϖ

is given by

\begin{matrix} l_{i}^{'} = & \{\begin{matrix} \frac{\sum_{k = t_{1} + 1}^{t_{2}} p_{k} (p_{k} - p_{i})}{ln r (\sum_{k = t_{1} + 1}^{t_{2}} p_{k})} if i = t_{1} + 1, \dots, t_{2} . \\ 0 else . \end{matrix} \end{matrix}

(A14)

Hence,

\begin{matrix} f^{'} (ϖ) & = \frac{\sum_{i} \sum_{j} p_{i} p_{j} e^{ϖ (2 - p_{i} - p_{j})} r^{- l_{i}} (p_{j} - p_{i} - l_{i}^{'} ln r)}{{(\sum_{j} p_{j} e^{ϖ (1 - p_{j})})}^{2}} = \frac{F_{1} + F_{2}}{{(\sum_{j} p_{j} e^{ϖ (1 - p_{j})})}^{2}}, \end{matrix}

(A15)

where

F_{1} = \sum_{i} \sum_{j} p_{i} p_{j} e^{ϖ (2 - p_{i} - p_{j})} r^{- l_{i}} (p_{j} - p_{i})

and

F_{2} = \sum_{i} \sum_{j} p_{i} p_{j} e^{ϖ (2 - p_{i} - p_{j})} r^{- l_{i}} (- l_{i}^{'} ln r)

.

(1) When

ϖ > 0

, we have

\begin{matrix} F_{1} & = \sum_{p_{j} < p_{i}} p_{i} p_{j} e^{ϖ (2 - p_{i} - p_{j})} r^{- l_{i}} (p_{j} - p_{i}) + \sum_{p_{j} > p_{i}} p_{i} p_{j} e^{ϖ (2 - p_{i} - p_{j})} r^{- l_{i}} (p_{j} - p_{i}) \end{matrix}

(A16)

\begin{matrix} \leq \sum_{p_{j} < p_{i}} p_{i} p_{j} e^{ϖ (2 - p_{i} - p_{j})} r^{- l_{j}} (p_{j} - p_{i}) + \sum_{p_{j} > p_{i}} p_{i} p_{j} e^{ϖ (2 - p_{i} - p_{j})} r^{- l_{i}} (p_{j} - p_{i}) \end{matrix}

(A16a)

\begin{matrix} = \sum_{p_{j} < p_{i}} p_{i} p_{j} e^{ϖ (2 - p_{i} - p_{j})} r^{- l_{j}} (p_{j} - p_{i}) + \sum_{p_{i} > p_{j}} p_{i} p_{j} e^{ϖ (2 - p_{i} - p_{j})} r^{- l_{j}} (p_{i} - p_{j}) \end{matrix}

(A16b)

\begin{matrix} = \sum_{p_{j} < p_{i}} p_{i} p_{j} e^{ϖ (2 - p_{i} - p_{j})} r^{- l_{j}} (p_{j} - p_{i} + p_{i} - p_{j}) \end{matrix}

(A16c)

\begin{matrix} = 0 . \end{matrix}

(A16d)

In fact, if

p_{i} > p_{j}

, then we will have

l_{i} \leq l_{j}

due to Theorem 3. Thus,

r^{- l_{i}} (p_{j} - p_{i}) \leq r^{- l_{j}} (p_{j} - p_{i})

in this case. With taking

p_{i} p_{j} e^{ϖ (2 - p_{i} - p_{j})} \geq 0

into account, we have Equation (A16a). Furthermore, Equation (A16b) is obtained by exchanging the notation of subscript in the second item.

For

t_{1} < i \leq t_{2}

and

1 \leq j \leq n

, we have

\begin{matrix} F_{2} & = \sum_{i = t_{1} + 1}^{t_{2}} \sum_{j = 1}^{n} p_{i} p_{j} e^{ϖ (2 - p_{i} - p_{j})} r^{- l_{i}} (- l_{i}^{'} ln r) \end{matrix}

(A17)

\begin{matrix} = \sum_{i = t_{1} + 1}^{t_{2}} \sum_{j = 1}^{n} p_{i} p_{j} e^{ϖ (1 - p_{j}) - ln ln r + ln λ} (- l_{i}^{'} ln r) \end{matrix}

(A17a)

\begin{matrix} = \sum_{j = 1}^{n} (p_{j} B_{j} (\sum_{i = t_{1} + 1}^{t_{2}} p_{i} (- l_{i}^{'} ln r))) \end{matrix}

(A17b)

\begin{matrix} = \sum_{j = 1}^{n} (p_{j} B_{j} (\frac{\sum_{i = t_{1} + 1}^{t_{2}} p_{i}^{2} \sum_{k = t_{1} + 1}^{t_{2}} p_{k} - \sum_{k = t_{1} + 1}^{t_{2}} p_{k}^{2} \sum_{i = t_{1} + 1}^{t_{2}} p_{i}}{\sum_{k = t_{1} + 1}^{t_{2}} p_{k}})) \end{matrix}

(A17c)

\begin{matrix} = 0, \end{matrix}

(A17d)

where

B_{j} = \exp {ϖ (1 - p_{j}) - ln ln r + ln λ}

.

Based on the discussions above, we have

\begin{matrix} f^{'} (ϖ) & = \frac{F_{1} + F_{2}}{{(\sum_{i = 1}^{n} p_{i} e^{ϖ (1 - p_{i})})}^{2}} \leq 0 . \end{matrix}

(A18)

Since that

f^{'} (ϖ) \leq 0

when

ϖ > 0

,

D_{r} (x, ϖ)

is monotonically decreasing with

ϖ

in

(0, + \infty)

.

(2) Similarly, when

ϖ < 0

, if

0 < p_{j} < p_{i}

, then we will have

l_{i} > l_{j}

due to Theorem 3. Thus,

r^{- l_{i}} (p_{j} - p_{i}) \geq r^{- l_{j}} (p_{j} - p_{i})

in this case. With taking

p_{i} p_{j} e^{ϖ (2 - p_{i} - p_{j})} \geq 0

into account, we have

\begin{matrix} F_{1} & \geq \sum_{p_{j} < p_{i}} p_{i} p_{j} e^{ϖ (2 - p_{i} - p_{j})} r^{- l_{j}} (p_{j} - p_{i}) + \sum_{p_{j} > p_{i}} p_{i} p_{j} e^{ϖ (2 - p_{i} - p_{j})} r^{- l_{i}} (p_{j} - p_{i}) \end{matrix}

(A19)

\begin{matrix} = \sum_{p_{j} < p_{i}} p_{i} p_{j} e^{ϖ (2 - p_{i} - p_{j})} r^{- l_{j}} (p_{j} - p_{i}) + \sum_{p_{i} > p_{j}} p_{i} p_{j} e^{ϖ (2 - p_{i} - p_{j})} r^{- l_{j}} (p_{i} - p_{j}) \end{matrix}

(A19a)

\begin{matrix} = \sum_{p_{j} < p_{i}} p_{i} p_{j} e^{ϖ (2 - p_{i} - p_{j})} r^{- l_{j}} (p_{j} - p_{i} + p_{i} - p_{j}) \end{matrix}

(A19b)

\begin{matrix} = 0, \end{matrix}

(A19c)

where Equation (A19a) is obtained by exchanging the notation of subscript in the second item.

In addition,

F_{2}

is still given by Equation (A17), and

F_{2} = 0

. As a result,

f^{'} (ϖ) \geq 0

when

ϖ < 0

. Therefore

D_{r} (x, ϖ)

is monotonically increasing with

ϖ

in

(- \infty, 0)

.

(3) When

ϖ = 0

, the storage size

l_{i}

for

i = 1, 2, \dots, n

will be all equal to T, and therefore

D_{r} (x, 0) = (r^{L - T} - 1) / (r^{L} - 1)

. Based on the discussion in (1) and (2), we obtain

D_{r} (x, ϖ) \leq D_{r} (x, 0)

. The proof is completed.

References

Chen, M.; Mao, S.; Zhang, Y.; Leungm, V.C. Definition and features of big data. In Big Data: Related Technologies, Challenges and Future Prospects; Springer: New York, NY, USA, 2014; pp. 2–5. [Google Scholar]
Cai, H.; Xu, B.; Jiang, L.; Vasilakos, A. IoT-based big data storage systems in cloud computing: Perspectives and challenges. IEEE Internet Things J. 2017, 4, 75–87. [Google Scholar] [CrossRef]
Hu, H.; Wen, Y.; Chua, T.; Li, X. Toward scalable systems for big data analytics: A technology tutorial. IEEE Access 2014, 2, 652–687. [Google Scholar]
Dong, D.; Herbert, J. Content-aware partial compression for textual big data analysis in hadoop. IEEE Trans. Big Data 2018, 4, 459–472. [Google Scholar] [CrossRef]
Park, J.; Park, H.; Choi, Y. Data compression and prediction using machine learning for industrial IoT. In Proceedings of the IEEE International Conference on Information Networking (ICOIN), Chiang Mai, Thailand, 10–12 January 2018; pp. 818–820. [Google Scholar]
Geng, D.; Zhang, C.; Xia, C.; Xia, X.; Liu, Q.; Fu, X. Big data-based improved data acquisition and storage system for designing industrial data platform. IEEE Access 2019, 7, 44574–44582. [Google Scholar] [CrossRef]
Nalbantoglu, Ö.; Russell, D.; Sayood, K. Data compression concepts and algorithms and their applications to bioinformatics. Entropy 2010, 12, 34–52. [Google Scholar] [CrossRef] [Green Version]
Cao, X.; Liu, L.; Cheng, Y.; Shen, X. Towards energy-efficient wireless networking in the big data era: A survey. IEEE Commun. Surv. Tutor. 2017, 20, 303–332. [Google Scholar] [CrossRef]
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
Oohama, Y. Exponential strong converse for source coding with side information at the decoder. Entropy 2018, 20, 352. [Google Scholar] [CrossRef] [Green Version]
Pourkamali-Anaraki, F.; Becker, S. Preconditioned data sparsification for big data with applications to pca and k-means. IEEE Trans. Inf. Theory 2017, 63, 2954–2974. [Google Scholar] [CrossRef]
Aguerri, I.E.; Zaidi, A. Lossy compression for compute-and-forward in limited backhaul uplink multicell processing. IEEE Trans. Commun. 2016, 64, 5227–5238. [Google Scholar] [CrossRef] [Green Version]
Cui, T.; Chen, L.; Ho, T. Distributed distortion optimization for correlated sources with network coding. IEEE Trans. Commun. 2012, 60, 1336–1344. [Google Scholar] [CrossRef]
Ukil, A.; Bandyopadhyay, S.; Sinha, A.; Pal, A. Adaptive Sensor Data Compression in IoT systems: Sensor data analytics based approach. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 19–24 April 2015; pp. 5515–5519. [Google Scholar]
Zhong, J.; Yates, R.D.; Soljanin, E. Backlog-adaptive compression: Age of information. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 566–570. [Google Scholar]
Elkan, C. The foundations of cost-sensitive learning. In Proceedings of the the Seventeenth International Joint Conference on Artificial Intelligence, Seattle, WA, USA, 4–10 August 2001; pp. 973–978. [Google Scholar]
Zhou, Z.; Liu, X. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 2006, 18, 63–77. [Google Scholar] [CrossRef]
Lomax, S.; Vadera, S. A survey of cost-sensitive decision tree induction algorithms. ACM Comput. Surv. 2013, 45, 16:1–16:35. [Google Scholar] [CrossRef] [Green Version]
Masnick, B.; Wolf, J. On linear unequal error protection codes. IEEE Trans. Inf. Theory 1967, 3, 600–607. [Google Scholar] [CrossRef]
Sun, K.; Wu, D. Unequal error protection for video streaming using delay-aware fountain codes. In Proceedings of the IEEE International Conference on Communications (ICC), Paris, France, 21–25 May 2017; pp. 1–6. [Google Scholar]
Feldman, D.; Schmidt, M.; Sohler, C. Turning big data into tiny data: Constant-size coresets for k-means, pca and projective clustering. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 6–8 January 2013; pp. 1434–1453. [Google Scholar]
Tegmark, M.; Wu, T. Pareto-optimal data compression for binary classification tasks. Entropy 2020, 22, 7. [Google Scholar] [CrossRef] [Green Version]
Liu, S.; She, R.; Fan, P.; Letaief, K.B. Non-parametric message important measure: Storage code design and transmission planning for big data. IEEE Trans. Commun. 2018, 66, 5181–5196. [Google Scholar] [CrossRef]
Ivanchev, J.; Aydt, H.; Knoll, A. Information maximizing optimal sensor placement robust against variations of traffic demand based on importance of nodes. IEEE Trans. Intell. Transp. Syst. 2016, 17, 714–725. [Google Scholar] [CrossRef]
Kawanaka, T.; Rokugawa, S.; Yamashita, H. Information security in communication network of memory channel considering information importance. In Proceedings of the IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), Singapore, 10–13 December 2017; pp. 1169–1173. [Google Scholar]
Li, M.; Zuo, W.; Gu, S.; Zhao, D.; Zhang, D. Learning convolutional networks for content-weighted image compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 3214–3223. [Google Scholar]
Zhang, X.; Hao, X. Research on intrusion detection based on improved combination of K-means and multi-level SVM. In Proceedings of the IEEE International Conference on Communication Technology (ICCT), Chengdu, China, 27–30 October 2017; pp. 2042–2045. [Google Scholar]
Li, M. Application of cart decision tree combined with pca algorithm in intrusion detection. In Proceedings of the IEEE International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 24–26 November 2017; pp. 38–41. [Google Scholar]
Beasley, M.S.; Carcello, J.V.; Hermanson, D.R.; Lapides, P.D. Fraudulent financial reporting: Consideration of industry traits and corporate governance mechanisms. Account. Horiz. 2000, 14, 441–454. [Google Scholar] [CrossRef]
Fan, P.; Dong, Y.; Lu, J.; Liu, S. Message importance measure and its application to minority subset detection in big data. In Proceedings of the IEEE Globecom Workshops (GC Wkshps), Washington, DC, USA, 4–8 December 2016; pp. 1–6. [Google Scholar]
She, R.; Liu, S.; Wan, S.; Xiong, K.; Fan, P. Importance of small probability events in big data: Information measures, applications, and challenges. IEEE Access 2019, 7, 100363–100382. [Google Scholar] [CrossRef]
She, R.; Liu, S.; Fan, P. Recognizing Information Feature Variation: Message Importance Transfer Measure and Its Applications in Big Data. Entropy 2018, 20, 401. [Google Scholar] [CrossRef] [Green Version]
Liu, S.; Dong, Y.; Fan, P.; She, R.; Wan, S. Matching users’ preference under target revenue constraints in data recommendation systems. Entropy 2019, 21, 205. [Google Scholar] [CrossRef] [Green Version]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
Aggarwal, C.C. Data Classification: Algorithms and Applications; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
Salvador–Meneses, J.; Ruiz–Chavez, Z.; Garcia–Rodriguez, J. Compressed kNN: K-nearest neighbors with data compression. Entropy 2019, 21, 234. [Google Scholar] [CrossRef] [Green Version]
She, R.; Liu, S.; Dong, Y.; Fan, P. Focusing on a probability element: Parameter selection of message importance measure in big data. In Proceedings of the IEEE International Conference on Communications (ICC), Paris, France, 20–26 May 2017; pp. 1–6. [Google Scholar]
Van Erven, T.; Harremoës, P. Rényi divergence and kullback-leibler divergence. IEEE Trans. Inf. Theory 2014, 60, 3797–3820. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Pictorial representation of the system model.

Figure 2. Restrictive water-filling for optimal storage sizes.

Figure 3. The success rate of compressed storage versus the maximum available average storage size.

Figure 4. Broken line graph of optimal storage size with the probability distribution

(0.03, 0.07, 0.1395, 0.2205, 0.25, 0.29)

, for a given maximum available average storage size

T = 4

and original storage size

L = 10

.

Figure 4. Broken line graph of optimal storage size with the probability distribution

(0.03, 0.07, 0.1395, 0.2205, 0.25, 0.29)

, for a given maximum available average storage size

T = 4

and original storage size

L = 10

.

Figure 5. Relative weighted reconstruction error (RWRE)

D_{r} (x, ϖ)

versus maximum available average storage size T with the probability distribution

(0.031, 0.052, 0.127, 0.208, 0.582)

in the case of the value of importance coefficient

ϖ = - 20, 0, - 12, 20

.

D_{r}

is acquired by substituting Equation (19) in Equation (38), while

D_{r}^{*}

is obtained by substituting Equation (22) in Equation (38).

Figure 5. Relative weighted reconstruction error (RWRE)

D_{r} (x, ϖ)

versus maximum available average storage size T with the probability distribution

(0.031, 0.052, 0.127, 0.208, 0.582)

in the case of the value of importance coefficient

ϖ = - 20, 0, - 12, 20

.

D_{r}

is acquired by substituting Equation (19) in Equation (38), while

D_{r}^{*}

is obtained by substituting Equation (22) in Equation (38).

Figure 6. RWRE

D_{r} (x, ϖ)

versus maximum available average storage size T with the probability distribution

(0.031, 0.052, 0.127, 0.208, 0.582)

in the case of the value of importance coefficient

ϖ = - 30, - 20, - 10, 0, 10, 20, 30

.

Figure 6. RWRE

D_{r} (x, ϖ)

versus maximum available average storage size T with the probability distribution

(0.031, 0.052, 0.127, 0.208, 0.582)

in the case of the value of importance coefficient

ϖ = - 30, - 20, - 10, 0, 10, 20, 30

.

Figure 7. RWRE

D_{r} (x, ϖ)

vs. average compressed storage size of each data

Δ

with importance coefficient

ϖ = 5

.

Figure 7. RWRE

D_{r} (x, ϖ)

vs. average compressed storage size of each data

Δ

with importance coefficient

ϖ = 5

.

Figure 8. RWRE versus maximum available average storage size T.

Table 1. Notations.

Notation	Description
$x = x_{1}, x_{2}, \dots, x_{k}, \dots, x_{K}$	The sequence of raw data
$\hat{x} = {\hat{x}}_{1}, {\hat{x}}_{2}, \dots, {\hat{x}}_{k}, \dots, {\hat{x}}_{K}$	The sequence of compressed data
$S_{x}$	The storage size of x
$D_{f} (S_{x 1}, S_{x 2})$	The distortion measure function between $S_{x 1}$ and $S_{x 2}$ in data reconstruction
n	The number of event classes
${a_{1}, a_{2}, \dots, a_{n}}$	The alphabet of raw data
${{\hat{a}}_{1}, {\hat{a}}_{2}, \dots, {\hat{a}}_{n}}$	The alphabet of compressed data
$W = {W_{1}, W_{2}, \dots, W_{n}}$	The error cost for the reconstructed data
$P = {p_{1}, p_{2}, \dots, p_{n}}$	The probability distribution of data class
$D (x, W)$	The weighted reconstruction error
$D_{r} (x, W), D_{r} (W, L, l)$	The relative weighted reconstruction error
$L = L_{1}, L_{2}, \dots, L_{n}$	The storage size of raw data
$l = l_{1}, l_{2}, \dots, l_{n}$	The storage size of compressed data
$l_{i}^{*}$	The round optimal storage size of the data belonging to the i-th class
T	The maximum available average storage size
$ϖ$	The importance coefficient
$γ_{p}$	$γ_{p} = \sum_{i = 1}^{n} p_{i}^{2}$
$α_{1}$ , $α_{2}$	$α_{1} = arg {min}_{i} p_{i}$ and $α_{2} = arg {max}_{i} p_{i}$
$L (ϖ, p)$	The message importance measure, which is given by $L (ϖ, p) = ln \sum_{i = 1}^{n} p_{i} e^{ϖ (1 - p_{i})}$
$Δ$	The average compressed storage size of each data, which is given by $Δ = L - T$
$Δ^{*} (δ)$	The maximum available $Δ$ for the given supremum of the RWRE $δ$
$L (P)$	The non-parametric message importance measure, which is given by $L (P) = ln \sum_{i = 1}^{n} p_{i} e^{(1 - p_{i}) / p_{i}}$

Table 2. The auxiliary variables in ideal storage system.

Variable	Probability Distribution	$ϖ (γ_{p} - p_{α_{1}}) / ln r$	$ϖ (γ_{p} - p_{α_{2}}) / ln r$	$L (ϖ, P) + ϖ e^{- H_{2} (P)}$
$P_{1}$	$(0.01, 0.02, 0.03, 0.04, 0.9)$	5.7924	−0.6276	6.7234
$P_{2}$	$(0.003, 0.007, 0.108, 0.132, 0.752)$	4.2679	−1.1350	6.1305
$P_{3}$	$(0.001, 0.001, 0.001, 0.001, 0.996)$	7.1487	−0.0287	5.4344
$P_{4}$	$(0.021, 0.086, 0.103, 0.378, 0.412)$	2.2367	−0.5838	5.2530
$P_{5}$	$(0.2, 0.2, 0.2, 0.2, 0.2)$	0	0	5

Table 3. The auxiliary variables in the quantification storage system.

Variable	Probability Distribution	$p_{α_{1}}$	$L (P)$
$P_{1}$	$(0.007, 0.24, 0.24, 0.24, 0.273)$	0.007	136.8953
$P_{2}$	$(0.007, 0.009, 0.106, 0.129, 0.749)$	0.007	136.8953
$P_{3}$	$(0.01, 0.02, 0.03, 0.04, 0.9)$	0.01	94.3948
$P_{4}$	$(0.014, 0.086, 0.113, 0.375, 0.412)$	0.014	66.1599
$P_{5}$	$(0.2, 0.2, 0.2, 0.2, 0.2)$	0.2	4.0000

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, S.; She, R.; Zhu, Z.; Fan, P. Storage Space Allocation Strategy for Digital Data with Message Importance. Entropy 2020, 22, 591. https://doi.org/10.3390/e22050591

AMA Style

Liu S, She R, Zhu Z, Fan P. Storage Space Allocation Strategy for Digital Data with Message Importance. Entropy. 2020; 22(5):591. https://doi.org/10.3390/e22050591

Chicago/Turabian Style

Liu, Shanyun, Rui She, Zheqi Zhu, and Pingyi Fan. 2020. "Storage Space Allocation Strategy for Digital Data with Message Importance" Entropy 22, no. 5: 591. https://doi.org/10.3390/e22050591

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Storage Space Allocation Strategy for Digital Data with Message Importance

Abstract

1. Introduction

2. System Model

2.1. Modeling Weighted Reconstruction Error Based on Message Importance

2.2. Modeling Distortion between the Raw Data and the Compressed Data

2.3. Problem Formulation

2.3.1. General Storage System

2.3.2. Ideal Storage System

2.3.3. Quantification Storage System

3. Optimal Allocation Strategy with Limited Storage Space

3.1. Optimal Allocation Strategy in General Storage System

3.2. Optimal Allocation Strategy in Ideal Storage System

3.3. Optimal Allocation Strategy in Quantification Storage System

4. Property of Optimal Storage Strategy Based on Message Importance Measure

4.1. Normalized Message Importance Measure

4.1.1. Positive Importance Coefficient

4.1.2. Negative Importance Coefficient

4.2. Optimal Storage Size for Each Class

4.3. Relative Weighted Reconstruction Error

5. Property of Optimal Storage Strategy Based on Non-Parametric Message Importance Measure

6. Numerical Results

6.1. Success Rate of Compressed Storage in General Storage System

6.2. Optimal Storage Size Based on Message Importance Measure in Ideal Storage System

6.3. The Property of the RWRE Based on MIM in Ideal Storage System

6.4. The Property of the RWRE Based on Non-Parametric MIM in a Quantification Storage System

7. Conclusions

Author Contributions

Funding

Conflicts of Interest

Abbreviations

Appendix A. Proof of Theorem 2

Appendix B. Proof of Lemma 1

Appendix C. Proof of Theorem 3

Appendix D. Proof of Theorem 4

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI