1 Introduction

The multimedia video services in the fifth-generation (5G) cellular network are playing an indispensable role in our daily lives, leading to a rapid increasement in the video traffic generated by users [1]. In this extremely data-demanding situation, supporting ultra-reliable and low-latency communications (URLLC) has become one of the major aims in the wireless network that deliveries the required video data from the core network to the requesting users. The limited backhaul capacity has become the bottleneck of the wireless networks. In this regard, in order to effectively alleviate the serious traffic burden and reduce the transmission delay, wireless caching is proposed, i.e., popular video data is prefetched at the network edges during the off-peak times, which is more attractive for the future communication networks [2,3,4,5,6,7,8]. Combining distributed networks [9,10,11] with the video coding technology, the edge caching networks can provide users with different video viewing experiences [12,13,14,15].

In a user-centric femto-cellular network where multiple caching helpers served a user in a joint transmission manner and each caching helper only stored one of the most popular files, the optimal caching strategy was obtained through the optimization of the successful transmission probability [4]. The work in [5] where the contents were cached into small-cell based stations (BSs), designed a distributed caching optimization algorithm to minimize the delay. The cooperation between device-to-device (D2D) transmitters was introduced. Two novel hybrid caching strategies, i.e., single-point caching and two-point cooperative caching, were proposed [6]. Afshang et al. [7] developed a spatial model for D2D networks where the locations of mobile users were modeled as a Poisson cluster process, and derived the distributions of distances from a device to both intra-/inter-cluster devices. Li et al. [8] presented a software-defined network (SDN)-based cooperative caching system, which considered the heterogeneous content attributes and a three-tier heterogeneous network.

Considering the users’ personalized video viewing experiences, that the videos are cached at the edges based on the video coding technology is proposed. In this regard, scalable video coding (SVC) is capable to flexibly and conveniently adapt to diverse personalized user demands and dynamic network environments. Recently, some researches have been concentrating on combining SVC with wireless caching. The work in [12] designed an effective SVC-based layer placement scheme, which was able to largely reduce the content download time. In [13], two novel high-efficiency caching schemes, i.e., SVC-based fractional caching and SVC-based random caching, were proposed, and the expressions for the successful transmission probabilities and ergodic service rates were derived. However, the above works assumed that the location of BS was fixed, and therefore BS could not provide dynamic and flexible data services for users.

To address diversified video services and applications, the International Telecommunication Union (ITU) has proposed a category of 5G service, i.e., URLLC [16,17,18,19]. With the introduction of mobile edge computing (MEC) [9,10,11], URLLC has been mainly used in virtual reality (VR)/augmented reality (AR), Internet of Things (IoT) and other fields. In this regard, various performance requirements such as lower latency, higher reliability, and better energy efficiency have been newly introduced. Combining with wireless caching which can take popular video data close to user, the URLLC video services can be well implemented. Meanwhile, the increasing research interests of unmanned aerial vehicles (UAVs) as airborne BSs that can boost the capacity and coverage of the existing wireless networks have recently been received from both academia and industry [20]. UAV enabled radio access network (UAV-RAN) has been considered as a key component for the future wireless URLLC networks to construct a resilient network structure. Compared with the traditional BS wireless networks where the BS’s location is fixed, the mobile UAV as aerial BS offers highly flexible and efficient connectivity for temporary events as well as for disaster-struck areas when traditional terrestrial BSs are damaged. UAVs can be used as promising supplements to the traditional BS, providing improved radio access coverage and wireless transmitting rate. A framework for enabling URLLC in the control and non-payload communications links of UAV communication systems was established and analyzed [39]. In order to reap the benefits of UAV-RAN, many technical challenges must be solved, including channel modeling, optimal UAV deployment and energy efficiency [21,22,23,24,25,26].

The key feature of the UAV-user link is the line-of-sight (LoS) connections, potentially leading to the coverage and transmitting rate enhancement. In [21, 22], the probability of LoS connection for UAV-user link was derived as a function of the elevation angle and the buildings’ average height. The UAV-user pathloss model had been further studied in [23]. Due to pathloss and shadowing, the characteristics of the UAV-user channel depended on the height of the UAV’s aerial. In order to optimize the UAV deployment, the authors in [24] derived the optimal altitude to achieve a maximum coverage radius. In [25], the authors studied how to optimize the UAV deployment to improve the connectivity of wireless networks. The deployment of an UAV providing the flying wireless communications to a given geographical area was analyzed, and the coexistence between UAV and an underlaid D2D communication network was also considered [26].

Incorporating wireless caching into UAV-RANs could provide more innovative URLLC services with a small service delay and a high-quality data experience. The authors of [27] studied wireless edge caching for multiple UAV-RANs and investigated how the overall spectral efficiency could be improved by efficient edge caching. Ji et al. [28] studied the security issue of a cache-enabled UAV-relaying network with D2D communications in the presence of eavesdropper. To achieve secure and fair transmission, an optimization problem was formulated to maximize the secrecy rate among receivers by jointly optimizing the cache placement and UAV flight trajectory. The focus of [29] was the performance analysis and trajectory optimization of cache-enabled UAV-assisted networks with underlaid D2D communications, and both static and dynamic UAV deployments were considered. The main researches of [30] were to maximize the throughput among the cache-enabled UAV with the caching placement and the UAV location. The work of [31] proposed a novel scheme for UAV-enabled communications by utilizing wireless caching at users, where a UAV was dispatched to serve a group of users with random and asynchronous requests for files. But there rarely were existing works considering personalized video viewing experiences of users.

In this paper, a collaborative caching strategy in UAV-RANs is proposed. To provide users with videos of different viewing qualities [including the standard definition video (SDV) and the high definition video (HDV)], SVC is applied to the video library where every video is encoded into two-layer files, including a based layer (BL) and an enhancement layer (EL). Whatever video quality is requested, SDV or HDV, the BL file of requested video is always needed to be transmitted to the requesting user. Therefore, BS who has a relatively fixed position and a wide serving range, is selected to cache and transmit the BL files of all videos. Considering the random requests for HDV, UAVs with a relatively small serving range and maneuverability only store the EL files. Our contributions consist of four components:

  • In order to provide users with videos of different perceptual qualities, i.e., SDV and HDV, every video in the library is encoded into one BL and one EL files through SVC. The popularity of different definition videos is investigated and derived.

  • A heterogeneous network with two tiers, i.e., the BS-tire and the UAV-tier, is presented. In the BS-tier, BS can offer all users in the cell data service and provide UAVs with error-free data when UAVs are within an appropriate distance from BS. UAVs in the UAV-tier hover within the whole cell and only provide data service at several preordained stop points.

  • An SVC-based two-tier cooperative caching model is proposed, where BS stores the BL files of all videos and UAVs cache the El files in the collaborative manner of 0–1 caching. In the delivery phase, BS transmits the BL file of the requested video to the corresponding users (ignoring the quality version), and UAVs transmit the corresponding EL file to the user who requests the HDV version.

  • Considering the limited-capacity battery of UAV, the UAV’s energy consumption is analyzed, which is the constrain of the following formulated optimization problem. We derive the caching hit probability (CHPro) as the optimization problem to obtain the optimal collaborative caching strategy. The optimization problem is NP hard, and we propose a two-step solution based on the simulated annealing algorithm. On this basis, the average service delay is also analyzed and proved that the proposed strategy is efficient in reducing delay.

The remainder of this paper is organized as follows. We present the system model in Sect. 2, including the network architecture, the SVC-based two-tier cooperative caching model and hierarchical delivery model. In Sect. 3, we derive the energy consumption of UAV and the caching hit probability, and the main system problem is formulated. Simulation results are presented in Sects. 4 and 5 concludes the paper.

2 System model

In this section, we will introduce the network architecture, the SVC-based two-tier cooperative caching model and the corresponding hierarchical delivery model.

2.1 Network architecture

Considering both the video popularity and the different needs of user’s personalized movie-watching experience synthetically, SVC is applied into the video library and an SVC-based video library is proposed, where both SDVs and HDVs of all videos can be provided to users.

As shown in Fig. 1, a video library containing \(N\) videos is placed in Cloud Server (CS). The video popularity represents the probability that a video will be randomly requested across the whole network, assumed to follow the Zipf's law [27, 29,30,31]. All videos \({V}=\left\{{{v}}_{1},{{v}}_{2},...,{{v}}_{{N}}\right\}\) are arranged in a descending order of popularity, where more popular videos are ranked with smaller indices. The popularity of the \(n\)th video \(v_{n}\) can be written as

$$p_{{v_{n} }} = \frac{{n^{ - \gamma } }}{{\sum\nolimits_{m = 1}^{N} {m^{ - \gamma } } }},\;\;n = 1,2,...,N$$
(1)

where \(\gamma\) is the skewness parameter, characterizing the concentration of video request. Employing SVC, each video can be encoded into \(L\) layer files. Considering the viewing quality preference for SDV and HDV, two-layer (\(L = 2\)) video coding is applied, and the encoded layer files just include one BL and one EL. The BL file provides fundamental video quality, and users with both the BL and EL files of the same video can acquire a superior quality of the video. According to [32], the SDV perceptual preference of \(v_{n}\) can be modeled as

$${\mathbb{P}}_{{{\text{SDV}}}} (v_{n} ) = \frac{n - 1}{{N - 1}},\;\;n = 1,2,...,N$$
(2)

and the HDV preference of \(v_{n}\) is

$${\mathbb{P}}_{{{\text{HDV}}}} (v_{n} ) = 1 - {\mathbb{P}}_{{{\text{SDV}}}} (v_{n} ) = \frac{N - n}{{N - 1}},\;\;n = 1,2,...,N$$
(3)
Fig. 1
figure 1

The proposed network architecture

Therefore, the probability that user randomly requests the \({\text{Ind}} \in \left\{ {0,1} \right\}\) version of \(v_{n}\) can be written as

$$P_{{v_{n} }}^{{{\text{Ind}}}} = \left\{ \begin{aligned} & \frac{{(n - 1)n^{ - \gamma } }}{{(N - 1)\sum\nolimits_{m = 1}^{N} {} m^{ - \gamma } }},{\text{Ind}} = 0 \\ & \frac{{(N - n)n^{ - \gamma } }}{{(N - 1)\sum\nolimits_{m = 1}^{N} {m^{ - \gamma } } }},{\text{Ind}} = 1 \\ \end{aligned} \right.,\;\;n = 1,2,...,N$$

When \({\text{Ind}} = 0\), the SDV of the corresponding video is requested. The HDV is requested while \({\text{Ind}} = 1\). The sizes of SVC encoded BL (or EL) files of all videos are assumed to be the same, denoted as \(L_{{\text{B}}}\) (\(L_{{\text{E}}}\), \(L_{{\text{B}}} < L_{{\text{E}}}\)) bits, respectively.

The heterogeneous network with two tiers, i.e., the BS-tire and the UAV-tier, is considered in the paper. In the observed cell of Fig. 1, a signal-antenna BS locating in the center of cell can obtain all video encoded files from CS directly through the wired backhaul link. The radius of the cell is \(R_{{\rm B}}\).

In the UAV-tier, there are MU signal-antenna UAVs with an average flying speed \(V_{{{\text{ave}}}}\) and a fixed flying height HU across the cell (\(M_{{\text{U}}} = 1\) in Fig. 1), who can be provided with error-free data by BS due to the wired backhaul connection between CS and BS. For simplicity, we assume that UAVs can obtain no-error data when it hovers within a distance of \(R_{{0}}\) from BS, and the circular area with the center of BS and the radius of \(R_{{0}}\) is denoted as free area [33]. The specific design of wireless error-free data service in free area is beyond the scope of the paper. In addition, UAVs move over the cell and only transmit at several specific high-altitude geographical locations, hereinafter referred to as stop points [26]. Based on to the disks covering problem [34], the aim of stop points is to cover the whole cell and ensure the coverage requirements for all users with a minimum UAV transmitting power and minimum number of stop points. We assume that the minimum number of stop points is MSP and the corresponding required serving range is RSP. The relationship between MSP and RSP is shown in Table 1. Therefore, the transmission radius of UAV is \(R_{{\text{U}}} = \sqrt {R_{{{\text{SP}}}}^{2} + H_{{\text{U}}}^{2} }\). The area with the center of a high-altitude stop point and the radius of a distance \(R_{{\text{U}}}\) from the stop point to the ground is referred to as UAV serving area.

Table 1 The number of stop points \(M_{{\rm SP}}\) and the serving range \(R_{{\rm SP}}\)

There are many users distributed across the cell, whose locations follow homogeneous 2-D Poisson Point Process (PPP) \(\Phi_{{\text{U}}}\) with density \(\lambda_{{\text{U}}}\). According to the definition of PPP, the probability that there are \(J\) users in a circle area of radius \(R\) can be written as

$${\mathbb{P}}_{{{\text{PPP}}}} (J,R,\lambda_{{\text{U}}} ) = \frac{{(\pi R^{2} \lambda_{{\text{U}}} )^{J} }}{J!}{\text{e}}^{{ - \pi R^{2} \lambda_{{\text{U}}} }}$$

There are wireless transmission links from BS or UAV to users, namely, the BS-user link and UAV-user link, respectively.

2.2 SVC-based two-tier cooperative caching model

According to the characteristics of video request and network architecture, an SVC-based two-tier cooperative caching model is proposed, where BS with the whole cell as the serving range is chosen to cache and transmit BL files while mobile and flexible UAVs store and transmit EL files to satisfy the users’ different visual experiences. The caching capacities of BS and UAV are denoted as CB and CU bits, respectively. Therefore, BS and UAV can store \(C_{{2{\text{B}}}} = \left\lfloor {\frac{{C_{{\text{B}}} }}{{L_{{\text{B}}} }}} \right\rfloor\) BL files and \(C_{{{\text{UE}}}} = \left\lfloor {\frac{{C_{{\text{U}}} }}{{L_{{\text{E}}} }}} \right\rfloor\) EL files at most in their local storages (\(C_{{2{\text{B}}}} ,C_{{{\text{UE}}}} \le N\)), where \(\left\lfloor {*} \right\rfloor\) is the flooring operation. The SVC encoded files of all videos are stored in the local storages of BS and UAVs in the collaborative manner of 0–1 caching, and a binary layer file caching indicator \(x_{m,n}\) (\(m = 0,1,2,...,M_{{\text{U}}}\), \(n = 1,2,...,N\)) is defined to represent the caching state of encoded file.

  • When \(m = 0\), \(x_{0,n}\) (\(1 \le n \le N\)) indicates whether the BL file of \(v_{n}\) is cached in the local storage of BS or not. If \(x_{0,n} = 1\), the requesting user can directly obtain the corresponding BL file in BS locally. Otherwise, BS first obtains the BL file from CS and then transmits the file to the user. Due to the limited caching capacity of BS, we have \(\sum\nolimits_{{n = {1}}}^{N} {x_{0,n} } \le C_{{2{\text{B}}}}\).

  • When \(m = 1,2,...,M_{{\rm U}}\), \(x_{m,n}\) (\(1 \le n \le N\)) represents whether the EL file of \(v_{n}\) is cached in the local storage of the \(m\)th UAV or not. If \(x_{m,n} = 1\), the \(m\)th UAV moves to the targeted stop point where the UAV can directly serve the requesting user, and transmits the corresponding EL file to the user. Otherwise, the \(m\)th UAV first moves into the free area, and BS obtains the corresponding EL file from CS and transmits the file to the UAV who then moves to the targeted stop point and transmits the file to the requesting user. We also have \(\forall 1 \le m \le M_{{\text{U}}} ,\sum\nolimits_{{n = {1}}}^{N} {x_{m,n} } \le C_{{{\text{UE}}}}\).

All the binary layer file caching indicators \(x_{m,n}\) (\(m = 0,1,2,...,M_{{\text{U}}}\), \(n = 1,2,...,N\)) are merged and denoted as the caching matrix \(X \in {\mathbb{R}}^{{(M_{{\rm U}} + 1) \times N}}\).

2.3 Hierarchical delivery model

In the cell, BS serves the requesting users with BL files of the requested videos, and UAVs transmit the corresponding EL files to the users who request the HDVs. When the corresponding encoded file can be found in the local storage of BS or UAVs, the file is directly transmitted to the requesting user. Otherwise, BS first obtains the file from CS through the wired backhaul link, and UAV obtains the file from CS through the wired backhaul and BS-UAV links. Without loss of generality, it is assumed that the wired backhaul link from CS to BS and the wireless BS-UAV link in free area are error-free, and the transmitting spectra of BS and UAV are orthogonal.

As discussed in [20, 21], users can receive three kinds of signals, including LoS signal, non-line-of-sight (NLoS) signal, and multipath fading component. These signals can be considered separately with different occurrence probabilities that are functions of environment, density and height of buildings, and elevation angle [21, 23]. Since the occurrence probability of multipath fading signal is significantly lower than the LoS and NLoS signals, the impact of multipath fading can be neglected [21, 24]. According to the LoS or NLoS connection between UAV and user, the signal power received at user can be written as

$${\text{PL}} = \left\{ \begin{aligned} & P_{{\text{U}}} r_{m}^{{ - \beta_{{\text{U}}} }} ,\;\;{\text{LOS}} \\ & \eta P_{{\text{U}}} r_{m}^{{ - \beta_{{\text{U}}} }} ,\;\;{\text{NLOS}} \\ \end{aligned} \right.$$
(6)

where PU is the UAV’s transmitting power dependent on the transmission radius RU, i.e., the value of PU should be set based on RU. Generally speaking, a large value of RU results in a large PU. \(r_{m}\) is the distance between the \(m\)th UAV and the requesting user, and the probability density function (PDF) of \(r_{m}\) is shown in APPENDIX. \(\beta_{{\rm U}}\) is the pathloss coefficient over the UAV-user link. Due to the shadowing effect and the reflection of signals from obstacles, the pathloss of NLoS connection is higher than LoS. Therefore, \(\eta\) is set as an additional attenuation factor for the NLoS connection.

The occurrence probability of LoS can be expressed as [24],

$${\mathbb{P}}_{{{\text{LOS}}}} = \frac{1}{{1 + A{\text{e}}^{ - B(\theta - A)} }}$$
(7)

where \(A\) and \(B\) are constants which depend on the environment, \(\theta\) is the elevation angle of the \(m\)th UAV and \(\theta = \frac{180}{\pi }\arcsin \left( {\frac{{H_{{\text{U}}} }}{{r_{m} }}} \right)\). The occurrence probability of NLoS is \({\mathbb{P}}_{{{\text{NLOS}}}} = 1 - {\mathbb{P}}_{{{\text{LOS}}}}\). It can be seen from formula (7), the LoS occurrence probability increases with the elevation angle between UAV and user.

When the \(m\)th UAV is serving the requesting user, there may exist the same frequency signals from the other UAVs as interferences. \({\text{SINR}}_{m,n}\) represents the signal to interference plus noise ratio (SINR) of the \(m\)th UAV transmitting the EL file of \(v_{n}\). Due to the two cases of PL, we have

  1. (1)

    The LOS connection between the \(m\)th UAV and the requesting user

    $${\text{SINR}}_{m,n}^{{{\text{LOS}}}} = \frac{{P_{{\text{U}}} r_{m}^{{ - \beta_{{\text{U}}} }} }}{{\sum\nolimits_{i = 1,i \ne m}^{{M_{{\text{U}}} }} {P_{{\text{U}}} r_{i}^{{ - \beta_{{\text{U}}} }} } + n_{0} }}$$
    (8)
  2. (2)

    The NLOS connection between the \(m\)th UAV and the requesting user

    $${\text{SINR}}_{m,n}^{{{\text{NLOS}}}} = \frac{{P_{{\text{U}}} \eta r_{m}^{{ - \beta_{{\text{U}}} }} }}{{\sum\nolimits_{i = 1,i \ne m}^{{M_{{\text{U}}} }} {P_{{\text{U}}} \eta r_{i}^{{ - \beta_{{\text{U}}} }} } + n_{0} }}$$
    (9)

In formulas (8) and (9), \(n_{0}\) is the power of complex additive white Gaussian noise, \(r_{i}\) is the distance between the requesting user and the \(i\)th UAV who transmits interference signal. Since UAVs only transmit signals at preordained and fixed stop points, the PDF of \(r_{i}\) is the same as \(r_{m}\).

3 Performance metrics and problem formulation

In this section, the performance metrics for the heterogeneous network are first derived. Second, a system problem is formulated based on the metrics, and a quasi-optimal solution is proposed.

3.1 The energy consumption of UAV

Considering the limited-capacity battery of UAV, the energy consumption of UAV during flying and data-transmitting should be analyzed. The energy restriction and actual average energy consumption of UAV are denoted as \(E_{\max }\) and \(E_{{\text{U}}}\), respectively. Therefore, we have \(E_{{\text{U}}} \le E_{\max }\). The average energy consumption \(E_{{\text{U}}}\) contains the flying mobile energy consumption \(E_{{\text{M}}}\), the flying hold energy consumption \(E_{{\text{H}}}\), the caching energy consumption \(E_{{\text{C}}}\) and the transmission energy consumption \(E_{{\text{T}}}\). We have \(E_{{\text{U}}} = E_{{\text{M}}} + E_{{\text{H}}} + E_{{\text{C}}} + E_{{\text{T}}}\).

  1. (1)

    The flying mobile energy consumption

The flying mobile energy consumption EM has positive correlation with the average moving time TM when the \(m\)th UAV flies to the stop point where UAV can stay and serve the requesting user, or goes back to free area where UAV can obtain no-error data directly. Therefore, we have

$$E_{{\text{M}}} = E_{{{\text{Fix}},{\text{M}}}} T_{{\text{M}}}$$

where \(E_{{{\text{Fix}},{\text{M}}}}\) is the fixed power consumption constant for UAV flying across the cell [35]. The moving times between two stop points and from stop point to free area are denoted as TSP and TBS, respectively. Therefore, TM of the \(m\)th UAV can be calculated as

$$T_{{\text{M}}} = T_{{{\text{SP}}}} + \sum\limits_{n = 1}^{N} {\left[ {p_{{v_{n} }}^{1} (1 - x_{m,n} )T_{{{\text{BS}}}} } \right]}$$
  1. (2)

    The flying hold energy consumption

The flying hold energy consumption \(E_{{\text{H}}}\) can be calculated as

$$E_{{\text{H}}} = E_{{{\text{Fix}},{\text{H}}}} T_{{\text{H}}}$$

where \(E_{{{\text{Fix}},{\text{H}}}}\) is the fixed power consumption constant for UAV holding at the targeted stop point [35] and \(T_{{\text{H}}}\) is the average time when the \(m\)th UAV transmits the corresponding EL file to the HDV-user. \(T_{{\text{H}}}\) is related to the transmitting rate of UAV. According to formulas (8) and (9), the rates of the \(m\)th UAV transmitting the \(v_{n}\)’s EL file under the LOS connection and NLOS connection are shown as

$$R_{m,n}^{{{\text{LOS}}}} = W_{{\text{U}}} \log ({\text{SINR}}_{m,n}^{{{\text{LOS}}}} + 1)$$
$$R_{m,n}^{{{\text{NLOS}}}} = W_{{\text{U}}} \log ({\text{SINR}}_{m,n}^{{{\text{NLOS}}}} + 1)$$

where \(W_{{\text{U}}}\) is the transmission bandwidth of UAV. Therefore, \(T_{{\text{H}}}\) is

$$T_{{\text{H}}} = {\mathbb{P}}_{{{\text{LOS}}}} \sum\limits_{n = 1}^{N} {\frac{{p_{{v_{n} }}^{1} L_{{\text{E}}} }}{{R_{m,n}^{{{\text{LOS}}}} }} + {\mathbb{P}}_{{{\text{NLOS}}}} } \sum\limits_{n = 1}^{N} {\frac{{p_{{v_{n} }}^{1} L_{{\text{E}}} }}{{R_{m,n}^{{{\text{NLOS}}}} }}}$$
  1. (3)

    The caching energy consumption

Storing some encoded layer files in UAV is an energy-consuming process, and the caching power consumption is proportional to the number of data bits stored in UAV [36, 37]. Therefore, the caching energy consumption \(E_{{\text{C}}}\) is calculated as

$$E_{{\text{C}}} = \mu_{{\text{C}}} S_{{\text{C}}} (T_{{\text{M}}} + T_{{\text{H}}} )$$

where \(\mu_{{\text{C}}}\) is the caching coefficient of UAV in W/bit, \(S_{{\text{C}}}\) is the total number of data bits cached in the local storage of UAV and \(S_{{\text{C}}} = C_{{{\text{UE}}}} L_{{\text{E}}}\).

  1. (4)

    The transmission energy consumption

When the HDV is requested, the corresponding EL file is always needed to be transmitted to the requesting user by UAV, whether the file is cached in UAV or not. Therefore, we have

$$E_{{\text{T}}} = (T_{{\text{M}}} + T_{{\text{H}}} )\sum\limits_{n = 1}^{N} {(p_{{v_{n} }}^{1} \mu_{{\text{T}}} P_{{\text{U}}} )}$$

where \(\mu_{{\text{T}}}\) is the power efficiency coefficient of the UAV power amplifier.

3.2 The caching hit probability

We define the probability that the requested videos can be obtained from the local storage of BS or UAV as the cache hit probability (CHPro). When the SDV of \(v_{n}\) is requested, CHPro is the probability that the BL file of \(v_{n}\) is stored in BS, denoted as \(P_{n,0}^{{\text{B}}}\), and we have

$$P_{n,0}^{{\text{B}}} = \frac{{x_{0,n} L_{{\text{B}}} }}{{L_{{\text{B}}} }} = x_{0,n}$$

When the HDV of \(v_{n}\) is requested, CHPro is the joint probability that the corresponding BL file is stored in BS and the EL file is cached in at least one UAV, denoted as \(P_{n,1}^{{{\text{B}},{\text{U}}}}\). We have

$$P_{n,1}^{{{\text{B}},{\text{U}}}} = \frac{{x_{0,n} L_{{\text{B}}} + {\mathbb{I}}\left( {\sum\nolimits_{m = 1}^{{M_{{\text{U}}} }} {x_{m,n} } \ge 1} \right)L_{{\text{E}}} }}{{L_{{\text{B}}} + L_{{\text{E}}} }}$$

where \({\mathbb{I}}\left( {\sum\nolimits_{m = 1}^{{M_{{\text{U}}} }} {x_{m,n} } \ge 1} \right)\) is an indicator function, and only if the return value of \(\sum\nolimits_{m = 1}^{{M_{{\text{U}}} }} {x_{m,n} } \ge 1\) is true, \({\mathbb{I}}\left( {\sum\nolimits_{m = 1}^{{M_{{\text{U}}} }} {x_{m,n} } \ge 1} \right) = 1\).

Therefore, CHPro can be calculated as

$$P_{{{\text{hit}}}} = \sum\limits_{n = 1}^{N} {\sum\limits_{{{\text{Ind}} = 0}}^{1} {P_{{v_{n} }}^{{{\text{Ind}}}} \left( {(1 - {\text{Ind}}) \cdot P_{n,0}^{{\text{B}}} + {\text{Ind}} \cdot P_{n,1}^{{{\text{B}},{\text{U}}}} } \right)} }$$

3.3 Analysis of the average service delay

Considering the requirement of URLLC, the average service delay in the SVC-based two-tier cooperative caching model is defined as the average time when a user randomly requesting the SDV or HDV of a video obtains the corresponding files, denoted as TD. In the benchmark scheme where only BS has the caching capability, the average service delay is the average time when BS transmits the corresponding encoded files to the requesting user, denoted as TB.

When the SDV of a video is randomly requested, BS needs to transmit the corresponding BL file. Compared with the benchmark scheme, there are more BL files stored locally in the SVC-based two-tier cooperative caching model. Therefore, more SDV requests can be immediately served locally in the proposed caching strategy, and we have TD < TB. When the HDV of a video is randomly requested, BS and UAV need to transmit the corresponding BL and EL files in the proposed strategy. On the one hand, the two transmissions of BS and UAV occur simultaneously. Even if the transmission time of UAV is larger than that of BS, the requesting user can watch the SDV first and the SDV is automatically converted to HDV when the user obtains the corresponding EL file. On the other hand, it is BS that needs to transmit both BL and EL in the benchmark scheme. The waiting time for a video in the benchmark is larger than that in the proposed strategy. Therefore, TD < TB.

Hence, it is proved that the proposed SVC-based two-tier cooperative caching model is efficient in reducing delay and more suitable to the URLLC service.

3.4 Problem formulation and solution

Combining (20) with relevant constrain conditions, we define the CHPro maximization as the following optimization problem.

$$\begin{aligned} & \mathop {\max }\limits_{X} \, P_{{{\text{hit}}}} \\ & {\text{s}}.{\text{t}}.\quad \, \forall {0} \le m \le M_{{\text{U}}} ,{1} \le n \le N, \, x_{m,n} \in \{ 0,1\} \\ & \quad \sum\limits_{{n = {1}}}^{N} {x_{0,n} } \le C_{{2{\text{B}}}} \\ & \quad \forall {1} \le m \le M_{{\text{U}}} , \, \sum\limits_{{n = {1}}}^{N} {x_{m,n} } \le C_{{{\text{UE}}}} \, \\ & \quad E_{{\rm U}} \le E_{\max } \\ \end{aligned}$$

We can see that the above optimization problem is a 2-choice 2-dimensional knapsack problem. In particular, for each video \(v_{n} \in V\), the 2-choice corresponds to the caching state and the 2-dimension corresponds to the cache size and average UAV power constraints, respectively. The above multiple-dimensional multiple-choice knapsack problem (MMKP) is NP-hard in strong sense.

Since the storages of BS and UAV are isolated, the above problem can be divided into two subproblems and a two-step solution is proposed. Firstly, the values of \(x_{m,n} ({1} \le m \le M_{{\text{U}}} ,{1} \le n \le N) \,\) are all assumed to be 1 and the objective is to obtain the optimal \(x_{0,n} ({1} \le n \le N) \,\), denoted as \(x_{0,n}^{*} ({1} \le n \le N) \,\). The first subproblem is shown as

$$\begin{aligned} & \mathop {\max }\limits_{{x_{0,n} }} \, \sum\limits_{n = 1}^{N} {\sum\limits_{{{\text{Ind}} = 0}}^{1} {P_{{v_{n} }}^{{{\text{Ind}}}} \left( {x_{0,n} + (1 - x_{0,n} )\frac{{{\text{Ind}} \cdot L_{{\text{E}}} }}{{L_{{\text{B}}} + L_{{\text{E}}} }}} \right)} } \\ & {\text{s}}.{\text{t}}.\quad \, \forall {1} \le n \le N, \, x_{0,n} \in \{ 0,1\} \\ & \quad \sum\limits_{{n = {1}}}^{N} {x_{0,n} } \le C_{{2{\text{B}}}} \\ \end{aligned}$$

The subproblem can be solved by the simulated annealing (SA) algorithm, and the optimal \(x_{0,n}^{*}\) is obtained. Then, due to \(M_{{\text{U}}} { = 1}\), CHPro based on \(x_{0,n}^{*}\) can be written as the second subproblem, which can also be solved by the SA algorithm.

$$\begin{aligned} & \mathop {\max }\limits_{{x_{m,n} (1 \le m \le M_{{\text{U}}} )}} \, \sum\limits_{n = 1}^{N} {\sum\limits_{{{\text{Ind}} = 0}}^{1} {P_{{v_{n} }}^{{{\text{Ind}}}} \left( {x_{0,n}^{*} \left( {1 - \frac{{{\text{Ind}} \cdot L_{{\text{E}}} }}{{L_{{\text{B}}} + L_{{\text{E}}} }}} \right){ + }x_{{{1},n}} \frac{{{\text{Ind}} \cdot L_{{\text{E}}} }}{{L_{{\text{B}}} + L_{{\text{E}}} }}} \right)} } \\ & {\text{s}}.{\text{t}}.\quad \, \forall {1} \le m \le M_{{\text{U}}} ,{1} \le n \le N, \, x_{m,n} \in \{ 0,1\} \\ & \quad \forall {1} \le m \le M_{{\text{U}}} , \, \sum\limits_{{n = {1}}}^{N} {x_{m,n} } \le C_{{{\text{UE}}}} \, \\ & \quad E_{{\text{U}}} \le E_{\max } \\ \end{aligned}$$

The SA algorithm is a probabilistic algorithm. The temperature of the solid first increases to a sufficient high value and then let it cool down slowly. During the heating, the particles inside the solid become disordered and the internal energy increases. During the cooling, the particles gradually become the order and equilibrium state at each temperature value. In the two-step solution, each subproblem can be solved by the SA algorithm as shown in Fig. 2. In the SA algorithm based subproblem solution, the value of each particle is set as the CHPro gain of encoded BL (EL) files shown in formulas (22) and (23), and the random disturbance term \(\Delta X\) should satisfy the cache size and average UAV power constraints.

Fig. 2
figure 2

The SA algorithm based subproblem solution

4 Performance evaluation

In this section, we evaluate the performance of the proposed strategy by MATLAB. The simulation setup and performance analysis are presented as follows.

4.1 Simulation setup

According to references [13, 26], the parameter setting for the collaborative caching strategy in UAV-RANs is shown in Table 2. In order to analyze the network performance, the skewness parameter of the video popularity \(\gamma\) is set in the range from 0 to 1.2 with the interval of 0.1. Due to \(R_{{\text{U}}} = \sqrt {R_{{{\text{SP}}}}^{2} + H_{{\text{U}}}^{2} }\), the value of \(R_{{\text{U}}}\) increases with \(H_{{\text{U}}}\) and \(R_{{{\text{SP}}}}\) who is dependent on \(M_{{{\text{SP}}}}\). Since the value of \(P_{{\text{U}}}\) depends on \(R_{{\text{U}}}\), we set the value of \(P_{{\text{U}}}\) evenly from 5 to 10 W with the variations of \(H_{{\text{U}}}\) and \(M_{{\rm SP}}\). For simplicity, when \(M_{{{\text{SP}}}} = 3,5,8,12\), the values of \(P_{{\text{U}}}\) are set as 10 W, 7.35 W, 5.86 W and 5.37 W, respectively. When \(H_{{\text{U}}} = 150\,{\text{m}},180\,{\text{m}},200\,{\text{m}},220\,{\text{m}},250\,{\text{m}}\), \(P_{{\text{U}}}\) is equal to 5 W, 6.36 W, 7.35 W, 8.33 W and 9.88 W, respectively. The occurrence probability of LOS is assumed to be 0.6 and the average flying speed of UAV is 10 m/s. According to [35], when \(V_{{{\text{ave}}}}\) is equal to 0 and 10 m/s, the energy consumption is 160 W and 130 W. So \(E_{{{\text{Fix}},{\text{M}}}} = 130\,{\text{W}}\) and \(E_{{{\text{Fix}},{\text{H}}}} = 160\,{\text{W}}\). According to the disks covering problem, the distances UAV moving between two stop points and flying from stop point to free area are constant values. In this regard, since UAVs fly on an average speed \(V_{{{\text{ave}}}}\), it’s assumed that the moving time from stop point to free area is twice than the time between two stop points (\(T_{{{\text{BS}}}} = 2T_{{{\text{SP}}}}\)) and the value of \(T_{{{\text{SP}}}}\) is assumed to be 30 s.

Table 2 The parameter setting of the network

Two schemes are compared with the proposed collaborative caching strategy. One is the “most popular content” scheme (MPC) [38] in which BS and UAVs store video encoded files based on the video popularity distribution. The other is the uniform caching scheme (UCS) where BS and UAVs store video files uniformly and there is no cooperation between BS and UAVs.

4.2 Simulation result and analysis

The effects of \(\gamma\) on \(E_{{\text{U}}}\) under different values of \(M_{{{\text{SP}}}}\) or \(H_{{\text{U}}}\) are shown in Figs. 3 and 4. Due to the fact that CHPro of the proposed caching strategy increases with \(\gamma\), which can be verified by Figs. 6 and 7, the flying mobile energy consumption decreases as well as \(E_{{\text{U}}}\). From the perspective of UAV energy consumption, the energy-limited UAV-RANs are more suitable for the video library where the video popularity concentrates on the major popular videos. Since a larger value of \(M_{{{\text{SP}}}}\) results in a smaller \(P_{{\text{U}}}\) as well as the transmission energy consumption, \(E_{{\text{U}}}\) decreases with \(M_{{{\text{SP}}}}\). A larger value of \(H_{{\text{U}}}\) inversely leads to a larger \(P_{{\text{U}}}\) as well as the transmission energy consumption, so \(E_{{\text{U}}}\) increases with \(H_{{\text{U}}}\). Therefore, we should increase the number of UAV stop points and reduce the flying height of UAV reasonably.

Fig. 3
figure 3

The effect of \(\gamma\) on \(E_{{\rm U}}\) under different \(M_{{\rm SP}}\)

Fig. 4
figure 4

The effect of \(\gamma\) on \(E_{{\rm U}}\) under different \(H_{{\rm U}}\)

Figure 5 indicates that the effects of \(\gamma\) on \(E_{{\text{U}}}\) under different \(\lambda_{{\text{U}}}\). A larger value of \(\lambda_{{\text{U}}}\) means that there are more users in a fixed area. So, the probability that the requesting user situates in the serving area of UAV gets bigger with \(\lambda_{{\text{U}}}\), resulting in a relatively smaller value of the flying hold energy consumption as well as \(E_{{\text{U}}}\). Therefore, the collaborative caching strategy in UAV-RANS is suitable for the ultra-dense networks.

Fig. 5
figure 5

The effect of \(\gamma\) on \(E_{{\rm U}}\) under different \(\lambda_{{\rm U}}\)

As shown in Figs. 6 and 7, the effects of \(\gamma\) on \(P_{{{\text{hit}}}}\) under different values of \(C_{{\text{B}}}\) or \(C_{{\text{U}}}\) are obvious. The aim of the proposed collaborative caching strategy is to maximize CHPro, and \(P_{{{\text{hit}}}}\) in formula (20) is related to the value of video-version popularity \(P_{{v_{n} }}^{{{\text{Ind}}}}\), including the video popularity and the SDV/HDV requesting probability. The requesting probability of the HDV of a video with a small index, is high, so both the BL and EL files need to be cached. The requesting probability of the SDV of a video with a large index, is high, so only the BL file needs to be cached. When the value of \(\gamma\) is small, the popularities of all videos are similar. In this case, no matter what encoded video files are stored, \(P_{{{\text{hit}}}}\) varies a little. The major video popularity concentrates on the several popular videos as \(\gamma\) increases. When \(\gamma\) increases a little, the caching gain of encoded files of several popular videos cannot compensate for the decrease of \(P_{{{\text{hit}}}}\) due to the lack of many less popular files. When \(\gamma\) increases in a large value range, \(P_{{{\text{hit}}}}\) gets large when the major popular video files are stored in the local storages of BS or UAVs. Therefore, the collaborative caching strategy can be better used in the UAV-RANs with a skew video popularity library. This is the reason why the value of \(P_{{{\text{hit}}}}\) first goes down and then goes up with \(\gamma\) increasing. No matter whose caching capacity gets larger, BS or UAV, \(P_{{{\text{hit}}}}\) always becomes larger obviously. But due the capacity of the UAV is limited, which is also the reason why CHPro of the strategy is not high, we would introduce the cooperation between UAVs.

Fig. 6
figure 6

The effect of \(\gamma\) on \(P_{{\rm hit}}\) under different \(C_{{\rm B}}\)

Fig. 7
figure 7

The effect of \(\gamma\) on \(P_{{\rm hit}}\) under different \(C_{{\rm U}}\)

Figure 8 indicates that \(E_{{\text{U}}}\) decreases with the value of \(C_{{\text{U}}}\). This is because that CHPro increases with \(C_{{\text{U}}}\), resulting in the reduction of the flying mobile energy consumption as well as \(C_{{\text{U}}}\). Combining Fig. 8 with Fig. 7, we should increase the caching capacity of UAV or introduce the cooperation between UAVs to improve CHPro and reduce the energy consumption.

Fig. 8
figure 8

The effect of \(\gamma\) on \(E_{{\rm U}}\) under different \(C_{{\rm U}}\)

Figures 9 and 10 show that the effects of \(\gamma\) on \(E_{{\text{U}}}\) and \(P_{{{\text{hit}}}}\) of the proposed collaborative caching strategy, MPC and UCS. As shown in Fig. 9, when the video popularity is relatively even, i.e., \(\gamma\) is in a small value, the \(E_{{\text{U}}}\) of our proposed strategy is between those of MPC and UCS. When \(\gamma\) gets large, \(E_{{\text{U}}}\) of our proposed strategy goes down rapidly and becomes the smallest one. MPC pays more attention to the major popular video files, so \(P_{{{\text{hit}}}}\) increases when the popularities of few popular videos getting larger with \(\gamma\). In UCS, the caching probability of every video file is the same, ignoring the caching priority of popular videos, so \(P_{{{\text{hit}}}}\) decreases with \(\gamma\). Obviously in Fig. 10, the \(P_{{{\text{hit}}}}\) of our proposed strategy is always bigger than those of the others, and the performance of the strategy is more superior.

Fig. 9
figure 9

The comparison of \(E_{{\rm U}}\) between PBCS, UCS and the proposed scheme

Fig. 10
figure 10

The comparison of \(P_{{\rm hit}}\) between PBCS, UCS and the proposed scheme

5 Conclusion

In this paper, considering the requirements of URLLC, a collaborative caching strategy was proposed for UAV-RANs. First, an SVC-based video library where every video was encoded into one BL and one EL files through SVC, was presented to provide users with videos of different perceptual qualities, including SDV and HDV. The popularity of different definition videos was also derived. Then, a heterogeneous network where UAVs within an appropriate distance from BS could be provided with error-free data and UAVs only at several stop points provided data service to users while BS could offer all users in the cell data service, was proposed. Next, we proposed an SVC-based two-tier cooperative caching model, where BS and UAVs stored different encoded video files in the collaborative manner of 0–1 caching. Finally, the energy consumption of UAV and CHPro were derived, and the optimal collaborative caching strategy was obtained via the maximization of CHPro. The maximization problem was NP hard and a two-step solution based on the SA algorithm was proposed. It was proved that the proposed SVC-based two-tier cooperative caching model was efficient in reducing delay. Simulation results demonstrated that the collaborative caching strategy could be better used in the UAV-RANs with a skew video popularity library and an ultra-dense user distribution. The effectiveness of the proposed strategy was verified by numerical examples where existing benchmark schemes were compared and outperformed.