1 Introduction

Vehicular communications, which form a network to support vehicle-to-vehicle (VTV) and vehicle-to-infrastructure (VTI) communications, are essential techniques of intelligent transportation system (ITS). In recent years, lots of attention has been drawn to develop multiple applications in vehicular communications such as automatic selection of routing protocol[1]. To realize such high-speed mobile communications, the IEEE 802.11p standard [2], that defines the physical layers (PHY) and the medium-access layers (MAC), has been officially applied in 2010. The IEEE 802.11p is a modified version of 802.11a [3]. The main difference between them is that 802.11p facilitates the half of frequency bandwidth of 802.11a, thus making signals more robust against fading and multipath propagation effects in vehicular environments[4]. What is more, it can support lower latency, realize higher data rate, and enhance security compared to other standards [5].

Channel estimation (CE) schemes play a crucial role in the performance of vehicular communication using 802.11p. The estimated channel response (CR) significantly affects the subsequent equalization, demodulation, and decoding. In general, the accuracy of CE determines the performance of the whole system. However, in PHY layer, the 802.11p protocol utilizes four pilot subcarriers per one OFDM symbol. The pilot positions are too loose to adequately track variations of channel frequency response (CFR). In addition, due to the fact that CR varies greatly in vehicular communications, coupled with no restrictions of the packet length in 802.11p, channel estimation (CE) keeps easily outdated during the entire packet.

A lot of work has been proposed to track channel variations over the frame duration for vehicular communication under the IEEE 802.11p standard. The current method focuses on data pilot-aided successive (DPAS) whose key is to consider the demapped data signals as aided pilot [69]. The performance gain, however, is not evident especially at high signal-noise-ratio (SNR) region because of the error propagation caused by accumulated noise in the iterative process. Recently, deep learning (DL) has shown impressively promising prospects. DL enables to extract inherent characteristic of signals and is applied for channel estimation [1014]. However, due to deep fading caused by high Doppler shift under vehicular environment, above DL-based approaches degrades in the accuracy of CE. The main objective of this paper is to estimate precisely CFR by integrating DPAS with DL.

In this paper, a deep learning-aided temporal spectral channel network (TS-ChannelNet) for 802.11p-based channel estimation under high-speed vehicular scenarios is proposed to track variations of CFR. In general, the pilot taken as low-resolution (LR) version of CR is utilized to recover high-resolution (HR) version of CR by TS-ChannelNet. Our presented TS-ChannelNet consists of two phases. Initially, coarse CR is restored via pilot by leveraging averaging decision-directed with time truncation (ADD-TT). By averaging both in time and frequency domains, ADD-TT handles few of the impacts of error propagation caused by time truncation based on decision feedback. Afterwards, a neural network (NN) architecture named super resolution convolutional long short-term memory (SR-ConvLSTM) is introduced to make estimated CR more accurate. SR-ConvLSTM utilizes the power of convolutional long short-term memory (ConvLSTM) that exploits temporal spectral correlation to combat deep fast fading under the extremely time-varying environments. The obtained CR is tailored for vehicular communications. Simulation results demonstrate that our method is competent over previous methods under representative vehicular scenarios. Our contributions of this paper are summarized as follows.

  • CFR is modeled as an image. The pilot is considered as a LR version of the image. The estimated CFR is viewed as a HR version of the image. Then, the TS-ChannelNet, which includes pilot-based interpolation and DL-based restoration, is presented to obtain HR version of CFR.

  • An improved interpolation based on DD-TT called ADD-TT is taken to extend pilot into reasonable initial coarse CFR. ADD-TT handles few impacts of error propagation by time truncation based on decision feedback and further improves the performance of the follow-up SR-ConvLSTM.

  • The new super resolution technique-based architecture named SR-ConvLSTM is designed. It restores HR version of CFR by reflecting highly variations of channel.

  • The extensive ablation experiment is conducted to verify that SR-ConvLSTM powerfully extract temporal spectral correlation of signal to track the variations of channel.

The rest of this paper is organized as follows. Section 2 illustrates related work in details. Section 3 introduces the system model, channel model, and benchmark algorithm. Section 4 presents our temporal spectral deep learning-based channel estimation scheme. Section 5 verifies the full advantage of TS-ChannelNet by simulation results. Section 6 concludes the paper.

2 Related work

In this section, the existing work of CE under vehicular communications using 802.11p standard is first elaborated. The downside of present work is then introduced. Furthermore, DL applied in the communication field, as a promising prospect, is investigated.

In few years, mobile ad hoc network (MANET) has successfully applied in amounts of field, such as health care [15, 16], broadcast encryption [17], vehicular streaming service [18], and urban management [19]. CE has been investigated actively because it decides the performance of the system in PHY layer [7]. The current CE focuses on DPAS method, such as STA [6] and CDP [7]. The key part of these algorithms is to consider the demapped data signals as aided pilot. Then, the estimated CR is iteratively used to construct data pilot in the follow-up orthogonal frequency division multiplexing (OFDM) symbol. Mehrabi [9] introduced decoded data bits into DPAS to suppress noise caused by demodulation, but the performance gain is still marginal at high SNR region. To further improve accuracy of CFR, Awad [8] transformed CFR into time domain and performed truncation operation, thus removing demodulation errors. However, because iterative accumulated noise is not eliminated completely, these schemes still suffer from error propagation especially in the rapid time-varying vehicular channels.

Compared to the conventional schemes, DL has been shown to extract powerfully the inherent characteristic of signals [20] and thus has been qualified when overcoming multiple problems in wireless communications field [2125]. FCNN was utilized into channel estimation and pilot design [11, 12]. It initially demonstrates the powerful ability of DL to increase improvement in the accuracy of CE. However, the scheme is not fit for vehicular communications using 802.11p. Because the unlimited packet length in 802.11p leads to increase rapidly in the number of neurons and thus FCNN tends to overfit. Neumann [13] modeled channel as conditionally Gaussian distribution given a set of random hyperparameters. Those hyperparameters are learned via convolutional neural network (CNN). Soltani [14] viewed channel estimation as an image super resolution problem where the pilot was a low-resolution sampled version of the channel and time-frequency CR was the image to be recovered. But the performance of the method still degrades under fast time-variant environment. The goal of this paper is to integrate DPAS with DL to track variations of channel, thus estimating CFR precisely.

3 System model

In this section, the structure of IEEE 802.11p under vehicular communications is first presented. Then, the channel model for vehicular wireless environment employed in this paper is briefly introduced. Subsequently, ChannelNet applied as benchmark algorithm is elaborated.

3.1 Structure of IEEE 802.11p

IEEE 802.11p physical layer is based on OFDM which boosts spectrum utilization by turning serial large data streams into parallel data streams on orthogonal subcarriers.

In 802.11p, the received signal is turned into parallel data for fast Fourier transformation (FFT) input, thus obtaining follow-up output in the frequency domain.

$$\begin{array}{@{}rcl@{}} Y(t,k) = H(t,k)X(t,k) + Z(t,k). \end{array} $$
(1)

where Y(t,k) and X(t,k) represents received, transmitted OFDM data symbols using FFT respectively, H(t,k) represents the CFR of the wireless channel, and Z(t,k) is added white Gaussian noise (AWGN). t represents the index of length per frame with 1 ≤tT. T is the number of length per frame. k denotes the index of subcarriers per frame with 1 ≤kK. K is the number of subcarrier per frame. How to estimate H more accurately is the goal of this paper.

IEEE 802.11p defines 75 MHz band at 5.9 GHz. The 75 MHz bandwidth is divided into 7 channels including one control channel (CCH) and six service channels. Safety messages are transmitted through CCH when emergent events happen [26]. IEEE 802.11p standard defines that pilot tones for channel estimation is comb structure. It is located on subcarriers -21, -7, 7, and 21 as Fig. 1 is shown. The initial channel estimation is enabled by utilizing the known training symbols transmitted of the preamble. Due to the highly time-varying channel in vehicular environments and the fact that the frame length is unlimited in the IEEE 802.11p standard, the channel estimation for each packet outdates easily over the entire packet duration. Therefore, how to design a channel estimation scheme to track variations under vehicular channel is a challenging problem.

Fig. 1
figure 1

The structure of TS-ChannelNet

3.2 Channel model for vehicular communications

Due to the relative motion of the transmitter and receiver, a Doppler spectral spread or broadening appears under vehicular communication. The relatively high velocity causes fast time-varying CR. To capture the joint Doppler-delay characteristics of vehicular communications, the tapped-delay line (TDL) model is adopted following the parameter of [27]. In [27], taps are characterized by Doppler power spectral density due to Rayleigh fading. The channel impulse response is calculated as (2)

$$\begin{array}{@{}rcl@{}} h(t,\tau) = \sum\limits_{l = 1}^{L} {{\phi_{l}}(t)\delta (\tau - {\tau_{l}}(t))}. \end{array} $$
(2)

where ϕl(t) represents the fading coefficient, L denotes resolution multipath, δ is impulse function, and τl(t) denotes time delay in lth path.

In this paper, three representative models are given as in [27], i.e., VTV Expressway Oncoming (VTVEO), VTV Urban Canyon (VTVUC) Oncoming, and RTV Expressway (RTVE). In the VTV Expressway Oncoming scenario, the moving speed of the receiver and the transmitter is the highest compared to the other scenarios. Its speed is 100km/h and its Doppler shift is about 1200Hz. Then, the VTV Urban Oncoming is the medium challenging environment for channel estimation. Its Doppler shift is 400–500 Hz with about 32 km/h moving velocity. In conclusion, the models presented are typical standard vehicular environments that consist of different velocities (low velocity/high velocity), and a Doppler shift ranged from 400 to 1200 Hz.

3.3 Benchmark algorithm: ChannelNet

In [14], a deep learning-based channel estimation scheme named ChannelNet was implemented for the short length of frame in slow time-varying environment. By viewing CR as images, the pilot values were utilized via image super resolution technique to restore (estimate) CR.

The process of ChannelNet consists of two phases. On the one hand, the isolated pilot values are extended to initial CR via Gaussian interpolation. On the other hand, CR values as input are fed into super resolution neural network (SRCNN) [28] followed by denoising convolutional neural network (DnCNN) [29]. The NN generates the estimated CR. The authors investigate the performance of ChannelNet in relatively slow time-varying environment. In our experimental trial, ChannelNet furthermore degrades for high-velocity mobile communications. This is owing to the unreliability of initial interpolation method, coupled with the fact that CNN does not have enough capacity to uncover temporal spectral correlation of the CR, thereby keeping CR outdated over the frame duration.

4 Proposed method

In this section, we first describe the pre-process of TS-ChannelNet. It utilizes interpolation scheme based on ADD-TT via pilot values. Then, the NN architecture named SR-ConvLSTM is presented to track variations of vehicular channel. Afterwards, the training process of TS-ChannelNet that is made up of ADD-TT following by SR-ConvLSTM is illustrated.

4.1 Interpolation based on ADD-TT

In this subsection, interpolation based on pilot via ADD-TT is implemented to obtain coarse CR. It extends few pilot values to initial CR values that are taken as IR images.

Usually, least squares (LS) estimation utilizes two identical preambles which are sent at the beginning of received packet in IEEE 802.11p to estimate tentative CR. Y(1,k), Y(2,k) are the first two long training symbols. X(1,k), X(2,k) are identical and two transmitted predefined long symbols in the frequency domain. To obtain CFR for all subcarriers, the received Y(1,k) and Y(2,k) are divided by X(1,k) as

$$\begin{array}{@{}rcl@{}} {\hat{H}_{LS}}(1,k) = \frac{{Y(1,k) + Y(2,k)}}{{2X(1,k)}}, \end{array} $$
(3)

where \(\hat {H}_{LS}(1,k)\) represents the LS channel estimate at the 1th time slot on the kth subcarrier. LS estimation assumes the channel is stationary. However, vehicular channel varies fast and the performance of LS estimation degrades significantly.

Then, decision-directed channel estimation is presented. It is based on correlation of adjacent symbols. The symbols are equalized by previous channel estimation as follows

$$\begin{array}{@{}rcl@{}} \hat{S}(t,k) = \frac{{Y(t,k)}}{{\hat{H}(t - 1,k)}}, \end{array} $$
(4)

where \(\hat {S}(t,k)\) denotes equalized symbol at the tth time slot on the kth subcarrier and \(\hat {H}(t-1,k)\) is the previous channel estimation. Based on the high correlation between adjacent data symbols, the current tth CFR is assumed to be unchanged with respect to the previous. The errors caused by such assumption are alleviated by the subsequent demodulation. Hence, the previous \(\hat {H}(t-1,k)\) is utilized to estimate. Note that the first estimated CR is LS channel estimation using (2). Then, the decision feedback is used to update channel estimate according to (5)

$$\begin{array}{@{}rcl@{}} \hat{H}(t,k) = \frac{{Y(t,k)}}{{\hat{X}(t,k)}}, \end{array} $$
(5)

where \(\hat {X}(t,k)\) represents the demodulated OFDM data symbol that stems from \(\hat {S}(t,k)\). The errors of estimated CFR are alleviated by demapping \(\hat {S}(t,k)\) to the corresponding constellation point \(\hat {X}(t,k)\). Thus, data symbols can provide useful channel information to construct data pilot.

However, \(\hat {H}(t,k)\) still cannot eliminate completely noise and accumulate error in iterative process, caused by error propagation, especially at low SNR region. The error propagation happens because the data symbols may be incorrectly demapped and thus the error is gradually accumulated during the iteration. To reduce such negative impact on decision-directed channel estimation, an average method based on time-domain truncation loop approach is applied. The scaled version of FFT matrix V is firstly calculated following by

$$\begin{array}{@{}rcl@{}} V = \sqrt M {F_{M}}(:,1:L + 1), \end{array} $$
(6)

where M represents the modulation order of the signal, FM is the FFT matrix, and L is the number of reserved time domain taps. \(\hat {H}(t,k)\) is converted to h(t,k) in the time domain using inverse fast Fourier transformation (IFFT). To curb noise, time truncation is operated. V is the scaled matrix that works for converting \(\hat {H}(t,k)\) to frequency domain. Then, \(\hat {H}(t,k)\) is turned into time domain to remove the time domain taps containing most of noise as follows

$$\begin{array}{@{}rcl@{}} \hat{H}_{f}(t,k) = V\hat{h}(t,1:L), \end{array} $$
(7)

where \(\hat {h}(t,1:L)\) represents the reserved CR in time domain at the tth time slot and \(\hat {H_{f}}(k)\) denotes scaled version of CR in frequency domain. The demodulation errors are equivalent to adding noise into \(\hat {h}\). Converting from frequency domain to time domain, noise is uniformly distributed across the different taps. \(\hat {h}\) is truncated in the time domain to alleviate the effect of noise caused by demodulation errors. Even though the time truncation is employed, there are some cumulative errors in \(\hat {H}_{f}(k)\) caused by minor noise. Thus, the average of \(\hat {H}_{f}(k)\) in time and frequency domains to smooth CR according to (8), (9)

$$\begin{array}{@{}rcl@{}} \hat{H}_{s}(t,k) = \sum\limits_{\lambda = - \beta }^{\lambda = \beta} {\frac{1}{{2\beta + 1}}} {\hat{H}_{f}}{(t,k + \lambda)}, \end{array} $$
(8)

where 2 β+1 represents the number of averaged subcarriers. The high correlation between adjacent subcarriers \(\hat {H}_{f}(t, k + \lambda)\) can be introduced to further improve the accuracy of the estimates. Then, averaging in time domain is calculated as follows,

$$\begin{array}{@{}rcl@{}} \hat{H}_{f}{(t,k)} = (1 - \alpha)\hat{H}{(t - 1,k)_{f}} + \alpha \hat{H}_{s}{(t,k)}. \end{array} $$
(9)

where \(\hat {H}_{f}(t,k)\) denotes the output of ADD-TT scheme at the tth time slot on the kth subcarrier, and α is coefficient parameter to update CR. Based on the high correlation across successive OFDM symbols, the weighted summation of previous and current estimated CFR can improve the performance. α, β are parameters related to knowledge of the vehicular environments. However, it is impossible to obtain such information in practice. It is observed in [6] that the best performance of averaging in time and frequency domain is achieved with α = 0.5 and β = 2. Thus, α is fixed to 0.5 and β is set to 2 in this paper.

4.2 The architecture of SR-ConvLSTM

ChannelNet based on CNN is inept in uncovering the inherent characteristics of temporal spectral correlation, thus a NN architecture SR-ConvLSTM based on ConvLSTM is proposed. It models temporal spectral correlation of adjacent symbols to estimate CR and is suitable for non-stationary scenarios.

Channel estimation of vehicular communications using IEEE 802.11p is viewed as super resolution problem. Considering the time-variant channel, LSTM that enables to extract time correlation of series is introduced to tackle super resolution problem. In [30], the authors prove LSTM successfully handles channel state information (CSI) feedback for time-varying communications. Adding a convolution operation to the LSTM composes of ConvLSTM. ConvLSTM is more effective for feature extraction when the time series data are images. The ConvLSTM [31] originates from LSTM. The difference is that after adding the convolution operation which not only obtain the timing relationship, but also to extract features such as convolution layers. In this way, we obtain the temporal spectral characteristics via SR-ConvLSTM based on ConvLSTM.

The details of proposed SR-ConvLSTM are presented. SR-ConvLSTM is composed of five layers including ConvLSTM and batch normalization (BN). Since this paper views channel estimation as an image super resolution problem, inspired by the architecture of [28], the structure of ConvLSTM following by BN is chosen and such structure is repeated to track high variations of CFR. ConvLSTM works for capturing temporal spectral correlation between adjacent data symbols and BN enables SR-ConvLSTM to converge. The specific structure is seen in Table 1. The first layer applies 64 filters of size 9 × 9 of ConvLSTM following by rectified linear units (ReLU) activation following by

$$\begin{array}{@{}rcl@{}} R(x) = \max (0,x), \end{array} $$
(10)
Table 1 Architecture of the SR-ConvLSTM

where x is input of the ConvLSTM. When the activation value of the neuron enters the negative half region, the gradient is 0. That means this neuron is trained to keep sparsity. The second layer is BN. BN is able to solve the problem when the neural network is training with slow convergence speed or exploding gradients. In fact, we find out if BN is removed from SR-ConvLSTM, the network cannot be converged. The reason may lie in the complex distribution of channel that needs BN operation. In addition, BN is added to speed up the training speed and improve the accuracy of the model. The third layer uses 32 filters of size 1 ×1 of ConvLSTM following by ReLU activation. The fifth layer is BN. The last layer is 1 filter of size 5 ×5×5 to reconstruct the output. Notably, to strike balance between performance and complexity, TS-ChannelNet removes DnCNN compared to ChannelNet.

The relationship between input and output of proposed SR-ConvLSTM is represented as

$$\begin{array}{@{}rcl@{}} \hat{H} = f\left(\theta ;{\hat{H}_{\text{seq}}}\right). \end{array} $$
(11)

where θ denotes the parameters of SR-ConvLSTM, \(\hat {h}\) is the final estimated CR, and f means nonlinear function that is determined by θ.

The architecture of ChannelNet must be revised if the frame lengths are changed. What is worse, the whole ChannelNet should be trained from scratch, which is non-trivial in practice. In SR-ConvLSTM, the CR is divided into blocks that contain n data symbols. Hence, SR-ConvLSTM fits for the arbitrary frame length without amending the input shape of NN. In conclusion, SR-ConvLSTM is more robust than SRCNN that is building blocks of ChannelNet.

4.3 Training of TS-ChannelNet

In this paper, estimating CFR at the receiver is viewed as a super resolution problem which includes pilot-based interpolation and DL-based restoration [14]. Thus, the proposed TS-ChannelNet is composed of ADD-TT and SR-ConvLSTM. In the first phase, pilot values hp are extended into the coarse CFR whose dimension is identical to estimated \(\hat {h}\). In this second phase, SR-ConvLSTM parameterized by θ is utilized to make coarse CFR become HR version via DL. The relationship between the input and output of TS-ChannelNet can be represented by this equation:

$$\begin{array}{@{}rcl@{}} \hat{H} = f(\theta ;{h_{p}}) = {f_{\theta} }({f_{ADD - TT}}({h_{p}});\theta) \end{array} $$
(12)

where fθ and fADDTT are the network and interpolated functions, respectively.

ADD-TT in the first phase comprises decision-direction, time truncation, and weighted average. Firstly, decision-direction assumes that the tth CFR is highly correlated with the previous and thus \(\hat {H}(t-1, k)\) is used as pseudo \(\hat {H}(t,k)\) to calculate data pilot. The errors caused by such iterative operation are alleviated via demapping data pilot to constellation point. Secondly, accumulate errors by wrong demodulation are equal to adding noise. Noise is uniformly distributed across the different taps from frequency domain to time domain [8] and operate truncation to curb it. Thirdly, to make use of pilot, averaging \(\hat {H}(t,k)\) in the frequency and time domain is taken into account. In general, ADD-TT utilizes average decision-directed time truncation to make pilot become coarse \(\hat {h}\).

SR-ConvLSTM in the second phase is introduced to restore HR version of \(\hat {h}\). Initially, training SR-ConvLSTM needs to extract real and imaginary part of \(\hat {h}\) and stack them. Then, the stacked \(\hat {h}\) is divided into several blocks to make SR-ConvLSTM reveal temporal spectral correlation. SR-ConvLSTM has impressive power to achieve intrinsic correlation of signal in a end-to-end manner. The stacked \(\hat {h}\) is divided into several blocks. Finally, the output of SR-ConvLSTM is concatenated to obtain final estimated CFR. In addition, the optimization algorithm Adam [11] is chosen to make SR-ConvLSTM converge. To measure the accuracy of estimates, the normalized mean square error (NMSE) between \(\hat {h}\) and H is utilized.

$$\begin{array}{@{}rcl@{}} \text{NMSE} = \frac{1}{N}\sum\limits_{i = 1}^{N} {\frac{{E\left[{{\left| {H - \hat{H}} \right|}^{2}}\right]}}{{E\left[{{\left| H \right|}^{2}}\right]}}} \end{array} $$
(13)

where N is the frame length. Besides, bit error rate (BER) is also chosen to demonstrate the performance of TS-ChannelNet. The algorithm of TS-ChannelNet is summarized in Algorithm 1.

5 Simulation and results

In this section, we first introduce the settings of the simulation, which includes the parameters of IEEE 802.11p and DL-based model. Then, the simulation results demonstrate the strength of our proposed TS-ChannelNet.

5.1 Simulation setup and parameters

The IEEE 802.11p end-to-end PHY is implemented for simulation. The NMSE and BER are taken as the performance measurement of the scheme. The range of SNRs for simulation is from 0 to 30 dB with 4 quadrature amplitude modulation (QAM). The velocities range from 32 to 104 km/h. Frame length with 60 blocks is chosen.

Tensorflow using graphics processing unit (GPU) is employed for our approach. The learning rate is 0.001 and the dropout is 0.2. The batch size is 128 and epochs are 60. The training size, validation size, and test size are 32000, 8000, and 4000 respectively. The two models are trained at the SNR values of 22 dB with above hyperparameters with respect to three different environments. The specific parameters of simulation are described in Table 2..

Table 2 The parameters of simulation

5.2 Results and discussion

Figures 2, 3, and 4 compare the performance of TS-ChannelNet and other schemes with maximum Doppler shift ranged from 300 to 1200 Hz. It is seen that the DD-TT outperforms ChannelNet at the high SNR region. Our presented scheme consistently has a better performance advantage than other approaches. This is because our proposed scheme estimates CR by integrating pilot knowledge, data knowledge, and the correlation of adjacent symbol. TS-ChannelNet is competent under high-velocity communication, which is challenging for real vehicular communication.

Fig. 2
figure 2

NMSE with 4QAM modulation, maximum Doppler = 1200 Hz, and 60 OFDM symbols (VTV Expressway Oncoming)

Fig. 3
figure 3

NMSE with 4QAM modulation, maximum Doppler = 300 Hz, and 60 OFDM symbols (RTV Expressway)

Fig. 4
figure 4

NMSE with 4QAM modulation, maximum Doppler = 500 Hz, and 60 OFDM symbols (VTV Urban Canyon)

In Fig. 5, ideal BER is illustrated. Ideal BER is obtained with known of CR without noise. It is seen that the performance of our method is approaching the ideal situation, which means TS-ChannelNet can nearly accurately recover CR. It is obvious that TS-ChannelNet has a better performance as deep fading for vehicular communications become severer. Through the performance under representative vehicular models, we demonstrate our TS-ChannelNet is robust and has a evident performance in terms of BER or NMSE.

Fig. 5
figure 5

BER with 4QAM modulation, maximum Doppler = 500 Hz, and 60 OFDM symbols (VTV Urban Canyon)

To further investigate our proposed method, an ablation analysis for fast time-varying environment is introduced. Due to the fact that Gaussian interpolation (GI) is utilized in ChannelNet, we take GI, DD-TT, and ADD-TT as interpolation methods in the first phase of TS-ChannelNet respectively while SR-ConvLSTM remains. We refer these approaches as GI-(SR-ConvLSTM), DD-TT-(SR-ConvLSTM), and ADD-TT-(SR-ConvLSTM). Besides, ChannelNet is taken as benchmark algorithm.

Figure 6 plots the NMSE of TS-ChannelNet with different interpolation methods under high mobility scenario while ChannelNet is considered as a reference. It is clearly seen that TS-ChannelNet with GI outperforms ChannelNet with GI. It suggests that our proposed SR-ConvLSTM has better capacity to extract temporal spectral correlation of data symbol than NN structure of ChannelNet. It is also observed that the different interpolation methods have effect on the performance of following SR-ConvLSTM. It proves that our proposed ADD-TT outperforms DD-TT, especially at high SNR values.

Fig. 6
figure 6

NMSE for ablation analysis with 4QAM modulation, maximum Doppler = 1200 Hz and 60 OFDM symbols (VTV Expressway Oncoming)

With respect to the compared methods, the improved result of our method in percentage is also presented in Table 3. Under three representative channel models, this percentage is obtained in terms of NMSE with SNR=30 dB. The three representative channel models are RTV Expressway (RTVE), VTV Expressway Oncoming (VTVEO), and VTV Urban Canyon (VTVUC) as mentioned before. It is obvious that our proposed method delivers fairly performance gain. Besides, the gain increases as the maximum Doppler shift grows. It demonstrates our proposed method can track more adequately variations of CFR with respect to the compared methods.

Table 3 The improved percentage of TS-ChannelNet with respect to the compared methods

6 Conclusions

Because CFR in vehicular communications varies highly, it is difficult to track variations of channel. The current DPAS method suffers from error propagation caused by accumulative noise. In this paper, a TS-ChannelNet-based channel estimation method for the fast time-varying scenario using IEEE 802.11p is proposed. In this scheme, CR is taken as images and apply TS-ChannelNet to estimate the CR leveraging pilot. TS-ChannelNet is made up of two phases. Pilot values are first extended to coarse tentative CR via interpolation based on ADD-TT. Note that the estimated CR is divided into sequences that contain n adjacent symbols. Afterwards, the SR-ConvLSTM takes divided CR as input and generates recovered CR. Simulation results demonstrate that our proposed method enables prominent performance over previous schemes under high-sped scenarios. Further experiments verify the two building blocks of TS-ChannelNet have all evident performances in channel estimation accuracy. The proposed TS-ChannelNet sheds light on how DL can be successfully applied for CE under high velocity environments.

In this paper, the NN is trained separately with respect to the correspoding representive environments. Hence, the generalization ability of network needs to be further improved. How to use transfer learning to overcome this problem will be our future work.