1 Introduction

With the growing interest in oceanic exploration, underwater robots, i.e., autonomous underwater vehicles (AUVs), play a vital role in marine research activities, such as ocean pollutant monitoring [1], marine biology exploration [2], and pipeline inspection [3, 4]. To maximize the AUV’s efficiency, reliable navigation information is essential and precise positioning is mandatory for effective navigation and control [5]. A long-baseline system (LBL) is an acoustic positioning technology for AUVs, affording high accuracy and a broad operational spectrum [6, 7]. However, an LBL positioning system is typically affected by biological noise, multipath fading channels, and other environmental noise, reducing the AUV’s distance measurement accuracy in a low signal-to-noise ratio (SNR) and reverberation environment, ultimately affecting the AUV’s positioning accuracy.

The time difference of arrival (TDOA) method is commonly employed to calculate the AUV—hydrophone distance to accomplish AUV localization. The AUV’s position is determined based on the time differences of the sound source signal arriving at different hydrophone receivers, which lowers the time synchronization requirements and avoids errors induced by the time synchronization between the sound source and the hydrophones [8]. However, an AUV often operates in a low SNR and multipath propagation environment, requiring TDOA to increase precision and accuracy. Thus, researchers have proposed several methods to solve these problems. For example, general cross-correlation (GCC) [9, 10] is a generic TDOA estimation method. GCC can be divided into two steps. The first step contains the calculation of two-time series signals. In the second step, the pre-filter is estimated. The pre-filter selection depends on the environment and the signal strength, with the pre-filter’s cross-correlation being sensitive to noise and reverberation [11]. Although GCC can utilize several pre-filter types, e.g., phase transform (PHAT) [12, 13] and maximum likelihood (ML) [14], these pre-filters are estimated by calculating the impulse response function (IRF) of the underwater acoustic system. However, IRF is not constant in an underwater environment, and thus, it is still challenging to estimate the impulse response function in various underwater environments [15]. Furthermore, multipath signals and spurious noise peaks lead to anomalous time delay estimates in the underwater environment [16], affecting GCC’s performance. The difference between the main peak and the fake peak of the calculated values close to the main peak is not apparent when an AUV operates in a low SNR and reverberant environment, further enlarging the error. With the development of multi-information fusion technology, the work of [17,18,19] developed a system architecture including Doppler Velocity Log (DVL), Strapdown Inertial Navigation System (SINS), Global Positioning System (GPS), and LBL. Although these systems achieve accurate positioning, they do not solve the positioning difficulty brought by the multipath transmission of the underwater acoustic signal. Furthermore, such a system is highly complex and costly.

Spurred by the deficiencies of current methods, this paper proposes a new AUV position estimation architecture. The suggested technique involves a hybrid localization algorithm that combines the Chan [20] and Taylor series expansion algorithms [21], where the fast positioning speed of the Chan algorithm generates the initialization values for the Taylor algorithm. Furthermore, regarding the crucial time delay parameter estimation in the hybrid algorithm, we suggest an improved weighing function that combines the ML and empirical mode decomposition (EMD) methods. Specifically, the EMD method estimates the AUV’s signal and noise, which is then input to ML to obtain the pre-filter function. Extensive simulations in MATLAB/Simulink verify the effectiveness of the proposed method.

This paper is organized as follows. Section 2 describes the long-baseline system architecture and explains the hybrid localization algorithm combining the Chan algorithm and the Taylor series expansion algorithm. Section 3 introduces the empirical mode decomposition based on the maximum likelihood method, while Sect. 4 presents the simulation process and discusses the corresponding results. Finally, Sect. 5 concludes this based on theoretical verification and simulation results.

2 Methods

2.1 LBL architecture

For AUV localization, an LBL system includes, for example, a GPS, a base station, and a ground working system (Fig. 1). Such a setup involves at least four floating base stations, where the least-squares solution can be applied to provide the required acoustic information for the location, with the distance to each baseline being 800 m. Each station is equipped with a wireless signal transmitter, hydrophone, and satellite for self-geolocation, whereas the AUV is equipped with an inertial navigation system, depth transducer, and sonar.

Fig. 1
figure 1

Architecture of LBL

In an LBL setup, the AUV initially acquires its position through GPS when floating on the water surface. When underwater, the AUV continuously sends an acoustic signal to the hydrophones. Given that the floating base stations may move due to waves, a GPS geolocates the base stations in real time. Hence, each base station sends the received acoustic signals and positioning information to the ground working station. A positioning algorithm solves the AUV’s location at the ground working system, which is then sent to the primary base station H1 that relays it to the AUV.

2.2 LBL positioning algorithm

An LBL system provides the distance between each base stations and the AUV, as measured by the TDOA positioning system will be described in the next section. The latter comprises one main and N − 1 auxiliary base stations, as illustrated in Fig. 2. Let the location of each base station be \((x_{i } ,y _{i } ,z_{i } )^{{\text{T}}}\), \(i = 1,2, \ldots ,N\), where \(i = 1\) represents the main base station and \(i = 2,3, \ldots ,N\) the auxiliary ones. The target’s spatial position is \((x ,y ,z)^{{\text{T}}}\), \(R_{i }\) represents the distance between the target and the i-th station, \(t_{i}\) refers to the time when the signal reaches the i-th base station from the target, and \(R_{i,1}\) denotes the distance of the difference between the target to the i-th base station and the target to the main station, expressed as:

$$\left\{ {\begin{array}{*{20}l} {R_{i} = \sqrt {(x - x_{i} )^{2} + (y - y_{i } )^{2} + (z - z_{i} )^{2} } } \hfill \\ {R_{i,1} = R_{i} - R_{1} = c(t_{i } - t_{1} )} \hfill \\ \end{array} } \right.$$
(1)

where c denotes the sound velocity.

Fig. 2
figure 2

TDOA positioning model

The TDOA measurement value obtained by solving the hyperbolic equation is used to solve the subsequent algorithm involving the Chan and Taylor algorithms. Chan’s algorithm is a non-recursive hyperbolic equation system with an analytical expression solution. When the TDOA measurement error is too large, the localization accuracy drops sharply [22]. The Taylor series expansion algorithm is a recursive algorithm that achieves high positioning accuracy if some initial estimated information is provided. Otherwise, accurate positioning cannot be achieved [23]. Hence, we propose the Chan–Taylor hybrid algorithm illustrated in Fig. 3, where the Chan algorithm provides the initialization values for the Taylor algorithm.

Fig. 3
figure 3

Flowchart of the proposed Chan–Taylor hybrid algorithm

Once TDOA’s time delay is obtained, we utilize the weighted least squares (WLS) of Chan's algorithm to obtain the initial solution. Then, the first estimated coordinates and TDOA measurement value are re-employed in the WLS for the second time to obtain the improved estimated coordinates as follows:

$$R_{i,1}^{2} + 2R_{i,1} R_{1} = K_{i} - K_{1} - 2x_{i,1} x - 2y_{i,1} y - 2z_{i,1} z$$
(2)

where \(K_{i } = x_{i }^{2} + y_{i }^{2} + z_{i }^{2 } , \;x_{i,1 } = x_{i} - x_{1} ,\;y_{i,1} = y_{i } - y_{1} ,\;{\text{and}}\;z_{i,1} = z_{i} - z_{1}\). Let \({\mathbf{z}}_{{\mathbf{a}}} = \left( {x,y,z,R_{1} } \right)^{{\text{T}}}\) and \(x,y,z,R_{0}\) are linearly independent. Considering the existence of TDOA observation noise, the solution of the least weighted bivariate estimation is given by:

$${\hat{\mathbf{z}}}_{{\varvec{a}}} = ({\varvec{G}}_{{\varvec{a }}}^{{\varvec{T}}} \varvec{\varphi }^{{\varvec{ - 1}}} {\varvec{G}}_{{\varvec{a }}} )^{{\varvec{ - 1}}} {\varvec{G}}_{{\varvec{a }}}^{{\varvec{T}}} \varvec{\varphi }^{{\varvec{ - 1}}} {\varvec{h}}$$
(3)

where \({\varvec{h}} = \frac{1}{2}\left[ {\begin{array}{*{20}c} {K_{2} - K_{1} - R_{2,1}^{2} } \\ {K_{3} - K_{1} - R_{3,1}^{2} } \\ {\begin{array}{*{20}c} \vdots \\ {K_{N} - K_{1} - R_{N,1}^{2} } \\ \end{array} } \\ \end{array} } \right]\), \({\varvec{G}}_{{\varvec{a}}} = \left[ {\begin{array}{*{20}c} {x_{2,1} } & {y_{2,1} } & {\begin{array}{*{20}c} {z_{2,1} } & {R_{2,1} } \\ \end{array} } \\ {x_{3,1} } & {y_{3,1} } & {\begin{array}{*{20}c} {z_{3,1} } & {R_{3,1} } \\ \end{array} } \\ {\begin{array}{*{20}c} \vdots \\ {x_{N,1} } \\ \end{array} } & {\begin{array}{*{20}c} \vdots \\ {y_{N,1} } \\ \end{array} } & {\begin{array}{*{20}c} {\begin{array}{*{20}c} \vdots & \vdots \\ \end{array} } \\ {\begin{array}{*{20}c} {z_{N,1} } & {R_{N,1} } \\ \end{array} } \\ \end{array} } \\ \end{array} } \right]^{ - 1}\), \(\varvec{\varphi } = c^{2} {\varvec{BQB}}\), Q is the covariance matrix of the TDOA measurement value, and \({\varvec{B}} = {\text{diag}}(R_{2}^{0} ,R_{3}^{0} , \ldots ,R_{N}^{0} )\) containing the unknown distance from the radiation source to each base station. When the distance is far, \(R_{i}^{0}\) is close to a constant, and thus, we employ the TDOA covariance matrix Q instead of \(\varvec{\varphi }\) and obtain:

$${\hat{\mathbf{z}}}_{{\varvec{a}}} = ({\varvec{G}}_{{\varvec{a}}}^{{\varvec{T}}} {\varvec{Q}}^{{{\mathbf{ - }}{\varvec{1}}}} {\varvec{G}}_{{\varvec{a}}} ){\varvec{G}}_{{\varvec{a }}}^{{\varvec{T}}} {\varvec{Q}}^{{{\mathbf{ - }}{\varvec{1}}}} {\varvec{h}}$$
(4)

The parameter \({\hat{\mathbf{z}}}_{{\varvec{a}}}\) is obtained from Eq. (4), which is then substituted to Eq. (3) to obtain the solution of \({\varvec{B}}\). The value of \({\hat{\mathbf{z}}}_{{\varvec{a}}}\) in Eq. (4) is obtained when each element of \({\mathbf{z}}_{{\varvec{a}}}\) is independent; in fact, \({\varvec{R}}_{{\mathbf{1}}}\) in \({\mathbf{z}}_{{\varvec{a}}}\) is related to \(x,y,z\). Therefore, to obtain a higher accuracy position estimation, we employ the initial WLS estimation and retain the linear perturbation term in Eq. (3) and then by ignoring the quadratic term error we obtain from the WLS the second estimation results:

$${\mathbf{z^{\prime}}}_{{\varvec{a }}} = \left( {{\varvec{G}}_{{\varvec{a}}}^{{\prime {\varvec{T}}}} {\varvec{B}}^{{\prime {\mathbf{ - }}{\varvec{1}}}} {\varvec{G}}_{{\varvec{a}}} {\varvec{Q}}^{{{\mathbf{ - }}{\varvec{1}}}} {\varvec{G}}_{{\varvec{a}}} {\varvec{B}}^{{\prime {\mathbf{ - }}{\varvec{1}}}} \varvec{G^{\prime}}_{{\varvec{a}}} } \right)^{{ - {\mathbf{1}}}} \left( {{\varvec{G}}_{{\varvec{a}}}^{{\prime {\varvec{T}}}} {\varvec{B}}^{{\prime {\mathbf{ - }}{\varvec{1}}}} {\varvec{G}}_{{\varvec{a}}} {\varvec{Q}}^{{{\mathbf{ - }}{\varvec{1}}}} {\varvec{G}}_{{\varvec{a}}} {\varvec{B}}^{{\prime {\mathbf{ - }}{\varvec{1}}}} } \right)\varvec{h^{\prime}}$$
(5)

where \({\varvec{h}}^{^{\prime}} = \left[ {\begin{array}{*{20}c} {(x^{0} + e_{1} - x_{1} )^{2} } \\ {(y^{0} + e_{2} - y_{1} )^{2} } \\ {\begin{array}{*{20}c} {(z^{0} + e_{3} - z_{1} )^{2} } \\ {(R_{1}^{0} + e_{4} )^{2} } \\ \end{array} } \\ \end{array} } \right]\), \({\varvec{G}}_{{\varvec{a}}}^{^{\prime}} = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} 1 & 0 & 0 \\ \end{array} } \\ {\begin{array}{*{20}c} 0 & 1 & 0 \\ \end{array} } \\ {\begin{array}{*{20}c} {\begin{array}{*{20}c} 0 & 0 & 1 \\ \end{array} } \\ {\begin{array}{*{20}c} 1 & 1 & 1 \\ \end{array} } \\ \end{array} } \\ \end{array} } \right]\),\({\mathbf{z}}_{{\varvec{a}}}^{^{\prime}} = \left[ {\begin{array}{*{20}c} {(x - x_{1} )^{2} } \\ {(y - y_{1} )^{2} } \\ {(z - z_{1} )^{2} } \\ \end{array} } \right]\), and \(e_{1} ,e_{2} ,e_{3} ,e_{4}\) is the error estimate of \({\mathbf{z}}_{{\varvec{a }}}\). Finally, the estimated results of the two WLS are be expressed as:

$${\mathbf{z}}_{{\varvec{p}}} = \pm \sqrt {{\mathbf{z}}_{{\varvec{a }}}^{^{\prime}} } + \left( {\begin{array}{*{20}c} {x_{1} } \\ {y_{1} } \\ {z_{1} } \\ \end{array} } \right)$$
(6)

Then, the initial position \(\left( {x ^{(0)} ,y^{(0)} ,z ^{(0)} } \right)\) is determined based on the target’s prior information. Considering the positioning results of the Chan algorithm as the iterative initial value of the Taylor algorithm, fi function is defined according to Eq. (1):

$$\begin{aligned} f_{i } \left( {x,y,z,x_{i} ,y_{i} ,z_{i} } \right) & = \sqrt {\left( {x - x_{i} } \right)^{2} + \left( {y - y_{i} } \right)^{2} + \left( {z - z_{i} } \right)^{2} } \\ & \quad - \sqrt {(x - x_{1} ) ^{2} + (y - y_{1} )^{2} + (z - z_{1} )^{2} } \\ \end{aligned}$$
(7)

where \(i = 1,2, \ldots ,N\). We expand the objective of Eq. (7) for the \(\left( {x^{(0)} ,y^{(0)} ,z^{(0)} } \right)\) case and according to the Taylor series. Furthermore, we ignore the high-order terms above the quadratic term in the expansion section and define \(\hat{f}_{i} = f_{i} \left( {x^{\left( 0 \right)} ,y^{\left( 0 \right)} ,z^{\left( 0 \right)} ,x_{i} ,y_{i} ,z_{i} } \right)\) as:

$$f_{i} \left( {x^{\left( 0 \right)} ,y^{\left( 0 \right)} ,z^{\left( 0 \right)} ,x_{i} ,y_{i} ,z_{i} } \right) \approx \hat{R}_{i} - \hat{R}_{1} + a_{i,1} \Delta x + a_{i,2} \Delta y + a_{i,3} \Delta z$$
(8)

where\(\hat{R}_{i } = \sqrt {\left( {x^{(0)} - x_{i} } \right)^{2} + \left( {y^{(0)} - y_{i} } \right)^{2} + \left( {z^{(0)} - z_{i} } \right)^{2} }\), \(a_{i1} = \frac{{x^{(0)} - x_{i} }}{{\hat{R}_{i} }} - \frac{{x^{(0)} - x_{1} }}{{\hat{R}_{1} }}\), \(a_{i2} = \frac{{y^{(0)} - y_{i} }}{{\hat{R}_{i} }} - \frac{{y^{(0)} - y_{1} }}{{\hat{R}_{1} }}\), \(a_{i2} = \frac{{z^{(0)} - z_{i} }}{{\hat{R}_{i} }} - \frac{{z^{(0)} - z_{1} }}{{\hat{R}_{1} }}\). Then, Eq. (8) can be rewritten as:

$$\varvec{A\delta } = {\varvec{D}} + {\varvec{\varepsilon}}$$
(9)

where \({{\varvec{\upvarepsilon}}} = \left[ {\begin{array}{*{20}c} {e_{2} } \\ {\begin{array}{*{20}c} {e_{3} } \\ {\begin{array}{*{20}c} \vdots \\ {e_{n} } \\ \end{array} } \\ \end{array} } \\ \end{array} } \right]\), \({\varvec{D}} = \left[ {\begin{array}{*{20}c} {m_{2} - \hat{f}_{2} } \\ {\begin{array}{*{20}c} {m_{3} - \hat{f}_{3} } \\ {\begin{array}{*{20}c} \vdots \\ {m_{n} - \hat{f}_{n} } \\ \end{array} } \\ \end{array} } \\ \end{array} } \right]\), \({\varvec{A}} = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {a_{21} } \\ {\begin{array}{*{20}c} {a_{31} } \\ {\begin{array}{*{20}c} \vdots \\ {a_{n1} } \\ \end{array} } \\ \end{array} } \\ \end{array} } & {\begin{array}{*{20}c} {a_{22} } \\ {\begin{array}{*{20}c} {a_{32} } \\ {\begin{array}{*{20}c} \vdots \\ {a_{n2} } \\ \end{array} } \\ \end{array} } \\ \end{array} } & {\begin{array}{*{20}c} {a_{23} } \\ {\begin{array}{*{20}c} {a_{33} } \\ {\begin{array}{*{20}c} \vdots \\ {a_{n3} } \\ \end{array} } \\ \end{array} } \\ \end{array} } \\ \end{array} } \right]\), \({{\varvec{\updelta}}} = \left[ {\begin{array}{*{20}c} {\Delta x} \\ {\Delta y} \\ {\Delta z} \\ \end{array} } \right]\), \(m_{i} = c(t_{i } - t_{1} )\), and \(e_{i}\) is the measurement error. Considering \({\varvec{\delta}}\) as an unknown variable and supposing that Q is the covariance matrix of \({\varvec{\varepsilon}}\), we adopt the WLS estimation method, and Eq. (9) is rewritten as:

$${{\varvec{\updelta}}} = \left[ {{\varvec{A}}^{{\varvec{T}}} {\varvec{Q}}^{{{\mathbf{ - }}{\varvec{1}}}} {\varvec{A}}} \right]^{{ - {\mathbf{1}}}} {\varvec{A}}^{{\varvec{T}}} {\varvec{Q}}^{{{\mathbf{ - }}{\varvec{1}}}} {\varvec{D}}$$
(10)

The calculated \({\varvec{\delta}}\) value is then compared against the estimated error value \(\eta\). If the condition is not met regarding the estimated error value η, the initial value will be updated, and iteration continues until the requirement is met.

3 EMD–ML method

This method’s objective is first to obtain the initial nonlinear TDOA value provided by the linear equation system of Eq. (1). GCC is one of the most popular methods for estimating time delay and assumes that two identical base stations H1 and H2 are within a reverberant environment. Among these base stations, H1 acquires signal \(x_{1} (t)\) and H2 the signal \(x_{2} (t)\) from a source \(s(t)\), expressed as:

$$\left\{ {\begin{array}{*{20}l} {x_{1} \left( t \right) = s\left( t \right) + w_{1} (t)} \hfill \\ {x_{2} \left( t \right) = \alpha s\left( {t + \tau } \right) + w_{2} (t)} \hfill \\ \end{array} } \right.$$
(11)

where \(w_{1} (t)\) and \(w_{2} (t)\) are Gaussian noise uncorrelated with \(s\left( t \right)\), \(\tau\) represents the delay, and \(\alpha\) is an attenuation coefficient.

The cross-correlation function \(R_{{x_{1} x_{2} }}\) generated between \(x_{1} \left( t \right)\) and \(x_{2} \left( t \right)\) is:

$$R_{{x_{1} x_{2} }} \left( \tau \right) = \mathop \smallint \limits_{ - \infty }^{\infty } \Psi \left( f \right)G_{{x_{1} x_{2} }} (f)e^{j2\pi f\tau } {\text{d}}f$$
(12)

The generalized cross-correlation function \(R_{{x_{1} x_{2} }}\) is calculated by applying in Eq. (12) the cross-correlation spectra and suitable pre-filter values. The maximum value of \(R_{{x_{1} x_{2} }}\) provides information about the time delay \(\tau\):

$$D = \arg \max \left[ {R_{{x_{1} x_{2} }} \left( \tau \right)} \right]$$
(13)

where \(\Psi \left( f \right)\) is a pre-filter, and \(G_{{x_{1} x_{2} }} (f)\) represents the cross-spectral density. The frequency domain of the cross-correlation represents the cross-spectral density. Selecting pre-filter \(\Psi \left( f \right)\) depends on the signal’s propagation environment and SNR, with the literature proposing several pre-filters, i.e., general cross-correlation (GCC), phase transform (PHAT), and maximum likelihood (ML) method. This section proposes calculating the delay time \(\tau\) by combining empirical mode decomposition (EMD) and maximum likelihood (ML).

3.1 Maximum likelihood method

Hannan and Thomson [24] developed the coherence-based maximum likelihood pre-filter defined as:

$$\Psi \left( f \right) = \frac{{\left| {\gamma_{12} (f)} \right|^{2} }}{{\left| {G_{{x_{1} x_{2} }} (f)} \right|\left[ {1 - \left| {\gamma_{12} (f)} \right|^{2} } \right]}}$$
(14)

where

$$\left| {\gamma_{12} (f)} \right|^{2} = \frac{{\left| {G_{{x_{1} x_{2} }} (f)} \right|^{2} }}{{G_{{x_{1} x_{1} }} (f) \cdot G_{{x_{2} x_{2} }} (f)}}$$
(15)

Furthermore, Stephane and Benoit [25] proposed the power spectral-based maximum likelihood pre-filter appropriate for the reverberation environment.

$$\Psi \left( f \right) = \frac{S(f)}{{W_{1} (f)W_{2} (f)}} \cdot \frac{1}{{\left[ {1 + \frac{S(f)}{{W_{1} (f)}} + \frac{S(f)}{{W_{2} (f)}}} \right]}}$$
(16)

where \(S(f)\), \(W_{1} (f)\) and \(W_{2} (f)\) represent the power spectral densities of \(s(t)\), \(w_{1} \left( t \right)\) and \(w_{2} \left( t \right)\), respectively. The signal \(s(t)\) can be estimated from \(x_{1} (t)\) and \(x_{2} (t)\) as follows:

$$\left\{ {\begin{array}{*{20}c} {x_{1} \left( t \right) = \left[ {I_{1} *s} \right]\left( t \right) + w_{1} \left( t \right)} \\ {x_{2} \left( t \right) = \left[ {I_{2} *s} \right]\left( t \right) + w_{2} \left( t \right)} \\ \end{array} } \right.$$
(17)

where \(*\) represents the convolution operation, and \(I_{1}\) and \(I_{2}\) are the impulse responses of base stations H1 and H2, respectively. Once the impulse response for each hydrophone has been calculated, the required signal generated by sensor \(s(t)\), and noise \(w_{1} \left( t \right)\) and \(w_{2} \left( t \right)\) is estimated utilizing a deconvolution operation.

3.2 Empirical mode decomposition method

Conventional signal processing techniques include time-domain and frequency-domain analysis. Most of them depend on the Fourier transform because they assume that the process of generating a signal is stationary and linear. Hence, Huang et al. [26] propose a time–frequency analysis method, named empirical mode decomposition (EMD), appropriate for nonlinear and non-stationary signal generation scenarios.

EMD is employed to extract the required information from noisy signals, with the corresponding EMD signal processing flow presented in Fig. 4, where \(x(t)\) represents a low SNR signal. The intrinsic mode function (IMF) conditions include the two conditions: the number of crossing zeros and the number of extreme points equal or at most one. The difference throughout the time course and the mean value of the upper envelope defined by the local maxima and the lower envelope defined by the local minima is zero at any point on the signal, i.e., the signal is symmetric about the time axis. During this process, all the extreme points on \(x(t)\) are determined and connected with a cubic spline curve to form the upper and the lower envelopes. The difference between \(x(t)\) and the mean value \(m_{1}\) of the lower and upper envelope curves provides \(H_{1}\):

$$H_{1} = x\left( t \right) - m_{1}$$
(18)
Fig. 4
figure 4

Flowchart of EMD

\(H_{1}\) is regarded as a new \(x\left( t \right)\) and this process repeats until \(H_{i}\) meets IMF conditions and then becomes the original signal (\(C_{1}\)) by filtering out IMF’s first order. Usually, the first stage IMF component \(C_{1}\) contains the highest frequency component of the signal. Then, we separate \(C_{1}\) from \(x\left( t \right)\) to obtain a different signal \(r_{1}\) with the high-frequency components removed:

$$r_{1} = x\left( t \right) - C_{1}$$
(19)

Considering \(r_{1}\) as a new signal, Eq. (18) is reapplied until the residual signal becomes a monotonic function and the IMF component is not screened out.

$$r_{n} = r_{n - 1} - C_{n}$$
(20)

Mathematically, \(x\left( t \right)\) is represented as the sum of the IMF components and a residual term:

$$X\left( t \right) = \mathop \sum \limits_{j = 1}^{n} C_{j} \left( t \right) + r_{n} (t)$$
(21)

where \(r_{n} (t)\) is the residual quantity representing the average signal trend and each IMF component \(C_{j} \left( t \right)\) represents the signal’s components in different frequencies, i.e., the frequency components contained in each frequency band are different. In the same IMF component, the instantaneous frequency at different times also varies and the local time distribution of this different frequency component changes with the signal changes.

3.3 EMD–ML time delay estimation

In the ML method of Stephen and Benoit [25], signal and noise are estimated using deconvolution. By calculating the impulse response function of the acoustic medium, the signal and noise are estimated from the received noisy signal. Moreover, the estimated signal and noise are input to Eq. (16) to obtain the GCC function. However, this method is estimated by calculating the IRF of the underwater acoustic system, and IRF is not constant in an underwater environment. In response to these problems, the EMD–ML method proposed in this study.

The signal (\(s_{1} \left( t \right),s_{2} \left( t \right)\)) and noise (\(w_{1} \left( t \right),w_{2} \left( t \right)\)) are estimated from the noisy signals \(x_{1} (t)\) and \(x_{2} (t)\), as explained in the previous section. The EMD–ML pre-filter is expressed in Eq. (16).

By applying the power spectral densities of the EMD estimated signal \(s_{1} (t)\) and noise \(w_{1} \left( t \right)\) and \(w_{2} \left( t \right)\) in Eq. (16), the EMD–ML pre-filter \(\Psi_{ml} \left( f \right)\) is estimated in Eq. (22).

$$\Psi_{ml} \left( f \right) = \frac{{S_{1} (f)}}{{W_{1} (f)W_{2} (f)}} \cdot \frac{1}{{\left[ {1 + \frac{{S_{1} (f)}}{{W_{1} (f)}} + \frac{{S_{1} (f)}}{{W_{2} (f)}}} \right]}}$$
(22)

where \(S_{1} (f)\), \(W_{1} (f)\), and \(W_{2} (f)\) represent the power spectral densities of \(s_{1} (t)\), \(w_{1} \left( t \right)\) and \(w_{2} \left( t \right),\) respectively. By substituting the pre-filter value in Eq. (22), the generalized cross-correlation function \(R_{{x_{1} x_{2} }} \left( \tau \right)\) is calculated, with its maximum value providing the information about the time delay \(\tau\).

$$R_{{x_{1} x_{2} }} \left( \tau \right) = \mathop \smallint \limits_{ - \infty }^{\infty } \Psi_{ml} \left( f \right)G_{{x_{1} x_{2} }} (f)e^{j2\pi f\tau } {\text{d}}f$$
(23)

The EMD–ML algorithm integrates two processes, it shows that the proposed method does not increase the computational complexity, the pre-filtering effect can be optimized, and the accuracy of delay time estimation can be improved.

4 Results and discussion

4.1 Simulation of AUV working environment

The underwater environment in which the AUV operates is a low SNR environment, where multiple sound wave reflections from the water surface and seabed create several delayed and attenuated versions of the source signal. Each hydrophone receives these in addition to the direct path signal. This perceivable phenomenon, known as reverberation, is illustrated in Fig. 5. Let a hydrophone receive n acoustic signal paths. The unit impulse signal received by each hydrophone is expressed as follows:

$$I_{i} = \mathop \sum \limits_{n = 1}^{N} \alpha_{n} \delta (t - \tau_{n} )$$
(24)

where \(\alpha_{n}\) is the attenuation coefficient of the nth propagation path, \(\tau_{n}\) is the relative time delay of the transmission along the nth propagation path, and \(\alpha_{n}\) is related to the underwater environment. Several empirical equations exist to measure the attenuation coefficient [27]. Since the simulation signal frequency used in this paper is less than 5 kHz, we consider the viscous absorption of pure water in low frequency utilizing the Thorpe attenuation model:

$$\alpha (f) = \frac{{0.109f^{2} }}{{1 + f^{2} }} + \frac{{40.7f^{2} }}{{4100 + f^{2} }} + 3.01 \times 10^{ - 4} f^{2}$$
(25)

where the attenuation \(\alpha (f)\) is measured in dB/km and frequency \(f\) in kHz.

Fig. 5
figure 5

Multipath propagation

Then, the base stations receive a signal \(x_{i} \left( t \right)\) generated from Eq. (17). The simulated sound source signal \(s(t)\) adopts the amplitude-modulated signal:

$$s\left( t \right) = 1000*\sin \left( {2*\pi *4000*t} \right) + 5000*\sin (2*\pi *2000*t)$$
(26)

We suppose that the AUV moves slowly, and therefore, the acoustic channel is a linear time-invariant (LTI) system. According to the sound velocity distribution shown in Fig. 6, we establish the AUV motion model to simulate sound propagation and reception.

Fig. 6
figure 6

Sound velocity distribution curve

4.2 Time delay estimated simulation of motion AUV

For the scenario examined, the three-dimensional trajectory of the AUV is illustrated in Fig. 7, involving four base stations at H1(0,0,0), H2(800,0,0), H3(0,800,0), H4(0,800,800).

Fig. 7
figure 7

The trajectory of the AUV

In this system, noise is simulated through a Gaussian distribution with zero mean and a variance of one. Specifically, we consider white Gaussian noise of 5–30 dB SNR with a 5 dB step size, and EMD extracts the required information from the noise signal.

Since the AUV trajectory is spiral, the signals received from the different base station are similar. Therefore, we compare the time estimation accuracy from H1 and H2 under different SNR levels and evaluate the effectiveness of different methods on the time delay estimation. The corresponding results are presented in Table 1.

Table 1 Time delay estimation for GCC, PHAT, ML, and EMD–ML

Figure 8 highlights that when the SNR is lower than 15 dB, GCC attains a poor average estimated time delay value because GCC does not have any clear peak at the maximum value influenced by noise. Therefore, this method is unsuitable for reverberant scenarios and SNR below 20 dB environment (Figs. 9, 10). The generalized cross-correlation function presents a lower average time delay estimation error than the traditional generalized cross-correlation scheme using the PHAT pre-filter. According to Fig. 11, for SNR = 15 dB, PHAT has a clear peak at the maximum value, i.e., the pre-filter suppresses noise, and therefore, this pre-filter is suitable for reverberant and high noise environments. Figure 12 shows that the generalized cross-correlation function using the Hannan and Thomson ML pre-filter attains a maximum value when SNR is 15 dB. Nevertheless, in Fig. 8, the average estimated time delay error is higher than the GCC pre-filter method. Therefore, this method is more suitable for environments where the SNR is higher than 20 dB.

Fig. 8
figure 8

Average time delay error of different methods

Fig. 9
figure 9

Generalized cross-correlation method (SNR = 15 dB)

Fig. 10
figure 10

RMSE distribution of different methods

Fig. 11
figure 11

PHAT pre-filter method (SNR = 15 dB)

Fig. 12
figure 12

ML pre-filter method (SNR = 15 dB)

Considering the EMD–ML method, the received signal is decomposed into IMFs utilizing an empirical shifting process. The simulated hydrophone’s H1 IMF signals are illustrated in Fig. 13. In the time domain, there is some difficulty in discerning the noise signal present in Fig. 13 IMFs. To separate the noise signal, a Fourier transform processes the IMFs with the corresponding result depicted in Fig. 14. The first two IMFs have a wider frequency distribution and a relatively smaller amplitude characteristic than the other IMFs, revealing the noise signal characteristic that determines the noise signal. By adding the remaining IMFs and the residue, the estimated signal is illustrated in Fig. 15. Finally, the estimated signals and noise are input to the ML pre-filter.

Fig. 13
figure 13

Received IMF signal (SNR = 15 dB)

Fig. 14
figure 14

Spectral of received signal IMFs (SNR = 15 dB)

Fig. 15
figure 15

EMD processing result. A Received signal, B estimated signal by EMD, and C estimated noise by EMD

Figure 16 shows that EMD–ML provides a sharp generalized correlation function peak, which means that this method suppresses noise and the reflected sound sources generated in the reverberant environment.

Fig. 16
figure 16

EMD–ML pre-filter method (SNR = 15 dB)

The results of the proposed EMD–ML method are presented in Fig. 8. When the SNR is lower than 15 dB, it has a lower average time delay estimate error than GCC, ML, and PHAT. Also from Table 1, it is observed that the average time delay estimate error of EMD–ML is reduced by 0.189 s and 0.125 s compared with GCC and ML, respectively, when the SNR is 5 dB. At same condition, the accuracy of the average delay estimate error is reduced by 0.043 s. And when the SNR is lower than 15 dB, EMD has better performance than other methods in the accuracy of time estimation. Furthermore, the RMSE value is more accurate than the other filter affording greater stability, according to Fig. 10. However, according to Fig. 8, when SNR exceeds 15 dB, the average time delay estimation error of the EMD–ML method increases due to the improvement of SNR, the frequency of the combined components in the mixed signal is close, and the information between the decomposed IMF is coupled with each other. IMF1 and IMF2 not only include noise information but also signal information. Modal aliasing makes EMD decomposition results unable to represent the real physical process, resulting in signal and noise not being well separated.

Through the computational time of different filtering algorithms at 100 points, the computational time between them is compared. From Table 2, the computational time of PHAT is the least, and the calculation time of GCC and ML filtering algorithm is similar. The computational time of EMD–ML is 0.3 s more than that of ML filtering algorithm, which is because the EMD–ML filtering algorithm increases the signal and noise extraction steps and has an impact on the computational time. However, compared with the improvement of performance such as accuracy and stability when SNR is lower than 15 dB, the increase in computational time is still within an acceptable range.

Table 2 Computational efficiency of GCC, PHAT, ML, and EMD–ML

The time delay is estimated from the maximum value of the generalized cross-correlation function \(R_{{x_{1} x_{2} }}\), with the EMD–ML pre-filter method achieving a lower estimated time delay than the competitor methods due to modal aliasing in the EMD process. The experimental results verify the effectiveness of the EMD–ML method and demonstrate that it is more suitable for low SNR and reverberant environments.

4.3 Chan–Taylor hybrid positioning algorithm

This section employs the EMD–ML method to estimate the TDOA value of various base stations. Then, the Chan and Chan–Taylor hybrid algorithms are utilized for AUV positioning.

The trajectories acquired by various positioning methods are illustrated in Fig. 17, highlighting that some positioning estimations deviate from the true trajectory when the Chan method is employed. We also apply the Chan–Taylor hybrid positioning algorithm for the same trajectory and find minor deviations, indicating that this positioning algorithm effectively limits Chan’s error. The Chan–Taylor hybrid positioning algorithm affords a trajectory closer to the real one for the entire AUV, showing that the Chan–Taylor hybrid algorithm is more accurate than simply using the Chan method. The corresponding positioning error results are illustrated in Fig. 18.

Fig. 17
figure 17

Trajectory positioning results of the Chan and Chan–Taylor hybrid algorithms

Fig. 18
figure 18

Positioning error of the Chan and Chan–Taylor hybrid algorithm

Due to the singular solution of the Chan algorithm, some localization errors exceed 50 m while the maximum error exceeds 250 m at the beginning of the localization process. In any case, these errors are limited by the Taylor positioning algorithm. In the case where the CHAN algorithm has a stable solution value, the positioning error of the Chan algorithm exceeds that of Chan–Taylor. For the simulation examined here, we used the result of the CHAN algorithm as the initial value of Taylor's algorithm, and the positioning accuracy of the Chan algorithm increases up to 4.34 m, demonstrating that combining the Chan and the Taylor positioning algorithm improves the overall positioning accuracy.

5 Conclusions

The positioning accuracy of traditional LBL is reduced due to the time delay estimation under low SNR and reverberation environment. This study proposes a new LBL positioning method based on EMD–ML. Specifically, we design a fusion scheme combining the empirical mode decomposition (EMD) and the coherence-based maximum likelihood (ML) pre-filter to estimate time delay correctly. Furthermore, we develop the Chan–Taylor hybrid positioning algorithm to improve positioning accuracy. Simulations verify that the proposed EMD–ML algorithm does not require calculating the impulse response function, has a higher time delay estimation accuracy and is more appealing than the traditional methods GCC, ML, and PAHT. Overall, the Chan–Taylor hybrid positioning algorithm affords a higher positioning accuracy in low SNR and reverberation environment.