1 Introduction

Shear-driven mixing is ubiquitous in many environmental flows and it plays a major role in the transport of mass, momentum, heat and scalar in environmental systems. In the ocean, turbulent mixing by shear is responsible for the transport of heat from the surface to the ocean interior. Therefore, it is important to understand the mixing processes in order to quantify the ocean heat budget. Ocean observations often show mixing events which involve complex shear and density stratification profiles [7, 26, 27]; however, due to the need for applicability over a wide range of flow conditions, the parameterizations of mixing are often deduced from simplified flow models. It is often assumed that the mixing parameterizations are generic and deviations from the simplified flow conditions do not significantly influence the results of mixing models. In the present study, we explore the sensitivity of mixing parameterizations to variabilities in the flow conditions.

In the past few decades, the direction numerical simulation (DNS) method has become an important tool to explore the parameterization of shear mixing. Nearly all DNS studies make use of one of the following three canonical models for shear-driven turbulence: (1) homogenous shear with uniform stratification [5, 8, 17, 24] ; (2) hyperbolic tangent velocity and density profiles [2, 13, 25]; and (3) hyperbolic-tangent velocity profile with uniform stratification [4, 28]. For example, Shih et al. [24] use DNS of homogenous shear with uniform stratification to propose a mixing model that depends on buoyancy Reynolds number. Mashayek et al. [13, 15] use DNS of Kelvin–Helmholtz (K–H) shear instabilities with the hyperbolic tangent velocity and density profiles to propose a different mixing model which also depends on the buoyancy Reynolds number. Recently, VanDine et al. [28] perform DNS of K–H instabilities with hyperbolic tangent velocity profile and uniform stratification to show that the evolution of the shear layer and the amount of turbulent mixing can be significantly different from the results of Mashayek et al. [15]. It is noted that all of these studies use one specific type of velocity and density profile to parameterize the mixing in the complex shear flows of the natural environment. Furthermore, since the evolution of KH shear instabilities at high Reynolds number is sensitive to the numerical setup [9], it is necessary to perform a comparative study to explore whether differences in the background flow conditions, specifically the stratification profiles, can affect the results of mixing parameterizations. In the present study, we aim to explore how the stratification distribution inside shear layers affect the evolution of turbulent mixing. The simulations in the present study include a uniform stratification, a hyperbolic tangent density profile and a composite case of these two profiles.

Previous studies have pointed out the dynamical differences in the evolution of shear layers between the hyperbolic tangent density profile and the contiuous stratification. Pham et al. [20] and Pham and Sarkar [19] compare the evolution of the KH instability between these two cases and show that internal waves can be excited when the ambient has sufficiently strong stratification. However, these simulations are performed at low Reynolds numbers so the evolution of the shear layer does not exhibit the growth of secondary instabilities as pointed out by Mashayek and Peltier [11, 12]. Another notable difference between the two density profiles is the development of a transition layer as the shear layers become turbulent. VanDine et al. [28] denote the transition layers with elevated of shear and stratification at the edges of the shear layer in their simulations of shear layer with uniform stratification. Significant turbulent dissipation and mixing are found in the transition layers. It is unclear whether transition layers can develop in the hyperbolic tangent density profile. Nearly all studies using the hyperbolic tangent density profile take the momentum thickness to delineate the shear layer boundaries [14, 25]. In this study, we investigate the development of the transition layer in turbulent shear layers with various density gradient distributions.

A popular mixing model for shear-driven mixing is from Osborn [18]. In the model, the turbulent thermal diffusivity (\(K_\rho\)) is parameterized as \(\Gamma \varepsilon /N^2\) where \(\Gamma\) is the flux coefficient, \(\varepsilon\) is the dissipation rate and \(N^2\) is the background stratification. Osborn [18] suggests \(\Gamma\) = 0.2 to be the upper-bound value to be used in the parameterization. The model is widely used to estimate thermal diffusivity from ocean measurements. However, DNS of stratified shear layers with various density and velocity profiles all point to considerable variability in the mixing efficiency (E) which is related to the flux coefficient \(\Gamma\) = E/(1-E) [13, 21, 24, 28]. Furthermore, \(\Gamma\) is suggested to vary as a function of buoyancy Reynolds number, Prandtl number, Richardson number and Froude number. Notably, the mixing parameterizations suggested by these previous studies show significant differences from one another. For example, Shih et al. [24] suggest three regimes of mixing while Mashayek et al. [15] suggest only two regimes of mixing although both parameterizations are based on buoyancy Reynolds number. VanDine et al. [28] demonstrate that \(\Gamma\) can have a significantly larger value than what is suggested in Mashayek et al. [15] at the same buoyancy Reynolds number. The difference among these studies can be due to the different stratification profiles being used in the simulations. Here, we aim to further explore the effect of stratification profiles on the turbulent mixing and its parameterization.

In the present study, we perform DNS of stratified shear layers at a high Reynolds number with three density gradient distributions: a uniform stratification; a two-layer hyperbolic tangent profile and finally a composite profile with both the hyperbolic tangent density profile at the center of the shear layer and uniform stratification in the  ambient. The cases have the same initial non-dimensional parameters like the Reynolds number, Prandtl number and bulk Richardson number. Comparative analysis of the simulations will shed light on how the variation in density profiles can influence the physics of turbulent mixing despite having the same initial bulk control parameters. This work is organized as follows. In Sect. 2, we describe the model setup and discuss the relevant non-dimensional parameters with particular attention on the differences in the gradient Richardson profile. Results from linear stability analysis and evolution of the shear layers in the DNS are compared in Sect. 3. Section 4 focus on the quantification of turbulent mixing and its parameterization. We provide discussions and conclusions in Sect. 5 to address the questions raised in this section.

2 Model formulation

In this section, we introduce the model problem of a shear layer with different types of stratification and the numerical implementation of the DNS method employed. The initial profiles of velocity, density and its gradient are presented in detail with particular attention to the important non-dimensional parameters such as the bulk Richardson number and the profiles of gradient Richardson number. It is our objective to explore how turbulent mixing differs with different types of density stratification.

2.1 Distribution of stratification in a shear layer

We consider a shear layer between two streams flowing in opposite directions:

$$\begin{aligned} \langle u^* \rangle (z^*,t=0) = -\frac{\Delta U^*}{2} \mathrm {tanh}\left( \frac{z^*}{0.5\delta ^*}\right) \, , \end{aligned}$$
(1)

where \(\Delta U^*\) denotes the velocity difference across the shear layer, \(\delta ^*\) denotes the half-thickness and the superscript \(*\) denotes a dimensional quantity. When the shear layer is stratified with the two streams having constant density values which differ by \(\Delta \rho ^*\), the density has a hyperbolic tangent profile similar to the velocity profile:

$$\begin{aligned} \langle \rho ^* \rangle (z^*,t=0) = \rho ^*_0 - \frac{\Delta \rho ^*}{2} \mathrm {tanh}\left( \frac{z^*}{0.5\delta ^*}\right) \, , \end{aligned}$$
(2)

where \(\rho ^*_0\) is a reference density value and \(\Delta \rho ^*\) is the density difference across the shear layer (\(-1 \le z/\delta ^* \le 1\)). In contrast, when the shear layer has uniform stratification, the density profile takes the following form:

$$\begin{aligned} \langle \rho ^* \rangle (z^*,t=0) = \rho ^*_0-\frac{\Delta \rho ^*z^*}{2\delta ^*}, \end{aligned}$$
(3)

Here, the density gradient is chosen such that density difference across the shear layer is the same between the two-layer density profile and uniform stratification. In order to contrast the effects of the two stratification profiles above, we construct a third profile by combining the two profiles with the following expression:

$$\begin{aligned} \langle \rho ^* \rangle (z^*,t=0) = \rho ^*_0 - \frac{\Delta \rho ^*}{2} \left[ \frac{Rz^*}{\delta ^*} +(1-R) \mathrm {tanh}\left( \frac{z^*}{0.5\delta ^*}\right) \right] . \, \end{aligned}$$
(4)

Here, the density profile consists of two terms: a linear and a hyperbolic tangent term that are bridged together through the parameter R. The parameter R with values ranging between 0 and 1 is used to vary the density gradient distribution across the shear layers. The density has a two-layer hyperbolic tangent profile when R = 0, and it has uniform gradient when R = 1. For values of R between the two extremes, the density profile is a composite of both profiles.

Using \(\delta ^*\), \(\Delta U^*\), and \(\Delta \rho ^*\) as the characteristic length, velocity and density scales, respectively, we use a direct numerical simulation (DNS) method to solve the three-dimensional Navier-Stokes equations with the Boussinesq approximation for the evolution of the non-dimensional velocity components (uvw), density deviation (\(\tilde{\rho }\)) from the initial background profile (\(\rho _{bg} = \rho (z,t = 0) - \rho _0\)) and dynamic pressure (p) in Cartesian coordinates (x, y, z) as follows:

$$\begin{aligned}&\frac{\partial u_j}{\partial x_j} = 0 , \end{aligned}$$
(5a)
$$\begin{aligned}&\frac{\partial u_i}{\partial t} + \frac{\partial (u_ju_i)}{\partial x_j} = -\frac{\partial p}{\partial x_i} +\frac{1}{Re}\frac{\partial ^2 u_i}{\partial x_j \partial x_j} - Ri_{b,0}\tilde{\rho } g_i , \end{aligned}$$
(5b)
$$\begin{aligned}&\frac{\partial \tilde{\rho }}{\partial t} + \frac{\partial (u_j \tilde{\rho })}{\partial x_j} = \frac{1}{Re\, Pr} \left( \frac{\partial ^2 \tilde{\rho }}{\partial x_j \partial x_j} + \frac{d^2 \rho _{bg}}{dz^2}\right) - w\frac{d \rho _{bg}}{dz} . \end{aligned}$$
(5c)

The nondimensional parameters are initial Reynolds number (Re), initial bulk Richardson number (\(Ri_{b,0}\)), and Prandtl number (Pr) which are defined as follows:

$$\begin{aligned} Re = \frac{\varDelta U^* \delta ^*}{\nu ^*} \, , \quad Ri_{b,0} = \frac{\varDelta \rho ^* g^* \delta ^*}{\rho ^*_0 \varDelta U^{*2}} \, , \quad Pr = \frac{\nu ^*}{\kappa ^*}\, . \end{aligned}$$
(6)

Here, \(\nu ^*\) and \(\kappa ^*\) are the kinematic viscosity and thermal diffusivity, respectively. We set \(Re = 24{,}000\), \(Ri_{b,0} = 0.16\) and \(Pr = 1\) for all simulations in the present study. The Reynolds number is sufficiently large for the secondary instabilities to develop in both the linear and hyperbolic tangent density profiles as shown in Mashayek et al. [13] and VanDine et al. [28], respectively. The initial bulk Richardson number of \(Ri_{b,0} = 0.16\) allows us to compare our results with the two previous studies.

Fig. 1
figure 1

Initial profiles of stratified shear layers with different density gradient distribution parameter (R): a mean squared shear rate (\(S^2\)), b background density (\(\rho _{bg}\)), c stratification (\(N^2\)), and d gradient Richardson number (\(Ri_g\)). The density difference across twice of the shear layer thickness measured by \(\pm I_u\) is similar among the cases yielding the same initial bulk Richardson number \(Ri_{b,0}\). Here, R = 0 corresponds to the two-layer density profile and R = 1 to uniform stratification

Three simulations are performed with the density gradient distribution parameter R equal to 0, 0.5 and 1. The two-layer case with R = 0 has been considered by Mashayek et al. [13], and we perform the simulation again in order to contrast with other cases. The uniform stratification case R = 1 is taken from VanDine et al. [28]. Figure 1 shows the initial profiles of squared shear rate (\(S^2\) where \(S = \partial \langle u\rangle / \partial z\)), background density (\(\rho _{bg}\)), stratification (\(N^2 = -g/\rho _0 \partial \langle \rho \rangle / \partial z\)) and gradient Richardson number (\(Ri_g = N^2/S^2\)) where the angle brackets denote a horizontal average over x-y planes. While the initial bulk Richardson number is the same in the three cases, the minimum gradient Richardson number value (\(Ri_{min}\)) and the shear layer averaged Richardson number (\(\overline{Ri_0} = \overline{N^2}/\overline{S^2}\) where overbar denotes averaging over \(-1 \le z \le 1\)) are different at t = 0 as listed in Table 1. Increasing R values results in the decrease of both \(Ri_{min}\) and \(\overline{Ri}\) at initial time noting the following relationship: \(Ri_{min} = Ri_{b,0}\left( 1-0.5R\right)\). It is the objective of the present study to explore which of the three Richardson numbers is best correlated with the evolution of turbulence in the stratified shear layers.

Table 1 Parameters used in the simulations: density gradient distribution parameter (R), initial bulk Richardson number (\(Ri_{b,0}\)), initial minimum gradient Richardson number \(Ri_{min}\), initial shear layer averaged Richardson number (\(\overline{Ri_0}\)), growth rate (\(\sigma\)) and wavenumber (\(k_0\)) of the fastest growing modes (FGM), number of grid points (\(N_x\), \(N_y\), \(N_z\)), grid spacing in the shear layer (\(\varDelta\)), bulk mixing efficiency (representative of the entire evolution) computed over the momentum thickness (\(E_{Iu}\)) and that computed over the shear layer thickness (\(E_{sl}\))

2.2 Numerical methods

The simulations are performed using the same DNS solver as previously used in Pham et al. [20] and Brucker and Sarkar [1]. Second-order central differences are used for spatial derivatives and a third-order low-storage Runge–Kutta method is used for time advancement in Eq. (5). The dynamic pressure is solved using a multi-grid Poisson solver. In order to set up the computational domain, linear stability analysis (LSA) is performed to identify the fastest growing modes (FGM) in each case. Results from the LSA are discussed in detail in the next section. In all simulations, the domain consists of two wavelengths of the FGM in the streamwise (x) direction and one wavelength in the spanwise (y) direction. The vertical (z) extent spans the region \(-16 \le z \le 16\). The grid size and grid spacing are listed in Table 1. The grid spacing (\(\varDelta x = \varDelta y = \varDelta z = \varDelta\)) is uniform in the middle region of the domain (\(-2.2 \le z \le 2.2\) in the R = 0 and 0.5 cases and \(-2 \le z \le 2\) in the R = 1 case). The vertical grid spacing is mildly stretched outside this region at a maximum rate of 3%. The fine grid resolution in the middle region is sufficient to resolve at least 2.2 Kolmogorov length scale associated with the turbulence developed inside the shear layer. Periodic boundary conditions in the horizontal directions are used for all independent variables. Free slip conditions are used for the velocity components while the pressure and the density deviation satisfy the homogeneous Neumann condition at the top and bottom boundaries. Sponge layers with a thickness of 5 are implemented at the top and bottom boundaries to prevent the reflection of internal waves.

In order to promote the growth of K–H instabilities, velocity perturbations are added to the three velocity components. The perturbations have a broadband energy spectrum given by \(E(k) \propto k^4 \mathrm {exp}\left[ -2 \left( k/k_0\right) ^2\right]\) where the spectrum is set to peak at the wavenumber \(k_0\) of the FGM as listed in Table 1. The fluctuations have the peak r.ms. value of 0.1% \(\Delta U\) at the center of the shear layer and decay with distance away from the center as \(\exp {\left[ -\left( z/\delta \right) ^2\right] }\).

2.3 Statistical analysis

Mean quantities denoted by angle brackets are obtained as x-y horizontal averages and the fluctuations denoted by primes are deviation from the mean. We diagnose the evolution of the turbulent kinetic energy (TKE) to compare the results from different simulations. The TKE budgets are computed using the following evolution equation:

$$\begin{aligned} \frac{\mathrm {D} K}{\mathrm {D} t} = \mathrm {P} - \varepsilon + \mathrm {B} - \frac{\mathrm {\partial } \mathrm {T}_3}{\mathrm {\partial } z} \, , \end{aligned}$$
(7)

with turbulent kinetic energy (\(\mathrm {K}\)), production (\(\mathrm {P}\)), dissipation (\(\varepsilon\)), buoyancy flux (\(\mathrm {B}\)), and transport term (\(\mathrm {T}_3\)) specified as

$$\begin{aligned}&\mathrm {K} = \frac{1}{2} \left( \langle u'\rangle ^2 + \langle v'\rangle ^2 + \langle w'\rangle ^2\right) , \nonumber \\&\mathrm {P} = -\langle u'w' \rangle \frac{\mathrm {d} \langle u \rangle }{ \mathrm {d} z}, \quad \varepsilon = \frac{2}{Re} \langle s_{ij}' s_{ij}' \rangle , \quad s_{ij}' = \frac{1}{2} \left( \frac{\partial u_i'}{\partial x_j} + \frac{\partial u_j'}{\partial x_i} \right) , \nonumber \\&\mathrm {B} = -Ri_{b,0} \langle \rho ' w' \rangle , \quad \mathrm {T}_3 = \frac{1}{2} \langle w' u_i' u_i' \rangle + \frac{1}{\rho _0} \langle w'p' \rangle - \frac{2}{Re} \langle u_i' s_{3i}' \rangle . \end{aligned}$$
(8)

We also compute the scalar dissipation rate \(\chi\) defined as follows:

$$\begin{aligned} \chi = \frac{1}{RePr} \left\langle \frac{\partial \rho '}{\partial x_i} \frac{\partial \rho '}{\partial x_i}\right\rangle \end{aligned}$$
(9)

to quantify the amount of turbulent mixing.

3 Evolution of the stratified shear layer

Previous studies indicate that the stratified shear layer evolves through three stages: (1) the growth of primary K–H instabilities; (2) development of the secondary instabilities that trigger the transition to turbulence; and (3) fully-developed turbulence and its decay until relaminarization of the shear layer. In this section, we explore the role of the density gradient distribution parameter R during each stage of the evolution.

Fig. 2
figure 2

Variability of the growth rate (\(\sigma\)) of the FGM as a function of: a initial bulk Richardson number (\(Ri_{b,0}\)) and density gradient distribution parameter (R); and b wavenumber (k) and density gradient distribution parameter (R) at \(Ri_{b,0}\) = 0.16. The blue markers denote the parameters used in the present simulations while the black and red markers denote the simulations with R = 0 in Mashayek et al. [13] and with R = 1 in VanDine et al. [28], respectively. The magenta line in a marks \(Ri_{min} = 0.25\) while the dashed lines denote curves of constant growth rate

3.1 The development of Kelvin–Helmholtz instability

We perform linear stability analysis to compare the growth rate (\(\sigma\)) of the fastest growing modes (FGM) of K–H instabilities among the cases. The LSA uses a finite-difference method with a grid spacing of \(\Delta z\) = 0.0025 to solve for the FGM as illustrated in Smyth et al. [27]. The growth rate in the R-\(Ri_{b,0}\) parameter space shown in Fig. 2a indicates that the \(Ri_{min} < 0.25\) criterion is required, independent of the R value [3, 6]. The growth rate diminishes when \(Ri_{b,0}\) and R approach the curve of \(Ri_{min} = 0.25\) (shown in magenta). It is important to point out that, for R = 1, the shear layer is still unstable to K–H instabilities as long as \(Ri_{b,0} < 0.5\). The growth rate varies little with R when \(Ri_{b,0}\) is fixed at a small value, i.e. for weakly stratified shear layers. As \(Ri_{b,0}\) increases, the growth rate increases with increasing R value. At a fixed value of \(Ri_{b,0} = 0.16\), the growth rate in the linear case (R = 1) is 50% larger than the value in the hyperbolic tangent case (R = 0) as listed in Table 1. The growth rate in the composite case (R = 0.5) is closer to the value in the R = 1 case indicating the growth rate does not vary linearly with R at a fixed value of \(Ri_{b,0}\). Noting that \(Ri_{min}\) decreases linearly with increasing R when \(Ri_{b,0}\) is fixed, it follows that the growth rate increases as R increases in the present simulations.

Figure 2b reveals that the wavenumber of the FGM increases with increasing R when \(Ri_{b,0} = 0.16\), and thus, the wavelength becomes shorter. The wavelength is approximately 17% longer in the R = 0 case when compared to the R = 1 case. Previous studies have shown that the wavelength becomes shorter when the \(Ri_{b,0}\) (and \(Ri_{min}\)) increases for either the R = 0 or R = 1 density profiles [10, 28]. In the present study, we keep \(Ri_{b,0}\) to be the same among the cases with different R values and the wavelength becomes shorter with increasing R. Thus, the wavelength of the FGM is correlated better with \(Ri_{min}\) rather than \(Ri_{b,0}\). Increasing \(Ri_{min}\) causes the growth rate to become smaller and the wavelength to become shorter for the K–H modes at different R values. We note that the evolution of K–H instabilities at high Reynolds number is sensitive to the streamwise length (\(L_x\)) of the computational domain of the DNS especially when the domain is limited to one or two wavelengths of the most unstable modes. When \(L_x\) is not set up in multiples of the most unstable modes, the mode that actually develops in the simulation has a smaller growth rate and thus it can influence the subsequent nonlinear evolution of the shear layer.

Fig. 3
figure 3

Evolution of the turbulent shear layer is shown using the density (left column), spanwise vorticity (middle column) and dissipation rate (right column) in the vertical plane at \(y = L_y/2\) in the \(R = 0.5\) case during a the development of primary K–H and secondary instabilities; b the transition to 3D turbulence and c the decay of turbulence and the relaminarization of the shear layer

3.2 Nonlinear evolution of Kelvin–Helmholtz instabilities and secondary instabilities

In general, the development of K–H instabilities and subsequent secondary instabilities are similar among the simulated cases. The evolution and type of the secondary instabilities (e.g, secondary shear instability, secondary convective instability, secondary core deformation instability, localized core vortex instability, etc.) have been discussed in detail for the R = 0 (e.g., Mashayek et al. [13]; Salehipour et al. [21]) and R = 1 (e.g., VanDine et al. [28]) cases in many previous studies. For brevity, we refrain from repeating the discussion on the type of secondary instabilities and instead provide Fig. 3 to show the evolution of the shear layer in the R = 0.5 case through snapshots of density, spanwise vorticity and turbulent dissipation on the \(y = L_y/2\) plane. Secondary instabilities realized by density overturns with enhanced localized vorticity and viscous dissipation develop on the upper eyelid of the left billow in Fig. 3a. The secondary instabilities also develop at the right end of the two braids connecting the two K–H billows. The upper and lower eyelids of the K–H billows are subsequently broken down by the secondary instabilities and the resulting turbulent eddies spread into the cores of the billows as well as in the streamwise direction from the billows into the braid regions. The dissipation is most intense during this period as shown in Fig. 3b and the turbulence field becomes more homogenous throughout the shear layer. After the transition to turbulence, Fig. 3c shows that the shear layer continues to grow vertically by entraining fluid from the ambient despite having weaker vorticity and dissipation relative to just after the transition. The dissipation induced by the eddies at the edges of the shear layer in Fig. 3c takes values comparable to and larger than those at the center of the shear layer.

Fig. 4
figure 4

Evolution of a the momentum thickness (\(I_u\)) and b the shear layer thickness (\(L_{sl}\)). The late-time values of \(I_u\) are similar among the three cases; however, the \(L_{sl}\) is slightly smaller in the Ri = 1 case. In all cases, the values of \(L_{sl}\) at late time is at least 40% larger than \(I_u\)

The shear layer grows differently in the three cases as shown in Fig. 4. Here, we compare the thickness of the shear layer using two measures: the momentum thickness (\(I_u=\int _{L_z} (1-4\left<u\right>^2) \mathrm {d}z\,\)) and the shear layer thickness (\(L_{sl}\)) defined as the region between the edges of the shear layer where the stratification \(N^2\) deviates from its initial background value by 5% of \(Ri_{b,0}\). The two measures are shown in Fig. 4a, b, respectively. In all cases, both \(I_u\) and \(L_{sl}\) exhibit two distinct periods of growth: the first due to the enlargement of K–H billows and the second due to turbulent entrainment. Since the growth rate of the FGM is largest in the R = 1 case, the K–H billows grow first as indicated by the growth of \(I_u\) during \(40< t < 65\) in Fig. 4a. Furthermore, the billows grow fastest in this case. The R = 0 case with the smallest growth rate has the K–H billows form at a later time and the billows have the smallest size as indicated by the values of \(I_u\) at t = 90 before the turbulent entrainment period begins. The evolution of \(L_{sl}\) in Fig. 4b also shows a similar trend. It is noted that, in the present study where \(Ri_{b,0}\) is fixed at 0.16 in all cases, the maximum size of the K–H billows is correlated with the growth rate of the FGM and thus the values of \(Ri_{min}\), i.e., the smaller \(Ri_{min}\) produces the larger K–H billows in this study. However, the trend of growth rate reverses during the turbulent entrainment period. The \(I_u\) thickness grows fastest in the R = 0 case because the turbulence entrains fluid from an unstratified ambient. In contrast, since the ambient stratification is strongest in the \(R = 1\) case, the turbulent entrainment rate is smallest. It is interesting that, despite the different growth rates during the two growth periods, the late time values of \(I_u\) are not very different among these cases which have the same \(Ri_{b,0}\). The evolution of \(L_{sl}\) also shows the fastest growth rate in the R = 0 case during the turbulent entrainment period. At late time, \(L_{sl}\) is approximately equal in the R = 0 and 0.5 cases and it is slightly smaller in the R = 1 case. Thus, \(Ri_{b,0}\) is a more reliable indicator for the final values of both measures of the shear layer thickness when compared to \(Ri_{min}\) as the R values vary.

Fig. 5
figure 5

Comparison of the normalized stratification (\(N^2/Ri_{b,0}\)) among the simulated cases: a R = 0; b R = 0.5; and c R = 1. The presence of stratification in the ambient induces an overshoot in the stratification in the transition layer in the two latter cases but not in the R = 0 case. After the transition to turbulence, a layer of enhanced stratification reappears in the center of the shear layer in the R = 0 and 0.5 cases but not in the R = 1 case. The right panels show \(N^2\) profiles during the turbulence decay period at t = 200. In this figure and hereafter, black and magenta dashed lines mark the shear layer boundaries as indicated by the momentum thickness (\(\pm I_u/2\)) and the shear layer thickness (\(L_{sl}\)), respectively

3.3 Fully-developed stratified turbulence and the relaminarization of the shear layer

We observe significant differences among the cases as the shear layer becomes turbulent specifically in terms of the vertical distribution of shear, stratification and gradient Richardson number. Figure 5 illustrates how the evolution of \(N^2\) profiles differs in cases with different R values. The most significant difference is the absence of transition layer in the R = 0 case. For R = 1 density profiles, VanDine et al. [28] note the development of a transition layer (TL), which is defined as a thin layer in which the local \(N^2\) is significantly enhanced when compared to the values inside the shear layer. There is a TL at each edge of the shear layer where \(N^2\) is as large as 2.2 times the background \(N^2\) value in the ambient at \(Ri_{b,0} = 0.08\) in VanDine et al. [28]. In the present study, the TL only develop in the cases with ambient stratification (R = 0.5 and 1) as depicted in Fig. 5b, c. The stratification in the \(R = 0\) case is enhanced at the edges of the shear layer when compared to the unstratified ambient; however, unlike Lewin and Caulfield [10], the local \(N^2\) is not larger than the values inside the shear layer. Thus, a distinctive transition layer does not form in the two-layer stratification. We note that Watanabe and Nagata [29] also observe the peak \(N^2\) to occur near the center of the shear layer at late time rather than in the TL in their DNS of a shear layer with R = 0 as in the present study. They suggest that the absence of the TL in the R = 0 case can be caused by many factors such as the amplitude and spectral content of the broadband initial velocity perturbations as well as the domain size. Regardless to whether the TL form, it is important to delineate the boundaries of the shear layer. The upper and lower black dashed lines in Fig. 5 mark the momentum thickness of the shear shear layer (\(\pm I_u/2\)) while the upper and lower magenta dashed lines mark the edge of the shear layer (\(L_{sl}\)). It is evident that \(\pm I_u/2\) does not capture the TL in the R = 0.5 and 1 cases. There is also significant enhancement of stratification outside \(\pm I_u/2\) in the R = 0 case. Therefore, unlike \(I_u\), \(L_{sl}\) is able to capture reliably the edges of the shear layer outside which the stratification is indeed equal to the ambient stratification.

Fig. 6
figure 6

Comparison of the normalized squared shear (\(S^2/S_0^2\)) among the simulated cases: a R = 0; b R = 0.5; and c R = 1. A layer of enhanced shear develops at the center of the shear layer in the R = 0 and 0.5 cases after the transition to turbulence but it does not appear in the R = 1 case. The shear enhancement near the transition layer becomes more prominent as the external stratification increases (i.e., increasing R value). The right panels show \(S^2\) profile during the turbulence decay period at t = 200

The evolution of the squared shear profiles (\(S^2\)) also shows different layering structures among the cases as shown in Fig. 6. The enhanced shear is well correlated with the enhanced \(N^2\) (Fig. 5) indicating that the transport of momentum and density by turbulent eddies occur together. The enhanced shear is most notable in the TL of the R = 1 case in which the \(S^2\) profile shows a double peak structure at t = 200 after the transition to turbulence.

Fig. 7
figure 7

Comparison of the gradient Richardson number (\(Ri_g\)) among the simulated cases: a R = 0; b R = 0.5; and c R = 1. After the transition to turbulence, the three cases show multiple layers with alternating small and large values of \(Ri_g\). The variability across the shear layer is most profound in the Ri = 0.5 case while the \(Ri_g\) in the R = 1 case is more homogenous

Unlike the profiles of \(N^2\) and \(S^2\) in which the layered structures are most profound in the R = 0 and 1 cases, the evolution of the gradient Richard number (\(Ri_g\)) in Fig. 7 shows the strongest layering in the R = 0.5 case. During the time period between t = 140 and 200, the values of \(Ri_g\) span a wide range from 0.25 to values larger than 1.25 across the shear layer. The central region of the shear layer has low values between 0.25 and 0.5 due the compensation of enhanced stratification by enhanced shear. Immediately above and below this low-\(Ri_g\) region, \(Ri_g\) increases rapidly up to values larger than 1.25. The large values are due to the notably small values of \(S^2\) in these regions as shown in Fig. 6b. Moving outward toward the edges of the shear layer marked by \(\pm I_u/2\), \(Ri_g\) decreases due to the enhanced shear in the transition layer. Among the three cases, the R = 1 case shows the least vertical variability of \(Ri_g\) after the transition to turbulence. At late time (t = 200), the \(Ri_g\) profile in the Ri = 1 case is more homogenous across the shear layer when compared to the other cases. Furthermore, the R = 0 case shows smaller values of \(Ri_g\) at late time on average across the shear layer.

4 Turbulence and the parameterization of mixing

In the previous section, we have illustrated similarities as well as structural differences among the cases with different R values but same \(Ri_{b,0}\). In particular, we find the late-time values of the momentum thickness and shear layer thickness to be similar while the profiles of the \(N^2\), \(S^2\) and \(Ri_g\) show considerable differences after the transition-to-turbulence period. In this section we diagnose the TKE budgets and the rate of mixing and also explore whether there is any difference in regard to mixing parameterization.

Fig. 8
figure 8

Evolution of the integrated TKE budget: a TKE production (P); b buoyancy flux (B); and c TKE dissipation (\(\varepsilon\)). The scalar dissipation (\(\chi\)) is shown in d. The integration is taken in the region \(-5 \le z \le 5\)

4.1 TKE budgets

Figure 8 compares the terms in the integrated TKE budgets. The integration is carried over the region \(-5 \le z \le 5\). The integrated production shows double peaks in all three cases. The first peak corresponds to the development of the K–H billows while the second peak corresponds to turbulent entrainment after the shear layer transitions from K–H billows to turbulence. The first production peak (largest in the R = 1 case) is correlated with the growth rate of the FGM of the K–H instabilities and the growth rate of \(I_u\) during this period. In contrast, the second peak shows the highest value in the R = 0 case and the peak values decrease with increasing R. The smaller peak value during the turbulent entrainment period is due to the presence of the external stratification as R increases. Between the two peaks, the production in the R = 1 case shows a short period with negative values. The negative production is the result of counter-gradient momentum flux during the contraction period as has been discussed in detail by VanDine et al. [28]. The integrated buoyancy flux shown in Fig. 8b also indicates double negative peaks with a positive peak in-between for all three cases similar to the integrated production but with opposite sign. The negative values of the first peak increase in magnitude as the growth rate of the FGM and the value of R increase. For R = 1, the transfer from TKE to turbulent potential energy (TPE), i.e. \(B < 0\), is strong in the pre-turbulent period and weaker in the latter turbulent period. The opposite trend is noted for R = 0.

The integrated TKE dissipation (\(\varepsilon\)) and scalar dissipation (\(\chi\)) in Fig. 8c, d reach their peak values when the shear layer exhibits fully-developed turbulence. The peak values of \(\varepsilon\) and \(\chi\) are largest in the R = 0.5 case while the R = 1 case shows the smallest values unlike the integrated production or buoyancy flux. Thus, while the peak value of the production is correlated with the R value, the peak TKE and scalar dissipation are not correlated with R.

Fig. 9
figure 9

Comparison of the integrated bulk values of a TKE production; b buoyancy flux; c TKE dissipation and d scalar dissipation among the simulated cases. The integration is carried out in space and in time. Results from different domains of spatial integration are shown: in the region \(-5 \le z \le 5\), across the momentum thickness (\(\pm I_u/2\)), and across the shear layer thickness (\(L_{sl}\))

We quantify the total amount of turbulent mixing by computing the bulk evolution of the TKE budget terms over the entire simulation. We integrate the TKE budget terms shown in Fig. 8 in time and compare the results of all cases in Fig. 9. In the figure, we also include the results from the vertical integration over the momentum thickness (\(\pm I_u/2\)), shear layer thickness (\(L_{sl}\)), and integration over \(-5 \le z \le 5\) region. Overall, all terms show a peak value in the R = 0.5 case. The differences when R varies from 0 to 1 are approximately 10-15%. The bulk energetic terms are considerably different when comparing results from the integration over \(\pm I_u/2\) with the integration over \(L_{sl}\) due to a significant amount of turbulence generated outside \(\pm I_u/2\). When comparing the results from the integration over \(L_{sl}\) to that from the integration over \(-5 \le z \le 5\), all of the terms are well captured except for the buoyancy flux in the R = 0.5 and 1 cases. The difference in the buoyancy flux is due to the presence of turbulence-generated internal waves in the regions outside \(L_{sl}\).

Fig. 10
figure 10

Accumulation of TKE and scalar budget terms inside the transition layers: a production; b buoyancy flux; c TKE dissipation; and d scalar dissipation. The terms are plotted in percentage of the amount accumulated inside the region bounded by ± I\(_u\)/2

It is important to highlight the significant amount of TKE present inside the TL. Figure 10 shows the accumulation of TKE and scalar budget terms inside the TL normalized by the corresponding amount inside the regions bounded by \(\pm I_u/2\). In all cases, the dissipation inside the TL is between 20 to 30% of the amount inside the \(\pm I_u/2\) region. Lewin and Caulfield [10] also observed a large amount of dissipation inside the TL for both the R = 0 and 1 flow configurations. According to Fig. 10, the accumulation inside the TL varies with R which suggests the turbulence inside the TL is influenced by the local shear and stratification. We recall that the turbulent entrainment also varies with R as indicated by the the growth of \(L_{sl}\) in Fig. 4b. Therefore, including dynamics at the edges of the shear layer is important to the quantification of the turbulent budgets independent of the type of density profile (i.e, independent of the precise value of R).

Fig. 11
figure 11

Evolution of depth-averaged a stratification (\(\overline{N^2}\)); b squared shear (\(\overline{S^2}\)); c Richardson number (\(\overline{Ri}\)) and d mixing efficiency (\(\overline{E}\)). The quantities are averaged across the shear layer thickness (\(L_{sl}\))

4.2 Mixing efficiency and its parameterization

While the turbulence evolution as indicated by the time evolution of terms in the integrated TKE budgets as well as the profiles of \(N^2(z)\) and \(S^2(z)\) are different in the simulated cases, it remains in question whether the different initial density profiles can affect the mixing efficiency and its parameterization. The mixing efficiency is a measure of the rate of mixing wherein the change in background potential energy is compared to the total change in both the background potential energy and kinetic energy. The processes of computing the available and background potential energy require sorting of the 3D density fields. Instead, we follow the method of Scotti and White [23] to quantify the mixing efficiency \(\overline{E} = \overline{\varepsilon }_\rho / (\overline{\varepsilon } + \overline{\varepsilon }_\rho )\) where the shear-layer-averaged values of the TKE dissipation and TPE dissipation are defined as follows:

$$\begin{aligned} \overline{\varepsilon } = \frac{1}{L_{sl}} \int _{L_{sl}} \varepsilon \, dz \quad \quad \text {and} \quad \quad \overline{\varepsilon }_\rho = \frac{g^2}{\rho _0^2 \overline{N^2}}\frac{1}{L_{sl}} \int _{L_{sl}} \chi \, dz. \end{aligned}$$
(10)

Here, the overbar indicates vertical averaging over the region marked by \(L_{sl}\) and \(\overline{N^2}\) is the stratification averaged over the shear layer. It is noted that Vandine et al. (2021) use the constant initial stratification across the shear layer to compute \(\overline{\varepsilon }_\rho\) instead of the time-varying \(\overline{N^2}\) for their R = 1 simulations. Such an approximation cannot be used in the R = 0 and 0.5 cases in the present study since the density gradient is non-uniform and its value changes considerably in time as shown in Fig. 11a. The value of \(\overline{N^2}\) decreases approximately by a factor of 3 in the R = 0 case while the value varies little in the R = 1 case. The reason for the little variation of \(\overline{N^2}\) despite having significant turbulence and reduction of \(N^2\) from its initial value in the core of the shear layer in the R = 1 case is due to the large values of stratification in the TL. Averaging the stratification over \(\pm I_u/2\), which excludes the TL, shows significant reduction in \(\overline{N^2}\) (not shown). In the R = 0 and 0.5 cases, the stratification at the edges of the shear layer is relatively smaller than at the central region of the shear layer so that the contribution from the transition layer is less significant. Due to the decrease of \(\overline{N^2}\) over time in the R = 0 and 0.5 cases, we will use the time-varying \(\overline{N^2}\) to compute \(\overline{\varepsilon }_\rho\) instead of the constant value at the initial time \(\overline{N_0^2}\). Lewin and Caulfield [10] show that the use of Eq. (10) with \(\overline{N^2}\) in the denominator leads to a satisfactory agreement with mixing rate using density sorting.

The shear-layer average of the squared shear rate (\(\overline{S^2}\)) shown in Fig. 11b exhibits less difference among the cases than \(\overline{N^2}\). The three cases show nearly equal values of \(\overline{S^2}\) at late time. The different values of \(\overline{N^2}\) among the cases result in a wide difference in the late-time value of the shear layer averaged Richardson number (\(\overline{Ri} = \overline{N^2}/\overline{S^2}\)) as shown in Fig. 11c. The late-time value is approximately equal to 0.6 in the R = 0 case while it is nearly equal to 1 in the R = 0.5 and 1 cases. The late-time values of \(\overline{Ri}\) in the R = 0 case agree with the result in Mashayek et al. [13] while the larger values in the R = 0.5 and 1 cases fall in line with the results of VanDine et al. [28]. The addition of uniform stratification into the background causes the late-time values of \(\overline{Ri}\) to increase. When we average the shear and stratification only over \(\pm I_u/2\), the late-time values of \(\overline{Ri}\) are approximately equal to 0.85 and 0.8 in the R = 0.5 and 1 cases, respectively (not shown).

The mixing efficiency \(\overline{E}(t)\) shown in Fig. 11d exhibits some differences among the cases. During the development of the K–H billows, \(\overline{E}\) increases similarly to reach a comparable peak value of approximately 0.7 in all cases. After that, \(\overline{E}\) decreases rapidly during the transition-to-turbulence period. During the turbulence decay period, \(\overline{E}\) gradually decreases toward the value of 0.2. We note that the difference in \(\overline{E}\) among the cases is smaller when we perform the integration over \(\pm I_u/2\) (not shown).

The values of bulk mixing efficiency (\(E_{sl}\) and \(E_{Iu}\)) are listed in Table 1. Here, \(E_{sl}\) is obtained as follows:

$$\begin{aligned} E_{sl} = \frac{\int \overline{\varepsilon }_\rho \, dt }{\int \overline{\varepsilon } + \overline{\varepsilon }_\rho \, dt}, \end{aligned}$$
(11)

where \(\overline{\varepsilon }\) and \(\overline{\varepsilon }_\rho\) are the shear-layer-averaged values as defined in Eq. (10) and the time integration is over the entire evolution of the shear layer. For comparison, we also compute the bulk mixing efficiency (\(E_{Iu}\)) using the depth averaging over \(\pm I_u/2\), i.e. replace \(L_{sl}\) with \(\pm I_u/2\) and take the overbars in Eqs. (10) and (11) to be the average over \(\pm I_u/2\) instead of \(L_{sl}\). The values of \(E_{Iu}\) are approximately equal to 0.36 in all the cases. However, the values of \(E_{sl}\) decrease with increasing R. The R = 0 case has the highest value of 0.42 and the R = 1 case has the lowest value of 0.31. Increasing R from 0 to 1 reduces the bulk mixing efficiency by nearly 25%.

Fig. 12
figure 12

ac Dependence of flux coefficient (\(\Gamma\)) on ac depth-averaged buoyancy Reynolds number (\(\overline{Re}_b\)) and df depth-averaged Richardson number (\(\overline{Ri}\)) during the turbulence decay period in the three simulated cases. Quantities that are averaged across the momentum thickness (\(\pm I_u/2\)) are shown by blue markers while those averaged across the shear layer thickness (\(L_{sl}\)) are shown by red markers. Dashed black lines in panels (a-c) denotes the \(\Gamma = 0.2\) parameterization in Osborn [18]. Solid black lines denotes the upper bound of the parameterization suggested by Mashayek et al. [15]

An important application of mixing efficiency is its wide use to estimate turbulent thermal diffusivity (\(K_\rho\)) in oceanic flows. From the mixing efficiency, one can use Osborn model to estimate \(K_\rho\) from the relationship: \(K_\rho\) = \(\Gamma \varepsilon / N^2\) where \(\Gamma\) = E/(1-E) is the flux coefficient. The typically used value for \(\Gamma\) is 0.2 which is the upper bound value suggested by Osborn [18]. Results from numerous DNS studies indicate that \(\Gamma\) should not be taken as a constant. Shih et al. [24] use DNS of homogenous stratified shear turbulence to suggest that \(\Gamma\) should vary with buoyancy Reynolds number (\(\overline{Re}_b = \overline{\varepsilon} /\nu \overline{N^2}\)) in three different regimes: \(\Gamma\) = \(2 \overline{Re}_b^{-1/2}\) in an energetic regime (\(\overline{Re}_b > 100\)), \(\Gamma\) = 0.2 in an intermediate regime (\(7 \le \overline{Re}_b \le 100\)) and a diffusive regime (\(\overline{Re}_b < 7\)) in which \(K_\rho\) is equal to the molecular value. More recently, Mashayek et al. [15] use DNS of stratified shear layers with R = 0 to suggest two regimes: a buoyancy-dominated regime in which \(\Gamma\) \(\propto \overline{Re}_b^{1/2}\) and a shear-dominated regime where \(\Gamma\) \(\propto \overline{Re}_b^{-1/2}\) and the two regimes are separated by a transitional value of \(\overline{Re}_b\) ranging between 100 and 300. However, VanDine et al. [28] simulate stratified shear layers with R = 1 and show that the \(\overline{Re}_b\)-dependence is different from what has been suggested by Mashayek et al. [15]. They conclude that \(\overline{Re}_b\)-dependence of \(\Gamma\) is more aligned with what has been suggested by Shih et al. [24].

Figure 12a–c show the relationship between \(\Gamma\) and \(\overline{Re}_b\) in the three cases with different R values in the present study. We note that the figure only shows the time period after the turbulence in the shear layer has fully developed (i.e, after the integrated dissipation in Fig. 8c has peaked). Results from averaging \(\Gamma\) and \(\overline{Re}_b\) over \(\pm I_u/2\) and \(L_{sl}\) are both included in the figure for comparison. The \(\Gamma\)-\(\overline{Re}_b\) relationship using averaging over \(L_{sl}\) in the R = 1 case exhibits similar trends as in Shih et al. [24] particularly in the following aspects. First, \(\Gamma\) decreases with increasing \(\overline{Re}_b\) when \(\overline{Re}_b\) exceeds values of approximately 100. Second, \(\Gamma\) remains relatively constant for values of \(\overline{Re}_b\) between 10 and 100. Finally, \(\Gamma\) gradually decreases when \(\overline{Re}_b\) decreases below 10. In the R = 0 and 0.5 cases, the \(\Gamma\)-\(\overline{Re}_b\) relationship indicates good agreement with Mashayek et al. [15] for the range of \(\overline{Re}_b\) between 100 and 300. However, \(\Gamma\) decreases at a considerably slower rate than what has been suggested by Mashayek et al. [15] when \(\overline{Re}_b\) decreases to less than 100. The difference between our result in the R = 0 case and what has been suggested by Mashayek et al. [15] is possibly due to the different methods employed to compute \(\Gamma\). Mashayek et al. [15] perform 3D sorting of the density field to compute the change in background potential energy while we use the approximation of potential energy mixing by \(\overline{\varepsilon }_\rho\) [23]. Changing the averaging method from \(L_{sl}\) to \(\pm I_u/2\) does not affect \(\Gamma\)-\(\overline{Re}_b\) relationship in the \(\overline{Re}_b < 100\) range as shown in Fig. 12a–c. Figure 12d, e show the \(\Gamma\)-\(\overline{Ri}\) relationship during the fully-developed turbulence period. In the Ri = 0 case, \(\Gamma\) decreases monotonically at a constant value of \(\overline{Ri} \approx 0.6\), slightly larger than the critical value of 0.5 that Salehipour et al. [22] suggest to shut off turbulent mixing. In the R = 0.5 and 1 cases, \(\Gamma\) remains significantly large even when \(\overline{Ri}\) approaches near unity. The simulations in VanDine et al. [28] show the late-time values of \(\overline{Ri}\) to be much larger than unity.

Fig. 13
figure 13

ac Evolution of Ozmidov and Ellison length scales and df the parameterization of flux coefficient (\(\Gamma\)) using the \(R_{OE}\) ratio between the two length scales. Each panel includes results for both methods of averaging, i.e., across \(L_{sl}\) and \(\pm I_u/2\). Arrows in panel d indicate time progression through three regimes in \(\Gamma\)-\(R_{OE}\) relationship. The dashed lines in df denote a power law fit using the values of \(\Gamma\) and \(R_{OE}\) that are averaged over \(L_{sl}\). The vertical dashed lines in panels ac mark the second regime during which \(R_{OE}\) increases as \(\Gamma\) decreases

Beside the \(\overline{Re}_b\) dependence, Smyth et al. (2001) suggest that \(\Gamma\) can be used to parameterize the thermal diffusivity using the ratio (\(R_{OT} = L_O/L_T\)) between the Ozmidov scale (\(L_O = (\overline{\varepsilon }/\overline{N^2}^{3/2})^{1/2}\)) and the Thorpe scale (\(L_T\)). The motivation behind the parameterization is to relate \(\Gamma\) to the relative evolution of two length scales: one representative of the energy-containing scales like that of the K–H billows and one in the inertial subrange of the TKE spectrum like the turbulent eddies during the transition-to-turbulence period. In ocean applications, the Thorpe scale is taken to be the length scale of large-eddy overturns while the Ozmidov scale provides a good estimate of the inertial subrange. From the scaling argument, if we take \(K_\rho \propto \varDelta U L_T\), then the Osborn model yields:

$$\begin{aligned} \Gamma = \frac{K_\rho \overline{N^2}}{\overline{\varepsilon }} \propto \frac{\varDelta U}{\overline{N^2}^{1/2}}\frac{L_T}{L^2_O} \, , \end{aligned}$$
(12)

in which \(\Gamma\) varies as a function of the mixing length scale \(L_O\) and two energy-containing scales \(L_T\) and \(\varDelta U/\overline{N^2}^{1/2}\). Now, if we further assume \(\overline{\varepsilon } \propto \varDelta U^3/L_O\) which implies \(L_O \propto \varDelta U/\overline{N^2}^{1/2}\), then Eq. (12) suggests:

$$\begin{aligned} \Gamma&\propto (L_O/L_T)^{-1} \propto R_{OT}^{-1}. \end{aligned}$$
(13)

In contrast, if we assume \(\overline{\varepsilon } \propto \varDelta U^3/L_T\) which leads to \(L_O \propto (\varDelta U/\overline{N^2}^{1/2})^{3/2} L_T^{-1/2}\), then Eq. (12) indicates:

$$\begin{aligned} \Gamma&\propto (L_O/L_T)^{-4/3} \propto R_{OT}^{-4/3}. \end{aligned}$$
(14)

Recent analysis using observational data by Ijichi and Hibiya [7] indicates \(\Gamma\) \(\propto R_{OT}^{-4/3}\) while Mashayek et al. [16] suggest a parameterization scheme that involves both of the \(R_{OT}^{-1}\) and \(R_{OT}^{-4/3}\) scalings. Using DNS of stratified shear layers at low Reynolds numbers, Smyth et al. [26], however, suggest \(\Gamma\) \(\propto R_{OT}^{-0.63}\). It is of interest to examine which scaling of \(\Gamma\) (equivalently, \(\overline{\varepsilon }\)) prevails in the present study.

We use the Ellison scale \((L_E = \overline{\left<\rho '^2\right>^{1/2}}/ \overline{\partial \left<\rho \right>/\partial z})\) in place of the Thorpe scale to test the scaling arguments. Figure 13a–c illustrate the evolution of \(L_E\) and \(L_O\) in the simulated cases. It is noted that the length scales are averaged over the shear layer and we include both the averaging over \(\pm I_u/2\) and \(L_{sl}\) in the figure. As the K–H billows develop, \(L_E\) increases drastically and then it decreases when the shear layer transitions into turbulence. While the K–H billows grow to different sizes in the cases with different R as indicated by \(I_u\) and \(L_{sl}\), the peak value of \(L_E\) is similar among the cases for the both methods of averaging. It is interesting that despite the differences in stratification profile, in billow size and in integrated buoyancy flux (e.g., Fig. 8b) among the cases with R = 0, 0.5 and 1 during this period, the billows generate a similar value of maximum Ellison scale in these cases. The peak values of \(L_E\) are significantly different between the two averaging methods. When the averaging is done over \(L_{sl}\), the peak value of \(L_E\) is smaller due to contribution of the enhanced stratification in the TL. The Ozmidov scale begins to grow at a later time when the Ellison scale has decreased. The peak value of \(L_O\) is comparable among the cases when the averaging is over \(\pm I_u/2\). When comparing \(L_O\) results between the two methods of averaging, the R = 0 case shows larger values when averaging over \(L_{sl}\) while the other two cases show smaller values. The reason is that there exists significant dissipation but the stratification is relatively weak in the TL in the R = 0 case. The enhanced stratification in the transition layer in R = 0.5 and 1 cases causes \(L_O\) to decrease.

The relationship between \(\Gamma\) and the ratio between the Ozmidov and Ellison length scale (\(R_{OE}\)) is shown in Fig. 13d, e. Unlike Fig. 12 in which the \(\Gamma\)-\(Re_b\) relationship is only shown during the period of the fully-developed turbulence, the entire evolution of the shear layer is included in Fig. 13. The \(\Gamma\)-\(R_{OE}\) relationship in all simulated cases exhibits three evolutionary regimes. During the first regime when the K–H billows develop, \(R_{OE}\) decreases to its minimum value as \(\Gamma\) increases drastically. As \(R_{OE}\) increases during the second regime (marked by the dashed lines in Fig. 13a–c), \(\Gamma\) decreases monotonically. Finally, \(R_{OE}\) and \(\Gamma\) decreases in the third regime at late time. Figure 13d–f show the correlation between \(\Gamma\) and \(R_{OE}\). A power law fit using the values of \(\Gamma\) and \(R_{OE}\) averaged across the \(L_{sl}\) during the second regime results in \(\Gamma\) = 0.46\(R_{OE}^{-0.66}\) in the R = 0 case, \(\Gamma\) = 0.37\(R_{OE}^{-0.68}\) in the R = 0.5 case, and \(\Gamma\) = 0.3\(R_{OE}^{-0.63}\) in the R = 1 case. When the average is across \(\pm I_u/2\), the results are: \(\Gamma\) = 0.32\(R_{OE}^{-0.85}\), \(\Gamma\) = 0.36\(R_{OE}^{-0.79}\), \(\Gamma\) = 0.42\(R_{OE}^{-0.66}\) for R = 0, 0.5 and 1, respectively. Results in Lewin and Caulfield [10] show the scaling of \(\Gamma\) \(\propto R_{OE}^{-1}\) during the early period of the second regime and the scaling of \(\Gamma\) \(\propto R_{OE}^{-4/3}\) during the later period of the second regime. The different scaling observed in the present study perhaps is due to how the K–H billows in Lewin and Caulfield [10] are allowed to develop first in 2D simulations and then extended into 3D simulations when the billows reach its maximum sizes. In the present study, the K–H billows are allowed to develop in 3D simulations since the beginning and the secondary instabilities are observed to develop before the K–H billows grow to its full size. A different transition to turbulence can result in different \(\Gamma\)-\(R_{OE}\) relationship.

Furthermore, if we assume the behavior of \(R_{OE}\) and \(R_{OT}\) is similar during the second regime as suggested by Lewin and Caulfield [10], then the \(\Gamma\)-\(R_{OE}\) relationship in the present study agrees best with the scaling of \(\Gamma\) = 0.33\(R_{OT}^{-0.63}\) found in Smyth et al. [26]. We do not observe the scaling of either \(\Gamma\) \(\propto R_{OT}^{-1}\) or \(R_{OT}^{-4/3}\) as indicated in Ijichi and Hibiya [7], Mashayek et al. [16], and Lewin and Caulfield [10]. It has been hypothesized that the scaling \(\Gamma\) \(\propto R_{OT}^{-0.63}\) in Smyth et al. [26] is possibly due to the effect of a low Reynolds number of 5000 [7]. Here, we observe the same scaling as in Smyth et al. [26] at Re = 24,000.

5 Discussion and conclusions

In the present study, we have used DNS to investigate the evolution of stratified shear layers with different profiles of density gradient at a high Reynolds number. The shear layer has constant initial bulk Richardson number (\(Ri_{b,0}\)); however, the density gradient distribution is varied among the three cases by varying the density gradient distribution parameter R. One case has the two-layer density profile (R = 0) while another case has uniform density gradient throughout the fluid (R = 1). The third case has a composite of the two-layer profile and the linear density gradient profile (R = 0.5). The three cases have the same \(Ri_{b,0}\) but the minimum gradient Richardson number (\(Ri_{min}\)) and the shear-layer-averaged value (\(\overline{Ri_0}\)) vary among the cases. The value of \(Ri_{min} = Ri_{b,0}(1-0.5R)\) decreases with increasing R. We are interested in how the evolution of the shear layer correlates with the different Richardson number parameters.

The evolution of the shear layer in all cases shows the development of K–H shear instabilities and billows, the transition to turbulence by secondary instabilities and subsequent decay of turbulence. Results from linear stability analysis indicate the K–H instabilities grow the fastest in the R = 1 case and the growth rate decreases with decreasing R value (equivalently increasing \(Ri_{min}\)). The evolution of the momentum thickness (\(I_u\)) and shear layer thickness (\(L_{sl}\)) indicate the shear layer evolves through two distinct periods of significant growth. The first period corresponds to the enlargement of the KH billows while the second period corresponds to the turbulent entrainment of fluid from the external ambient. During the first period, the growth rate of \(I_u\) and \(L_{sl}\) increases with increasing R values. The maximum size of the K–H billows is observed in the R = 1 case. After the first period of growth, secondary instabilities occur on the eyelids of the K–H billows and they trigger the shear layer to transition to turbulence. Unlike the first period, the growth rate during the turbulent entrainment period is slower when R has larger values. The reason for the slower growth rate is due to the external stratification in the ambient when R increases. Overall, the momentum thickness and the shear layer thickness are comparable among the cases at late time suggesting \(Ri_{b,0}\) exerts control on these late-time thickness metrics. We note that the shear layer in the present study has a fixed value of \(Ri_{b,0} = 0.16\) in all cases. Whether the insensitvity of the momentum thickness and shear layer thickness at late time to R persists at different values of \(Ri_{b,0}\) requires future study.

After the K–H billows have developed, transition layers with peak values of shear and stratification form at the edges of the shear layer in the R = 0.5 and 1 cases, namely the cases where stratification exists in the ambient. The R = 0 case does not exhibit peak value of shear and stratification at the edges where there exists enhanced turbulence. Significant turbulent mixing is found in the TL defined as the region outside \(\pm I_u/2\) and bounded by \(L_{sl}\) in all cases. The TKE dissipation rate integrated over the TL can be as large as 28% of the amount found inside \(\pm I_u/2\). Therefore, it is important to use \(L_{sl}\) as physical boundaries of the shear layer instead of \(\pm I_u/2\) as was typically used in previous studies.

The bulk mixing efficiency (integrated over space and time) is found to be approximately equal to 0.36 in all cases when only the mixing over \(\pm I_u/2\) is taken into consideration. When the mixing in the TL is taken into account, the bulk mixing efficiency has a peak value of 0.42 in the R = 0 case which agrees with the result of Mashayek et al. [13]. The mixing efficiency decreases to 0.31 in the R = 1 case. While the flux coefficient (\(\Gamma\)) during the period of fully-developed turbulence in the R = 1 case shows three regimes of mixing efficiency based on the \(\overline{Re}_b\) parameterization similar to the study of Shih et al. (2002), \(\Gamma\) in the R = 0 and R = 0.5 cases decreases monotonically with decreasing \(\overline{Re}_b\) when \(\overline{Re}_b < 100\) as suggested by Mashayek et al. [15]. However, we observe a slower decrease with \(\overline{Re}_b\) and thus \(\Gamma\) remains at larger values relative to Mashayek et al. [15]. When we examine the relationship between \(\Gamma\) and the ratio between the Ozmidov and Ellison length scales, we find that \(\Gamma\) \(\propto R_{OE}^{-0.66}\) similar to the scaling of \(\Gamma\) \(\propto R_{OT}^{-0.63}\) found in Smyth et al. [26].

Overall, the density gradient distribution parameter R influences the layering structures in \(N^2\), \(S^2\) and \(Ri_g\) inside the shear layer after the transition to turbulence. The TL (a layer with enhanced stratification and shear) develops at the edges of the shear layer when R \(\ne\) 0. As the shear layer evolves, the shear-layer-averaged stratification (\(\overline{N^2}\)) exhibits a significant reduction when R is small. The change in \(\overline{N^2}\) becomes smaller when R increases. The shear-layer-averaged Richardson number \(\overline{Ri}\) at late time is larger when R \(\ne\) 0. Nonetheless, the shear layer thickness (\(I_u\) and \(L_{sl}\)) at late time is not influenced by the value of R although the details of the shear and stratification profiles are influenced. VanDine et al. [28] investigate the uniform stratification case (R = 1) with \(Ri_{min}\) varying between 0.04 and 0.2, equivalently \(Ri_{b,0}\) between 0.08 and 0.4, and they find the TL develop in all cases. As \(Ri_{b,0}\) increases in that study, both \(I_u\) and \(L_{sl}\) have a smaller value at late time and the net turbulent dissipation is also smaller. All cases in VanDine et al. [28] exhibit TL. The \(\Gamma\)-\(\overline{Re}_b\) relationship discussed here for \(Ri_{b,0}\) = 0.16 and R = 1 are also found for all the different values of \(Ri_{b,0}\) in that study.