Introduction

Genetic maps are essential tools with many applications in quantitative and population genetics in humans, animals, and plants. They are for instance essential to study complex traits through QTL mapping (Mackay et al. 2009), to model transmission of identity-by-descent (IBD) segments or to perform linkage-based imputation in genotyped pedigrees (Lander and Green 1987; Abecasis et al. 2002; Druet and Farnir 2011). Genetic distances are also used in models to detect signatures of selection (Grossman et al. 2010) or to characterize inbreeding with hidden Markov models (Leutenegger et al. 2003; Druet and Gautier 2017). They are particularly useful to interpret the length of homozygous-by-descent or IBD segments or to estimate the age of alleles (Albers and McVean 2020). Linkage maps can additionally be used to study the recombination process and the genetic architecture of individual recombination rate (RR). As an example, global recombination rate (GRR) on autosomes has been investigated in several species including cattle (Kong et al. 2008; Chowdhury et al. 2009; Sandor et al. 2012; Ma et al. 2015; Johnston et al. 2016; Kadri et al. 2016; Petit et al. 2017). Several studies showed that individual GRR was heritable in humans, cattle, and sheep (Fledel-Alon et al. 2011; Sandor et al. 2012; Johnston et al. 2016) and controlled by variants explaining a large fraction of the genetic variance (Kong et al. 2014; Johnston et al. 2016; Kadri et al. 2016; Petit et al. 2017). For instance, variants in RNF212 have been found to affect GRR in human (Kong et al. 2008), cattle (Sandor et al. 2012), and sheep (Johnston et al. 2016). Johnston et al. (2016) provided evidence that these effects were trans effects likely to affect RR globally. GRR is thus rather an oligogenic trait in these species (e.g., Stapley et al. 2017). Variants with sex-specific effects on GRR are common in humans (Kong et al. 2014) and have also been reported in sheep (Johnston et al. 2016), whereas variants affecting both male and female GRR were most common in cattle (Kadri et al. 2016). Male and female GRR were found to be highly correlated in cattle (Kadri et al. 2016) and sheep (Johnston et al. 2016), contrary to findings in humans (Fledel-Alon et al. 2011). Hotspot usage has also been found to be heritable and controlled by variants with large effects mainly associated with PRDM9 (e.g., Fledel-Alon et al. 2011; Sandor et al. 2012).

The X chromosome has most often been ignored in these studies due to specific challenges (Khramtsova et al. 2019) including distinct inheritance, lower genotyping accuracy (e.g., lower signal intensity in males), or paucity of SNPs on genotyping arrays (Wise et al. 2013). Yet, recombination of the sex chromosomes may be particularly interesting as it may correspond to a biological process distinct from that on autosomes. For one, gonosomal recombination drastically differs between sexes, being restricted to the small pseudo-autosomal region (PAR) of homology between the X and Y chromosomes in males, whereas females recombine along the entire length of their X chromosomes (Hinch et al. 2014). Accordingly, RR in the PAR was found to be ten times greater in males than in females in humans (Rouyer et al. 1986), mouse (Soriano et al. 1987), sheep (Johnston et al. 2016), and red deer (Johnston et al. 2017), consistent with one obligatory gonosomal crossover (CO) per male meiosis. Moreover, studies in the mouse have concluded that PRDM9 plays a role in CO positioning on autosomes but not in the PAR (Brick et al. 2012). Gonosomes were also shown to pair significantly later during prophase I than the autosomes, which may be related to replication timing (Kauppi et al. 2011).

A new assembly of the bovine genome has been released recently (Rosen et al. 2020), providing a substantial improvement compared with the previous build, particularly for the X chromosome. This represents an opportunity to improve the genetic map of the X chromosome by using updated information on marker ordering. The genetic distances between markers can be estimated with an EM algorithm using coordinates of CO identified in a genealogy (Kong et al. 2010; Druet and Georges 2015; Ma et al. 2015). This approach assumes that all CO have been detected although this might not be true when informativeness is low (e.g., low marker density, parents harboring long homozygous tracks, unphased genotypes). Hidden Markov models (HMM) are also frequently used to estimate genetic distances, notably in the framework of the Lander–Green algorithm (Lander and Green 1987). The likelihood of a map (order and distances) obtained with these models takes into account information of neighboring markers and the variable marker informativeness across families. In addition, genotyping errors or uncertainty in genotype calling (e.g., with low-coverage sequence data) can also be accounted for. For instance, several methods rely on HMM to estimate genetic maps in full-sib families (Rastas et al. 2013; Bilton et al. 2018) or in multi-parental populations (Zheng et al. 2019). We herein implemented a similar approach in LINKPHASE3 (Druet and Georges 2015) and took advantage of the new bovine build to construct sex-specific genetic maps of the X chromosome using more than 100,000 genotyped animals. These new genetic maps were then used to estimate individual variation in recombination rate on the X chromosome (XRR). We finally studied the genetic architecture of this new trait in each sex and determined its correlation with GRR measured on all the autosomes.

Materials and methods

Data

We used genotypes from three dairy cattle populations from France, New Zealand, and the Netherlands that were previously used by Kadri et al. (2016) to study GRR on the autosomes. Animals from France (45,348) and the Netherlands (11,831) were Holstein-Friesian whereas those from New Zealand (58,474) were Holstein-Friesian (24%), Jersey (19%), or crossbred (57%) individuals. For autosomes, 30,127 filtered SNPs were already available whereas for the present study, we selected 853 SNPs common to the BovineSNP50 and the BovineHD genotyping arrays (Illumina) that mapped on the X chromosome in the newly released ARS-UCD1.2 assembly (Rosen et al. 2020) (based on the same Hereford cow as the previous assembly). Following the findings of Van Laere et al. (2008) and Johnson et al. (2019), we placed the pseudo-autosomal boundary (PAB) at position X:133,300,518. We filtered out monomorphic markers, those with call rate <0.90 or with homozygosity <0.98 in males (for X-specific, non-PAR markers only), leaving 744 SNPs mapping to the specific part of the X chromosome and 73 to the PAR. We subsequently erased residual Mendelian inconsistencies (i.e., parent–offspring pairs with opposite homozygote genotypes and heterozygous offspring with both parents homozygous for the same allele). Among the selected genotyped animals, 90,481 sire-offspring and 26,107 dam-offspring pairs were available for CO identification in gametes transmitted by 2958 sires and 11,228 dams.

Estimation of the genetic map and CO identification

Since we observed map errors in the sequence-based physical map for the X chromosome in the former genome assembly, we started by validating the new physical map order as described in Supplementary File 1. We then used LINKPHASE3 (Druet and Georges 2015) to phase individuals and identify CO. We start by describing this phasing process on the autosomes. Genotypes from offspring are first phased using Mendelian segregation rules and linkage observed in half-sibs is subsequently used to reconstruct the phases from the common genotyped parent. These are subsequently improved and corrected with an HMM specific to each sub-pedigree in which the haplotypes inherited by the offspring (one haplotype per offspring) are modeled as a mosaic of the two haplotypes of the parent (representing the two hidden states). The initial state probabilities represent the probability to inherit the paternal or the maternal haplotype at the first position and are equal to 0.5. Transition probabilities \(\tau _{k - 1,k}\left( {i,j} \right)\) represent the probability to inherit paternal or maternal haplotype at marker k conditional on the haplotype inherited at marker k − 1 and depend on the RR between markers k − 1 and k (see below). Finally, the emission probabilities εk(i) associated with hidden state i at marker k are functions of the alleles observed on the parental haplotypes, on the inherited haplotype and of the inheritance vector. CO is then identified where the inheritance patterns of an offspring change from paternal to maternal or vice versa.

We used an EM algorithm to determine the genetic distances between markers that maximizes the likelihood of data under our model. The estimation of the expected number of CO between two markers requires the definition of the forward variables αk(i) that are equal to the probability to inherit parental haplotype i at marker position k conditional on observations at markers 1 to k, as well as the backward variables βk(j) that are equal to the probability to inherit parental haplotype j at marker k’ conditional on observations at markers k’ + 1 to M (where M is the number of markers).

The four possible pairs of states at marker pair (k, k + 1) are {sk = 1, sk + 1 = 1}, {sk = 1, sk + 1 = 2}, {sk = 2, sk + 1 = 1}, and {sk = 2, sk + 1 = 2}, where sk is the hidden state (inherited parental haplotype) at marker k (using 1 and 2 for the paternal and maternal parental haploptypes, respectively). The probability of each pair {i, j} is equal to:

$$\frac{{\alpha _k\left( i \right)\tau _{k,k + 1}\left( {i,j} \right)\varepsilon _{k + 1}\left( j \right)\beta _{k + 1}\left( j \right)}}{{P\left( {O|\lambda } \right)}},$$

where \(\tau _{k,k + 1}\left( {i,j} \right)\) is the transition probability from state i to j between markers k and k + 1 and is equal to (1 − rk,k + 1) and rk,k + 1 when i and j are respectively equal or different, rk,k + 1 being the RR between the two markers. P(O|λ) is the probability of the observations conditional on the parameters of the model (equal to the sum of probabilities for the four possible configurations). The expected number of CO between markers k and k + 1 is equal to the probability of states combination {sk = 1, sk + 1 = 2} and {sk = 2, sk + 1 = 1} in the modeled offspring haplotype. Finally, the RR rk,k + 1 is updated in the maximization step as the sum of expected number of CO across all offspring within a sub-pedigree and across all sub-pedigrees divided by the total number of modeled meioses.

The rules described above need to be modified when modeling the X chromosome. LINKPHASE3 has therefore been previously modified to account for the segregation of the X chromosomes (Murgiano et al. 2016). On the X-specific part, males transmit their maternal chromosome without recombination to their daughters whereas a null chromosome is transmitted to their sons (the paternal chromosome of males is modeled as null). Using this model adapted for the X chromosome, we estimated the sex-specific genetic maps by performing 100 iterations of the EM algorithm described above.

In summary, the genetic map is estimated by finding the RRs for each marker interval that maximizes the likelihood of the segregation patterns modeled in half-sib families.

Estimation of genetic parameters for RR on the X chromosome

We ran LINKPHASE3 with the newly estimated sex-specific map of the X chromosome to identify CO in 90,481 genotyped sire-offspring and 26,107 genotyped dam-offspring pairs. For each pair, the CO counts were associated to gametes transmitted by the parent to its offspring (each offspring allowing to count CO in one gamete of its parent). We conserved for further analysis only parent–offspring pairs previously selected in the study from Kadri et al. (2016). These 114,254 individual CO counts were considered as repeated phenotypes from the 13,576 parents (2839 sires and 10,737 dams), and X-chromosome recombination rates (XRR) in males and females were modeled as distinct traits. We first estimated heritability of XRR in each sex separately with the following univariate model (Kadri et al. 2016):

$${\boldsymbol{y}} = {{\boldsymbol{1}}\mu} + {{{\mathbf{Pc}}}} + {{{\mathbf{Z}}}_{\mathrm{u}}{{{\mathbf{u}}}}} + {{{{\mathbf{Z}}}}_{\mathrm{p}}{{{\mathbf{p}}}}} + {{{\mathbf{e}}}},$$

where y is the vector of CO counts related to XRR (one record per offspring), µ is the mean effect (1 is a vector with all elements equal to 1), P, Zu, and Zp are incidence matrices relating respectively principal components, random polygenic effects and random permanent environment effects to the phenotypes, c is the vector of effects of the first four principal components of genetic variation (using 30,127 autosomal SNPs, see below) fitted to account for breed structure, u is the vector of normally distributed random individual polygenic effects ~N(0,A\(\sigma _g^2\)), where \(\sigma _g^2\) is the additive genetic variance and A is the additive relationship matrix estimated from the pedigree, p is the vector of normally distributed random permanent environment effects ~N(0,I\(\sigma _p^2\)), where \(\sigma _p^2\) is the permanent environment variance, and e is the vector of normally distributed error terms ~N(0,I\(\sigma _e^2\)), where \(\sigma _e^2\) is the residual variance. We applied the same model to individual RR on each of the 29 bovine autosomes (BTA) for comparisons. These GRR phenotypes were available from a previous study and computed using 30,127 SNPs (Kadri et al. 2016). We estimated the genetic correlations of sex-specific XRR with sex-specific GRR estimated on autosomes (also available from Kadri et al. (2016)) with a multivariate linear mixed model (LMM) including the same effects for the four traits. The random polygenic effects had the following covariance structure:

$${\mathrm{var}}\left[ {\begin{array}{*{20}{c}} {{\boldsymbol{u}}_{{\mathrm{XM}}}} \\ {\begin{array}{*{20}{c}} {{\boldsymbol{u}}_{XF}} \\ {{\boldsymbol{u}}_{{\mathrm{GM}}}} \\ {{\boldsymbol{u}}_{{\mathrm{GF}}}} \end{array}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {{\boldsymbol{A}}\sigma _{g_{{\mathrm{XM}}}}^2} & {{\boldsymbol{A}}\sigma _{g_{{\mathrm{XM}}},g_{{\mathrm{XF}}}}} & {\begin{array}{*{20}{c}} {{\boldsymbol{A}}\sigma _{g_{{\mathrm{XM}}},g_{{\mathrm{GM}}}}} & {{\boldsymbol{A}}\sigma _{g_{{\mathrm{XM}}},g_{{\mathrm{GF}}}}} \end{array}} \end{array}} \\ {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {{\boldsymbol{A}}\sigma _{g_{{\mathrm{XM}}},g_{{\mathrm{XF}}}}} & {{\boldsymbol{A}}\sigma _{g_{{\mathrm{XF}}}}^2} & {\begin{array}{*{20}{c}} {{\boldsymbol{A}}\sigma _{g_{{\mathrm{XF}}},g_{{\mathrm{GM}}}}} & {{\boldsymbol{A}}\sigma _{g_{{\mathrm{XF}}},g_{{\mathrm{GF}}}}} \end{array}} \end{array}} \\ {\begin{array}{*{20}{c}} {{\boldsymbol{A}}\sigma _{g_{{\mathrm{XM}}},g_{{\mathrm{GM}}}}} & {{\boldsymbol{A}}\sigma _{g_{{\mathrm{XF}}},g_{{\mathrm{GM}}}}} & {\begin{array}{*{20}{c}} {{\boldsymbol{A}}\sigma _{g_{{\mathrm{GM}}}}^2} & {{\boldsymbol{A}}\sigma _{g_{{\mathrm{GM}}},g_{{\mathrm{GF}}}}} \end{array}} \end{array}} \\ {\begin{array}{*{20}{c}} {{\boldsymbol{A}}\sigma _{g_{{\mathrm{XM}}},g_{{\mathrm{GF}}}}} & {{\boldsymbol{A}}\sigma _{g_{{\mathrm{XF}}},g_{{\mathrm{GF}}}}} & {\begin{array}{*{20}{c}} {{\boldsymbol{A}}\sigma _{g_{{\mathrm{GM}}},g_{{\mathrm{GF}}}}} & {{\boldsymbol{A}}\sigma _{g_{{\mathrm{GF}}}}^2} \end{array}} \end{array}} \end{array}} \end{array}} \right],$$

where σgi,gj is the genetic covariance between trait i and j and where subscripts XM, XF, GM, and GF refer to male XRR, female XRR, male GRR and female GRR, respectively. Since male and female records are measured on different individuals, the covariance structure for the random permanent environment effect was

$${\mathrm{var}}\left[ {\begin{array}{*{20}{c}} {{\boldsymbol{p}}_{{\mathrm{XM}}}} \\ {\begin{array}{*{20}{c}} {{\boldsymbol{p}}_{{\mathrm{XF}}}} \\ {{\boldsymbol{p}}_{{\mathrm{GM}}}} \\ {{\boldsymbol{p}}_{{\mathrm{GF}}}} \end{array}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {{\boldsymbol{I}}\sigma _{p_{{\mathrm{XM}}}}^2} & {{\boldsymbol{I}}\sigma _{p_{{\mathrm{XM}}},p_{{\mathrm{XF}}}}} & {\begin{array}{*{20}{c}} 0 & 0 \end{array}} \end{array}} \\ {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {{\boldsymbol{I}}\sigma _{p_{{\mathrm{XM}}},p_{{\mathrm{XF}}}}} & {{\boldsymbol{I}}\sigma _{p_{{\mathrm{XF}}}}^2} & {\begin{array}{*{20}{c}} 0 & 0 \end{array}} \end{array}} \\ {\begin{array}{*{20}{c}} 0 & 0 & {\begin{array}{*{20}{c}} {{\boldsymbol{I}}\sigma _{p_{{\mathrm{GM}}}}^2} & {{\boldsymbol{I}}\sigma _{p_{{\mathrm{GM}}},p_{{\mathrm{GF}}}}} \end{array}} \end{array}} \\ {\begin{array}{*{20}{c}} 0 & 0 & {\begin{array}{*{20}{c}} {{\boldsymbol{I}}\sigma _{p_{{\mathrm{GM}}},p_{{\mathrm{GF}}}}} & {{\boldsymbol{I}}\sigma _{p_{{\mathrm{GF}}}}^2} \end{array}} \end{array}} \end{array}} \end{array}} \right],$$

where σpi,pj is the covariance between permanent environment effects for traits i and j. Similarly, the covariance structure for the residual error terms was

$${\mathrm{var}}\left[ {\begin{array}{*{20}{c}} {{\boldsymbol{e}}_{{\mathrm{XM}}}} \\ {\begin{array}{*{20}{c}} {{\boldsymbol{e}}_{{\mathrm{XF}}}} \\ {{\boldsymbol{e}}_{{\mathrm{GM}}}} \\ {{\boldsymbol{e}}_{{\mathrm{GF}}}} \end{array}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {{\boldsymbol{I}}\sigma _{e_{{\mathrm{XM}}}}^2} & {{\boldsymbol{I}}\sigma _{e_{{\mathrm{XM}}},e_{{\mathrm{XF}}}}} & {\begin{array}{*{20}{c}} 0 & 0 \end{array}} \end{array}} \\ {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {{\boldsymbol{I}}\sigma _{e_{{\mathrm{XM}}},r_{{\mathrm{XF}}}}} & {{\boldsymbol{I}}\sigma _{e_{{\mathrm{XF}}}}^2} & {\begin{array}{*{20}{c}} 0 & 0 \end{array}} \end{array}} \\ {\begin{array}{*{20}{c}} 0 & 0 & {\begin{array}{*{20}{c}} {{\boldsymbol{I}}\sigma _{e_{{\mathrm{GM}}}}^2} & {{\boldsymbol{I}}\sigma _{e_{{\mathrm{GM}}},e_{{\mathrm{GF}}}}} \end{array}} \end{array}} \\ {\begin{array}{*{20}{c}} 0 & 0 & {\begin{array}{*{20}{c}} {{\boldsymbol{I}}\sigma _{e_{{\mathrm{GM}}},e_{{\mathrm{GF}}}}} & {{\boldsymbol{I}}\sigma _{e_{{\mathrm{GF}}}}^2} \end{array}} \end{array}} \end{array}} \end{array}} \right],$$

where σei,ej is the covariance between residual error terms for trait i and j.

Genetic parameters of univariate and multivariate mixed models were estimated using average information restricted maximum likelihood analysis (AI-REML) implemented in AIREMLF90 (Misztal et al. 2002). Standard deviations of heritabilities and correlations were estimated with AIREMLF90 by repeated sampling of parameter estimates from their asymptotic multivariate normal distribution (Meyer and Houle 2013).

Haplotype-based association study and single variant associations

We performed a haplotype-based GWAS to find variants associated with male or female XRR. Zhang et al. (2012) demonstrated the benefits of such approach in cattle when using 50K SNP arrays. For the autosomes, we used the ancestral haplotype (AHAP) clusters from Kadri et al. (2016) available for 30,127 marker positions (with the number of clusters per position, K, set to 60). In that study, haplotypes were clustered at each marker position based on their similarity following the method of Scheet and Stephens (2006) as described in Druet and Georges (2010). For the X chromosome, the phases estimated by LINKPHASE3 were completed using Beagle 4.1 (Browning and Browning 2007), which uses linkage disequilibrium information while conserving the pre-phasing information obtained from LINKPHASE3. Then, we clustered these haplotypes at each marker position in 60 AHAP using HiddenPHASE (Druet and Georges 2010).

The association between AHAP and XRR was evaluated by the same procedure as in Kadri et al. (2016). At each marker position, two mixed models were compared, one without AHAP effect (H0, no QTL) and one with AHAP fitted as random effects (H1). Both models included a mean, four principal component effects, and a random polygenic effect to account for structure. A likelihood ratio test (LRT) was used to test whether H1 was significant. The genome-wide significance threshold was set at 1.67 × 10−6 after Bonferroni correction for 30,000 independent tests (as an approximation for a total of 30,944 tested SNPs).

In addition to the haplotype-based GWAS, we also tested whether 11 variants associated with GRR and previously identified in Kadri et al. (2016) were significantly affecting XRR. To that end, we used the same LMM and simultaneously fitted the variants as fixed effects. Each variant was fitted by a regression on SNP allelic dosage and the association was tested using a Z-test. For comparisons purposes, we also tested these variants for association with individual RR estimated for each of the 29 BTA separately.

To gain further insights into the correlation between global and X-specific individual RR levels, we estimated the correlations between haplotype effects estimated simultaneously at all marker positions (60 haplotypes per position). We fitted a mixed model with random haplotype effects (h) assumed to be normally distributed with a common variance \(\sigma _h^2\). To that end we took advantage that models in which all haplotypes are fitted as random effects can be transformed into models in which genomic breeding values are estimated as the sum of haplotype effects, g = ZHh (VanRaden 2008; Stranden and Garrick 2009), with ZH being the incidence matrix relating random haplotype effects to the random polygenic effects. In this equivalent model, the haplotypes are used to estimate the genomic relationship matrix (GRM) between the random individual polygenic effects (Stranden and Garrick 2009). We thus first estimated an haplotype-based GRM GH between 15,107 genotyped individuals (including the parents) by using all the AHAPs estimated in Kadri et al. (2016) and the method implemented in GLASCOW described in Zhang et al. (2012). Seventy-eight pairs of individuals with a relationship >0.95, corresponding to monozygotic twins, were merged. We then estimated the genetic correlation between XRR and GRR within each sex with a bivariate model where the additive relationship matrix A was replaced by GH. As a result of the equivalence between the two models, the estimated genetic correlations corresponded also to the genetic correlations between random haplotype effects estimated for XRR and GRR (Karoui et al. 2012; Maier et al. 2015).

Finally, to obtain the estimators from the haplotype effects, we applied the following back-transformation (Stranden and Garrick 2009; Wang et al. 2012): \({\hat{\mathbf h}} = \sigma _g^{ - 2}\sigma _h^2{\mathbf{Z}}_{\mathrm{H}}^\prime {\mathbf{G}}_{\mathrm{H}}^{ - 1}{\hat{\mathbf g}}\), where \(\sigma _g^2\) is the variance of the random individual polygenic effects g.

Results

Estimation of sex-specific genetic distances and identification of COs

With the exception of a 2.55–3.72 Mb long inversion between positions chrX:51911569 and chrX:54460491, the marker order from the ARS-UCD1.2 bovine assembly appeared reliable (see Supplementary File 1) and was used in further analyses. We ran LINKPHASE3 on the entire dataset to estimate genetic distances with the EM procedure described in “Materials and methods”. We tested two marker orders: (1) the unchanged marker order of the ARS-UCD1.2 bovine assembly and (2) the same map order but with the order of the markers located between positions chrX:51911569 to chrX:54460491 inverted (see Supplementary File 1). The likelihood of the genetic map estimated in females with the second order (logL = −415350.5) was much higher than the value obtained with the original order (logL = −417498.3). This corresponds to a highly significant LRT of 2 × 2147.8 achieved thanks to the size of our dataset. The estimated map lengths in females were 122.85 and 121.17 cM for the first and second map, respectively. The modified map was hence 1.68 cM shorter, providing additional support for the new order as suggested by Groenen et al. (2012) for a similar case. Finally, the RRs in the marker intervals flanking the putative order inversion dropped from 2.04 and 1.82 cM to 0.57 and 0.36 cM with the corrected map.

Figure 1a represents the genetic maps in both sexes and the corresponding average RR along the X chromosome (the genetic map is also provided as Supplementary Data). In females, the estimated average RR on the X chromosome was 0.873 cM/Mb. The lengths of the map for the X-specific part and for the PAR were 113.68 and 7.30 cM, respectively, corresponding to 0.86 and 1.29 cM/Mb. In females, RR along the X chromosome was higher at both ends and reduced in the center (Fig. 1c). In particular, we observed two large regions (~7 Mb) and a third smaller one (3 Mb) with almost no recombination. The first of these regions, extending from 38 to 45 Mb and with a RR below 0.01 cM/Mb for most of the 1 Mb windows, corresponds to the centromere. The entire region contained only five SNPs, all of them with a MAF < 0.005. Similarly, the RR remained below 0.05 cM/Mb from 66 to 74 Mb and from 92 to 93 Mb and few informative SNPs were available in these regions (respectively 13 and 6 SNPs, all with MAF < 0.01 except one SNP in the first window with MAF = 0.013). The female RR at the PAB located at 133.300 Mb (Johnson et al. 2019) followed the trend of the surrounding marker intervals (e.g., no marked variation). In males, the estimated genetic length of the map was shorter and equal to 46.44 cM but recombination occurred only in the PAR. This corresponds to 8.17 cM/Mb and represents a sixfold increase (6.3) in RR compared with females. The RR remained high for the entire PAR, yet due to its short size, it was difficult to observe a trend in variation of RR along the PAR (Fig. 1b).

Fig. 1: Sex-specific genetic maps and associated recombination rates along the X chromosome.
figure 1

a Sex-specific genetic maps of the X chromosome; b, c recombination rate per Mb along the X chromosome estimated per sex. Recombination rate was first estimated in 1-Mb successive nonoverlapping bins. The blue and red loess smoothed lines were fit for males and females with a span parameter of 0.5 and 2.5, respectively, using the ggplot2 (v.3.2.1) package in R (v.3.4.4). In males, we represented the rates only for the PAR. The black circle represents the position of the centromere (according to flanking microsatellites and BAC libraries reported in Amaral et al. (2002) and Frohlich et al. (2017)), whereas the vertical line indicates the limit of the PAR.

We estimated genetic maps for all autosomes with the same methodology. As reported by Housworth and Stahl (2003), or by Johnston et al. (2016, 2017), there was a clear linear relationship (adjusted-r² = 0.91 and 0.97 in males and females, respectively) between length of linkage map and physical size of the chromosomes (Fig. 2 and Supplementary Fig. 1 for autosomes only). Chromosome-wide RR was fitted as a multiplicative inverse function (Johnston et al. 2017) consistent with one obligate CO per meiosis (adjusted-r² = 0.78 and 0.82 in males and females, respectively). In females, the RR on the X chromosome and its genetic length were both in range with the relationships estimated for the autosomes. For instance, the X chromosome had similar values to those observed on BTA2 that has a similar physical size. In males, the genetic length of the PAR was higher than expected according to the linear relationship estimated on autosomes. This was however not unique to the PAR since other autosomes presented larger deviations from the fit (e.g., BTA19). The RR estimated for the PAR in males lied also above the trend estimated on autosomes with a multiplicative inverse function (note that the size of the PAR lied outside the range of values for which the trend was fitted).

Fig. 2: Relationship between recombination rate, genetic length, and physical chromosome length on autosomes.
figure 2

For both panels, the curves were fitted per sex and using autosomes only (males in blue, females in red). the X chromosome in females and the PAR in males were subsequently added. a The relationship between recombination rate per Mb and physical length was fitted with a multiplicative inverse function. b The relationship between genetic and physical lengths was fitted with a linear model. The curves were fitted with the stat_smooth function of ggplot2 and shaded areas indicate 95% confidence interval.

The distribution of numbers of CO per gamete on the X chromosome in males and females is represented in Supplementary Fig. 2 (each parent–offspring pair representing one gamete). In females, the distribution ranged from 0 to 7 (after exclusion of one outlier with 19 CO) but with 99.7% having less than four CO (26.4%, 43.1%, 25.5% and 4.6% with 0, 1, 2, and 3 CO, respectively). These values are close to those observed on chromosome 2 (having the same length; see Supplementary Fig. 2). In males, no CO was identified for most individuals (67.6%) and one single CO was identified for 32.2% of the sires. These values are below expectations and suggest that some CO remained undetected in the PAR (see “Discussion”). We used the XOI R package (Broman et al. 2002) to measure the interference levels (ν) associated with this distribution of CO and with the distances between identified CO. The values of the gamma function are compared with those observed on the autosomes in Supplementary Fig. 3. In females, higher interference levels were generally observed for shorter chromosomes in agreement with Broman et al. (2002). Indeed, the mean interference parameters on smaller (BTA16-29) and larger chromosomes (BTA1-15) were respectively equal to 5.5 and 5.1, although the lowest level was observed for BTA-22 (in the first group). The X chromosome presented the second lowest observed value (ν = 3.8). In males, interference levels were rather stable across autosomes (4.9 vs 4.9 for short and large chromosomes) whereas the PAR had the highest interference levels (ν = 7.6), with less double CO observed than expected. Finally, average estimated interference levels were lower in males (4.9) than in females (5.3).

Estimation of genetic parameters

The genetic parameters estimated with the multivariate model are reported in Table 1. The heritability of female XRR was equal to 0.024 (±0.006) and small compared with the value estimated for GRR (0.079). The contrast was even higher in males with heritabilities equal to 0.011 (±0.003) and 0.137 (±0.012) for XRR and GRR, respectively. Larger heritabilities for GRR were expected since this phenotype was obtained from observations on multiple chromosomes (similar to an average of multiple records). Interestingly, heritability for female XRR was higher than the heritability for female RR measured for the other chromosomes individually (Fig. 3). The heritability for male XRR was in the range of values observed for other chromosomes despite the smaller map length. The repeatabilities were equal to 0.178 (±0.006), 0.087 (±0.007), 0.041 (±0.002), and 0.050 (±0.006) for male GRR, female GRR, male XRR and female XRR, respectively.

Table 1 Genetic parameters of male and female X-chromosome recombination rate (XRR) and male and female global recombination rate (GRR) estimated with a multitrait linear mixed model.
Fig. 3: Comparison of heritabilities estimated with univariate models for XRR and RR measured on each autosome in males and females.
figure 3

Error bars represent standard errors (SE). SE were not reported for heritabilities below 0.001.

The estimated genetic correlations between female traits (XRR and GRR) were particularly high and equal to 0.759 (±0.094) suggesting that these traits might be under the control of the same variants. Conversely, the genetic correlations between male traits were much lower (0.168 ± 0.124) with less evidence that these traits might be regulated by the same variants. The relatively high genetic correlation between male and female GRR (0.511 ± 0.078), previously reported by Kadri et al. (2016), suggests that these two traits might be affected by shared variants. The genetic correlation between male and female XRR was moderate with a large standard deviation (0.399 ± 0.192) and indicates that these two traits would share less variants. Importantly, all parameters associated with male XRR presented higher standard errors.

Genome-wide association study and correlation between SNP or haplotype effects across traits

None of the tested positions in the haplotype-based GWAS for male or female XRR reached the significance threshold (Supplementary Fig. 4). One hypothesis to explain this lack of significant association is the lower power of detection when working with RR on a single chromosome compared with GRR. In addition to these GWAS approaches, we also tested whether eleven variants previously found to be significantly associated with male GRR, female GRR, or both (Kadri et al. 2016) had also significant effects on male or female XRR (Supplementary File 2). The significance threshold was set at 4.55 × 103 after Bonferroni correction for 11 independent tests. With the exception of the rs110203897 marker in the proximity of PRDM9, these variants were absent from the initial dataset and imputed (Kadri et al. 2016). Most of these variants were significant for male or female GRR (9 and 7, respectively). In contrast, none of the eleven candidate variants was associated with male XRR after correction for eleven independent tests (Supplementary File 2). As a matter of comparison, 5.5 significant associations were found on average for these eleven variants and male RR measured for single autosomes (Supplementary File 2), suggesting that the lack of significant association with male XRR was not due to a low number of records. Only one variant was significantly associated with female XRR (and three additional had uncorrected p values < 0.05). Association with female RR on other single autosomes resulted in 0–4 significant tests per trait (2.1 on average). We also compared the effects of these eleven variants on XRR and GRR estimated in the same sex with a univariate model (e.g., independent analyses) and found a high correlation (0.86) in females whereas a moderate value (0.30) was observed in males (Fig. 4). The high correlations between the effects on female XRR and GRR suggests that all variants might be shared although some were not significantly associated with female XRR.

Fig. 4: Comparison of estimated effects on global and X-chromosome recombination rates for a set of eleven previously identified variants (Kadri et al. 2016).
figure 4

Comparisons were realized for male (a) and female (b) recombination. The error bars represent the standard errors from the estimates.

We finally estimated correlations between 1,807,620 random haplotype effects fitted simultaneously at all marker positions (60 AHAPs fitted at each marker position) as described in methods. When estimated with a REML procedure, the correlations between random haplotype effects on XRR and GRR were equal to 0.22 and 0.66 in males and females, respectively, consistent with the pedigree-based genetic correlations. To further illustrate this correlation, haplotype-effects estimates were obtained by transformation of the breeding values (see “Materials and methods”). The correlation between these estimated haplotypes effects on female XRR and GRR was equal to 0.90 (Supplementary Fig. 5).

Discussion

Thanks to a large-genotyped pedigree containing more than 100,000 meioses, we were able to validate the map order and to estimate the genetic map on the X chromosome in cattle. For the map order, we took advantage of highly informative sub-pedigrees in which dams were phased and the haplotypes they transmitted to their sons were known without errors (see Supplementary File 1 for more details). In contrast with the previous build, there was no evidence for errors in the new bovine assembly on the X chromosome with the exception of an inversion in the middle of the chromosome. The identified order inversion might result from a true inversion segregating in the population but we did not find clear evidence based on local principal components analysis (Ma and Amos 2012) or on inspection of whole-genome sequence data. The corrected map was more likely in both pure Holstein and Jersey populations, suggesting that the inversion is not a segregating variant. The Hereford cow used to realize the bovine assembly might also carry an inversion when compared with the breeds used in the present study or the order inversion might result from an assembly error. Note also that we searched only for errors associated with markers mapped at relative large distances from their correct position (more than 1 cM). It is indeed more difficult to validate errors at shorter distances because there are no or very few CO between such markers. In addition, the informative meioses are distinct for each pair of markers making comparisons more challenging. Noise associated with genotyping errors is also more problematic at short distances. Consequently, small map errors might still be present.

We relied on a HMM to estimate the genetic distances between markers. Such an approach presents the advantage to account for informativeness and for potential genotyping errors. For instance, relying as previously only on the identified CO (Ma et al. 2015; Kadri et al. 2016) resulted in an 8.5% shorter map in females. Similarly, the map was inflated by 1.6% when genotyping errors were ignored. LINKPHASE3 extracts the information from different family structures. In a first step, Mendelian segregation rules are used to phase parent–offspring pairs whereas linkage information is used in a second step, within half-sibs families. Single parent–offspring pairs are used if genotypes from the parents were phased in the first step (i.e., if one of the grandparents was also genotyped). As a result, reliable genetic maps were estimated by modeling 90,481 male and 26,107 female meioses. Although the physical map remains a reasonable approximation in females (since the RR is 0.873 cM/Mb), the male RR is much higher in the PAR. It is therefore recommended to use sex-specific genetic maps rather than relying on physical maps. Overall, our study indicated that the new bovine assembly is reliable and provides genetic maps that are important to perform QTL mapping, association studies, or imputation on the X chromosome. It is important to include the X chromosome in such applications since it is a gene-rich chromosome and contains important genes related to fertility, reproduction, and potentially recombination (Demars et al. 2013; Fernandez et al. 2014; Arishima et al. 2017; Pacheco et al. 2020).

At the chromosome level, RRs from autosomes had a multiplicative inverse relationship with their physical length (r² = 0.82 and 0.78 in males and females). These high adjusted-r² values represented a good fit, but some chromosomes deviated from the estimated relationship. Similar or even larger deviations were observed in previous studies (Housworth and Stahl 2003; Johnston et al. 2016; Johnston et al. 2017), indicating that additional factors influence RR at the chromosome level. These might for instance include chromosome and chromatin structure (e.g., proportion of heterochromatic regions), gene density, GC content, density of repetitive elements, presence of structural variants, or hotspot motifs density (Coop and Przeworski 2007; Stapley et al. 2017). The RRs we herein estimated for the X chromosome indicate that, at the chromosome level, it had similar properties to autosomes in females. For instance, RR was close to the RR estimate on BTA2, a chromosome with almost the same physical size. The distribution of number of CO was close too. Note that more markers were genotyped on BTA2 (1589), providing higher informativeness although the number of markers on the X chromosome was sufficient to estimate the genetic length of the chromosome. Conversely, the PAR had a much higher RR in males, as previously observed in other species (Rouyer et al. 1986; Soriano et al. 1987; Johnston et al. 2016, 2017) and in agreement with the obligate CO per meiosis hypothesis. Higher RR had consistently been reported for other short chromosomes such as microchromosome in birds (Rodionov 1996; Megens et al. 2009). Very few double CO were detected as in human where they were found to be rare (Schmitt et al. 1994). As a result, the PAR presented higher interference levels in males than on other chromosomes consistent with previous reports of higher interference on shorter chromosomes (Broman et al. 2002; Wang et al. 2016). For autosomes, this trend of higher interference levels in shorter chromosomes was present only in females. Campbell et al. (2015) or Wang et al. (2016) observed this relationship in both sexes, but more pronounced in females as in our study. As pointed out by Wang et al. (2016), cows thus presented lower RRs and higher interference levels than bulls whereas opposite patterns were observed in human (e.g., Campbell et al. 2015). Finally, chromosomes with lower interference levels in our study, such as BTA-4 or BTA-22, were previously found to have higher levels in comparable cattle populations (Wang et al. 2016). This suggests that some technical aspects as data and CO cleaning might cause these differences.

Importantly, a certain number of CO remained undetected in males on the PAR (more CO occurred but we did not identify them). Indeed, one obligate CO per meiosis would result in a length larger than 50 cM whereas the length estimated with our data remained below 47 cM. This suggests that even when using an HMM approach, there is still a lack of informativeness on such a small chromosome (for instance at the borders of the chromosome that have higher influence in small chromosomes). Furthermore, the number of identified CO per gamete (0.36) indicates that only a fraction of CO is unambiguously identified (around 20% of expected CO missing). The lower informativeness on the PAR is also due to a lower marker density per cM compared with other chromosomes. Although 73 SNPs were available for a 5 Mb region, this corresponded only to 1.6 SNPs per cM. A more detailed comparison of different informativeness measures is available in Supplementary File 3.

At more local levels, the RR in females on the X chromosome presented differences with autosomes. In bovine, the X chromosome is metacentric whereas the autosomes are acrocentric. We previously observed on autosomes that recombination was higher in telomeres and lower in the center of the chromosome (Kadri et al. 2016) in agreement with findings in other species (Broman et al. 1998; Liu et al. 2014; Venn et al. 2014). The X chromosome presented a somehow distinct pattern in females, with higher RRs at both extremities, more so at the proximal part (opposite to the PAR). In the middle of the X chromosome, three large regions from 3 to 8 Mb long presented very low RRs (below 0.05 cM/Mb) and were characterized by the absence of informative SNPs (few SNPs and low MAFs). The first of these regions corresponds to the centromere and few similar regions of low RR were observed on autosomes. In males, although recombination was not constant in the PAR, the trend was difficult to characterize given the small physical size of the region.

Genetic parameters estimation and GWAS were realized using a LMM, although XRR was almost a binary trait in males. Generalized linear mixed models (GLMM) with a Poisson distribution might be more appropriate to model XRR. However, differences might be limited for individuals with multiple records, and when count frequencies are not too extreme as in the present study (e.g., not too close from 0 or 1 for a binary trait). For comparison purposes, we applied such models, and genetic parameters were in line with those obtained with the LMM. The individual polygenic effects estimated with LMM and GLMM were highly correlated (0.9883 and 0.9996 in males and females, respectively), and GWAS did not reveal significant associations with a GLMM neither (a correlation of 0.90 between LRT values). Results from the LMM were thus presented as they are on the same scale as GRR, making comparisons more interpretable (for instance, a Poisson GLMM would assume multiplicative effects whereas effects are additive in the LMM framework).

Heritabilities for XRR were as expected lower than for GRR obtained with multiple chromosomes. More interestingly, female XRR was more heritable (0.024) than RR on any other single chromosome, whereas male XRR (0.011) had a value similar to other chromosomes. We are not aware of similar estimates in cattle or other species. In absolute terms, these heritabilities were low, and similar to those observed for fertility traits such as non-return rate (Jansen and Lagerweij 1987). The higher female heritability might result from a more accurate phenotype since haplotypes from males are known on the X chromosome and can help to infer the haplotypes from their daughters (see Supplementary File 1 for more details). Conversely, male XRR might be more difficult to study than RR on other chromosomes. As we discussed above, more CO might have remained undetected in comparison with larger chromosomes and the estimated number of CO per individual was below expectations. More information (higher marker density per cM) is needed to identify male CO in the PAR with high power. Given the presence of interference and the size of the region, there is less room for variation between individuals. The genetic correlations between female GRR and female XRR were high, indicating that these might be controlled by the same genetic variants. Despite a low power for testing associations with RR on a single chromosome in females, a few variants previously identified for their effect on GRR were associated with female XRR and the effects of these variants on XRR were correlated to the effects on GRR. We also estimated that the correlation between the effects of haplotypes on XRR and GRR was high. Overall, recombination patterns at the chromosome levels and genetic correlations with female GRR indicate that female XRR might follow the same process as on the other chromosomes. Correlation between male GRR and male XRR was lower suggesting more different traits as might be expected from the very different recombination patterns. The variants affecting male GRR were not significant for male XRR although many significant associations were found for individual autosomes, suggesting that the number of records were not an issue. However, the male XRR phenotype might be less informative due to the elements discussed earlier. More data or more accurate phenotypes (with a higher heritability), including CO obtained with higher density maps or from molecular techniques such as MLH1 foci counts (e.g., Peterson et al. 2019; Wang et al. 2019), are required to clarify how strong are these correlations and to identify eventually shared variants.

As illustrated in recent studies, recombination in the PAR has specific mechanisms (Acquaviva et al. 2020). We herein showed that genetic correlation between male XRR and GRR was limited. We estimated however RRs in broad intervals (larger than 50 kb). Complementary analyses with whole-genome sequence data, and with LD-based approaches in particular, could estimate RR at finer scale. Such high-density recombination maps would help to determine whether genomic features associated with recombination are the same on autosomes, the X chromosome, and the PAR. For instance, CO occur in hotspots associated with PRDM9-binding motifs in mammals such as human or mice (e.g., Baudat et al. 2010). However, Brick et al. (2012) observed that PRDM9 was not associated with CO positions on the PAR in mouse. Therefore, it would be interesting to determine whether such hotspots are present on the X chromosome (representing female-specific recombination) and the PAR in cattle, and whether they are associated with PRDM9-binding motifs. Depending on the species, CO frequency might also be higher close to the transcription start site or at CpG island. Recombination has additionally been associated with different families of repetitive elements. Fine-scale recombination maps would thus allow to determine whether this landscape is the same on autosomes, the X-specific part, and the PAR (undergoing different constraints during meiosis).