1 Introduction

In Switzerland, beekeepers from the associations mellifera.ch (MEL), breeding Apis mellifera mellifera, and Société Romande d’Apiculture (SAR), rearing Apis mellifera carnica, maintain two breeding programmes operating independently, but they share a common interest in improving the production, behaviour, and health traits of honey bees. Their aim is to provide beekeepers with genetic material corresponding to their respective population standards and with good capabilities for beekeeping in local environmental conditions. The selection is subsidised by government funding to support local breeding programmes.

Both breeding programmes maintain mating stations in distinct Alpine valleys, which enables controlled mating of the queens with selected drones of the respective honey bee population (Plate et al. 2019). Since 2010, selection in each population occurs after evaluation of about 100 to 180 queens per year by qualified beekeepers in networks of test apiaries throughout Switzerland following standardised testing procedures, corresponding or being highly similar to other protocols presented in literature (Büchler et al. 2013; Ehrhardt et al. 2010). At the test apiaries, the following traits are routinely recorded: honey yield, defensive behaviour, calmness during inspection, swarming drive, hygienic behaviour towards pin-killed brood, and infestation by the parasitic mite Varroa destructor, while MEL beekeepers additionally assess the size of each honey bee colony. After an initial treatment to equalize infestation between colonies at a very low level, no treatments against V. destructor are performed when testing the colonies the following season.

To ascertain the quality of both breeding programmes, genetic parameters for the aforementioned traits were estimated using a best linear unbiased prediction (BLUP) approach. Recently, a similar analysis was performed including phenotypic records from ~ 15,000 Austrian A. m. carnica colonies (Brascamp et al. 2016). The estimation of genetic parameters is strongly dependent on data size and structure; therefore, it was uncertain if the same approach (Brascamp and Bijma 2014) could be applied on our smaller datasets (~ 1000 colonies per population). In the study of Brascamp et al. (2016), it was demonstrated that genetic effects of queen and worker, both contributing to colony performance, can be jointly estimated. Under this condition, it became feasible to sum up the estimated breeding values (EBVs) for queen and worker effect and use this sum as selection criterion. In smaller datasets, it is more likely that the maximum likelihood algorithm may not converge when a joint estimation is performed, thus requiring estimation of either worker or queen effects. Such a situation was described for a recent honey bee selection programme including 151 colonies (Facchini et al. 2019).

The aim of this study was to calculate EBVs, heritability estimates, and phenotypic correlations for the different traits recorded by MEL and SAR beekeepers. Furthermore, we also validated the results of the applied statistical models. The results presented in this study will allow Swiss beekeepers to optimise their breeding programme and selection strategies.

2 Material and methods

2.1 Structure of the applied breeding programmes

In the MEL breeding programme, groups of 12 sister queens are mated on mating stations during summer with drones from drone-producing colonies headed by sisters. All queens are blindly evaluated the following year in a network of testing apiaries. Based on this evaluation, queens are selected in their third year for grafting to produce daughter queens (female side). The selection of queens for production of queens heading drone-producing colonies (male side) additionally occurs across the programme according to test results.

In the SAR breeding programme, experienced beekeepers are responsible for the maintenance of their own maternal lines. For this purpose, on the female side, colonies are empirically selected each year for the production of the next generation by grafting. Following the same strategy as in MEL, groups of 12 full-sister queens are produced and distributed to different test apiaries, where they are blindly tested to select queens for the male side. Contrary to the MEL programme, only some of the best queens may occasionally be used for the female side for queen rearing; however, this is uncommon and therefore phenotypic information is used intensively on the male and hardly on the female side.

2.2 Datasets

In 2019, the MEL and SAR honey bee breeding associations provided the recorded phenotypes (2009–2018) and ancestry information to estimate heritabilities and EBVs for the different traits (honey yield, defensive behaviour, calmness during inspection, swarming drive, hygienic behaviour towards pin-killed brood, and infestation of V. destructor). The database was complemented each year, corresponding to the colonies evaluated during the preceding beekeeping season (the number of queens tested by beekeepers determines the additional amount of information available each year).

In each dataset, the unique identification number (ID) of the queen was used to identify the respective colonies. Here, we refer to a colony as a group of sister workers originating from the same queen. The ID of the queen heading the colony (dam of the workers), of her mother, and of the mother of the drone-producing colonies (sire) that were used to mate the queen of the colony generally was known. Based upon this ancestry information, two pedigree files (MEL and SAR) were generated. For the majority of the colonies included in the pedigree files, information for most of the evaluated traits and the identification of the testing apiary were available. At each testing apiary, beekeepers report the date and the person who evaluated the colonies. These putative effects (year, tester, and location) on the evaluation of honey bees were confounded, as the quality of all colonies is simultaneously assessed during the season. The phenotypic records and information of the apiary (year, tester, and location) were included in the respective performance files.

2.3 Data preparation

In the pedigree files (Table I), IDs of the dams and sires were entered for each colony. The number of drone-producing colonies was also indicated. A base queen or sire was added to the pedigree file in case one of the parents was unknown. If queens were mated in the same place with an unknown sire (often a group of unrelated drone-producing colonies), the latter was encoded as a common sire without known parents. This situation was repeatedly observed in the MEL population and resulted in the addition of many virtual base animals. In the pedigree file, the number of entries corresponds to the total number of colonies, dams, and sires. The rows in these files contain all colonies, dams, and sires, along with the identities of their own dams and sires.

Table I Structure and content of pedigree and performance files obtained for mellifera.ch (MEL, Apis mellifera mellifera) and Société Romande d’Apiculture (SAR, Apis mellifera carnica)

Before the beginning of the records analysed here, queens were already mated at mating stations in a way similar to the years included in the dataset. It was therefore considered that the additive genetic relationship between drone-producing colonies had reached an equilibrium. For each mating, the number of drones involved was assumed to follow a Poisson distribution and was set to 12 (Brascamp et al. 2016). The inverse of the pedigree relationship matrix between all entries in the pedigree was calculated following Brascamp and Bijma (2014).

In the performance files (Table I), to facilitate interpretation of the results, phenotypic values for each trait were entered without transformation, even for not normally distributed traits. Honey yield was analysed as raw data, but also excluding colonies that did not produce any honey, in order to know if absence of production is mainly due to detrimental environmental conditions, or to poor genetic value of the colony. Two ratios were calculated from phenotypic data: the growth rate of V. destructor infestation between spring and summer (Ehrhardt and Bienefeld 2007), and the colony size growth rate, expressed as the ratio of colony size in summer to colony size in spring. Identification of the testing apiary was also added to the file by combining the geographic location and the testing year.

2.4 Models

Estimated breeding values for all traits were calculated with the ASReml software version 4.1.2132 (www.vsni.co.uk), using the aforementioned performance files and the inverses of the pedigree relationship matrices.

In a first trial, a model was used to jointly estimate both worker and queen effects, along with the fixed apiary-year effect and an overall mean. In both datasets, due to the data size or structure, the restricted maximum likelihood algorithm did not converge. Thus, worker and queen effects were evaluated separately. Therefore, the models were defined to include either the colony or the queen for the purpose of estimating the worker and queen effects separately. In the model including the colony, we accounted for the fact that the workers are a group of individuals, rather than a single individual, by calculating the relationship matrix following the approach of Brascamp and Bijma (2014).

For each dataset, two linear models on single traits were finally used, the first on worker effects (WM) and the second on queen effects (QM) as random effects. The complete models are presented below:

  • Pij = μ + Apiaryi + colonyj + eij (WM)

  • Pij = μ + Apiaryi + queenj + eij  (QM)

where Pij is the phenotype associated with the worker or queen of apiary i and colony j; μ is the general mean of the population for this phenotype; Apiaryi is the fixed effect of the testing environment (date, location, and evaluator were confounded as all colonies per test apiary were evaluated each time); colonyj is the random effect associated with one worker of colony j; queenj is the random effect associated with the queen heading colony j; and eij is the residual associated with the measurement. We refer to the models as WM (worker model) and QM (queen model), as they are used to estimate variance components for worker effect and queen effect, respectively.

2.5 Heritability estimates

With the two models (WM and QM), it was possible to derive three heritability estimates. Because the colony consists of a group of workers, the WM yields two heritabilities (Brascamp and Bijma 2019). First, a heritability relating to the worker effect of a single individual, \( {h}_W^2=\operatorname{var}\left(\mathrm{colony}\right)/\operatorname{var}(P) \), which is a measure for the scope of selection. In this expression, var(colony) is the estimate of the colony variance as produced by ASReml when using the relationship matrix according to Brascamp and Bijma (2014). Second, a heritability relating to a group of workers, \( {h}_{\overline{W}}^2=0.4 \operatorname {var}\left(\mathrm{colony}\right)/\operatorname{var}(P) \), which reflects the part of the phenotypic variance due to the colony effect. The 0.4 is the additive genetic relationship between drone-producing queens in a sire in the base generation, assuming an equilibrium (Brascamp and Bijma 2019). The QM provided an estimate of the heritability for the queen effect \( {h}_Q^2 \). Genetic correlations between worker and queen effects could not be calculated, as the estimates for worker and queen effects were estimated in two separate models.

2.6 Validation of the model

For the models WM and QM, the quality of the EBVs was evaluated by comparing the predicted phenotypes for workers or queens (using WM and QM, respectively) with their realised phenotypes. This approach is known as cross-validation, and is a common strategy for validation of EBVs in livestock (e.g. Luan et al. 2009). Prediction involves the estimation of the EBVs for workers or queens by excluding their own phenotypes and is therefore based solely on the information of relatives. In practice, we randomly divided the performance file into 10 equally sized subsets. Then, we did 10 analysis to estimated EBVs, in each analysis masking records of one of the subsets. Individuals in the masked subset, however, did receive EBVs because of pedigree relationships with the individuals in the remaining 90% of the data. These EBVs served as predicted phenotypes. Realised phenotypes for the masked individuals equalled the observation as a deviation from the corresponding fixed-effect estimate. For both WM and QM, we compared the predicted phenotypes and their realisation through the regression of the latter on the former. Theoretically, this value equals unity, and the estimated regression coefficient provides insight into whether the models produce unbiased EBVs. We also compared the accuracy of WM and QM to produce unbiased EBVs by considering the standard errors of the regression coefficients in order to distinguish which model should potentially be favoured, if some differences were noted.

2.7 Phenotypic correlations

Due to the small datasets, it was not possible to compute genotypic correlations with sufficiently small standard errors. In such a situation, phenotypic correlations were preferred for evaluating the relationships among traits in the respective populations. The measured values were corrected for the test apiary effect obtained from WM, and all pairwise correlations for each population were calculated with Pearson’s product-moment method using the cor.test function in R (R-Core-Team 2018). Standard errors (SEr) associated to the correlation estimates (r) were obtained as follows: \( SEr=\sqrt{\frac{1-{r}^2}{n-2}} \), n − 2 being the associated degrees of freedom also provided by cor.test.

3 Results

In Table II, the estimated variance components, heritabilities, and their respective standard errors for both populations are summarised. For the MEL population, the highest heritabilities were obtained for defensive behaviour, calmness during inspection, and hygienic behaviour (Table II). For these three traits, heritabilities estimated for colony and queen effects were in the same range (0.34 and 0.32, 0.16 and 0.12, and 0.19 and 0.18, respectively). Low heritabilities were obtained for honey yield, swarming drive, and colony size growth rate (0.02 and 0.10, 0.06 and 0.07, and 0.02 and 0.08, respectively). No colony or queen effects were detected on the V. destructor infestation growth rate, and almost no effects were found on the infestations themselves (except low effects for infestation in spring with WM). For SAR population, heritabilities were generally very low, with the only exception being for honey yield and hygienic behaviour (0.11 and 0.11, 0.06 and 0.09, respectively, for colony and queen effects) (Table II).

Table II Estimated variance components (Var), heritabilities (h2) for worker (\( {h}_W^2 \)), colony (\( {h}_{\overline{W}}^2 \)), or queen (\( {h}_Q^2 \)) effects and associated standard errors (between brackets) for measured phenotypes and calculated ratios from MEL and SAR datasets. n.d. indicates that no variance attributed to either worker or queen effects was detected by the models

In both populations, standard errors for the heritability estimates were high (with magnitude often similar to the values of the estimates), and only a limited number of traits: defensive behaviour, calmness during inspection, and hygienic behaviour in MEL, and honey yield and hygienic behaviour in SAR datasets, had heritabilities above 0.1. For the honey yield trait, in the MEL population, heritabilities were estimated to be slightly higher when colonies with zero yield were included in the data.

Results from the validation of the models are presented in Figure 1. Estimated linear regression coefficients for the “realised” and “predicted” phenotypes are represented in relation to the heritability estimated for the traits. Most of the regression coefficients did not significantly differ from 1, indicating that the models provided unbiased estimates. One noticeable exception was the QM for the SAR population, in which all coefficients significantly differed from 1. In the other models, the most precise predictions (regression coefficients close to 1 and low standard errors) were observed for traits with heritabilities estimated over 0.1. In data from MEL, hygienic behaviour estimated by WM gave better predictions than QM estimates. In SAR data, one trait (V. destructor infestation in spring) had a strongly negative regression coefficient and a very low heritability estimate; convergence of the maximum likelihood algorithm for the calculation of the latter may have been possible only due to particularities of the available data.

Figure 1.
figure 1

Tests of the models (models on queen or worker effects for MEL or SAR populations). Regression coefficient of linear relation between phenotypes corrected for apiary effects and breeding values calculated only according to pedigree is plotted in relationship to the estimated heritabilities. Bars indicate standard errors. A: honey yield, B: honey yield (colonies having produced only), C: defensive behaviour, D: calmness during inspection, E: swarming drive, F: V. destructor infestation in spring, G: V. destructor infestation in summer, H: V. destructor infestation growth rate, I: hygienic behaviour, J: colony size (spring), K: colony size (last harvest), L: colony size growth rate. In the model on queen effects from the SAR population, one trait (V. destructor infestation in spring) was highly negative and is therefore represented in the top right corner diagram.

Pairwise correlations between all traits are presented in Table III. In the MEL population, the highest correlation (0.71) was found between defensive behaviour and calmness during inspection. In addition, honey yield was positively correlated with colony size in spring and summer (0.40 and 0.52, respectively) and with calmness during inspection (0.21). Colony size in spring was the only trait that moderately (0.09 to 0.11) correlated with V. destructor infestation levels. Hygienic behaviour and V. destructor infestation levels were found to be uncorrelated. Hygienic behaviour had a moderate positive correlation with honey yield (0.13).

Table III Pairwise correlations between phenotypes corrected for apiary effects obtained by the worker model and associated standard errors (between brackets)

In the SAR dataset (Table III), a high correlation (0.65) was observed between defensive behaviour and calmness during inspection. A moderate correlation (0.11) was identified between honey yield and hygienic behaviour. A low correlation (− 0.09) was observed between V. destructor infestation in spring and the hygienic behaviour of the colony, but this result was not observed in summer conditions.

4 Discussion

The genetic analysis of two independent honey bee datasets, each having about 1000 colonies with observations, indicated that it was possible to calculate genetic parameters even in small populations. However, in our case, it was not possible to estimate queen and worker effects jointly, like in previous studies (Bienefeld and Pirchner 1990, 1991; Brascamp et al. 2016; Ehrhardt et al. 2010). As two linear models (WM and QM) had to be used, part of the variation linked to the queen effect may have been included in the worker effect in WM, and vice versa, nor was it possible to estimate the genetic correlation between worker and queen effects for the same trait.

As MEL and SAR populations were not genetically connected, were managed differently, and had distinct evaluation protocols for some traits, comparisons between estimates should only be done with caution. This is also the case for comparisons to some previously published studies, in which heritabilities may have been estimated with other methods.

In both datasets, heritabilities were low to moderate for honey yield, below previously described values in other countries (Andonov et al. 2019; Bienefeld and Pirchner 1990; Brascamp et al. 2016; Najafgholian et al. 2011; Tahmasbi et al. 2015; Zakour et al. 2012). This may be explained by the specificities of honey production in Switzerland, which mainly relies on rapeseed nectar and silver fir honeydew (Persano Oddo et al. 2004), both strongly influenced by environmental conditions with high environmental variability: production may, for instance, be highly influenced by genotype-environment interactions. In addition, the colony size recorded by MEL beekeepers was almost not heritable but positively correlated to honey yield; the latter may be influenced by colony management (for instance, space availability for the queen for laying eggs or feeding, if necessary). In the MEL data, the heritability estimates for honey yield were slightly higher when non-producing colonies were included in the dataset. This result suggests a putative genetic effect on non-producing colonies; therefore, we suggest including non-producing colonies in the data analysis.

Heritability estimates for defensive behaviour and calmness during inspection differed between the two populations, with high values in MEL, corresponding to previously published values for other populations (Andonov et al. 2019; Bienefeld and Pirchner 1990; Brascamp et al. 2016; Tahmasbi et al. 2015) or even being higher than others (Zakour et al. 2012). Lower estimates were obtained for the SAR population; this may be related to evaluation protocols. As the quality level of the colonies was recorded, and as many expressed apparently satisfying behaviour levels for these traits, half of the colonies were evaluated between 3.5 and 4 (maximum grade). We therefore observed a lower variation in the recordings of these traits compared with the MEL dataset, where the worst colony per apiary was graded 1, the best colony was graded 4, and the others were distributed in between (Table I). Thus, the low heritabilities might be the result of the low variability reported and not the absence of a genetic effect. In order to improve the assessment of calmness during inspection and defensive behaviour in SAR, we suggest evaluating the colonies by using the full scale from 1 to 4 for relative ranking to better discriminate the best colonies. This had already been suggested to more efficiently select for low defensive behaviour in an extremely aggressive A. m. syriaca population (Zakour and Bienefeld 2013). If almost no variability can be detected in the field for some traits, for instance when all colonies are very close to the optimum, other approaches could be preferred, such as removing the few colonies with low performance from the programme. High correlations between defensive behaviour and calmness during inspection were obtained for both populations (0.71 and 0.65 for MEL and SAR, respectively). High genetic correlations have been reported previously for these two traits in an Austrian A. m. carnica population (Brascamp et al. 2016). This may indicate either that beekeepers are not able to distinguish the two traits, or that they are in general closely linked. Breeding programmes could consider including only one trait or to define better tools to assess the two traits more distinctly.

In agreement with a previous study (Brascamp et al. 2016), heritability estimates for swarming drive were low (< 0.1) for queen and colony effects, indicating either strong genotype-environment interactions (involving weather or honey flow conditions), non-genetic quality factors of the queen, or a lack of exactitude in the assessment of this trait by the beekeepers. Higher values have been obtained in two other populations (Andonov et al. 2019; Tahmasbi et al. 2015), indicating that in the latter, selection for this trait could be possible.

Surprisingly, heritability estimations for V. destructor infestations only led to null values. This result is not in line with previous findings obtained by others (Büchler et al. 2008; Ehrhardt et al. 2010) but corresponds to some observations (Harbo and Harris 1999; Maucourt 2019). Low non-significant heritabilities were found in spring in WM. However, spring values are mainly recorded in order to check the efficiency of pre-testing treatment (infestation should be close to zero for all colonies); it is therefore likely that these values are due to specificities of the dataset rather than genetic differences among colonies. In contrast, we would have expected higher estimated heritabilities for infestation rates in summer and infestation growth rates between spring and summer. Several reasons may explain this result: either infestation may not be influenced by genetic background of the host, or these influences are masked early by far more important horizontal transmissions between colonies and/or apiaries. These transmissions, for instance linked to robbing, are likely to happen starting at the end of spring, as relatively long gaps between spring and summer honey flows are frequent in Switzerland. Many apiaries in Switzerland still use traditional hives grouped in small pavilions with the entrances of the different colonies being side by side. Apiaries with hives kept in groups and entries facing in the same direction are known to increase mite transfers between colonies (Dynes et al. 2019; Seeley and Smith 2015). In addition, many regions in Switzerland have a high colony and apiary density per square kilometre (Fluri et al. 2004; von Büren et al. 2019), and colony density has been linked with mite re-invasion flows in neighbouring Germany (Frey and Rosenkranz 2014). These mite flows between colonies likely mask colony effects. Finally, as V. destructor infestation measurements require precise protocols, it should be verified if, despite training provided by the associations, all beekeepers performed the measurements with the necessary exactitude. However, no heritability for the infestation level was observed either in a research population of similar size (Maucourt 2019). This may indicate that under some conditions, even if evaluators are trained, heritabilities for this trait are extremely low, perhaps due to environmental effects. Even if heritabilities for V. destructor so far have been low, beekeepers should continue to measure infestation levels to support the monitoring of the testing network and to guarantee good health conditions at the testing apiaries (efficient treatments between testing periods). Mortality due to poor infestation management by the beekeeper can be avoided, enabling complete testing of a maximum number of queens, which are highly valuable as potential drivers of genetic progress.

In the case of MEL, estimated heritabilities for hygienic behaviour were comparable with values found in the literature (Büchler et al. 2008; Ehrhardt et al. 2010; Facchini et al. 2019; Maucourt 2019). Lower values in the case of SAR may be due to the evaluation protocol of this trait. Perhaps, more precise data could be obtained if values were expressed as the number of cells cleaned per hour (test duration is not known so far), following the approach of MEL. We found no association between hygienic behaviour and V. destructor infestation level in summer. Hygienic behaviour is employed as a criterion to improve brood health, but many beekeepers may associate this trait with the aim of selecting colonies that show resistance to V. destructor, for instance by means of Varroa sensitive hygiene. The link between hygienic behaviour and resistance to V. destructor is controversially discussed (Leclercq et al. 2018). In the present case, it is unlikely that selection for hygienic behaviour may lead to better survival of colonies in the context of V. destructor infestations. This could be because other resistance traits (for instance grooming, Varroa sensitive hygiene, suppressed mite reproduction, swarming) or more likely, a combination of such traits, may be more efficient defence mechanisms under Swiss conditions, as this is the case in certain naturally resistant populations (Locke 2016). It is unclear how hygienic behaviour could help to decrease the number of cases of the widespread European foulbrood, as the prevalence of this endemic disease is also highly influenced by colony density (von Büren et al. 2019). For these reasons, the ability of hygienic behaviour to limit chalkbrood or European or American foulbrood prevalence should be assessed in the Swiss context. In the meantime, selecting for hygienic behaviour should not have detrimental effects on honey yield, as both traits show a low positive correlation (0.13 and 0.11 for MEL and SAR, respectively).

Tests from the models showed that EBVs were unbiased for almost all traits, especially for traits with heritabilities above 0.1. One noticeable exception was QM for the SAR dataset, where all estimates were biased. This might be related to the lack of performance data on the female side, as most of the colonies used for queen breeding are empirically selected by the beekeepers in charge of the lines. We suggest, in the future, adding performance information on the female side, for instance by using tested queens as dams to produce the next generation. In the MEL dataset, EBVs for hygienic behaviour obtained by WM were estimated more accurately than those obtained by QM. This may be explained by the fact that hygienic behaviour depends on the ability of workers to detect dead brood, a task that does not involve the queen. In this study, we could not estimate jointly the worker and queen effects. By adding data to the performance files in the next years, an aim for the associations could be to jointly assess the two effects in the future, as soon as convergence of the maximum likelihood algorithm can be obtained. This is expected to lead to a more accurate estimate of the breeding objective, when defined as the sum of worker and queen effects in a joint analysis.

Based on our results, beekeepers could select for traits with the highest heritabilities, and could periodically calculate genetic progress to verify whether selection leads to the desired results. Other parameters, such as generation interval, queen mortality due to management issues, and selection differentials could also be considered to optimise genetic progress. Standardization and quality of data collection should be verified frequently, as it is crucial for the quality of datasets and for obtaining better genetic estimates. Moreover, standardization will help in comparing results with other studies in the future. This is also the case for breeding value and heritability estimation methods. Traits related to V. destructor infestation need to be re-examined locally, in order to explore the genetic background of honey bees for resistance selection under Swiss conditions.