Introduction

Sex assessment is crucial in avian biology, allowing the investigation of inter- and intrasexual differences in bird ecology and behaviour. Unless the studied species show morphologically or behaviourally visible sexually dimorphic features, sexing birds in the field is not possible. In this case, sexing has been traditionally based on the inspection of primary sexual characters (laparoscopy: e.g. Richner 1989; cloacal examination: e.g. Samour et al. 1984), which may represent invasive, thus stressful, procedures. Recently, less invasive molecular methods have been developed (Fridolfsson and Ellegren 1999; Dubiec and Zagalska-Neubauer 2006), even though such methods may not be quick and cost-effective, as well as they still require the collection of biological samples such as feathers or blood. Alternatively, biometric data commonly recorded from birds (e.g. measurements of bill, wing, tarsus and body mass) has involved faster and less invasive animal handling, allowing sex classification through discriminant analysis or logistic models (Dechaume-Moncharmont et al. 2011; Gorman et al. 2014). This morphometric sexing technique is relatively rapid and can be useful especially in remote and/or harsh field conditions such as those in Polar regions (e.g. Lorentsen and Røv 1994; Polito et al. 2012; Valenzuela-Guerra et al. 2013).

Amongst bird species, it has been shown that sex discriminability by morphometrics depends mainly on the level of sexual dimorphism, with increasing sexual dimorphism resulting in higher classification power (Dechaume-Moncharmont et al. 2011). Thus, establishing morphometric thresholds to sex weakly dimorphic, cryptically dimorphic or nearly monomorphic species such as most birds inhabiting Polar regions is particularly challenging. Additionally, discriminatory power may be affected by methodological issues hampering comparison between and within studies, such as the use of alternative statistical analyses or sample size, as well as by measurements conducted by different researchers, respectively. Despite morphometric sexing has been implemented routinely in ornithological research, it is still unclear how such caveats affect classification power and therefore sexing performance. In fact, different analytical methods of classification and validation may yield different results (Dechaume-Moncharmont et al. 2011), and discriminant rate may be affected by sample size used to estimate morphometric criteria (Shealer and Cleary 2007; Dechaume-Moncharmont et al. 2011). Also, individual variability in measurement error, the so-called researcher or ringer effect, may affect classification accuracy (Dechaume-Moncharmont et al. 2011; Henry et al. 2015). Studies aiming to address the above pitfalls could provide possible advances for the applicability of morphometric sexing, helping field research activities in Polar environments. In this work, we presented a case study on a weakly dimorphic Antarctic seabird to assess feasibility of morphometric sexing achieved by different analytical techniques and changing sample size, comparing relevant classification powers.

Although with a lower extent compared to strictly polygamous bird species where sexual dimorphism is generally evident, and with a variable extent differing between species, some degree of sexual dimorphism is present in Pygoscelis penguins (Warham 1972; Agnew and Kerry 1995), including the Adélie penguin Pygoscelis adeliae (e.g. Ainley and Emison 1972). In this Antarctic seabird, morphological features developed until several years after birth, i.e. bill, flippers and other skeletal characters, can be clues for being a male or a female (Scolaro et al. 1990; Kerry et al. 1992; Polito et al. 2012; Gorman et al. 2014), with males retaining larger traits (Ainley and Emison 1972; Kerry et al. 1992; Polito et al. 2012; Gorman et al. 2014). Apparently, male Adélie penguins are also heavier than females (Ainley and Emison 1972; Kerry et al. 1992; Fairbairn and Shine 1993; Gorman et al. 2014). Nevertheless, body mass oscillates throughout the breeding season (e.g. Emmerson et al. 2019), as it may be intimately linked to body condition, which in turn, should depend on the interplay between several environmental parameters (e.g. diet quality/prey availability, energy expenditure and sea-ice conditions; e.g. Ballard et al. 2010; Massaro et al. 2020). Even at the onset of the breeding season, when Adélie penguins arrive at the colony for the first time (see Black 2016), body mass should reflect the joint effects of environmental parameters and food intake experienced in wintering areas (Gorman et al. 2014), as well as that of fat accumulated during the pre-migratory hyperphagia (Ainley 2002: 104), which might also vary according to the distance covered and the geographical location of the wintering range. Consequently, body mass should present a high inter-individual variability depending on both sampling date and year (e.g. Emmerson et al. 2019), therefore its absolute value may not be a reliable proxy of sex discrimination as much as those traits developed with skeletal growth. In spite of the above confounding factors, long-term data have shown that body mass of the Adélie penguin still seems to be greater in males than in females (e.g. Ainley 2002: 105; Beaulieu et al. 2010), suggesting to explore the potential for its discriminatory power.

Morphometric sexing has been applied successfully in most penguin species (Zavalaga and Paredes 1997; Arnould et al. 2004; Poisbleau et al. 2010; Vanstreels et al. 2011; Polito et al. 2012; Pichegru et al. 2013; Cappello and Boersma 2018). As to the Adélie penguin, it has been shown that bill measurements are key-indices of sex discrimination (Scolaro et al. 1990; Kerry et al. 1992; Polito et al. 2012). In contrast, the discriminatory role of body mass has received less support (but see Gorman et al. 2014). Furthermore, previous works conducted in this Antarctic seabird caution about inter-colony differences in measurements and call for population-specific studies on sex discrimination by morphometrics. The Adélie penguin is the most abundant penguin species strictly breeding in the Antarctica (Lynch and LaRue 2014; Southwell et al. 2017), particularly and long well studied, and it has been acknowledged as a sentinel species of ecosystem changes (Ainley 2002; Wormworth & Sekercioglu 2011). Studies providing morphometric thresholds to sex Adélie penguins using less invasive procedures would thus be useful for both monitoring and research purposes, as invasive techniques could impact on bird health even if conducted by skilled researchers (see Samour et al. 1983, for penguins).

Here, we investigated the sexing performance in Adélie penguins via discriminant analysis of morphometric traits, using a large dataset of bill size and body mass measurements collected in a population breeding in Central Victoria Land (Ross Sea, Antarctica), where no criteria for morphometric sexing was available. Along with discriminant function analysis, logistic models have been also used to sex birds by morphometrics (Gorman et al 2014, for the Adélie penguin). Both techniques can be used for classification problems (Press and Wilson 1978), but studies comparing their discriminatory power in morphometric sexing are scanty. We thus repeated analyses by developing an additional morphometric sexing based on logistic models, which may help validating results achieved by discriminant analysis. According to previous research (Scolaro et al. 1990; Kerry et al. 1992; Polito et al. 2012; Gorman et al. 2014), we expect that male and female penguins would be discriminated by both bill size and body mass, with males retaining larger bills and being heavier than females.

Materials and methods

Data collection and sexual dimorphism

We conducted our study in the colony of Edmonson Point (74°20' S, 165°08' E; Ross Sea, Antarctica), inhabited by c. 2100 breeding pairs of P. adeliae throughout the study period (Pezzo et al. 2007). This colony has been the focus of a long-term research project in the framework of the Commission for the Conservation of Antarctic Marine Living Resources Ecosystem Monitoring Programme (CEMP). We analysed measurements of bill length, bill depth and body mass collected according to CEMP Standard methods (SC-CAMLR 2014) between 1995 and 2001, on 501 breeding (> 4 years; Ballerini et al. 2009) Adélie penguins of known sex (n = 237 males; n = 264 females). We took measurements once per individual, between November–January, selecting adult birds from monitored nests (Pezzo et al. 2007). Upon capture, we marked penguins by subcutaneously implanted passive transponders tags (Texas Instruments, TIRIS), allowing individual recognition throughout our study period. We sexed penguins by cloacal examination, a method validated in breeding Adélie penguins (Sladen 1978). We only considered those individuals whose sex was unequivocally assessed by cloacal examination, upon the observation of papillae and associated urogenital diagnostics that are conspicuous especially in the breeding season (cf. Sladen 1978). All individuals for which this method provided unsatisfactory results were discarded. Additionally, the sex of each marked bird was also confirmed by extensive observations at nest, relying on sex-specific behavioural patterns or sex-synchronised shifts in nest attendance, which have been used consistently in this species (e.g. Lescroël et al. 2019). Such precautions should have further minimised potential mistake in sex assessment by cloacal examination. We followed Kerry et al. (1992) by defining (i) bill length, also known as culmen length, as the distal edge of the culmen to the bill tip; (ii) bill depth at nostrils, from the point on the lower mandible to just behind the mandibular symphysis (Scolaro 1990; SC-CAMLR 2014). We took bill measurements through a Vernier calliper, to the nearest 0.1 mm. We assessed body mass using a Salter® scale, to the nearest 0.05 kg. Although measurements were conducted by seven researchers between 1995 and 2001, all of them were previously trained and followed the same procedure throughout the study period. However, we accounted for the researcher identity in the second step of our analyses (see below), in order to explore the researcher effect.

We compared morphological measurements between males and females using t tests, after checking the normality and homoscedasticity assumptions of residuals within each sex. For each trait, when difference between sexes was significant, we calculated various indices of sexual dimorphism (SDIS, Storer 1966; SDILG, Lovich and Gibbons 1992; SDIG, Greenwood 2003), allowing comparisons with previous studies on the Adélie penguin (cf. Ainley and Emison 1972; Polito et al. 2012; Gorman et al. 2014). These metrics use either the mean measurements of males (M) and females (F) or those of the largest (L) and smallest (S) sex, depending on the index:

$${\text{SDI}}_{{\text{S}}} = 100\frac{{{\text{F}} - {\text{M}}}}{{\frac{{{\text{F}} + {\text{M}}}}{2}}}$$
$${\text{SDI}}_{{{\text{LG}}}} = \frac{{\text{L}}}{{\text{S}}} - 1$$
$${\text{SDI}}_{{\text{G}}} = 100\frac{{{\text{M}} - {\text{F}}}}{{\text{F}}}$$

Morphometric sexing

Initially, we applied to our study population those discriminant functions or equations of logistic models previously obtained to sex adult Adélie penguins breeding in other colonies. We applied those functions/equations which were based on any combination of our measured variables (cf. Kerry et al. 1992; Polito et al. 2012; Gorman et al. 2014). We calculated the proportion of correctly sexed individuals to check whether earlier morphometric criteria were suitable to sex Adélie penguins breeding at Edmonson Point.

Then, we build specific discriminant functions to sex penguins at Edmonson Point. As a first step, we used the Fisher’s discriminant function analysis (hereafter DFA; Rencher 1995) to assess the separation between sexes which is based on a linear combination of morphological traits. Unlike alternative types of discriminant function analyses, DFA requires no assumptions of multivariate normality and equality of group covariance, only assuming no multicollinearity between explanatory variables. We ran two separate DFAs to provide discriminatory power for different combinations of morphological traits, by considering: (i) bill length and bill depth; (ii) bill length, bill depth and body mass. We found no multicollinearity between explanatory variables (Pearson correlation, n = 501; bill length vs bill depth: r = 0.43; bill length vs body mass: r = 0.13; bill depth vs body mass: r = 0.13). For a grouping factor with two levels such as sex, and two or three explanatory variables, DFA replaces values of original variables with scores (D) of a unique discriminant function which is the linear combination of the variables providing the greatest separation between sexes and explaining 100% of variability. On DFA scores, we used the Mann–Whitney test (because scores were not normally distributed) to test formally the difference between sexes explained by the discriminant function. We checked sex assignment by re-classification of data to each sex, thus providing the percentage of correct re-classified items, which was validated by a ‘leave-one-out’ (jackknifed) cross-validation. As recommended in DFA (Dechaume-Moncharmont et al. 2011), we also provided the 95% bootstrap confidence interval of discriminant rate based on 1000 bootstrap replicates to quantify the uncertainty in discriminatory power. For the DFA based on bill depth and length, we also showed the bootstrap 95% confidence band around the discriminant line. DFA, bootstrap confidence interval and band were implemented through FORTRAN codes specifically constructed for our study and available from the authors.

As a second step, we re-ran the above DFAs by accounting for the field researcher who collected measurements as a potentially confounding factor. In fact, different researchers might influence biometric measurements by some degree of subjectivity due to individual systematic error, potentially affecting discriminant rate (Dechaume-Moncharmont et al. 2011). To this end, we used permuted DFA (pDFA; Mundry and Sommer 2007). pDFA controls for non-independence of measurements within researchers, by testing for the significance of the discriminability between sexes within researchers through a permutation approach. This technique randomises data repeatedly and compares discriminability of the original data with that of the randomised (“permuted”) data for which the null-hypothesis (no discriminability between sexes) is, by definition, true. The non-independence within researchers is maintained also in the permuted data. The proportion of randomized datasets revealing a number of correct assignments at least as large as the original data set equals the one-tailed p value (Mundry and Sommer 2007). We conducted pDFAs for our fully crossed design (i.e., each researcher measured both male and female birds) using 1000 random selections and 10,000 permutations. We performed this analyses using an unpublished function in R (R Core Team 2013) conceived by Roger Mundry, after Mundry and Sommer (2007)’s work. This R function is based on the function lda of the R package MASS (Venables and Ripley 2002).

We also performed morphometric sexing using logistic models and, to account for the researcher effect, logistic mixed models (Supplementary Material 1). Finally, we conducted sensitivity analyses for both the discriminant analysis and logistic model developed to sex penguins through bill measurements, in order to investigate variation in discriminant rate and its uncertainty according to a simulated decrease in sample size (Supplementary Material 2).

Results

Bill length and bill depth were significantly larger in males (mean ± SE; bill length, M: 37.79 ± 0.14 mm, n = 237, F: 35.79 ± 0.12 mm, n = 264, t = -10.85, p < 0.0001; bill depth, M: 18.19 ± 0.07 mm, n = 237, F: 16.94 ± 0.06 mm, n = 264, t = 13.48, p < 0.0001), whereas body mass did not differ between sexes (mean ± SE; M: 4205.6 ± 31.8 g, n = 237, F: 4238.7 ± 28.4 g; n = 264, t = 0.77, p = 0.4374). Sexual dimorphism was greater for bill depth (SDIS = -7.09%; SDILG = 0.073; SDIG = 7.35%) than for bill length (SDIS = -5.44%; SDILG = 0.059; SDIG = 5.59%). When we applied to our study population discriminant functions or logistic equations previously implemented for sexing Adélie penguins in other colonies, we found a generally low classification power (Kerry et al. 1992: 63.9% of correctly classified penguins; Polito et al. 2012: 61.2%; Gorman et al. 2014: 57.6%; n = 501 individuals).

When we considered bill length (BL) and bill depth (BD) measured from birds in our study colony, the linear combination of these variables significantly distinguished the two sexes (Mann–Whitney test on FDA scores: U = 10,444, z = -12.88, p = 0.0001; Fig. 1). The discriminant function with unstandardized coefficients

$$D = 0.035023{\text{B}}_{{\text{D}}} + 0.007728{\text{B}}_{{\text{L}}}$$

re-classified correctly most individuals (77%; 76.2% after jackknifed cross-validation), where penguins having D > 0.8974 were assigned to males. This results was further validated by bootstrap analysis, with a bootstrap 95% confidence interval for the discriminant rate ranging from 73 to 81% (bootstrap root mean square error of 2%).

Fig. 1
figure 1

Bill length and bill depth in 501 adult Adélie penguins Pygoscelis adeliae measured at Edmonson Point, Ross Sea (blue dots: males; red dots: females). Solid line shows decision line of sex assignment based on discriminant function analysis (dashed line: bootstrapped 95% confidence intervals). Inset: Kernel density estimate (KDE) of discriminant scores (D) for male (blue area) and female (red area) Adélie penguins yielded by discriminant function analysis based on bill length and bill depth (dashed lines: median scores)

Similarly, when we considered bill length, bill depth and body mass (BM), the linear combination of these variables significantly distinguished male and female penguins (Mann–Whitney test on FDA scores: U = 9724, z = -13.33, p = 0.0001; Fig. 2). The discriminant function with unstandardized coefficients

$$D = 0.035656{\text{B}}_{{\text{D}}} + 0.007887{\text{B}}_{{{\text{L~}}}} - ~0.000024{\text{B}}_{{{\text{M~}}}}$$

re-classified correctly most individuals (78.2%; 73.8% after jackknifed cross-validation), where penguins having D > 0.8144 were assigned to males. This results was further validated by bootstrap analysis, with a bootstrap 95% confidence interval for the discriminant rate ranging from 74 to 82% (bootstrap root mean square error of 2%).

Fig. 2
figure 2

Body mass, bill length and bill depth in 501 adult Adélie penguins Pygoscelis adeliae measured at Edmonson Point, Ross Sea (blue dots: males; red dots: females). Grey plane shows decision plane of sex assignment based on discriminant function analysis. Inset: Kernel density estimate (KDE) of discriminant scores (D) for male (blue area) and female (red area) Adélie penguins yielded by discriminant function analysis based on bill length, bill depth and body mass (dashed lines: median scores)

Even when controlling for the researcher effect, cross-validated classifications were significantly greater than expected by chance, showing a slight decrease in discriminant rate (pDFA; BL + BD: 72.7%, p = 0.0328; BL + BD + BM: 71.4%, p = 0.0344). The classification power obtained by DFA and pDFA was also confirmed when we sexed birds using logistic models and logistic mixed models, which provided a comparable discriminant rate for each type of analysis (Supplementary Material 1). Simulations performed on both the discriminant analysis and logistic model developed to sex penguins through bill measurements showed that reduction in sample size markedly increased the uncertainty in classification power (Fig. 3; Supplementary Material 2).

Fig. 3
figure 3

Sensitivity analysis showing classification power (top: percentage of correctly classified individuals) and its uncertainty (middle: range of the bootstrap confidence interval; bottom: root mean square error of classification power) in relation to simulated reduction in sample size, for both the a discriminant function analysis and b logistic model ran to sex penguins using bill measurements. From the original dataset (n = 501 individuals), 500 subsamples were randomly selected for each of the ten levels of sample size. Then, morphometric sexing by DFA and logistic model was performed for each of these 5000 subsamples (grey dots). The black line connects the median value achieved by the 500 simulations performed for each sample size, whilst yellow band includes 95% simulations. Red dashed line shows the value estimated in the original analysis, using the complete dataset

Discussion

Morphometric sexing can be invaluable to assist ornithological research as a quick tool when molecular sexing cannot be performed, particularly in remote field conditions like those occurring in Polar regions. This technique may be useful especially for criptically dimorphic species, i.e. those exhibiting an apparently low degree of sexual dimorphism such as Spheniscidae (Agnew and Kerry 1995). Nevertheless, discriminatory power may depend on geographical, population-specific differences (e.g. Valenzuela-Guerra et al. 2013; Steinfurth et al. 2019, for penguins). Indeed, applying to our study population those morphometric functions previously implemented for the Adélie penguin in other colonies would not provide a sufficient classification power to sex penguins at Edmonson Point. Our work evaluated the potential for morphometric sexing in an Adélie penguin population breeding in Central Victoria land and considered a large dataset of measurements to test how different analytical techniques influenced discriminatory power.

In line with our expectation, males had larger bills than females, although indices of bill dimorphism were slightly lower to those reported or estimable by previous studies (cf. Ainley and Emison 1972; Kerry et al. 1992; Polito et al. 2012; Gorman et al. 2014). Sex difference in bill size could be due to some degree of sexual selection exhibited by Adélie penguins for this trait (intrasexual selection: Ainley and Emison 1972, intersexual selection: Davis and Speirs 1990). Sexual difference in bill size originates in early development, with male chicks retaining larger and faster-growing bills than females (Jennings et al. 2016). Accordingly, throughout adulthood, a larger bill may increase male individual fitness by several mechanisms, supporting multiple adaptive meanings for its sexual selection. For example, the bill can represent a crucial ‘weapon’ and/or signal in male agonistic contests, territory defence and courtship (Ainley 1975; Spurr 1975), therefore a larger bill would provide advantages in male competition. Similarly, having a larger bill could also help collecting bigger stones to build high-quality nests (Tenaza 1971) or to achieve extrapair copulations (Hunter and Davies 1998), ultimately increasing reproductive success (Tenaza 1971; Hunter and Davies 1998). Alternatively, Ainley and Emison (1972) suggested that sexual dimorphism in bill size may have originated due to sexual difference in foraging niche, e.g. because males may feed on larger prey. However, this hypothesis remains to be tested, as contrasting evidence on sex-specific adult diet partitioning has been provided amongst Adélie penguin populations (cf. Clarke et al. 1998; Gorman et al. 2014; Widmann et al. 2015). In any case, the greater dimorphism in bill size than in body mass (the latter was not detected in our study, but see Ainley and Emison 1972; Fairbairn and Shine 1993; Ainley 2002: 105; Gorman et al. 2014) supports that adult Adélie penguins are sexually dimorphic in bill size, whilst body mass dimorphism appear to be less marked or, most likely, condition-dependent (cf. Ainley and Emison 1972).

Our findings provide moderate support for sex discrimination through bill size in the Adélie penguin population breeding at Edmonson Point, despite discriminatory power seems slightly lower than previous studies conducted on this species (Table 1). However, earlier research has suggested caution when comparing studies conducted with different sexing methods and morphological characters, as well as studies not reporting the confidence interval of discriminant rate (Dechaume-Moncharmont et al. 2011). Interestingly, considering body mass in addition to bill measurements does not improve our classification power significantly, possibly because body mass is not as sexually dimorphic as bill size and/or because it is condition-dependent (see above). In particular, the actual effect of body mass may have been masked by its temporal variability across dates and years of sampling, as well as by spatial/environmental parameters underlying date- and year-specific body condition (e.g. Ballard et al. 2010). In addition, our classification power decreases when controlling for the researcher effect, suggesting that proper analytical methods can handle the uncertainty arisen from this confounding factor, which should therefore be accounted for to achieve more reliable estimates of the ‘fixed effect’ of morphological variables. Discriminant rates achieved through DFA and pDFA are similar to classification rates obtained by logistic models and logistic mixed models, meaning that both classification methods represent suitable alternatives for morphometric sexing. Albeit potential source of error in assessing the actual sex of penguins by cloacal examination might not be excluded, affecting our classification power achieved through morphometrics, we took robust precautions to avoid misclassifying the true sex of our marked individuals (see Methods). Hence, we suggest that two non-mutually exclusive reasons may ultimately explain the lower discriminatory power of our morphometric sexing with respect to that found by previous studies on the Adélie penguin.

Table 1 Current knowledge on the performance of morphometric sexing in adult Adélie penguins Pygoscelis adeliae

(I) Population specific differences in sexual dimorphism. Dechaume-Moncharmont et al. (2011) highlight that the degree of sexual dimorphism has the largest effect on discriminatory rate between sexes. In turn, the relatively lower sexual dimorphism we found at Edmonson Point may have ultimately affected the sexing potential of bill size. Several specific features of our study colony may have reduced sexual dimorphism in bill size, depending on the origin of sexual dimorphism. Sexual selection theory predicts limited investment in exaggerated male traits when habitat quality is poor, because bearing such characters elicits costs (Warren et al. 2013, for a review). The potential for sexual selection can be also limited by predator-driven offspring mortality (Byers and Dunn 2012). If sexual dimorphism in bill size was due to sexual selection, one could speculate that a lower sexual selection on bill size may have occurred because of the relatively low habitat quality at Edmoson Point. In this colony, persistent fast-ice and high density of terrestrial predators (South Polar Skua Stercorarius maccormicki) seem to increase energetic constraints of penguins (see Clarke et al. 1998; Olmastroni et al. 2004, 2020; Pezzo et al. 2007; Mori et al. 2021, for environmental-related constraints at Edmonson Point), which might have reduced the potential investment in sexual selection. Nevertheless, this hypothesis remains to be tested across colonies differing in habitat quality/predator density. Alternatively, if bill size dimorphism originated because of sex-specific foraging niche of Adélie penguin (which should reduce intraspecific competition by increasing intersexual niche partitioning: Ainley and Emison 1972), a lower dimorphism may have been triggered by the relatively lower degree of competition for food in our study colony, as expected in the small-sized ones (Ainley et al. 2004). Although Clarke et al. (1998), in our study colony, found that meals of males were greater than those of females, sex-specific data on prey size would be necessary to test whether a lower dimorphism in bill size may have been linked to the lack of intersexual niche partitioning. Provided that future studies are required to corroborate either hypothesis, a lower degree of sexual dimorphism in morphological traits could have played a major role on the lower discriminant rate we found in our study population compared to that obtained in other colonies, where both the sexing potential of bill size and its sexual dimorphism are higher (Scolaro et al. 1990; Kerry et al. 1992; Gorman et al. 2014; Polito et al. 2012).

(II) Sample size. Our study involved a considerably larger dataset than previous works concerning morphometric sexing in the Adélie penguin (Table 1). Sample size influences discriminatory power, with simulations conducted on several bird species showing how the chance of misclassification increases especially with less than 100 individuals (Dechaume-Moncharmont et al. 2011; Boucheker et al. 2020). Whilst our simulations conducted on the Adélie penguin yield a relatively stable classification power even with small samples, they show that reduction in sample size reflects a much higher uncertainty in the discriminant rate (Fig. 3). If so, we cannot rule out that previous studies about morphometric sexing in the Adélie penguin may have had some bias in discriminatory power due to lower sample size, questioning their apparently higher performance. In line with Dechaume-Moncharmont et al. (2011), we thus suggest caution about the reliability of morphometric sexing in avian studies based on small sample size, further recommending to estimate the confidence interval of discriminant rate as an index of precision. We should also note that the use of large sample sizes in morphometric sexing analyses has been traditionally limited by the fact that datasets are split to account for measurements taken by different researchers involved in the field. Indeed, ornithologists have often run separate morphometric analyses on data obtained from long-term monitoring programmes according to the field season or year, because of measurements taken by different researchers in those periods or to account for improved researcher experience through time (e.g. Kerry et al. 1992). For future assessments of morphometric sexing criteria on individually recognizable birds, our work shows that increasing the sample size whilst achieving a more reliable estimate of sexing performance would be possible using statistical tools that are able to handle such confounding factor. A simple solution would be pooling annual series of morphological measurements taken by different field researchers and address this issue through permuted discriminant analysis (Mundry and Sommer 2007) or logistic mixed models. Likewise, these approaches would make possible to account for other potentially confounding factors for which discriminant functions are often derived separately, such as different age classes/cohorts (Boucheker et al. 2020) and study areas (Valenzuela-Guerra et al. 2013).

Although currently abundant in its distribution range (Lynch and LaRue 2014; Southwell et al. 2017), the Adélie penguin is expected to face environmental changes (Ainley et al. 2010; Trathan et al. 2015) and increasing human impact in the Antarctica, on the next decades (Coetzee and Chown 2016; Pertierra et al. 2017; Southwell et al. 2017). In particular, contrasting trends in penguin populations related to different climate scenarios over the continent (Ainley et al. 2010; Cimino et al. 2016) will possibly increase research effort on this sentinel species, making morphometric sexing a helpful tool for long-term studies in the field. Kerry et al. (1992) reported that a discriminant power greater than 80% can be acceptable to sex penguins for practical purposes, along with that unequivocal sex assessment would still require other methods. In our study population, though we would be able to sex most individuals through our discriminant functions, misclassification rate would be high to rely solely on this technique. In agreement with Kerry et al. (1992), we rather suggest that morphometric sexing via discriminant analysis should be used in combination with tools providing further sexual clues. For example, at the onset of breeding season, noninvasive and cost-effective sexing methods based on behavioural observations, such as inspecting the position of male during copulation or assessing the first incubation shift, are currently used (e.g. Lescroël et al. 2019) and may represent complementary alternatives. At the same time, as suggested by Polito et al. (2012), molecular sexing can then be targeted on individuals whose confidence in the discriminant function is low (see Mori et al. 2020, for feather-based molecular sexing in the Adélie penguin).

Our results highlight that methodological and population-specific pitfalls may hamper performance of morphometric sexing in weakly dimorphic birds. Although population-specific degree of sexual dimorphism may still limit the broad applicability of this field method, i.e. throughout a species’ distribution range, we report several analytical precautions that may help address the former. Whilst we confirm that sexing birds through morphometric measurements can assist alternative methods as a noninvasive, quick and cost-effective technique, our work emphasises that some requirements need to be prior fulfilled: (i) population-specific validation; (ii) large sample size; (iii) accounting for variability in systematic error bias, when measurements are collected by different researchers.