Introduction

Ornaments or decorative traits such as colorful plumage, fur, and skin have evolved across the animal kingdom for use in, for example, mate choice and male-male competition (Berglund et al. 1996). Darwin (1871) considered the conspicuous colors exhibited by many animal species to have evolved under sexual selection, as males of many species tend to show more and brighter colors than females. Hence, colorful ornaments are often sexually dimorphic. Increased color expression is selected for when it attracts individuals of the opposite sex (inter-sexual selection) or when it aids in competition with same sex individuals (intra-sexual selection) (Andersson 1994).

There is a large body of literature demonstrating that many colorful ornaments and signals are linked to sexual selection, with signal evolution driven by mate choice and/or male-male competition (e.g., reviewed in Caro 2005; Hill 2006; Senar 2006; Price et al. 2008). Considerable variation in ornamental coloration has evolved between species (Cuthill et al. 2017); however, there is also substantial variation in coloration within species (Caro 2005) even within the same sex (e.g., reptiles, Fitze et al. 2009; birds, Pryke et al. 2001). Studies investigating what predicts color variation within mammal species are less common (e.g., West and Packer 2002). Some colorful ornaments can be linked to physical conditions. For instance, skin color variation in reptiles is associated with body mass and corticosterone levels (Fitze et al. 2009). Bird plumage color variation is related to male body mass (Velando et al. 2006; Dobson et al. 2008; Vergara and Fargallo 2011), male immune competence (Velando et al. 2006), endoparasitic infection (McGraw and Hill 2000), and male age (Nicolaus et al. 2007). In mammals, the dark mane of male lions (Panthera leo) is associated with higher levels of testosterone compared to lighter-maned males (West and Packer 2002).

Colorful signals can also be associated with male status characteristics, e.g., competitive abilities. By communicating status to competitors through color signals, escalations of conflicts to potentially lethal levels may be prevented (Rohwer 1982). For example, in birds, redder plumage color has been associated with fewer aggressive encounters with intruders (Pryke et al. 2001). Colorful signals in mammals are less common and less studied but may also correlate with competitive abilities, for example, darker mane color in male lions is related to dominance (West and Packer 2002).

Primates are the most colorful mammalian taxon (Setchell 2015). Many primate species show a wide range of colors on the face, chest, and anogenital area, as well as a variety of pelage colors, which have been linked to sexual selection. Variation in male primate coloration is less often associated with mate choice in primates than in other taxa. Examples reported come from mandrills (Mandrillus sphinx) and rhesus macaques (Macaca mulatta) where males with redder face color or brighter coloration, respectively, receive more female proceptive behavior. However, it is not clear what drives color variation within these species (Setchell 2005; Dubuc et al. 2014a). Correlational studies on primates suggest that male coloration is more likely to represent social status (Setchell et al. 2008; Bergman et al. 2009; Marty et al. 2009; Grueter et al. 2015). In geladas (Theropithecus gelada), for instance, the skin color of leader males holding larger groups was redder than that of males holding smaller groups (Bergman et al. 2009). Similarly, a study of black-and-white snub-nosed monkeys (Rhinopithecus bieti) showed that males’ lip color reddened with both increasing age and when holding a group compared to subadult and non-group holding males (Grueter et al. 2015), indicating that color does covary with social status.

In primates, the most common skin colors associated with signaling are red and blue, with blue being rare compared to red. Red is usually produced by the amount of oxygenated blood flowing into the outer layers of a skin patch and can be regulated by testicular hormones (Vandenbergh 1965; Dixson 2012). Blue, in contrast, is a structural color, produced by coherent scattering of light by dermal collagen structures. The saturation of blue colors can be influenced by underlying melanin pigmentation (Prum and Torres 2004) and tissue fluids (Price et al. 1976), as well as by androgens, which show an effect on the amount of dermal collagen (Markiewicz et al. 2007).

Previous studies on signals of male social status in primates have addressed the covariates of single color patches (e.g., Setchell and Wickings 2005; Marty et al. 2009). However, multi-component color signals may be more accurate and informative for communication than single colors (Renoult et al. 2011; Grueter et al. 2015; reviewed in Setchell 2015). Additionally, multi-component color signals can be informative for different receivers, as shown in other taxa (Zambre and Thaker 2017). Higher color contrast within multi-component color signals may be more conspicuous (Endler 1990). Here, we refer to contrasts between colors both as measurements that calculate the difference between two colors and in reference to analyses that relate multiple colors to one another. Surprisingly, in primates only Renoult et al. (2011) and Grueter et al. (2015) have considered contrasts between colors, rather than the effects of a single color. Overall, multi-component color signals could help us to understand the evolution of signal complexity (Endler and Mappes 2017).

The genus Chlorocebus comprises several species in which males possess multi-colored genitalia. There is large variation in color between Chlorocebus species but also to some extent within species (all have blue scrota and red penises and some have a red perianal area, Hill 1966). Hence, members of this taxon are ideal for investigating how male characteristics covary with genital coloration and the presentation of genital colors (where the latter can provide insights into the communicative value of genital displays). For example, in green monkeys (Chlorocebus sabaeus) scrotal color varies from light brown to dark blue, while in vervet monkeys (Chlorocebus pygerythrus), scrotal color differences are less pronounced, ranging from light to dark blue (Hill 1966). Vervet monkeys, however, are of particular interest, as males have multi-colored genitals (Hill 1966), with single colors shown to vary across males (Cramer 2012) and males typically present their multi-colored genitals during several behavioral displays, including the Red-White-and-Blue (RWB) display (Struhsaker 1967; Henzi 1985). This display has been described as a male walking back and forth in front of or encircling another male while carrying its tail vertically erect and presenting to his opponent the red perianal area and the blue scrota with the white fur in-between (Struhsaker 1967; Henzi 1985; see Fig. 1a). Previous studies on male genital color in vervet monkeys have focused only on the blue coloration of the scrota and its association to male characteristics. Data presented by Cramer (2012) suggested that differences in blue scrotal color are unrelated to copulation frequencies, cortisol concentration, or parasite infection. Other colors and, in particular, the contrast of colors in the Chlorocebus multi-colored ornament have yet to be studied.

Fig. 1
figure 1

a Original RWB display (standardized for lighting); b male vervet genitals with measured genital areas (perianal area, fur, and left and right scrotum) circled in white (standardized for lighting)

To gain a better understanding of the evolution of genital coloration, we first have to explore if variation in coloration encodes information that could be informative to conspecifics. Accordingly, the aims of this study were, first, to investigate patterns and covariates of genital color presentation by male vervet monkeys and, second, identify the potential covariates of genital color and its variation to assess whether single or multiple genital colors are related to male characteristics. To do so, we used an operational definition comprising the original RWB display, as described by Struhsaker (1967) and Henzi (1985), along with the display of genitals across a variety of contexts. Specifically, we recorded all occurrences of the physical action whereby a male lifted his tail and exposed the genitals, the red perianal area, and the blue scrotum completely, irrespective of the context or to which sex it was shown (see supplementary Fig. S1). We refer to this as the RWB genital display hereafter. We examined both the distribution of genital displays across contexts and whether RWB genital display frequency was related to male characteristics, including genital colors (single or multiple colors), dominance rank, and group tenure. We predicted that the frequency of RWB genital displays would be influenced by differences in color contrasts, as a signal of male quality (Bergman et al. 2009). We also predicted that the RWB genital display frequency would reflect male social status (Bergman et al. 2009), as the display was originally described as occurring during agonistic interactions between males (Henzi 1985). To assess whether male characteristics might explain inter-individual differences in genital color, we examined the potential influence of dominance rank, age, group tenure, injuries, and fecal glucocorticoid metabolite (fGCM) concentrations. Finally, we investigated the relationship between intra-individual differences in genital colors and male characteristics. We predicted that inter- and intra-individual color variation would reflect male social status, as shown in other primate males, but not male health status, given that a previous study found no link between the blue scrotum coloration and cortisol concentration or parasite infection (Cramer 2012).

Methods

Study population and study period

We collected data from three habituated groups (RST, RBM, PT) of vervet monkeys at the Samara Private Game Reserve, Eastern Cape, South Africa (32°22′S, 24°52′E). All groups were observed at close range and each individual was identifiable via visual cues (Pasternak et al. 2013). During our study period in 2016, the group sizes were (mean ± SD) 25 ± 4 (m, 11 ± 2; f, 14 ± 2) individuals in group RST, 14 ± 2 (m, 5 ± 1; f, 10 ± 1) in RBM, and 15 ± 3 (m, 6 ± 2; f, 9 ± 0) in PT. Census data have been recorded on a near-daily basis for RST and RBM since 2009 and for PT since 2012, including births and deaths, the sex of each subject, the ID of behavioral mother per subject, and dates of male migration. These data were recorded as close to the date of occurrence as possible, but usually within a 2-day window. As our data collection is based on focal animals, it was not possible to record data blind.

Longitudinal data collected include dominance interactions (i.e., submission, displace, supplant, facial threat, vocal threat, lunge, physical contact; Freeman 2012), mating interactions (i.e., genital sniff, grab, female refusal, mount, and ejaculation) as well as inter-troop encounters, all collected via ad libitum sampling (Altmann 1974). As part of the longitudinal data collection, 10-min group scans were conducted every 30 min in which the general activity (i.e., locomotion, foraging, grooming) was recorded of as many individuals of the group as possible within that time period (for more details, see Minkner et al. 2018).

RWB genital display

We first assessed the behavioral context in which the RWB genital display was presented. To do so, we conducted a pilot study in 2015 and then continued RWB genital display data collection in the mating season (April to July: Freeman et al. 2012) in 2016. We recorded the RWB genital display ad libitum and noted the context (inter-troop encounter, mating, dominance, grooming, foraging, and locomotion) in which it was presented as well as to whom it was presented whenever the recipient was clearly observable. A recipient was an individual in the vicinity of the displaying male that had its face directed towards the visible genitals.

For the context analysis, we pooled the RWB genital display data for approximately 2 years (June 2015–May 2017). Differences in the number of RWB genital displays presented in different behavioral contexts were investigated by calculating the ratio between the frequency of RWB genital displays per context and the total frequency of the given context behavior. For the calculation of the total frequency per behavior, we used scan data (grooming, locomotion, foraging) and ad libitum data (mating interactions, agonistic interactions, inter-troop encounters).

Images

We aimed to collect multiple genital pictures from all males (N = 20) of our three study groups throughout one entire mating season. This longitudinal sampling approach enabled us to investigate within- and between-male color variation. Accordingly, genital pictures were taken by MY from 18th of April until 9th of September 2016. We were not able to collect a baseline before the mating season, so we continued data collection for about a month immediately following the mating season. We split efforts across groups depending upon the number of adult males available by following males opportunistically to take a picture series whenever males presented the full color display (N = 959 genital pictures; see Fig. S1). Images were collected during the genital presentation, when a male vervet displayed its red perianal area, white fur, and blue scrotum by lifting its tail up for a few seconds, or whenever the tail was up during climbing or walking within the group. Each display directly observed by MY was recorded with three to six sequential photos (hereafter called “event”). We aimed to record an event every 2 weeks for each male, but decreased the sampling interval when possible. Each vervet image event was followed immediately by pictures of an X-Rite ColorChecker color standard (24 color patches) in the same location, under the same light conditions, and with the same camera settings (sequential method, following Higham 2006; Bergman and Beehner 2008; Dubuc et al. 2014a) to correct for ambient light and camera setting differences between events. We used a Canon EOS 1000D with a Canon EF 75–300 mm zoom lens and photos were recorded in RAW format. The distance between observer and the study male ranged from 1 to 10 m (mean ± SD = 2.28 ± 1.17 m). Only pictures in focus and with the full perianal and scrotal areas visible were used for further analysis. Additionally, we excluded all photos from further analysis that were overexposed, clipped, or contained dappled light, as in overexposed and clipped photos color cannot be recovered and dappled light could influence color measurements. This resulted in a total of 122 photographed events (or 405 genital pictures) on 97 recording days from 20 males that were used in the analysis below, with 2 to 10 recording days per male (4.8 ± 2.0; mean ± SD), an average of 23.5 ± 19.7 days (mean ± SD) between recording days and 3 to 19 events per male (7.7 ± 3.9; mean ± SD).

Visual modeling

To model colors as seen by vervets, we used standard visual modeling methods (Stevens et al. 2007, 2009). We converted images from camera RGB color space to Chlorocebus LMS color space (i.e., characterized by quantal catches of vervet long, medium, and short wavelength photoreceptors). Our camera mapping model was a polynomial transformation generated using the Multispectral Image Calibration and Analysis Toolbox (Troscianko and Stevens 2015) based on the spectral sensitivities of the Canon EOS 1000D sensor (camera calibration by JT, following Troscianko and Stevens 2015), Chlorocebus spectral sensitivities computed using a rhodopsin template (Govardovskii et al. 2000) based on peak spectral sensitivities of grivets (Chlorocebus aethiops) (λmax = 566, 535, 434, Bowmaker et al. 1991), and Cercopithecine lens transmission data (R. Douglas and M. Powner, unpublished data). For the mapping we simulated the photoreceptor responses to several natural spectra (Arnold et al. 2010; Troscianko and Stevens 2015) with D65 illumination (Ohta and Robertson 2006) by using a first-degree polynomial transformation with three interaction levels (i.e., r + g + b + rg + rb + gb + rgb). The model R2 values were all > 0.997 for all color channels. The transformation of the image from camera RGB to Chlorocebus LMS was achieved via a MATLAB script written by SW.

Color analysis

For color measurements, photos were converted to linear TIFFs using DCRaw (Coffin 2012). Four display color measurements (the perianal area, fur, and left and right scrotum) as well as the white point of a standard were obtained with a MATLAB script written by W. Allen and SW (Dubuc et al. 2014b). The script extracts camera RGB pixel values from a selected area and maps them into Chlorocebus LMS quantal catches (longwave (L), mediumwave (M), and shortwave (S)) using the visual modeling approach described above. All four areas were measured separately for each photo. For the perianal area, the reddish-brown skin around the anus was measured. The white fur was measured in the shape of a triangle between the seat calluses and the scrota, with the tip of the triangle pointing at the scrota. For the scrota, we measured the circular regions of the blue scrotal skin. Shadows, dirt particles, overexposed pixels, and fur/hairs hanging into the perianal area and the scrota were excluded during marking of the measuring area (see Fig. 1b). To account for different lighting across pictures, we divided the vervet picture pixel values by the white point pixels of the respective standard picture for every color channel. Primate visual perception is based on opponency between receptors. We calculated the red-green opponency channel for redness with the equation (L − M)/(L + M) for the perianal area (RED) and the blue-yellow opponency channel for blueness with the equation (S − (L + M)/2)/(S + (L + M)/2) for the scrotum (BLUE). Additionally, we measured the genital luminance (achromatic color) with the equation (M + L)/2 to assess the lightness and darkness of male genitals (luminance of the red perianal area (LUM_R), the white fur (LUM_W), and the blue scrotum (LUM_B)). Measurements of the left and right scrotum were averaged per male. Furthermore, measurements per picture event per male were pooled and averaged for each area separately. For days where we had several picture events of a male, we averaged measurements of picture events per day (N = 97 recording days, 4.8 ± 2.0 (mean ± SD) recording days per male).

To facilitate descriptive comparison of intra-individual color variation, we normalized individual color values with the following formula, (xi − minimum(x))/(maximum(x) − minimum(x)), with xi being the color variable value and min(x), max(x) being the minimum and maximum of all values of the respective color variable.

Color contrast calculations

To consider multiple colors simultaneously, we calculated the contrast between colors of different areas. First, the quantal catches for the three receptors were transformed into trichromatic color space to account for the vervet vision. In this color space, colors are represented as points based on their relative stimulation of vervet photoreceptors (Kelber et al. 2003). This was done with the colspace function of the R package pavo version 1.3.0 (Maia et al. 2013). Using Euclidean distance, we calculated the contrast between the color of the perianal area and the fur (red and white, CONTRAST_RW), the perianal area and the scrotum (red and blue, CONTRAST_RB), and the fur and the scrotum (white and blue, CONTRAST_WB) in R version 3.2.3 (R Core Team 2015).

Male characteristics

The following male characteristics are included as predictors in the analyses of the RWB genital display frequency as well as the analysis of inter- and intra-individual differences in genital coloration.

Male dominance hierarchy

We used the package EloRating version 0.43 (Neumann et al. 2011) in R version 3.2.3 (R Core Team 2015) to construct dominance hierarchies per group. We calculated standardized Elo ratings per picture day for the 2016 mating season for each male allowing for the comparison of ratings between groups of different sizes and at different times. We used dyadic agonistic interactions of all adult males (from 5 years of age, Henzi and Lucas 1980) recorded between 2013 and 2016 in all three groups to account for the “burn in” phase of Elo rating and to achieve more accurate ratings for the 2016 mating season (Neumann et al. 2011; Young et al. 2017b). At the start of the analysis, all males were assigned a predefined start value (k = 1000) and males migrating later into one of the groups received the same start value before an interaction occurred (Neumann et al. 2011).

Age categories

Male age categories were established using census data. For males with a known date of birth, the precise age was calculated (N = 3 out of 20). For males who immigrated into the study groups, we calculated age as the time (years) spent in the study population plus 5 years, which is the approximate age of first migration (cf. Henzi and Lucas 1980). We then categorized males into “young adult” (5 to 7 years) and “prime adult” (8+ years). We assigned six males to the category “young adult” and 14 to “prime adult”.

Tenure length

Male tenure length was calculated based on migration data. In the census, males were noted as officially migrated when they were seen constantly over 14 days in the new group, with the immigration date considered as the first day seen in the new group. Male tenure length was calculated as number of days from the immigration date to the relevant picture.

Injuries

To assess male health, we used injuries as a proxy for male health status. Injuries were recorded ad libitum. We decided to include injuries only as a binomial factor: did a male have an injury in the 2 weeks prior to a recorded picture or not (yes/ no) as the classification of injury severity can be subjective.

Fecal glucocorticoid metabolite analysis

To investigate the influence of stress on genital color variation, we analyzed fecal samples for fGCM concentrations as a proxy for physiological stress. Fecal samples for hormone analysis were collected as part of a long-term study on social behavior, the environment, and stress (Young et al. 2017a). One fecal sample per male was collected every 2 weeks. Samples were collected within 15 min of defecation (to minimize deterioration of the metabolites) from positively identified individuals, thoroughly homogenized, and 2–5 g of feces collected and stored in a 50-ml tube. Subsequently, samples were transferred into a thermos filled with ice and at the end of the field day stored in a freezer at − 20 °C. Hormone analyses were conducted at the Endocrine Research Laboratory, University of Pretoria. For steroid extraction, samples were lyophilized, and dried samples were finely ground and filtered through a mesh to remove any fibrous matter (Ganswindt et al. 2010). Samples were then extracted and analyzed for immunoreactive fGCM with a cortisol enzyme immunoassay as described in Young et al. (2017a). Concentrations of steroids were expressed in nanogram/gram dry weight (DW). Sensitivity of the cortisol enzyme immunoassay was 0.6 ng/g DW. Intra-assay coefficients of variation of high- and low-value quality controls were 4.8 and 5.8%, respectively, and inter-assay coefficients of variation of high- and low-value quality controls was 13.1 and 15.6%, respectively. We averaged fGCM concentrations over a 4-week window around the recorded picture (2 weeks before and after each recorded picture) to increase the number of pictures with an available fGCM concentration.

Group mating activity

To account for increased mating competition over the mating season in our analyses, we controlled for mating activity per group as a proxy for male-male mating competition. Our choice was based on the evidence that the number of matings was positively correlated with the number of females involved in matings (Spearman rank correlation: PT: rho = 0.967; p < 0.001; N = 25; RBM: rho = 0.960; p < 0.001; N = 25; RST: rho = 0.947; p < 0.001; N = 25) and that the number of male-male agonistic interactions positively correlated with the number of matings (Spearman rank correlation: PT: rho = 0.647; p < 0.001; N = 25; RBM: rho = 0.719; p < 0.001; N = 25; RST: rho = 0.773; p < 0.001; N = 25). Group mating activity was calculated as the number of ejaculatory matings per group, divided by the number of females and males within the relevant group per calendar week in which the picture was taken.

Statistical analysis

RWB genital display context

We ran a non-parametric Friedman rank sum test to test for differences between RWB genital displays presented in different behavioral contexts.

To assess color differences between RWB genital displays presented in different behavioral contexts, we visually inspected the color data in different contexts within scatterplots. We further ran an analysis of similarity (ANOSIM) using the R package vegan version 2.5.4 (Oksanen et al. 2019) including the eight color variables and six behavioral contexts (dominance, grooming, inter-troop encounter, locomotion, resting and foraging) to test if genital colors are more similar within than between contexts.

RWB genital display frequency

To test if a male’s genital color variables or other behavioral variables influenced how often males presented their genitals, we ran a Generalized Linear Mixed Model (GLMM, Baayen 2008) with a Poisson error structure and a log link function. The response variable was the number of RWB genital displays per male within a week of a picture recording day (i.e., the day a picture was taken and 3 days before and after the picture, resulting in a 7-day window).

As some of the described color variables (RED, BLUE, LUM_R, LUM_W, LUM_B, CONTRAST_RW, CONTRAST_RB, and CONTRAST_WB) were correlated and created collinearity issues, we determined independent color predictors via factor analysis with varimax rotation. The first run resulted in CONTRAST_RW as the only variable loading heavily on Factor 4 (see supplementary online material and supplementary Table S1 for details on factor loadings), which is why we excluded this variable in a second factor analysis. The reduced factor analysis revealed three factors with Eigenvalues > 1 (explaining overall 74.3% of the variance), with LUM_W and LUM_B loading on Factor 1 (hereafter FACTOR_LUM), BLUE, CONTRAST_RB, and CONTRAST_WB loading on Factor 2 (hereafter FACTOR_B), and RED and LUM_R loading on Factor 3 (hereafter FACTOR_R) (see Table 1 for final factor loadings). As a consequence, we included the factors FACTOR_LUM, FACTOR_B, and FACTOR_R and the color variable CONTRAST_RW as color predictors and further included male dominance and tenure as male characteristic predictors. Group mating activity was included as a control variable. Male and group ID were included as separate random effects. We initially ran a model without random slopes to avoid over-parameterization and subsequently included only random slopes for significant predictors within male ID and group ID (see supplementary Table S2 for an overview of included model terms). All response variables included in the factor analysis and the models were checked for normal distribution and transformed where necessary (see supplementary Table S2). Additionally, we transformed all heavily skewed continuous predictor variables to ensure no differences in leverage between large and small values and to meet model assumptions (see supplementary Table S2). Furthermore, all continuous predictors were set to a mean of zero and a standard deviation of one (z-transformed) to facilitate interpretation of model estimates (Schielzeth 2010). We controlled for the variability of observation effort per male by including the observation time per day and male within a picture week as an offset term in the model.

Table 1 Results of the reduced factor analysis

As genital displays were observed in both communicative and non-communicative contexts (e.g., when a male was being groomed), we ran a reduced RWB genital display frequency model to control for genital displays shown in potentially non-communicative contexts. Here, we entered only those genital displays seen in male-male dominance contexts. This allows us to assess the extent to which the context of the display is likely to influence our results. The model formulation was identical to the RWB genital display frequency model, except for the color variables. Ideally, we would have liked to include color variables as predictors in this model but due to a small number of photos in the male-male dominance context we had to exclude all color predictors.

Inter-individual differences in color

To analyze inter-individual differences between male genital colors, there was no need to use the color factors described above as each individual color variable is used as the response variable in a series of models, hence issues of collinearity do not arise. Additionally, separate color variables are more intuitively interpretable, compared to factor axes. We tested which male characteristics predicted male color using the separate color measurements described above as responses, i.e., (i) RED, (ii) BLUE, (iii) LUM_R, (iv) LUM_B, (v) CONTRAST_RW, (vi) CONTRAST_RB, and (vii) CONTRAST_WB. We excluded LUM_W as a response because we do not expect the luminance of the white fur to be relevant for communication on its own, but only in combination with other colored areas (contrast). Additionally, we tested whether male characteristics such as dominance, age, tenure, injuries, and fGCM concentrations influenced male color by running two sets of models for each color variable.

In our first set of models, we ran Linear Mixed Models (LMMs, Baayen 2008) with six fixed effects test predictors: male dominance, age, tenure, injuries, and an interaction between male dominance × group mating activity (see supplementary online material for explanation of including interaction terms). Group mating activity was included as control predictor. Furthermore, we included male and group ID as random effects in the models (see supplementary Table S2 for an overview of included model terms and response and predictor transformations).

Due to an absence of fGCM data for three males, we ran the second set of models using only males for which fGCM concentrations were available (N = 17 males). These LMMs included as predictors the 3-way interaction fGCM concentration × dominance × group mating activity (see supplementary online material for explanation of including interaction terms) and the corresponding 2-way interactions and main terms. All terms comprising fGCM concentration were considered as test predictors and all others as control predictors. We included male and group ID as random effects in the models (see supplementary Table S2 for an overview of included model terms and response and predictor transformations).

In both sets of models we included all possible random slopes, but no correlations between random intercepts and random slope terms (see supplementary online material for details). We also ran both sets of models without any random slopes to test for over-parameterization of the models; however, our results remained qualitatively unchanged.

Intra-individual variation in color

To test for intra-individual variation within males, we ran two additional sets of models for which we subtracted a reference value from each color variable (except LUM_W for reasons addressed above) and used the remainder as the response variable. As reference values we used the mean of each color variable across all picture days per male. As above we ran the first set of models using all males and no fGCM data, and the second set only for males with available fGCM data. Model formulations of the respective sets were identical to the inter-individual model sets (see supplementary Table S2 for an overview of included model terms, response, and predictor transformations).

For the full data set and subset models of intra-individual variation, we used the same random slopes as in the respective inter-individual differences models described in detail in the supplementary online material.

General model procedures

To account for temporal autocorrelation through data points recorded close to each other (Kulik et al. 2015), we checked each model separately to determine whether an autocorrelation term was needed (see supplementary online material for more details) and included it where necessary.

We fitted LMMs with the function lmer and the GLMMs with glmer of the R package lme4 version 1.1.11 (Bates et al. 2015) using Maximum Likelihood (ML, Bolker et al. 2009). We inspected the residuals visually within a qqplot as well as residuals plotted against fitted values and found no obvious deviation from the assumptions of normally distributed and homogeneous residuals in any model presented here. By comparing estimates derived from a model generated using all data with estimates derived from models in which the levels of the random effects were dropped one by one, we were able to confirm that the models were stable. Additionally, we checked for collinearity with the vif function of the R package car version 2.1.1 (Fox and Weisberg 2011) and found no effect (Variance Inflation Factors (VIFs) of all 30 models < 1.26). For the GLMMs, we tested for overdispersion and found no evidence (RWB genital display frequency model: dispersion parameter = 1.26; reduced RWB genital display frequency model: dispersion parameter = 1.05).

By comparing the fit of the full model (all predictors included) with a null model (only including control predictors, autocorrelation term when needed, random intercepts, and slopes), we tested for the significance of the main effects and their interactions using a likelihood ratio test (LRT, Dobson 2002). Only when the full-null model comparison was significant, was the significance of fixed effects calculated by running likelihood ratio tests using the drop1 function (Barr et al. 2013). Effects of control predictors will only be touched upon briefly as they are not the focus of the study. To generate confidence intervals for all model estimates, we used the function bootMer of the R package lme4 (Bates et al. 2015), running 1000 parametric bootstraps. To calculate R2-like effect sizes (R2mar—marginal R2 value for the fixed effects; R2con—conditional R2 value for the whole model) of the full models, we used the function r.squaredGLMM of the R package MuMIn version 1.40.0 (Bartoń 2017).

All statistical analyses were conducted in R version 3.2.3 (R Core Team 2015).

Results

RWB genital display context

Out of 1658 recorded RWB genital displays (mean ± SD: 2.65 ± 2.25 RWB genital displays per day and group), 28.7% were shown during locomotion, 22.9% during agonistic interaction, 16.8% during grooming, 12.5% during inter-troop encounters, 4.8% during mating interactions, 3.0% during foraging, and the remaining 11.3% in an ambiguous context. Relative to how often each context occurred, RWB genital displays were most frequent in dominance and mating interactions (in 4.1 and 4.0% of all interactions, respectively; see supplementary online material and supplementary Fig. S2 for further details).

When looking at the similarities of color variables between contexts via visual inspections and ANOSIM, color variables did not appear to vary systematically between contexts (for visual inspection, see supplementary Fig. S3 and Fig. S4, ANOSIM: r = 0.004, p = 0.739, supplementary Fig. S5).

Color variation

Genital coloration varied between and within individuals (see Table 2, supplementary Table S6–S8, and supplementary Fig. S6S8; for general color variation, see supplementary Fig. S9). For example, between males, color variation was largest in the absolute color values of LUM_B and lowest in the contrast between red and white (Table 2 and supplementary Fig. S6S8). Within individuals, average color varied the most in RED and least in the contrast between red and blue (see supplementary Table S6–S8 for details on intra-individual color variable range, average, and standard deviation).

Table 3 Results of the RWB genital display frequency model (GLMM with Poisson distribution) including all display contexts. Estimate, estimated coefficients for the predictors; SE, standard error; lCI, lower confidence interval; uCI, upper confidence interval; χ2, Chi-square test value. No autocorrelation term was needed

RWB genital display frequency

Our first model investigated whether dominance, tenure length, or genital coloration of the sender had an influence on the frequency of the RWB genital display. The full versus null model comparison suggested that this might be the case (LRT: χ2 = 11.356, df = 6, p = 0.078; effect size: R2mar = 0.260, R2con = 0.301). In particular, higher-ranking males showed RWB genital displays more frequently than lower-ranking ones (Fig. 2, Table 3, see supplementary Table S3 and Table S4 for details on random effects and confidence intervals). Our control variable, group mating activity, showed a marginal negative effect on the frequency of RWB genital displays (see Table 3). Tenure length and the genital coloration variables showed no pronounced association with the frequency of the RWB genital display (see Table 3).

Fig. 2
figure 2

Effect of dominance rank (Elo rating) on the number of RWB genital displays per picture week (controlled for observation time). The dashed line indicates model fit to the data and dotted lines indicate bootstrapped 95% confidence intervals. Each dot indicates the number of RWB genital displays within 1 week around a recording day of one male

Table 2 Range of color variable variation between individuals. Min—lowest value, max—highest value. RED and BLUE maximal range from − 1 to 1. LUM_R and LUM_B maximal range from 0 to 1. CONTRAST_RW, _RB, and _WB range from 0 to infinity. SD—standard deviation

The results of the reduced RWB genital display frequency model with dominance and tenure as individual predictors remained qualitatively the same (for more details, see supplementary online material on “Reduced RWB genital display frequency” and supplementary Table S5).

Inter-individual differences in color

None of the seven color variables were significantly associated with male dominance, age, tenure, injuries, or fGCM concentration (p ≥ 0.228 for all full-null model comparison; for detailed full-null model comparison and effect sizes, see supplementary Table S9; for predictor estimates and their confidence intervals, see supplementary Tables S11–S24). This was the case for models based on the full (supplementary Tables S11–S17) and the reduced (supplementary Tables S18–S24) data sets.

Intra-individual variation in color

In relation to an individual reference level, none of the color variables were significantly associated with male dominance, age, tenure, injuries, or fGCM concentration (p ≥ 0.223 for all full-null model comparisons; for detailed full-null model comparison and effect sizes, see supplementary Table S10; for predictor estimates and their confidence intervals, see supplementary Tables S11–S24). This was the case for models based on the full (supplementary Tables S11–S17) and the reduced (supplementary Tables S18–S24) data sets.

Discussion

Until now most studies have investigated different colors presented during the same behavioral display separately and have not considered the relationship between coloration and the number of displays shown (reviewed in Hutton et al. 2015). Using a visual modeling approach analyzing colors from a vervet visual perspective, we found that the number of displays, but not the color variables per se, was influenced by male dominance. Our analyses present the first comprehensive investigation of vervet monkey genital coloration and its potential signaling value based on a modern approach. By investigating the covariates of a multi-component color signal and the link between the behavioral display and signal coloration directly, we formed the basis for further investigations on the mechanisms and functions of conspicuous color signals in primates.

When analyzing the number of performed RWB genital displays in relation to genital coloration and male characteristics, the full-null model comparison revealed at most a trend. However, looking closer at the predictors we found a clear effect of dominance: the number of displays per data window was more than twice as high in high-ranking compared to low-ranking males. In contrast, genital color had no effect on display performance. This is partly in line with a study on house sparrows (Passer domesticus), where the number of wingbar displays was related to defense success, which indicates fighting abilities (Bókony et al. 2006). The authors did not investigate whether the number of displays was also influenced by the wingbar contrast. In vervets, one interpretation of these findings is that low-ranking males are more concerned about exposing their genitals and being vulnerable to attack compared to higher-ranking males: low-ranking males have been shown to be more likely to adduct their testes and retract their scrota in response to threatening situations than high-ranking males (Henzi 1981). Although the results of the present study should be interpreted cautiously, they suggest a similar functionality of primate color displays in vervets as in other taxa and provide valuable directions for future research. In contrast to our results, increasing wingbar contrast in house sparrows improved defense success during agonistic interactions (Bókony et al. 2006), which indicates a possible link between the behavioral display and the presented color in this species. In the present study, we lacked the necessary sample size to address display function in detail. Hence, future studies should investigate the relationship between color and display behavior in different contexts in more detail to unravel the functionality of the display behavior, as well as whether color signals influence the evolution of display behavior or vice versa.

Additionally, our analyses suggest no relationship between genital coloration and male attributes. A previous study by Cramer (2012) presented mixed results when analyzing these variables in vervet monkeys. Using a limited sample size of 10 males, Cramer (2012) found a relationship between scrotum color and mating frequency, but this effect was driven by a single individual. Similarly, a relationship between stress behavior and scrotum color was evident in one statistical test, but not in another (Cramer 2012). Gerald (2001) investigated the relationship between scrotal coloration and dominance in green monkeys in an experimental setup. By observing pairs of males of different scrotal color, the author found that males with darker scrota were able to win fights against males with pale scrota (Gerald 2001). However, it is important to keep in mind that vervet monkeys differ in their scrotal color compared with green monkeys (Hill 1966). Vervets show a smaller variation in luminance and a slightly larger variation in the blue color than green monkeys (Cramer et al. 2013), which could explain the differing results of our study. Additionally, the genital coloration, and its potential covariates, could differ between vervet and green monkeys due to evolutionary changes in genital color components. For example, using phylogenetic comparative analyses, Romero-Diaz et al. (2019) showed that in spiny lizards (Genus Sceloporus), the loss of color patches and the addition of new colors did not lead to a signal loss but rather shifts in signal attributes. Green monkeys first split off from the other Chlorocebus species ~ 3.5 Myr ago, and the South African vervets only ~ 1.8 Myr ago (Dolotovskaya et al. 2017). It is possible that signal traits changed from a pale anogenital region in green monkeys to bright red in vervets, and today the blue scrota in vervets could be relatively uninformative compared to the blue scrota of green monkeys (Gerald 2001). However, such a scenario would not explain why all other color variables did not relate to male attributes in our study as well. Results in mandrills were partly in line with our study: red face coloration was related to dominance (Setchell et al. 2008), but not to fGCM concentration in mandrills (Setchell et al. 2010). The hypothalamus-pituitary-adrenal axis, the system regulating homeostasis, responds to a variety of stressors which leads to several physiological changes (reviewed in Beehner and Bergman 2017). Due to the complexity of stress responses, however, it is challenging to relate fGCM alterations to a one-dimensional cause. As a result, further studies are needed to improve our understanding of the link between male attributes and especially physiological stress responses and signal coloration.

Even though color contrasts may provide more reliable information for conspecifics (Renoult et al. 2011), in our study none of the calculated color contrasts of different genital areas revealed a link to any male characteristics (dominance rank, tenure, age, injuries, and fGCM concentration). This deviates from two primate studies which demonstrated links between color contrast and male characteristics (Renoult et al. 2011; Grueter et al. 2015). In mandrills, high-ranking males showed a higher saturation in single facial colors, red and blue, and hence a stronger contrast between those colors, than subordinate males (Renoult et al. 2011). Contrasts between the red lip and white face color of black-and-white snub-nosed monkeys increased with age and for males holding a group during the mating season (Grueter et al. 2015). These differences between studies could be due to different social systems. Mandrills gather in hordes with up to 800 individuals (Abernethy et al. 2002) and similarly, black-and-white snub-nosed monkeys live in one-male-units and aggregate into large groups with up to 500 individuals (Grueter et al. 2015). In such large social groups, individual recognition might be more difficult, which could enhance the importance of color signals to convey individual characteristics for assessment (Bergman et al. 2009). In contrast, vervets live in relatively stable multi-male multi-female groups and interact daily and based on such interactions can assess their conspecifics. This could imply that color signals in vervets might be informative for other covariates than the ones investigated in this study and function differently than in larger primate social groups.

Several factors may have influenced our results, masking a potential impact of color. First, a drought in South Africa in 2016 led to dehydration and affected the survival of our population (Young et al. 2019). This could have influenced the blue color of the scrotum due to reduced tissue fluid in the dermis (Price et al. 1976; reviewed in Caro 2005) and may have added to the reduced variation in vervet blue scrotal color described previously. Furthermore, an increase in environmental stress (drought) may have increased testosterone (cf. Setchell et al. 2008), resulting in a darker red coloration in males in general and hence less variation in male red perianal colors. Second, the recording methods for the genital photos could have influenced the coloration data as due to the rarity of the genital presentation we also recorded male genitals during non-communicative contexts such as climbing through a tree or locomotion to increase the picture sample size. We assumed that when we recorded coloration in potentially non-communicative contexts that the coloration would still be informative in these contexts, hence, we did not expect pronounced differences in coloration between communicative and non-communicative contexts. This expectation was supported by the reduced RWB genital display frequency model: when we considered only male-male dominance contexts there was no qualitative difference in results compared to the original model including all display contexts. Additionally, a visual evaluation of the color variables, and an analysis of similarity suggested no overall associations between coloration and genital display context. Hence, we consider it unlikely that display context affected our results. Due to the lack of a sufficient number of display photos during certain contexts (e.g., during male-male dominance context) and no recipient data for the display photos, however, we cannot test if males with a certain coloration in certain contexts might direct their display to particular individuals. This needs to be the focus of future investigations. Lastly, the potential lack of statistical power may have influenced our results. To rule this out, we additionally checked for effects of over-parameterization by excluding all random slopes from the models (resulting in the following number of data points per model term: full data set = 11.9; data subset = 8.7). The original models with all random slopes (data points per model term: full data set = 5; data subset = 3.4) included, however, revealed the qualitatively same results as the models without random slopes.

To further unravel the influence of covariates and to understand the function of multi-component color signals, like the RWB genital display, future studies should consider recent work on rhesus macaques. Males with more similar facial color interacted more aggressively than males with differing colors, which indicates that color differences between specific dyads rather than an individual’s coloration in general could mediate agonistic interactions (Petersdorf et al. 2017). Our data were insufficient to generate stable models for a comparable dyadic analysis of agonistic interactions with regard to coloration. Such an analysis would follow the approach by Gerald (2001) on dyadic interactions and scrotal color in green monkeys, which is why future studies in vervets should examine the relationship between color and dominance additionally at the dyadic level. Moreover, investigating the extent to which multi-component color signals are condition-dependent would further shed light on covariates and function of these signals. Furthermore, studying the role of ornament colors in connection with mating and paternity success in vervets might be rewarding, as studies in birds found a link between ornament coloration and reproductive success (Doucet et al. 2005).