Introduction

The comparison of neutral and adaptive genomic variation has allowed valuable insights into the evolutionary processes driving genetic structure in natural populations (Orsini et al. 2013; Andrews et al. 2016). Species with naturally isolated populations represent particularly interesting systems to infer the relative roles of genetic drift and natural selection for population differentiation, an important process in biological diversification. Genetic studies on such non-continuously distributed systems have shown that restricted gene flow among populations often results in strong genetic drift (e.g. Ortego et al. 2010; Palma-Silva et al. 2011; Nistelberger et al. 2015), but the importance of selection for genetic differentiation and local adaptation of these species has only recently been tested using molecular markers (e.g., Funk et al. 2016; Perrier et al. 2017; Wang et al. 2017; Prentice et al. 2017). Although both evolutionary forces are predicted to be important and strongly affected by the effective population size and population persistence through time, selection differentially affects traits and their underlying genetic variation (Flood and Hancock 2017). The smaller the effective population size and migration rates among populations, the larger the effects of stochastic processes and the lower the effectiveness of selection (Lande and Barrowclough 1987; Charlesworth 2009). Furthermore, local colonization-extinction dynamics may enhance the effects of drift across isolated populations. Although we lack empirical evidence on genetic drift acting as a sole driver of speciation processes, e.g., through relaxing purifying selection thus allowing genetic incompatibilities to accumulate between isolated populations (Coyne and Orr 2004; Sobel et al. 2010), the role of selection is often hard to demonstrate when drift is strong. The identification of the mechanisms responsible for population differentiation across patchy habitats may, therefore, shed light on the potential but still controversial role of drift and on the relative importance of selection on speciation processes.

The ability of species to disperse and establish in habitats under distinct climate conditions is directly linked to the interplay between drift and selection. While many species show evidence of niche differentiation, others have strict ecological requirements and establish in regions with similar environmental conditions (i.e., niche conservatism) (McCormack et al., 2010; Bacon et al. 2017; Baranzelli et al. 2018). The ability of organisms to adapt to new environmental conditions is determined by de novo mutations or by standing genetic variation, but adaptation from standing variation is likely to be faster (Barrett and Schluter 2008). Depending on the magnitude of selection and the effective population size, drift may overwhelm divergent selection on new beneficial mutations underlying a particular trait in small populations colonizing new habitats (Wright 1951; Leimu and Fischer 2008). Although population size greatly influences persistence, small founder populations may still hold considerable standing genetic variation on which selection could act (Agashe et al. 2011). Understanding the evolutionary strategy that allows, or not, non-continuously distributed species potentially subject to strong founder events to increase their ecological range toward distinct ecological conditions requires coupling genetic and ecological information.

For the highly heterogeneous Neotropics, an increasing number of genetic studies in widespread species has improved our understanding of diversification across biomes (Antonelli et al. 2018), but whether differentiation within species is accompanied or not by local adaptation has been less explored (but see Bacon et al. 2017). The application of high throughput sequencing to Neotropical systems promises to allow novel insights into diversification events and the history of biomes (e.g., Nadeau et al. 2013; Harvey and Brumfield 2015; Ebel et al. 2015). Indeed, the increasing number of loci generated by these sequencing technologies and the constant development of computational methods have allowed for more powerful inferences regarding genetic structure in non-model species (Davey et al. 2011; Narum et al. 2013), and have also greatly improved our ability to distinguish neutral from adaptive processes (Andrews et al. 2016). Approaches for identifying locally adaptive candidate loci rely on the identification of markers with unusually high genetic differentiation among populations or searching for correlations between allele frequencies and environmental information (Hoban et al. 2016). Moreover, when phenotypic information is available at the population level, phenotypic divergence (Pst) can be compared to neutral genetic divergence (Fst) to give insights into the degree of differentiation caused by selective versus neutral processes (PstFst comparisons; Brommer 2011; Leinonen et al. 2013), particularly when controlling for environmental variation is feasible. Nevertheless, there are still challenges associated with identifying sufficient genetic variation to detect loci under selection and to perform genotype-environment or genotype-phenotype associations (Leinonen et al. 2013; Lowry et al. 2017), particularly in species with restricted gene flow (Perrier et al. 2017).

In this study, we quantify natural selection on selected ecophysiological traits and infer the relative roles of genetic drift and natural selection on population differentiation in Pitcairnia lanuginosa Ruiz and Pav. This species has a widespread distribution in tropical regions of South America, with small populations typically scattered across space and associated with riparian forests in the Brazilian Cerrado and the Central Andean Yungas (Fig. 1). A phylogeographic study employing microsatellite markers and Sanger sequencing has suggested that the divergence between lineages occupying the Cerrado and Yungas is likely caused by past dispersal rather than vicariance (Leal et al. 2019). The authors found low levels of genetic diversity (mean HE = 0.205) and strong population structure (global FST = 0.73), likely resulting from limited gene flow due to the lack of dispersal traits in seeds, small populations, and extremely high rates of spontaneous self-fertilization in this species. Despite the strong effect of drift suggested by these findings, adaptive processes may also be important to explain divergence patterns of P. lanuginosa considering that populations are potentially submitted to contrasting macroclimatic conditions across the large-scale latitudinal and longitudinal distribution range of the species. Indeed, P. lanuginosa (here considered synonymous of P. burchellii Mez; Smith and Downs 1974) exhibits physiological and anatomical characteristics to cope with severe drought stress (Vieira et al. 2017a, b), which may be important for local adaptation to the greater seasonality in the Cerrado, a savannah biome with wet summers and dry winters, when compared to the Central Andean Yungas, a mesic forest biome. Nevertheless, the extent to which this strong genetic structure could be interpreted in terms of adaptive processes can be strongly affected by the species mating system (Glémin 2007; Wright et al. 2013). The increased self-fertilization in P. lanuginosa, for instance, may reduce the effective population sizes of populations, thereby enhancing genetic drift. Under strong drift, adaptive alleles can also be lost, limiting the process of local adaptation (Charlesworth and Charlesworth 1987; Wright et al. 2013).

Fig. 1: Population sampling and genetic structure of Pitcairnia lanuginosa as inferred by INSTRUCT analysis.
figure 1

A Geographic distribution of sampled populations assigned to genetic clusters as detected by INSTRUCT. Black dots represent species-occurrences retrieved from Global Biodiversity Information Facility (GBIF) and SpeciesLink. B Bar plot depicting the INSTRUCT clustering for 19 populations of the species. Selfing rates per cluster are shown above the barplot. Each vertical bar corresponds to one individual and colored segments to the proportion of assignment to clusters, each color corresponding to a distinct cluster.

Here we take advantage of hundreds of SNP markers derived from double-digest restriction-site associated DNA sequencing (ddRAD-seq, Peterson et al. 2012), and information on ecophysiological traits associated with environmental variation, to investigate the relative roles of neutral and adaptive processes in causing the strong genetic differentiation in P. lanuginosa. Specifically, we aim to: (1) investigate whether the presumed low effective population sizes resulting from isolation and high levels of inbreeding could preclude local adaptation for ecophysiological traits linked to drought, heat, and light stress in this species; (2) assess the relative importance of Isolation by Distance (IBD) and Isolation by Environment (IBE) on the patterns of genetic structure in this species. The wide yet patchy distribution in the Neotropics makes P. lanuginosa a particularly interesting model to investigate how ecological processes affect local adaptation. The information on the genomic and trait variation in this species may thereby shed light on eco-evolutionary processes acting on non-continuously distributed species.

Material and methods

Model species and population sampling

P. lanuginosa is an iteroparous perennial forb self-compatible and capable of intrafloral autonomous self-pollination (Leal 2018). Populations of this species are generally small and spatially restricted, composed of one to a few dozen individuals that are mostly found near streams and waterfalls (personal observation). We sampled 19 populations of P. lanuginosa across most of its geographic range (Table 1, Fig. 1). The sampling covers the Brazilian Cerrado (savanna biome, including transition zones with the Atlantic Forest and Amazonia) and the Central Andean Yungas (Amazonia-Andes transition) (Fig. 1). Adult specimens from a subset of eight Cerrado populations were transported from the field to the experimental greenhouse of São Paulo State University (UNESP), Rio Claro, Brazil (see Table 1), and planted into pots to grow under controlled conditions. Herbarium specimens were deposited within the herbarium HBRC, UNESP, Rio Claro, Brazil.

Table 1 Population sampling of Picairnia lanuginosa, with geographical coordinates and sample sizes for genetic analyses (NG) and for physiological traits measurements (NP).

Ecophysiological traits

To detect differences in plant responses to light, heat and water stresses among populations, we conducted ecophysiological trait measurements in a total of 24 individuals (herein genets) from eight out of the 19 P. lanuginosa populations sampled (three genets/population); these eight populations represent the extent of the geographic distribution of the species in the Cerrado (Fig. 1; Table 1). Adult plants were sampled in the field and acclimated in the greenhouse for at least 1 year (from May 2014 to May 2015). We separated each genet into two or more ramets to grow in distinct pots with the same substrate composition for an additional acclimation period of 3 months prior to the experiments. We then selected 48 ramets of similar size lacking inflorescences and submitted them to the following treatments (24 ramets/treatment): (A) control, consisting in watering pots every 2 days (about 80 mL of water, which corresponded to field capacity); and (B) water shortage, consisting in no watering for 30 days. We measured ecophysiological traits based on photosynthetic responses of leaves in August 2016 (during the dry season).

We selected an area of ca. 2 cm2 on a fully developed leaf on each ramet from the control treatment to measure the photosynthetic response to light (i.e. rapid light curve) and to heat stress using a modulate fluorometer (MINI-PAM-II, Walz). We calculated the following parameters: upper critical thermal limit (CTmax), sensitivity to temperature change (z), maximum relative electron transport rate (ETRmax), initial slope of the curve of light-response (α), light saturation coefficient (Ik), and maximal photochemical efficiency (Fv/Fm). To calculate heat tolerance parameters CTmax and z, we followed the framework described by Rezende et al. (2014), which considers the intensity and duration of thermal stress to drive the individual ‘thermal tolerance landscape’, or individual capacity to withstand heat at any given temperature. We followed procedures described by Godoy et al. (2011), and modified by Chaves et al. (2018), to measure the critical time duration under a static stressful temperature that promotes a 50% decay of the initial Fv/Fm (D50) in each leaf sample. To draw the ‘thermal tolerance landscape’ of each ramet, we calculated D50 under four distinct temperatures (40, 45, 50, and 60 °C) and performed a linear regression to measure CTmax and z parameters according to Rezende et al. (2014). Measures of Fv/Fm at each temperature were simultaneously carried out using a thermal-gradient block designed by Labouriau (1977) and adapted by Cardoso (2010). Finally, we used the decay of each heat and light parameter (i.e., values for ramets under controlled conditions minus the values under water shortage, divided by values under controlled conditions) as proxies of drought adaptation. Values close to one indicate little difference between treatments, i.e., tolerance to drought stress.

DNA extraction and ddRAD sequencing

Total genomic DNA was extracted from leaf samples of 115 individuals from 19 populations (Table 1) using the Qiagen DNA Plant Mini kit (Qiagen, Finland). Sample quality and concentration were checked on 1% agarose gels and using a NanoDrop 2000 spectrophotometer (Thermo Scientific) and normalized to 60 ng/μl as quantified by the Qubit dsDNA BR assay (Invitrogen). The ddRAD libraries were prepared and sequenced on the Ion Torrent Proton platform using a modified version of Peterson et al. 2012 protocol at the Genome Transcriptome Facility of Bordeaux (INRA, Cestas, France). In brief, we digested 50 μl (300 ng total) of genomic DNA using rare- and frequent-cutter restriction enzymes (PstI and MspI, New England Biosciences) at 37 °C for 2 h and inactivated them at 80 °C for 20 min. After bead-purification, a P1 adaptor (the same for all samples) and a unique barcode adaptor were added to each sample, and ligation was performed at 22 °C for 2 h followed by inactivation at 65 °C for 20 min. We then used a qPCR to quantify each sample before normalizing and pooling samples in a 48-plex. We employed an automated size-selection technology (Pippin, Sage Science) to select fragments with the expected size (270–290 pb). Each library was then amplified by PCR, quantified at the Agilent 2100 Bioanalyzer, and sequenced on the Ion Torrent Proton P1v2 chip.

De novo assembly

We analyzed raw read data using PYRAD v3.0.5 software (Eaton 2014). This software assembles short read RAD-seq (or similar) data using a clustering algorithm that allows for indel variation within and between samples and incomplete overlap among reads (Eaton 2014). This particularity makes the method suitable for highly structured populations. Following preliminary testing, we settled the parameters as follows: a maximum number of sites with quality score <20 (NQual): 5, clustering threshold (Wclust): 0.85, minimum coverage for a cluster (Mindepth): 6, and maximum individuals with a shared heterozygous site (MaxSH): 3. All other parameters were kept at default values. PYRAD was run on the GenoToul bioinformatics facility (INRA, Toulouse, France). We then employed VCFTOOLS v0.1.13 (Danecek et al. 2011) to retain loci with a minimum of 30% of individuals sequenced and removed any individual with genotypes for fewer than 30% of loci. These parameters were chosen to recover a maximum number of loci based on the biology of P. lanuginosa, a frequently selfing species in highly structured populations (Leal et al. 2019). Although our missing data criteria fall below what is typically used for RAD-seq data, studies have shown that even lower missing data cutoffs lead to correct inferences (Chattopadhyay et al. 2014; Huang and Knowles 2016; Hodel et al. 2017). Because of the known long-term divergence between Cerrado and Central Yungas populations (ca. 3 Mya; Leal et al. 2019), another data matrix including only samples from the Cerrado was separately build using PYRAD and then filtered by VCFTOOLS according to the same criteria. VCF files were converted to various formats using PGDSPIDER v2.0.7.2 (Lischer and Excoffier 2012) to perform further analyses.

Genomic diversity and differentiation

We calculated diversity indices, such as the proportion of polymorphic sites (PL), private alleles (PA), observed (HO) and expected (HE) heterozygosities and nucleotide diversity for variant sites (π) using the ‘populations’ module of STACKS v1.35 (Catchen et al. 2013).

We estimated pairwise FST (Weir and Cockerham 1984) and G′ST (Hedrick 2007) among sampled populations using the R package ‘diveRsity’ (Keenan et al. 2013). A neighbor-net unrooted phylogenetic network was inferred with the software SPLITSTREE4 using uncorrected p-distances and a bootstrap procedure to assess confidence (Huson and Bryant 2006). We further assessed patterns of genomic structure employing the Bayesian approach implemented in INSTRUCT (Gao et al. 2007). This software clusters individuals into subpopulations and allows estimating the self-fertilization rates or inbreeding coefficients both at the population level and at the individual level (Gao et al. 2007), and is thus especially suited for species that reproduce through self-fertilization. INSTRUCT was run for K = 1 to K = 20, for five independent chains, each with 200,000 iterations and 50,000 burn-in steps, with a thinning interval of ten steps, using the joint inference of population selfing rate and population sub-structure mode. We chose the best K based on the deviance information criterion (DIC) and used OBSTRUCT (Gayevskiy et al. 2014) to plot averaged results among chains. We then used ARLEQUIN 3.5 (Excoffier and Lischer 2010) to test the partition of genetic diversity within and between congruent groups revealed by STRUCTURE and INSTRUCT using an analysis of molecular variance (AMOVA).

Detection of outlier loci

We employed three different approaches to identify candidate SNPs bearing a signature of selection. The aim was to identify such loci without considering environmental variables (see next section), i.e., based solely on their population genetic signature of selection. This analysis was performed across the entire distribution of P. lanuginosa and among populations sampled in the Cerrado, using the independently assembled data sets. First, we detected outliers based on locus-specific FST using the program BAYESCAN v2.1 (Foll and Gaggiotti 2008) with prior odds of 10 and a false discovery rate (FDR) of 0.05. After 20 pilot runs of 5000 iterations, we ran a total of 100,000 MCMC iterations with a burn-in of 50,000 iterations and a thinning interval of 10. BAYESCAN examines pairwise FST between populations for each locus and decomposes the coefficients into a population-specific component (β), and a locus-specific component (α) using logistic regression (Foll and Gaggiotti 2008). The detected outlier loci with positive α values are suggested to undergo diversifying selection, while those with a negative α are suggested to be under balancing or purifying selection. Diagnostics of the log likelihoods and FST values were checked using the ‘CODA’ R-package (Plummer et al. 2006). To complement this analysis, we used two other differentiation-based methods to identify outlier SNPs that account for population structure: a principal components‐based method implemented in the R-package ‘pcadapt’ (Luu et al. 2017) and a Bayesian method implemented in BAYPASS (Gautier 2015). For the PCADAPT analysis, we assessed the optimal number of principal components (K) from 1 to 20 and inspected the resulting scree plots to retain K = 5, K = 4, and K = 2 for the data sets consisting of all sampled populations and a subset of populations sampled either in the Cerrado or in the Andean Yungas, respectively. For each SNP, deviation from the mean population structure was estimated based on the Mahalanobis distance D that summarizes regression of the SNP by the K principal components. D values were converted to p values and we used an FDR of 0.05 to obtain a list of outlier SNPs. For BAYPASS analysis, we calculated the correlation matrix Ω and estimate XTX, an FST-like statistic, for each individual SNP under the core model using default settings. We simulated pseudo‐observed data sets (PODs) to calibrate the XTX values and used a 95% threshold, as determined from PODs, to discriminate between outlier candidates and neutral SNPs.

Environment–genotype associations

We first tested for the correlation among predictors, i.e. geographical and environmental distances based on climate, soil, and elevation, for populations within the Cerrado with Mantel tests implemented in the R package ‘adegenet’ (Jombart 2008) with 1000 permutations. The geographic matrix consisted of the natural logarithm of Euclidian distances among populations (hereafter, GEO matrix), while the elevation matrix consisted of pairwise differences between the registered elevation values for sampled populations (ELEV matrix). The environmental distances were derived from Euclidean distances among populations along the first and second axis of a principal component analysis (PCA) summarizing (1) 19 WorldClim variables (Hijmans et al. 2005) at a resolution of 30 arc seconds (CLIM matrix) and (2) 10 chemical and physical soil properties downloaded from SoilGrids (Hengl et al. 2017) at a resolution of 250 m and a depth of 0–5 cm (Hengl et al. 2017) (SOIL matrix). The first and second axes of these PCAs for climate and soil data accounted for 70.90% and 62.29% of the data variances, respectively. For the climate data, the PCA axes were mostly correlated with the temperature of the coldest quarter and with the precipitation of the wettest quarter (Table S1). For the soil data, axes 1 and 2 were mostly correlated with the bulk density and proportion of clay particles (Table S2).

We used a Bayesian generalized linear mixed modeling (GLMM) approach to test whether genetic differentiation among the Cerrado populations at the neutral genomic fraction and at each outlier SNP is driven by geographic distance or environmental differences, as determined by variation in temperature and precipitation, elevation or soil. This approach accounts for the non independence of values resulting from the use of pairwise matrices of population‐level metrics and can distinguish the effects of multiple variables and their interaction. As response variable, we used either a pairwise linear FST matrix [measured as FST/(1 − FST)] derived from 616 neutral SNPs or the allele frequency differential delta (AFD, Zhu et al. 2005) for each of the two outlier SNPs (here, we considered as candidate to divergent selection those SNPs detected as outliers by at least two distinct methods). As predictor variables, we used the GEO matrix to assess the effect of IBD, as well as CLIM, ELEV, and SOIL matrices to access the effect of IBE. To avoid multicollinearity, we calculated the variance inflation index and Pearson’s correlation coefficient among explanatory variables and considered VIF < 2 and correlation <0.7 as acceptable to include the variable in the model.

We confronted models including either GEO, CLIM, ELEV, or SOIL matrix as a single predictor and models combining two, three, or four predictors against a null model (see Table 4). The analysis was performed with the R package MCMCGLMM (Hadfield 2010), following scripts available in Lexer et al. (2014). MCMCGLMM was run with a burn-in of 500,000 followed by 2,000,000 MCMC iterations with a thinning interval of 750, under standard priors. We accounted for the lack of independence between pairs of populations by fitting a multiple membership model. Chain convergences were checked using the ‘CODA’ R package. We used the DIC to compare models and draw conclusions on the relative roles of IBD versus IBE in the divergence of P. lanuginosa within the Cerrado.

P STF ST comparisons

We also evaluated the relative importance of selection and neutral evolutionary processes in P. lanuginosa by comparing differentiation in quantitative ecophysiological traits (as measured by PST) and neutral markers derived from ddRAD sequencing (as measured by FST, Weir and Cockerham 1984) among a subset of eight P. lanuginosa populations from the Cerrado (Table 1). To represent the degree of phenotypic differentiation among these populations, we employed an approximation of QST called PST (Merilä and Crnokrak 2001). PST is directly calculated from the total phenotypic variance with no distinction between the relative contribution of genetic and environmental variation (Leinonen et al. 2006; Silva and Silva 2018) and has been applied to distinct biological systems as an alternative of QST (e.g., He et al. 2013; Shinn et al. 2015), including to systems exhibiting strong genetic structure (Miller et al. 2019; Pometti et al. 2019). We assessed trait values in a common garden; we used this approach because a known family structure would be required to estimate the additive genetic variance component of QST. The logic of PSTFST comparison relies on the assumption that FST values obtained from neutral molecular markers reflect the divergence induced by genetic drift without selection (Merilä and Crnokrak 2001). Thus, PST > FST or PST < FST mean that quantitative traits show a higher or lower level of differentiation than expected under the influence of genetic drift alone, as induced by divergent or balancing selection, respectively; while PST = FST means that no departure from neutral expectations can be detected (Merilä and Crnokrak 2001). This analysis assumes that mutation rates at coding genes and neutral regions of the genome are uniform or that time since divergence among populations is short enough or migration is high enough to disregard mutations (Leinonen et al. 2013).

Population pairwise PST was calculated from between-population (σ2B) and within population (σ2w) components of variance for each of the 11 ecophysiological traits following Brommer’s (2011) formula PST = cσ2B/(cσ2B + 2 h2σ2W) using R package ‘Pstat’ (Silva and Silva 2018). We assumed that the proportion of the total phenotypic variance which is due to additive genetic effects is equal to the narrow-sense heritability, as we have no previous data on contributions of additive genetic variance to among (c) and within (h2) population differences in ecophysiological traits for our sampled populations. Because of the known problem associated with the lack of accurate estimation of the parameters c and h2 for a set of traits (see Brommer 2011), we assessed the strength of PSTFST comparisons exploring the variation range of 0 < c/h2 ≤ 1 (sensitivity analysis) using ‘Pstat’. In its turn, pairwise FST values were calculated using the ‘populations’ module of STACKS for the subset of 616 putatively neutral SNPs, excluding the two outliers detected by at least two of the employed approaches for the Cerrado populations. Confidence intervals of PST and FST values were determined with the bootstrap method (1000 repetitions) under a confidence level of 0.95%. PST was considered to significantly differ from FST when their confidence intervals did not overlap. Finally, we used an analysis of variance (ANOVA), followed by a Tukey test, to compare means among multiple populations for each trait bearing a signature of selection using the software R.

Results

Genetic diversity and structure patterns

Three lanes of ddRAD-sequencing (40–48 samples per lane) using ION Torrent technology resulted in ~1,992,450 single end reads per P. lanuginosa sample, with an average read length of 178 bases and an average of 84% bases with Q score above 20 (Table S3). PYRAD assembled a total of 13,858 ddRAD-based SNP loci. After VCFTOOLS filtering following missing data criteria for loci and individuals, our final matrix included 115 individuals and 688 SNPs. Final matrices for the Cerrado and Yungas subsets in their turn had a total of 618 and 714 SNPs, and 95 and 21 individuals, respectively.

We found extremely low within-population genetic variation in P. lanuginosa (Table 2). The proportion of polymorphic sites per population ranged from 1.93 to 18.52, while nucleotide diversity calculated for the SNPs ranged from 0.0118 to 0.0885 (Table 2). Observed and expected heterozygosities per population varied from 0 to 0.0009 and 0.0092 to 0.0753, respectively (Table 2).

Table 2 Genetic diversity parameters for 19 sampled populations of Pitcairnia lanuginosa based on 688 SNPs.

Pairwise FST and GST are generally high, with exception of neighbor populations SJC-YUR (Table S4). Independent analyses consistently recovered the same genetic structure for P. lanuginosa. The neighbor-net analysis revealed two major splits corresponding to Cerrado and Yungas populations with bootstrap support of 100%, despite Eastern Cerrado populations being slightly differentiated from those belonging to the Central-Western Cerrado distribution (Fig. 2). INSTRUCT clustered populations into three groups corresponding to Eastern Cerrado, Central-Western Cerrado and Yungas (east-west gradient of genetic variance), with low levels of admixture between both Cerrado clusters, and high selfing rates within clusters (0.956, 0.961, and 0.961, respectively; Fig. 1A, B). AMOVA showed that the largest part of the genomic variance is partitioned among groups, either when contrasting Cerrado and Yungas or Eastern Cerrado, Central-western Cerrado and Yungas (FCT = 0.54 and 0.53, respectively; p < 0.001), suggesting historical genetic isolation among these groups (Table 3). About 24% and 19% of the variation accounted for differences among populations nested within each of these groups, respectively, while 22% and 27% accounted for within-population variance (Table 3).

Fig. 2: Neighbor-net analysis.
figure 2

Unrooted phylogenetic network for 115 individuals of Pitcairnia lanuginosa based on 688 SNPs as inferred by SplitsTree using the neighbor-net model and uncorrected p-distances.

Table 3 Analysis of molecular variance (AMOVA) for 19 populations of Pitcairnia lanuginosa based on 688 SNPs markers.

Drift and selection effects

BAYESCAN detected a total of 34 FST outliers (i.e., SNPs putatively under selection), while BAYPASS and PCADAPT identified 5 and 43 outlier SNPs, respectively, among all populations sampled (Fig. 3A). Nevertheless, only two SNPs were regarded as candidates for being under selection due to their detection by at least two distinct methods (Fig. 3A). Both SNPs are candidates for directional selection, as demonstrated by their higher differentiation in terms of FST in BAYESCAN analysis (Fig. S1A). When searching for FST outliers within the Cerrado distribution, BAYESCAN pointed to a total of 17 outliers, while BAYPASS and PCADAPT detected 4 and 59 outlier SNPs, respectively. For this data set, the same two outlier SNPs were recovered by at least two different methods (Fig. 3B), both putatively under directional selection (Fig. S1B, Table 4).

Fig. 3: Candidate SNPs bearing a signal of selection.
figure 3

Venn diagrams of shared outlier loci detected by BAYESCAN, PCADAPT and BAYPASS outlier analyses based on A 688 SNPs genotyped across 19 Pitcairnia lanuginosa populations and B 618 SNPs genotyped for a subset of 16 Cerrado populations.

According to Mantel analyses, geographical distances are strongly correlated with climatic distances (r = 0.76, p < 0.001), and moderately correlated with other environmental distances derived from soil and elevation data (r = 0.26, p < 0.01; r = 0.27, p < 0.01; respectively; Table S5) within the Cerrado. As expected, climatic and elevation distances are also correlated with each other (r = 0.51, p < 0.01; Table S5). When using an FST matrix obtained from 616 neutral SNPs (Cerrado assembly), GLMM identified the model using soil as the single predictor (SOIL) as the best to predict genomic variance, but not significantly better (i.e., ΔDIC < 2) than the one combining soil and geographic pairwise distances (GEO + SOIL) (Table 4). GLMMs suggested an effect of both climatic and geographical distances on the AFD of the two outlier SNPs (Table 4). When contrasted to other predictors, the model that included all environmental distances and geographical distances as explanatory variables achieved the greatest DIC support (Table 4), indicating that genetic variation at these outlier loci is better explained by a combined effect of IBE and IBD. However, the full model was not significantly better than the model that included only climatic and geographical distances (GEO + CLIM; ΔDIC < 2) (Table 4).

Table 4 Results of generalized linear mixed models (GLMM) testing the influence of geographic (GEO) distance and environmental distances, derived from climate (CLIM), soil (SOIL), and elevation (ELEV) data, on the genetic divergence among 16 populations of Pitcairnia lanuginosa from the Cerrado.

The confidence intervals for most PST estimates derived from 11 ecophysiological traits overlapped with the confidence interval of FST among eight Cerrado populations (FST = 0.503, CI = 0.456–0.550; Fig. 4A), thus fitting the neutral expectation. Nevertheless, PST values were significantly higher than FST for two measured traits, the decay of both maximal photochemical efficiency (Fv/Fm) and light saturation coefficient (Ik) under drought stress (PST = 0.943 and 0.847; CI = 0.878–0.987 and 0.696–0.931, respectively; Fig. 4A), which may be caused by directional selection. Because the confidence interval of PST for Ik (mean PST = 0.758, CI = 0.551–0.943) nearly overlapped with that of FST, we did not consider it as different (Fig. 4A). Sensitivity analyses indicated that these results are robust to c/h2 variation, except for extremely low values of c/h2 (Fig. S2). ANOVAs showed that populations NAT and TIR are significantly less tolerant to drought stress in terms of Fv/Fm (Tukey test, p < 0.05; Fig. 4B), whereas RON is less tolerant to drought stress in terms of Ik (Fig. 4C).

Fig. 4: Forest plot comparing means and confidence intervals of PST for 11 phenotypic traits and neutral FST among eight Cerrado populations of Pitcairnia lanuginosa based on 1000 bootstraps.
figure 4

Traits related to heat, light, and drought stresses are shown in red, green and blue, respectively (A). Boxplots comparing means among populations for each trait bearing a signature of divergent selection, i.e. decay of Fv/Fm (B) and Ik (C) under drought stress. CTmax = Upper critical thermal limit, z = sensitivity to temperature change, ETRmax = maximum relative electron transport rate, α = initial slope of the curve of light-response, Ik = light saturation coefficient, and Fv/Fm = maximal photochemical efficiency.

Discussion

The application of combined data analysis including high throughput sequencing and ecophysiological traits refined our understanding of the mechanisms underlying the population differentiation in the widespread tropical bromeliad P. lanuginosa. The analysis of hundreds of loci revealed low genetic variation within populations and pronounced structure among populations of this highly endogamous and patchily distributed species, a pattern that suggests a strong role of genetic drift. Our data also revealed few outlier loci and few phenotypic traits putatively under directional selection, which suggests an additional role of natural selection, despite strong drift, in populations subject to similar microhabitat conditions. Below we discuss these main findings based on information about the effects of drift and selection on population differentiation in this species and highlight which species’ ecological features may be particularly linked to the genomic patterns recovered here.

We found a remarkably low genetic diversity within P. lanuginosa populations using hundreds of SNPs, in agreement with our previous study based on eight microsatellites (Leal et al. 2019). In fact, such a low within-population genetic diversity is expected in a predominantly selfing species occurring in small and spatially isolated populations. Although both studies evidenced a remarkably low genetic variability, heterozygosities are even lower for the SNP data set than for microsatellites (Leal et al. 2019). Such differences are likely due to distinct mutation rates between makers, typically higher for microsatellites by several orders of magnitude (Chakraborty et al. 1997; Haasl and Payseur 2011). Furthermore, we cannot rule out the effect of lower population sample sizes used here. Despite these differences, our results indicate that the low levels of variability resulting from strong genetic drift may decrease the adaptive potential of P. lanuginosa to novel environmental conditions (Messer and Petrov 2013), but might not prevent populations to locally adapt. Accordingly, combined analyses detected a limited number of outlier SNPs (corresponding to 0.2–0.3% of the total) under putative directional selection in P. lanuginosa, highlighting the important role of genetic drift compared to selection in promoting population differentiation within non-continuously distributed organisms. Although we expect missing data to have some impact on the detection of outliers through inflating genetic differentiation parameters, missingness in the two candidate SNPs (ca. 30% of individuals) was unlikely to explain their higher FST values (FST = 0.902 and 0.903). Furthermore, these candidates SNPs were identified simultaneously by two methods that take into account different signatures left by natural selection at the molecular level (i.e., BAYESCAN and PCADAPT), suggesting thus a robust signature of putative directional selection.

Our study also showed that PST values among Cerrado populations for nine out 11 ecophysiological traits were within neutral expectations. Although we expect contrasting macroclimatic conditions across the species distribution, P. lanuginosa populations generally occur within riparian forests, and may, therefore, be exposed to similar selective pressures across this wide distribution. Indeed, local adaptation is expected to be less frequent between populations experiencing similar environmental conditions (reviewed by Hereford 2009). Therefore, the consistent association to a specialized microhabitat suggests that niche conservatism may be the major process leading to the divergence, even between intra-specific lineages occurring in the Cerrado and Yungas.

On the other hand, the results of GLMMs suggested the existence of IBE determined by climate variation or by the combined effect of climate, soil, and elevation, in addition to IBD within the Cerrado for two candidate SNPs which exhibited a signature of putative directional selection. Furthermore, two traits related to drought stress (i.e., the decay of Fv/Fm and Ik under drought stress) showed an increased phenotypic variation in our PSTFST comparisons among Cerrado populations that were robust to c/h2 variation. Indeed, populations from the Cerrado exhibit varying levels of drought tolerance as measured by such traits, suggesting local adaptation to differences in micro-habitats and/or microclimates. Although riparian forests are buffered from abrupt climatic changes, levels of disturbance due to flooding or intermittent drought likely vary among P. lanuginosa sites. Together, these pieces of evidence suggest that selection acting on a few traits could overcome the effects of drift associated with the low observed effective population sizes and high inbreeding in this species. Because populations inhabiting the Cerrado are submitted to seasonal variation of precipitation, such specific traits could be seen as preliminary candidates for local adaptation (see Pujol et al. 2008), but still demand further investigation to confirm their importance.

Caution must be taken on interpreting results on ecophysiological traits identified to be under divergent selection because low sample sizes and violated assumptions in estimating FST and PST might create bias for comparisons and lead to the detection of false positives (Miller et al. 2008; Edelaar et al. 2011; Leinonen et al. 2013). One assumption of such analysis is that mutation rates at coding genes underlying traits are comparable to those of neutral regions of the genome (Leinonen et al. 2013). This assumption might be supported when mutation rates are high relative to migration rates or much time has passed since population divergence (Edelaar et al. 2011; Leinonen et al. 2013). It has been argued, however, that the use of SNPs can provide a more reliable neutral baseline differentiation as they experience lower mutation rates, which are comparable to those of the loci underlying quantitative traits (Roach et al. 2010; Whitlock 2011). Indeed, SNP markers likely prevents upward bias (i.e., QST or PST > FST, Edelaar et al. 2011). Notwithstanding, interpretation of results from genome scans of SNPs requires caution, because finding candidates to divergent selection is particularly difficult when the average level of population differentiation is high (De Mita et al. 2013; Hoban et al. 2016; Lowry et al. 2017). Also, more candidates for selection could arise from outlier tests for data sets with a higher density of markers (e.g., derived from whole genome sequencing). For our model species, we conclude, thus, that both PSTFST comparisons and genome scans may have rendered conservative rather than liberal results regarding the existence of traits and loci under divergent selection. Further studies could provide more insights on the relative importance of local adaptation for the differentiation of P. lanuginosa populations.

Our data also revealed high levels of genetic differentiation among P. lanuginosa populations (FST > 0.70, p value = 0.0001). Our RAD-based estimates are in strong agreement with previous findings based on microsatellites markers (global FST = 0.71; Leal et al. 2019). However, studies have generally reported smaller FST for microsatellite-based estimates than SNPs-based (e.g., Elhaik 2012). Although RAD-seq can accurately infer genetic structure even with relaxed levels of missing data (see Hodel et al. 2017), we cannot rule out that the excess of missing data has led to biased FST (see Arnold et al. 2013; Gautier et al. 2013). When using GST, which is generally less affected by within-population variability than FST, we estimated lower levels of genetic differentiation with SNPs (global GST = 0.770) than with microsatellite markers (global GST = 0.861, as calculated from the dataset of Leal et al. 2019). Patterns of genetic structure recovered here are similar to what has been reported for organisms distributed across islands or island-like terrestrial habitats with RAD markers (e.g. Funk et al. 2016; Wang et al. 2017). P. lanuginosa occurs as small and isolated populations and presents a combined set of ecological characteristics that likely limit gene flow. The species shows limited seed dispersal, owing to the tiny seeds that lack any specific dispersal mechanism (Smith and Downs 1974), a reduced overlap of flowering phenology among populations, and high rates of self-fertilization, which decrease opportunities for both seed and pollen-mediated gene flow thereby promoting differentiation.

Although genetic differentiation among populations is generally high, analyses consistently assigned P. lanuginosa populations to three genetic clusters occurring in the Eastern Cerrado, Central-western Cerrado and Yungas. Such divergence within the species also agrees with that previously detected using Sanger sequencing and microsatellite markers (Leal et al., 2019) and has been commonly reported for other Cerrado plant species (Resende-Moreira et al. 2017; Buzzati et al. 2018; reviewed by Leal et al. 2016). Because there are no obvious historical or geographical barriers to gene flow explaining the concordance among ecologically distinct species, this Eastern-Western genetic structure pattern deserves special attention and could be better investigated under a comparative framework. For riparian plants, such as P. lanuginosa, the genetic structure could also be explained due to historical drainage re-organization events, as has been shown for species inhabiting the Asian mountains valleys (Yue et al. 2002; Zhang et al. 2011), a hypothesis to be further investigated.

Patterns of genomic diversity and structure recovered for P. lanuginosa are likely under a strong influence of the high rates of self-fertilization. Despite the predicted benefits of selfing for reproductive assurance when pollinators and/or mates are rare, particularly during colonization events, high rates of self-fertilization may limit the species’ evolutionary potential (Herlihy and Eckert 2002; Wright et al. 2013). Because the strength of selection scales to the effective population size (Kimura 1983), natural selection may be less efficient in selfing populations and, to an even greater extent, in recently founded populations experiencing selfing (Laenen et al. 2018). Indeed, studies have found that small populations are less likely to display local adaptation, but the influence of mating system on limiting adaptation is still controversial (see Leimu and Fischer 2008; Hereford 2010; Clo et al. 2019). Selfing could theoretically promote local adaptation if selfing populations experience less gene flow than outcrossing populations (Linhart and Grant 1996; Hereford 2010). In addition, theoretical studies have shown that selfing tends to increase the effect of selection by exposing alleles to homozygosity that can be more easily removed or selected (Charlesworth and Charlesworth 1987; De Mita et al. 2013). The influence of self-fertilization on the efficiency of selection seems therefore unpredictable, and further empirical studies on organisms performing predominant selfing may help to test some of the theoretical predictions (see Clo et al. 2019).

In summary, genomic analyses with a set of hundreds of loci generated by ddRAD sequencing, and additional data on ecophysiological variation at the population level, found support for genetic drift as the main evolutionary mechanism responsible for divergence across the wide yet patchy distribution of P. laniginosa, a highly self-fertilized species. Natural selection may have an important role, particularly at maintaining highly advantageous variants or eliminating deleterious mutations within populations and thus allowing the persistence in small patches of riparian forests. Indeed, our results suggested that selection might be the dominant force underlying the phenotypic divergence of traits related to drought stress, while neutral processes likely explain the divergence for other ecophysiological traits measured in this study. Determining to what degree differentiation among populations is caused by adaptive versus neutral processes provides important insights about population differentiation across wide distribution ranges. As demonstrated by an increasing number of studies employing genome-wide markers, patterns of genetic diversity and structure are rarely explained by a single evolutionary force (e.g, Orsini et al. 2013, Moura et al. 2014; Gaither et al. 2015). Rather, complex patterns commonly arise when employing a wide genomic coverage and natural selection is often assumed as a relevant force to shape particular traits, even for populations experiencing strong drift (e.g. Prentice et al. 2017; Wang et al. 2017). The balance between genetic drift and natural selection and the relative contribution of plasticity to environmental adaptation seem to be strictly dependent on the ecological context. For predominantly selfing species inhabiting isolated but relatively stable habitats, such as riparian forests, drift may be stronger than selection for most loci. But even these relatively stable habitats can be subject to flooding or intermittent drought disturbance, imposing conditions in which plants must adapt to the environment either through genetic adaptation or phenotypic plasticity.