Multiple ecological and evolutionary forces shape the genetic makeup of natural populations; one of which is population structure (Wright, 1951). Population structure refers to the division of a population’s gene pool between groups of individuals such that random mating is more likely between individuals within a group and less likely between individuals from different groups. Owing to the ubiquity of population structure (and its associated varying levels of connectivity between subpopulations), it is common practice in molecular ecology to use statistical approaches to identify population structure in multi-individual datasets using molecular markers (for example, Corander and Martinen, 2006; Alexander et al., 2009). However, developments in sequencing technologies over the last decade have enabled researchers to produce data covering most, if not all, of the complete genome of a species relatively cheaply (for example, Groenen et al., 2012; Zhan et al., 2013), and estimate important evolutionary parameters using as few as a single genome, for example, the trend in effective population size over time (for example, Li and Durbin, 2011; Schiffels and Durbin, 2014). However, such methodologies assume that the data used for the analysis represent that of an unstructured population, and as it has been shown previously, deviations from that model can inflate estimates of the effective population size (Leblois et al., 2006; Heller et al., 2013; Bosse et al., 2014).

In this issue of Heredity, Mazet et al. develop a theoretical framework that illustrates the effect of otherwise ignored population structure on demographic inferences using genomic methods such as the Pairwise Sequentially Markovian Coalescent (PSMC; Li and Durbin, 2011). For their method, Mazet et al. derive the ‘Inverse Instantaneous Coalescent Rate (IICR)’: the inverse of the coalescent rate throughout time estimated from a sample consisting of two haplotypes as found in a diploid individual. The IICR in a genetic dataset from an unstructured population sample corresponds to the trajectory of the effective population size over time, analogous to the output of the PSMC method. However, if the data are from a structured population, the IICR function corresponds to the trajectory of the effective population size and the migration pattern between subpopulations (Figure 1). Interestingly, Mazet et al. show that when sampling a structured population, if the two genomic haplotypes sampled derive from the same subpopulation, the inferred demographic trend describes a reduction in the effective population size akin to a bottleneck signal (Figure 1, top right). In contrast, if each of the two haplotypes sampled derive from different subpopulations (for example, as when the sampled genome corresponds to an F1 individual descending from a migrant), the resulting inferred demographic trend is the opposite, namely a population expansion (Figure 1, bottom right). What is interesting is that in either case (that is, the signal of a bottleneck or that of an expansion), the trend described is spurious and independent of whether a real change in effective population size occurred. In other words, estimating the demographic history of a population using approaches that do not take into consideration the effect of population structure may show an inherent bias towards identifying changes in the demographic history of a population, even when the population has remained stable over time. In addition, changes in the migration rates between subpopulations (still without changes in the overall effective population size) result in wavy PSMC and IICR trends, like those typically interpreted as representing expansions and bottlenecks (Figure 5 of Mazet et al., this issue). Using their IICR approach, Mazet et al. investigated the demographic history of humans previously analysed using PSMC (Li and Durbin, 2011), but assuming a structured population with changes in migration rate and either a stable or changing demography. For both scenarios Mazet et al. carried out neutral simulations of a structured population with the same parameters used by Li and Durbin (2011) for the human PSMC, but including up to three migration events between 2.52 and 0.24 million years ago. With their simulations Mazet et al. showed that the previously described demographic history of our species (Li and Durbin, 2011) can also be obtained from a model of changes in connectivity between subpopulations since the beginning of the Pleistocene, and irrespectively of whether there was a change in population size (Figure 6 and Figure S4 of Mazet et al., this issue).

Figure 1
figure 1

Schematic diagram of a structured population and the inferred demographic histories from two alternative sampling scenarios. Two subpopulations are shown (one grey and one blue) which exchange migrants at varying rates throughout time, and which have a stable demographic history. In scenario 1, a diploid individual is sampled whose haplotypes derive from the same subpopulation and a bottleneck type demographic trend was inferred; while in scenario 2, an individual is sampled whose haplotypes derive from different subpopulations (e.g. the F1 from a migrant) and an expansion type demographic trend was inferred.

The risk of such spuriously inferred population size changes has been reported previously (for example, Leblois et al., 2006; Stadler et al., 2009; Heller et al., 2013; Bosse et al., 2014) and the developers of the PSMC method also pointed out that if a population splits in half and later on merges again, even in the absence of demographic changes, an increase in effective population size can be observed (Li and Durbin, 2011). Heller et al. (2013) showed how deviations from the random mating model affect demographic inferences using Bayesian skyline plots (BSP), a standard tool used for analysing mitochondrial DNA sequences. Using sampling schemes where all haploid samples derived from the same subpopulation or from more than one subpopulation, Heller et al. demonstrated that for simulations with a stable population size the various sampling regimes mostly resulted in inferred population bottlenecks (Figure 1 in Heller et al., 2013). In addition, if simulations included a population expansion or a bottleneck, the former could not be identified if the samples derived form the same subpopulation, and the latter could not be detected if an even amount of samples from each subpopulation was used for the analysis (Figure 2 in Heller et al., 2013).

The results described above are in line with those of Mazet et al. in this issue of Heredity and call for the development of approaches that allow the comparison of alternative models to the simple one based on effective population size changes over time (for example, population structure, varying levels of gene flow between subpopulations or a combination of the previous two and simultaneously occurring demographic changes). In that context, a maximum likelihood approach has recently been developed to attempt to distinguish between a structured model and a population demographic model with one single change using a single diploid genome dataset (Mazet et al., 2015). Nevertheless, until better approaches are developed that allow researchers to disentangle the effect of structure from that of demographic change, the IICR results showed here call for a cautious interpretation of demographic trends like those inferred by PSMC and which have become a standard result in many publications using genomic data.