Introduction

Patterns of population genetic variation can be shaped by neutral and adaptive evolutionary processes and there is an increasing understanding of how demography, gene flow, and selection interact to drive extant genetic patterns across space (Bradburd & Ralph, 2019; Lowe et al., 2017; Semenov et al., 2019). The fruit fly, Drosophila melanogaster, has been recognized as a useful model organism in understanding such interactions following many decades of research. The species has a cosmopolitan distribution and shows an ability to rapidly adapt to varying environments (David & Capy, 1988; Markow & O’Grady, 2005). Past biogeographical and systematic studies have confirmed that D. melanogaster was native to sub-Saharan Africa (Lachaise et al., 1988), and expanded its range from Middle East into the Eurasian continent roughly at 1800 years ago (Sprengelmeyer et al., 2020) and to America and Australia only about 100–200 years ago (Arguello et al., 2019; Stephan & Li, 2007). Current studies have shown that both historical demographic events and local adaptation as well as their interplay play crucial roles in shaping the population genetic structure of D. melanogaster (Arguello et al., 2016; Flatt, 2016; Keller, 2007; Laurent et al., 2011).

One factor influencing the population genetic structure of D. melanogaster is historical dispersal (Keller, 2007; Sprengelmeyer et al., 2020). North American and African populations of D. melanogaster have been affected by African and European migrations associated with human cargo transportation (Bergland et al., 2016; David & Capy, 1988; Kao et al., 2015; Lachaise et al., 1988). The exchange of fruit around the world has provided an opportunity for ongoing gene flow among populations, contributing to genetic admixture worldwide in recent years as documented in African (Kauer et al., 2003; Pool et al., 2012), North American (Duchen et al., 2013), and Australian (Bergland et al., 2016) populations. Compared to rural populations, African urban populations exhibit substantial genetic admixture from European populations (Kauer et al., 2003; Pool et al., 2012); although the high levels of inferred non-African ancestry can be affected by adaptation, ongoing trade might also contribute (Kauer et al., 2003; Pool et al., 2012).

Another factor influencing the population genetic structure of D. melanogaster is natural selection (Flatt, 2016; Hoffmann et al., 2003; Sigh et al., 1982). Numerous studies have revealed clinal variation in phenotypes, chromosomal arrangements, and genotypes across environmental gradients for this species, implicating spatially varying selection (Flatt, 2016; Hoffmann & Weeks, 2007). DNA variation in genic targets of clines can be connected with adaption to ecological factors like temperature, precipitation, and oxygen concentration (Parkash & Munjal, 2000; Schmidt et al., 2005; Weeks et al., 2002). To understand such clinal variation in D. melanogaster, the effect of natural selection should be disentangled from demographic events (Miller et al., 2020; Sexton et al., 2014; Smith et al., 2020).

The Eurasian continent has been proposed as one of the earliest areas colonized by D. melanogaster out of its native range (David & Capy, 1988). The East Asian populations of D. melanogaster may have some unique genetic characteristics compared to other populations (Schlötterer et al., 2006). However, there are very few studies on the population genetic structure of D. melanogaster across East Asia, with most previous studies concentrating on African, European, Australian, and American populations (Duchen et al., 2013; Kapun et al., 2018; Kauer et al., 2003; Stephan & Li, 2007). East Asia has a diversified geographical environment and a long history of human colonization. This area is one of the most important regions in the spread and diffusion of many species (Leipe et al., 2019; Molina et al., 2011; Xiang et al., 2018; Yu et al., 2018), especially those associated with human trade-which are likely to include D. melanogaster (Arguello et al., 2019; Keller, 2007). However, limited knowledge on the population genetic structure of D. melanogaster in East Asia has hindered an understanding of expansion and adaptation processes in this region and connections to other areas.

Here, we examined the genetic diversity and population structure of D. melanogaster populations collected across China based on the mitochondrial cox1 gene and microsatellite loci. We hypothesized that (1) low population genetic differentiation exists among Chinese populations of D. melanogaster owing to its relatively short history of introduction into China; and (2) genetic distance across Chinese populations are correlated with both geographical distance and environmental factors, given the rapid adaptive evolution of D. melanogaster and varied ecological environment across China, as already seen in other non-migratory invertebrates in this region (Cao et al., 2016b, 2019; Wei et al., 2015).

Materials and methods

Sample collection and DNA extraction

Specimens of D. melanogaster were collected from 16 locations in China through trapping by using rotten fruits (e.g., grapes, watermelon, and banana) in 2019 (Table 1 and Fig. 1). The sampled flies were first checked morphologically using an anatomical lens, followed by molecular identification by BLAST searching of the mitochondrial cytochrome oxidase subunit I (cox1) gene (see below) against the nucleotide database (nt) and in the BOLD system (http://www.boldsystems.org) to ensure the reliability of morphological identification. All samples were stored in 98% alcohol, and frozen at −80 °C before DNA extraction. A DNeasy Blood and Tissue Kit (Qiagen, Germany) was used to extract total genomic DNA individually for 330 samples randomly selected from 16 populations.

Table 1 Collection information for Drosophila melanogaster specimens used in this study.
Fig. 1: Collection site and geographical distribution of the mitochondrial cox1 haplotypes of Drosophila melanogaster.
figure 1

The haplotype frequency for each population is represented by colors in the pie charts. See Table 1 for population codes. Four of the five population groups identified in population genetic structure analysis are marked by circles: NW northwest, SW southwest, SE southeast, NE northeast. The rest of the populations formed the CE (central) group.

Mitochondrial gene amplification and sequencing

To characterize the mitochondrial variation and validate correct identification of the specimens, a fragment of the cox1 gene involving the DNA barcoding region of insects was sequenced using the universal primer pairs AF (5’-GGTCAACAAATCATAAAGATATTGG-3’) and AR (5’-TAAACTTCAGGGTGACCAAAAAATCA-3’) (Folmer et al., 1994). Polymerase chain reaction (PCR) was conducted using the Mastercycler Pro system (Eppendorf, Germany) in a 10 μL volume consisting of 0.5 µL of template DNA (5–20 ng/µL), 0.5 µL of LA Taq (Takara Biomedical, Japan, 5 U/μL), 1 µL of LA PCR Buffer II (Mg2+), 0.16 µL of forward primer (10 mM), 0.16 µL of reverse primer (10 mM), and 7.68 µL of ddH2O. The thermal profiles for DNA amplification were as follows: 4 min at 94 °C; 35 cycles of 30 s at 94 °C, 30 s at 52 °C, and 60 s at 72 °C, followed by a final 10-min extension at 72 °C. Amplified products were purified and sequenced on an ABI 3730xl DNA Analyzer by TianYi HuiYuan Biotechnology Co. Ltd (Beijing, China).

Microsatellite marker development and genotyping

We developed genome-wide microsatellites from the D. melanogaster genome (accession PGRM00000000). The QDD3 program (Meglecz et al., 2010) was used to extract microsatellites along with their flanking sequences (300 bp each) from the scaffolds, and to design the primers. According to previous reports, dinucleotide microsatellites are prone to polymerase slippage during the amplification process, which may lead to mistyping, so multi- (including tri-, tetra-, penta-, and hexa-) nucleotide loci were preferentially selected in this study. The criteria and parameters of primer design for the isolated microsatellite markers followed Cao et al. (2016a). Next, the output primers were further filtered manually using more stringent criteria: (i) only perfect (without repeat region interruptions) loci were retained; (ii) the minimum distance between the 3′ end of two primer pairs and their target region had to be >10 bp; and (iii) primers using the “A” design strategy that do not have multiple microsatellites, nanosatellites, or homopolymers in the amplicon were retained. A total of 60 primer pairs were ultimately screened out, which were used for further isolation of polymorphic microsatellite loci.

The 60 primer pairs were initially chemically synthesized by Tsingke Biotechnology Co. Ltd (Beijing, China), with a universal primer (CAGGACCAGGCTACCGTG) added to the 5′ end of the forward primers based on the method of Blacket et al. (2012). One individual D. melanogaster from each of the 16 populations was randomly selected to test the polymorphism level of target microsatellite sequences and the amplification efficiency of synthesized primers. The PCR amplification reaction was carried out in a 10 μL volume consisting of 0.5 µL of template DNA (5–20 ng/µL), 5 µL of Master Mix (Promega, Madison, WI, USA), 0.08 µL of PC tail modified forward primer (10 mM), 0.16 µL of reverse primer (10 mM), 0.32 µL of fluorescence-labeled PC tail (10 mM), and 3.94 µL of ddH2O. The thermal profiles for DNA amplification were as follows: 4 min at 94 °C; 35 cycles of 30 s at 94 °C, 30 s at 56 °C, and 45 s at 72 °C, followed by a final 10-min extension at 72 °C. The amplified PCR fragments were analyzed on an ABI 3730xl DNA Analyzer (Applied Biosystems) using the GeneScan 500 LIZ size standard (Applied Biosystems). Genotyping was conducted using GENEMAPPER 4.0 (Applied Biosystems, USA). Primer pairs that failed in the PCR amplification for more than two individuals, that were monomorphic in tested individuals, or that produced more than two peaks were discarded (Table 2).

Table 2 Fifteen microsatellite markers developed to genotype Drosophila melanogaster in this study.

Identification of relatives within populations

We identified relatives within each population to reduce the influence of close relatives on genetic diversity and population structure analysis. The software of SPAGeDi v 1.5 (Hardy & Vekemans, 2002) was used to calculate Loiselle’s K based on microsatellite loci. Kinship among individuals was identified based on the K scores of pairwise individuals. The maximum K value among individual pairs with different mitochondrial cox1 haplotypes was used as a standard to discriminate the related pairs. Individual pairs with K values larger than the standard were considered as relatives.

Genetic diversity analysis

For the mitochondrial cox1 gene, sequencing results from both strands were assembled into a consensus sequence using the software SEQMAN in LASERGENE version 7.1.2 (DNASTAR, Inc., USA). Sequences from all individuals were aligned using CLUSTALW (Thompson et al., 1994) implemented in MEGA version 7 (Kumar et al., 2016). The number of polymorphic sites (S), haplotype diversity (h), and nucleotide diversity (π) were analyzed in DnaSP version 5.10 (Librado & Rozas, 2009).

For microsatellite loci, the genotyping results were corrected with the software MICRO-CHECKER (van Oosterhout et al., 2004). The number of alleles, observed heterozygosity (HO), and expected heterozygosity (HE) were calculated in the Microsoft Excel microsatellite toolkit version 3.1 (Park, 2001). Deviation from Hardy–Weinberg equilibrium (HWE) at each locus/population combination, linkage disequilibrium (LD) among loci within each population, and pairwise mean population differentiation (FST) were examined using GENEPOP 4.2.1 (Rousset, 2008). We compared the total number of alleles (AT), standardized number of alleles (AS), and standardized expected heterozygosity (HES) among populations with different sample sizes using a rarefaction method in GENCLONE (Arnaud‐Haond & Belkhir, 2007). Allelic richness (AR) and allelic richness of private alleles (PAR) were calculated with a rarefaction approach in HP-RARE version 1.1 (Kalinowski, 2005) on a minimum sample size of 12 diploid individuals. Finally, the inbreeding coefficient (Fis) within populations was estimated by FSTAT version 2.9.3 (Goudet, 2017).

Phylogenetic relationship and population genetic structure analysis

For mitochondrial DNA, a neighbor-joining phylogenetic tree was constructed by MEGA version 7 (Kumar et al., 2016) to examine phylogenetic relationships among mitochondrial cox1 haplotypes of D. melanogaster from different sampling locations worldwide. Apart from sequences obtained in this study, we searched orthologous cox1 gene sequences from NCBI nucleotide database and included those with sampling locations in phylogenetic analysis (Nunes et al., 2008a, b). A Kimura two-parameter distance model was used in the tree inference as suggested by analysis in PartitionFinder version 1.1.1 (Lanfear et al., 2012). We conducted a bootstrap test with 1000 replicates.

For microsatellite loci, a Bayesian cluster method implemented in STRUCTURE version 2.3.4 (Pritchard et al., 2000) and a discriminant analysis of principal components (DAPC) were used, respectively, to investigate genetic structure across the populations. In the STRUCTURE analysis, 30 replicates for each K (from 1 to 10) were run with 200,000 Markov Chain Monte Carlo iterations after a burn-in of 100,000 iterations. The optimal K value was obtained using the delta (K) method by submitting output results of STRUCTURE to the online software of STRUCTURE HARVESTER version 0.6.94 (Earl, 2012) (http://taylor0.biology.ucla.edu/structureHarvester/). CLUMPP version 1.1.2 (Jakobsson & Rosenberg, 2007) based on the greedy algorithm was used to combine the membership coefficient matrices (Q‐matrices) of replicated runs for each K from STRUCTURE HARVESTER, and DISTRUCT version 1.1 (Rosenberg, 2004) was applied to prepare the visual representation. We ran a DAPC analysis in the R package ADEGENET 1.4-2 (Jombart et al., 2008) to identify genetic clusters among sampling individuals. DAPC analyses do not rely on any biological assumptions, providing a supplement to model-based STRUCTURE analysis.

Demographic history analysis

We used the approximate Bayesian computation, a model-based method, implemented in DIYABC version 2.1.0 (Cornuet et al., 2014), to infer the demographic history of D. melanogaster in China using the microsatellite data. This method allows a comparison of competing scenarios and incorporates unsampled populations in the analysis (Bertorelle et al., 2010). It has proven to be a powerful approach to test complex population genetic models in many species (Fraimout et al., 2017; Lombaert et al., 2014; Stone et al., 2017; Wei et al., 2015). In the analysis, we considered the relationship between the NW (XJKL) group, SW (SCPZ and YNDH) group, and CE (QHXN, YNKM, SCCD, SXAK, NMHH, HBWH, JXSR, SDLY, FJFZ, NMHL, and ZJHZ) group by comparing 18 competing scenarios involving three types of dispersal models (scenarios 1–6: any two groups were sequentially from the other group; scenarios 7–12: one group was an admixture of the other two; scenarios 13–18: the three groups dispersed in a stepping stone way) (Supplementary Fig. S1). We tested these scenarios using 22 datasets generated from one population of each group. Comparison of competing scenarios, estimation of posterior distributions of demographic parameters, model checking and evaluation of confidence in scenario choice, and accuracy of parameter estimation were performed following methods of previous studies (Cao et al., 2016b; Wei et al., 2015). Priors of each parameter involving in each scenario are shown in Supplementary Table S1.

Partitioning geographic and climate effects on genetic variation

To evaluate the effect of geographical separation and environmental factors on genetic differentiation of D. melanogaster, isolation by distance (IBD) and isolation by environment (IBE) analyses were performed. For IBD analysis, the correlation coefficient between pairwise population differentiation (calculated by FST/(1 – FST)) and geographic distance was estimated using the R package ade4, with 10,000 permutations. For IBE analysis, 19 standard bioclimatic variables for each sampling location across China were downloaded from WorldClim database version 2 with a spatial resolution of 10 min, and other parameter settings were identical to those in the IBD analysis.

To examine the extent to which genetic distance based on microsatellite genotypes was explained by geographical and environmental factors and by their interaction, a multivariate redundancy analysis (RDA) was performed. As a constrained linear ordination method, RDA combines multiple linear regression and principal component analysis. The geographic information and climate data for 16 different sampling sites were used, as for the IBD and IBE analyses above. Genetic distance between populations (not individuals) was computed. Before RDA analysis, the matrices of pairwise geographic distances were transformed into principal components of neighborhood matrices (PCNM) with the pcnm function implemented in the R package vegan (https://github.com/vegandevs/vegan), and only the first half of positive eigenvectors was used as explanatory variables of genetic variance among populations. To avoid the high collinearity among explanatory variables, a standard value of variance inflation factor (VIF) was set in advance; any candidate variables with VIF value higher than 10 were excluded as described in Cao et al. (2019). This led to three geographic variables (PCNM1, PCNM2, and PCNM4) and five climate variables (bio2, mean diurnal air temperature range; bio3 isothermality, bio5, maximum temperature of warmest month; bio13, precipitation of the wettest month; and bio15, precipitation seasonality) being retained in the model. Both the full RDA (environmental variables and geographical components) and partial RDA (environmental variables or geographical components) analyses were performed using the vegan package in R. With a full RDA model, the degree of geography, environment, and their collinear proportion were analyzed, while the independent effect of the environment was estimated using the partial RDA model in which geographic effects were constrained.

Results

Mitochondrial haplotype diversity and phylogeny

After removing one of the individuals from pairs of relatives (in total, 58 removed), there were 272 individuals available for analysis. Fourteen mitochondrial haplotypes were identified in these individuals from the 16 populations collected across China (Fig. 1). Hap01 was dominant in all the populations. Hap02 was mainly found in populations from western China, excepting XJKL and YNDH. In addition, populations from the western and southwest areas, except for XJKL and YNDH, possessed a higher number of haplotypes than most populations from the central and eastern areas. Populations from SXAK adjacent to the southwestern regions of China showed the highest level of mitochondrial DNA diversity, with the highest values for both haplotype diversity (Hd) and nucleotide diversity (Pi) (Table 3). NMHH and GDGZ populations also exhibited relatively high diversity, as indicated by Hd and Pi.

Table 3 Population genetic diversity of Drosophila melanogaster collected from China.

We related Chinese haplotypes in this study to cosmopolitan haplotypes by including D. melanogaster cox1 haplotypes from previous reports into the analysis. The results showed that 14 of 23 currently identified mitochondrial haplotypes could be found in China, and 10 of them were unique to China (Fig. 2). The phylogenetic relationships among all haplotypes indicated that Hap12 (unique to YNKM) and Hap 2 (from NMHH, SDLY, SCCD, SXAK, YNKM, SCPZ, QHXN, and Hsinchu of Taiwan) in China were relatively ancestral, following the African haplotypes Hap19 and Hap 20.

Fig. 2: Neighbor-joining phylogenetic relationships among mitochondrial cox1 haplotypes of Drosophila melanogaster worldwide.
figure 2

Hap1 to Hap 14 were found in this study, while Hap15 to Hap23 were reported in previous studies. The haplotypes unique to China are in red. Drosophila simulans was used as the outgroup.

Microsatellite marker development and genetic diversity

The QDD3 program provided 63,888 primer pairs, which included 50,412 tri-, 10,070 tetra-, 1651 penta-, and 1755 hexa-nucleotide microsatellites. Following our stringent filter, a total of 60 primer pairs flanking perfect microsatellites were retained. Among them, the number of primer pairs flanking tri-, tetra-, and penta-nucleotide microsatellites was 55, 4, and 1, respectively. In our initial test of these 60 primer pairs, 15 pairs generated polymorphic genotypes, 15 pairs failed to generate amplification in any individuals, and 30 pairs failed in more than two individuals. Finally, 15 microsatellite markers were retained for population-level genotyping (Table 2).

The genetic diversity in 16 populations of D. melanogaster from China was evaluated using the 15 developed microsatellite markers. Significant departures from HWE (p < 0.05) were detected in 18 of the 240 population-locus combinations after sequential Bonferroni correction. However, multi-locus tests revealed that none of the 16 populations departed consistently from HWE. In addition, 65 of 1680 locus-locus pairs showed LD in at least one population (p < 0.05), whereas 2 (S04–S39 pair on the Chromosome 2L, and S10–S44 pair on the Chromosome 2R) of 105 locus pairs showed LD across all populations. Inversions could produce LD between distantly located loci on the same chromosome, but the two members had not been identified in the same inversion fragment for any LD pairs by referring to the website PopFly (http://popfly.uab.cat/) (Hervas et al., 2017). Perhaps new inversions exist in the chromosomes of D. melanogaster from China. Genetic diversity parameters (Table 3 and Supplementary Fig. S2) varied among populations, and most of the southwest populations except for YNDH showed relatively high values of diversity, including the average allelic richness (AR) and the standardized total number of alleles (AS), when compared to central and eastern populations.

There was no significant correlation in the diversity parameters between microsatellite loci and mitochondrial cox1. The association was significant for pairs of parameters estimated from the same type of genetic markers, such as Hd vs Pi (r = 0.78, p < 0.001), Pi vs S (r = 0.93, p < 0.001), AR vs AS (r = 0.84, p < 0.001), and AR vs HET (r = 0.74, p < 0.001).

Population genetic structure

For the microsatellite loci, pairwise FST values ranged from 0 to 0.11, with an average FST value of 0.03 (Table 4 and Supplementary Table S2). Among them, XJKL, GDGZ, NMHB, and LNHL showed the highest level of genetic differentiation from the other populations. Contingency tests (based on likelihood ratios) on genotype frequencies across populations revealed that most pairwise FST estimates were significantly different (p < 0.05, Table 4).

Table 4 Pairwise population differentiation (FST, lower triangle) and their statistical significance (p values from exact G test, upper triangle) among 16 populations of Drosophila melanogaster based on microsatellites.

STRUCTURE analysis indicated that almost all populations could not be well separated when K increased from 3 to 6, aside from XJKL when K = 5 and K = 6 (Fig. 3a). Some correlation was found between the proportion of some clusters and the longitudinal distribution of populations (the blue cluster when K = 5: R2 = 0.449, p < 0.001; the yellow cluster when K = 6: R2 = 0.386, p < 0.001). DAPC results showed a similar pattern of population differentiation as that revealed by the STRUCTURE analysis, with XJKL and LNHL populations forming outliers and the other populations clumping together (Fig. 3b).

Fig. 3: Population genetic structure of 16 Drosophila melanogaster populations from China based on 15 microsatellite loci.
figure 3

a Genetic structure inferred from STRUCTURE and b DAPC analysis. The optimal number of clusters for the STRUCTURE analysis was five; K = 3, 4, 5, and 6 are presented (a) with populations arranged from west to east. DAPC analysis shows LNHL, XJKL, GDGZ, and NMHB to be outlier populations (b).

Demographic history of D. melanogaster in China

In the DIYABC analysis, scenarios 8, 13, and 14 were supported with the highest posterior probabilities by 4, 5, and 13 out of 22 datasets, respectively (Supplementary Table S3). All of these three scenarios assumed that the NW group was ancestral of D. melanogaster in China. The posterior probability in datasets supporting scenarios 8, 13, and 14 ranged from 0.115 to 0.276, from 0.207 to 0.276, and from 0.170 to 0.285, respectively (Supplementary Table S3). The sum of posterior probabilities for all scenarios in each dataset that support the NW (scenarios 1, 2, 7, 8, 13, and 14), CE (scenarios 3, 4, 9, 10, 15, and 16), and SW (scenarios 5, 6, 11, 12, 17, and 18) origin ranged from 0.534 to 0.903, from 0.055 to 0.321, and from 0.032 to 0.182, respectively.

Geographic and climate effects on genetic variance

Mantel tests indicated an extremely significant association between genetic distance and geographical distance (r = 0.5253, p = 0.0014; Fig. 4a) and environmental distance (r = 0.344, p = 0.009; Fig. 4b) for all sampling individuals from China.

Fig. 4: Scatter plots of isolation by geograhical distance and environment for all populations of Drosophila melanogaster in China.
figure 4

a Correlations between population genetic differentiation ((FST/(1 – FST)), geographical distance, and b environmental distance analyzed by Mantel tests Genetic differentiation was calculated based on 15 microsatellite loci. Correlation coefficients (r) and p values are given.

A full RDA model showed that only 9.1% of the total genetic variance could be explained by the effect of geography, the environment, and their interaction. For the explained genetic variance, independent effects of environmental conditions and geography accounted for 62.10% and 31.58%, respectively, while their collinear component accounted for 6.32%. When environmental and geographic effects were considered simultaneously in the full RDA analysis, two climatic variables (bio3 (isothermality) and bio13 (precipitation of the wettest month)) and two geographic variables (PCNM3 and PCNM4) were correlated with genetic distance (Supplementary Fig. S3). When geographic variables were constrained in the RDA analysis, three environmental variables (bio2 (mean diurnal air temperature range), bio13 (precipitation of the wettest month), and bio15 (precipitation seasonality)) were correlated with genetic distance (Fig. 5).

Fig. 5: Partial RDA analysis on genetic variance explained environmental variables when geographical variables were constrained.
figure 5

Individuals from the same population are represented by dots with the same color. The blue vectors are the environmental predictors used in this study, and arrows indicate the strength of correlations between the environmental variable and genetic variance. Five climate variables were included in analysis: bio2 (mean diurnal air temperature range), bio3 (isothermality), bio5 (maximum temperature of warmest month), bio13 (precipitation of the wettest month), and bio15 (precipitation seasonality).

Discussion

Distribution of genetic diversity in D. melanogaster from China

The mitochondrial data in this study showed that Chinese populations had a large number of unique haplotypes, and some of them were previously undescribed (Fig. 2). The high differentiation between Asian D. melanogaster and populations from other parts of the world has been noted in previous studies (Hale & Singh, 1991; Nunes et al., 2008a; Schlötterer et al., 2006). A founder event (bottleneck) occurring in the early phase of colonization may be one reason for the high genetic divergence between Chinese populations and other populations. A “Far Eastern Race” has previously been theorized to explain differentiation in D. melanogaster between Asian and other populations, reflecting independent colonization events during the out-of-Africa habitat expansion of D. melanogaster into the Asian area (David & Capy, 1988). However, this theory has been questioned because of a lack of genomic evidence from Southeast Asian samples (Laurent et al., 2011). Further analyses that include multiple Asian and African populations from across these regions are needed to test these historical connections (Arguello et al., 2019).

In addition, both the mitochondrial variation and microsatellite data showed that, with a few exceptions, southwest China populations had relatively high levels of genetic polymorphism as measured by haplotype diversity (Hd), nucleotide diversity (Pi), average allelic richness (AR), and standardized total number of alleles (AS), when compared to most populations from eastern regions of China (Table 3). Further studies by including samples from and outside China may help to adequately examine this pattern of genetic diversity as well as its underlying evolutionary processes.

Population genetic structure of D. melanogaster in China

Pairwise FST results among the sampled populations revealed relatively low levels of population differentiation, with an average value of 0.031 (Table 4). The low-level divergence among D. melanogaster populations within China might be attributed to a lack of time for genetic drift to occur in the absence of strong bottlenecks. Similarly, low levels of differentiation have also been documented within the European population and between populations on different continents that may have recent European ancestry (e.g. Australia, North America) (Nunes et al., 2008a, b; Kapun et al., 2020). In contrast, higher levels of differentiation have usually been found between populations from southeast Asia and Oceania (Schlötterer et al., 2006). Nevertheless, we were able to find some differentiation among populations. In particular, XJKL, GDGZ, NMHB,1 and LNHL, which were from the westernmost, southernmost, northernmost, and easternmost collection points in the current study, showed the highest genetic differentiation from other populations (Supplementary Table S2).

Demographic history of D. melanogaster in China

Our DIYABC analysis showed that scenarios of the northwestern origin of the D. melanogaster in China have the highest posterior probabilities. Although DIYABC analysis has proven to be a powerful approach in exploring the genetic relationship among the sampled or unsampled populations by comparing potential competing scenarios (Bertorelle et al., 2010), it should be acknowledged that limited coverage of sampling and a lack of potential source populations may bias the results. We found that the posterior probability for the best-fit scenario in each dataset was very low in our study. This is our best estimate, but it provides weak support. Thus, our results cannot be used to reach a robust conclusion, especially in the absence of additional samples from the northwest and outside of China. Additional populations should be further sampled in an ensuing study to validate the dispersal history of D. melanogaster into and within China.

Ecological adaptation of D. melanogaster in China

Despite low FSTs (Table 4), we did find a potential longitudinal pattern accompanied by admixture (blue cluster when K = 5, yellow cluster when K = 6 in Fig. 3a; Fig. 4b), and RDA analysis suggests that precipitation may have had an impact on patterns of genetic variation in Chinese populations of D. melanogaster (Fig. 5). Consistent with the RDA results, the XJKL and LNHL populations, which came from the driest and wettest regions among our sampling locations, had the highest levels of differentiation from other populations, as revealed by the DAPC and FST analyses. Adult D. melanogaster is sensitive to ambient humidity, and precipitation is usually regarded as one of the most important ecological factors limiting the distribution ranges of this species in nature (Gibbs et al., 1997; Hoffmann & Parsons, 1993). Laboratory selection in D. melanogaster can lead to rapid adaptation to desiccation stress (Telonis-Scott et al., 2016). Previously published work has shown genetically determined clinal variation in D. melanogaster along precipitation gradients (Hoffmann et al., 2001; Karan & Parkash, 1998; Parkash & Munjal, 2000). Our results may, therefore, partly reflect strong selection and adaptation in natural populations of D. melanogaster assuming some linkage between the microsatellite markers and traits under selection, which would be enhanced for microsatellite markers located within D. melanogaster inversion polymorphisms that respond rapidly to clinal selection (Kennington et al., 2006). Furthermore, an association between environmental factors and genetic structure over and above geographical distance could reflect patterns of genetic exchange driven by environmental conditions, which in themselves might reflect trade, given that regions with similar climates should support similar types of fruit production. Given the overall rapid decay of LD in parts of the D. melanogaster genome, these patterns need to be further investigated with high-density markers such as SNPs. Sampling will need to be robust because FST values were low and because the factors we considered here only accounted for around 9% of the genetic variance among populations.

Conclusion

In this study, we considered the population genetic structure of D. melanogaster across China. We found low levels of genetic differentiation, high levels of admixture, and isolation by geographical and environmental factors in this species. China covers a range of climatic conditions within a complex topography where there has been a long history of human colonization. This context provides an opportunity for further research with D. melanogaster populations, such as in linking putative adaptive polymorphisms to climatic selection (Hoffmann et al., 2003) and in understanding the role of inversions in local adaptation by testing for parallel patterns across different continents (Kapun et al., 2018).