The Aldabra giant tortoise (Aldabrachelys gigantea) is one of only two extant giant tortoise species together with the Galapagos giant tortoise complex (Chelonoidis niger) (Turtle Taxonomy Working Group 2017). The natural distribution of A. gigantea is restricted to Aldabra Atoll, Seychelles, northeast of Madagascar. The species is the largest vertebrate in the ecosystem and fulfills prominent functional roles in shaping and sustaining large-scale landscape dynamics, such as herbivory, seed dispersal, and nutrient cycling (Hansen 2015; Hnatiuk et al. 1976; Merton et al. 1976). Accordingly, for the rewilding of several Western Indian Ocean Islands, captive-bred A. gigantea have been introduced to degraded ecosystems as non-native species to functionally replace extinct endemic species of giant tortoises (Griffiths et al. 2010, 2013; Hansen et al. 2010). Recent studies on feeding behavior, fecal analyses, and vegetation monitoring have shown that these replacement tortoises seem to be doing a good job in resurrecting herbivory and seed dispersal interactions (Griffiths et al. 2010, 2013; Hansen et al. 2010). Depending on the effectiveness of these rewilding projects, spatially larger projects in Madagascar are planned, with pilot releases underway (Pedrono et al. 2013, 2020). Large-scale translocations may also become necessary for the protection of the species, as rising sea levels threaten the large, wild population on the low-lying Aldabra Atoll (Falcón et al. 2018).

Despite the increasing trend of application, a proper evaluation of project outcomes is still lacking from the majority of the rewilding programs (Nogués-Bravo et al. 2016; Svenning et al. 2016). Although some evaluations have looked at the functional roles of the introduced populations, none have assessed the genetic variation within both wild and rewilded populations. Incorporating genetic monitoring in the evaluation of rewilding projects is critical to enable the long-term survival of translocated populations (Stronen et al. 2019). Furthermore, the mating system of A. gigantea remains largely unknown, but such knowledge would highly facilitate the efficient management of captive populations.

To address these needs, we performed double digest restriction sequencing (ddRad-seq; Peterson et al. [2012]). We extracted DNA from 35 blood samples, representing all main islands of Aldabra Atoll, where wild A. gigantea occur (Malabar, Grande Terre, and Picard, N = 33), and two captive populations in Mauritius and Rodrigues (N = 2, Supplementary Information 1&2). DNA extractions were performed using a sbeadex™ kit following the manufacturer’s protocol. A single ddRad-seq library was prepared with the restriction enzymes EcoRI and BfaI and sequenced on an Illumina Miseq platform with a 300 bp paired-end run. A total of 24,942,560 raw reads were generated for the 35 individuals of which 94.3% were retained after demultiplexing with Stacks v2.53 (Rochette et al. 2019) and quality filtering with Trimmomatic v0.39 (Bolger et al. 2014).

Although there is currently no reference genome available for A. gigantea, we chose a reference-based over a de novo SNP discovery. Our decision was mainly based on the high mapping rate of 96.5% (range: 94.7–97.0%) attained when aligning the quality-filtered data to the Gopherus evgoodei genome (NCBI Accession PRJNA559383, the phylogenetically closest species that has a chromosome-level assembled genome available [Kehlmaier et al. 2019], using default settings of BWA-MEM version 0.7.17 [Li and Durbin 2009]). Furthermore, the use of an annotated genome made it possible to exclude coding regions in order to focus on putatively neutral SNPs (see below). Sequencing depth per sample was 2.28x (range 0.2–6.1x) again making a reference-based SNP discovery (here with the HaplotypeCaller module of GATK v4.1.7 [McKenna et al. 2010] in GVCF mode) more suitable than a de novo SNP discovery. Genotypes of all 35 samples were called jointly, using GATK GenotypeGVCFs. SNPs were preliminarily filtered using the GATK VariantFiltration function by following the GATK Best Practices recommendations (Van der Auwera et al. 2013). Eventually, 5,442,728 SNPs were filtered with VCFtools 0.1.17 (Danecek et al. 2011) to only retain sites with coverage between 2 and 10, and a genotyping rate > 50%. This resulted in a minimum coverage per called SNP of 48x. Sites with PHRED quality < 30, minor allele count (MAC) < 3, and deviating from Hardy Weinberg Equilibrium (HWE) with a p-value < 0.00001 were excluded. Next, SNPs located in the non-coding regions of the reference genome (hence putatively neutral) and polymorphic in all three wild subpopulations, as well as both of the captive populations were selected as candidate SNP markers. To avoid putative paralogs, SNPs with observed heterozygosity > 0.5 were removed, and SNPs with minor allele frequency (MAF) < 0.2, and > 0.8 were filtered out to maximize the potential information content. LD pruning was performed with the default settings of PLINK v1.9 (Purcell et al. 2007) except for a window size of 4 Gbp (to include all possible pairwise LD estimates). Finally, we obtained a total of 1674 genome-wide and putatively neutral SNPs (Fig. 1). (See also Supplementary Information 3 for all the details of the bioinformatic analyses).

Fig. 1
figure 1

A flow diagram summarizing all the filtering steps that eventually yielded the total number of SNPs identified

The values of expected heterozygosity and the MAF ranged from 0.33 to 0.50 and from 0.20 to 0.50, respectively. The sequences spanning ∓ 300 bp of the SNP positions and the rest of the details about the 1674 genome-wide SNPs discovered for A. gigantea are summarized in Supplementary Information 4 and 5, respectively. These novel SNP markers will serve as a tool for the population structure analysis of A. gigantea and facilitate the management decisions regarding future rewilding plans using A. gigantea as important ecosystem engineers.