Introduction

In the context of climate change and human population growth, plant genetic resources (PGR) are of the utmost importance. They constitute the foundations of agricultural sustainability and global food safety (Ulukan 2011; Ogwu et al. 2014), but they are facing conservation issues. Regarding ex situ management of PGR, this mainly involves seeds and some collections are at risk, due in part to the fact that they are generally underfunded and that evaluation and characterization are often imprecise or inadequate (Fu 2017). In that respect, careful PGR management is crucial from storage to use (Maxted et al. 1997). For clonally propagated perennial species, such as fruit and nut tree species, PGR are generally conserved in ex situ orchards as grafted cultivars, which has its pros and cons: the main advantages are that they can be stored under the climate conditions of their intended use, and can be evaluated during storage; but on the other hand, they require a lot of space and the cost of conservation is significant (Hammer and Teklu 2008).

Nowadays, molecular tools contribute to PGR management (Wambugu et al. 2018), since they can help to find genetically close or synonymous accessions and to create “core collections” which will contain maximum genetic diversity within the smallest number of accessions, leading to a reduction in conservation costs, among other things. Simple Sequence Repeats (SSRs) were frequently used because of their high polymorphism, but the development of high-throughput genomics tools has led to the use of Single-Nucleotide Polymorphisms (SNPs). In the Persian walnut (Juglans regia L.), a species which is widely disseminated and grown in many temperate regions, more than 20 publications mention the use of SSRs (Bernard et al. 2018a). This type of markers was particularly used to decipher the genetic diversity and the structure of numerous walnut accessions at a continental scale (Ebrahimi et al. 2016). SSRs were also used to perform a landscape genetic analysis across the Asian range of Persian walnut, species native to the foothills of the Himalayas (Pollegioni et al. 2014). The authors showed that reservoirs of high genetic diversity still exist, confirming the importance of the conservation of walnut genetic resources in this area. Recently, a high-density Axiom™ J. regia 700K SNP genotyping array was developed and validated, initiating a novel genomic era in walnut (Marrano et al. 2019a). This genomics tool was used for instance to investigate the genome-wide population structure in Iranian walnut populations (Arab et al. 2019), and to perform genome-wide association studies (Bernard et al. 2020; Marrano et al. 2019b).

As a result, it is fair to wonder how to conduct the tasks related to germplasm management, such as core collections construction, and the choice of the type of marker to be used is one of the first questions. Neutral SSR loci, due to slippage during replication, usually mutate much more frequently than SNP loci, leading to population-specific alleles that are useful for revealing population structure (Tsykun et al. 2017). On the other hand, since they are usually developed in small numbers for one species, they therefore may not reflect genome-wide genetic diversity contrary to SNP loci that are much more frequent in the genome of most species (Ljungqvist et al. 2010). These two types of marker bring different views of the structure, and the merits of each are listed in Guichoux et al. (2011). SSRs and SNPs were compared in order to assess population structure and relatedness in short-lived species such as rice (Singh et al. 2013), maize (Hamblin et al. 2007; Van Inghelandt et al. 2010; Yang et al. 2011), sunflower (Filippi et al. 2015), bean (Müller et al. 2015), and cowpea (Desalegne et al. 2017). These comparisons are rarer for perennials (grapes (Emanuelli et al. 2013) and jujube (Chen et al. 2017)). A recent work in an apple gene bank compared population structure and relatedness using 15 SSRs and more than 15,000 SNPs. However, studies on core collections construction are lacking.

The aim of this paper is to identify the type of marker and the core collections construction methods more relevant for routine walnut germplasm management.

Materials and methods

Plant materials and genotypic data

The plant material consists in 150 unique accessions of Juglans regia from American, European, and Asian continents. The trees are maintained at the Prunus and Juglans Genetic Resources Center located in the Fruit Experimental Unit of INRAE in Toulenne (latitude 44° 34′ 37.442″ N–longitude 0° 16′ 51.48″ W), near Bordeaux, France (S1 Table). The 150 accessions were selected following two steps. (i) We created a GWAS panel of 170 accessions (Bernard et al. 2020) from SSR and SNP genotyping were described and made publicly available (Bernard et al. 2018b; Bernard et al. 2020).

Structure analyses

Principal Coordinate Analysis (PCoA) was used to determine the structure patterns among the 150 accessions (Fig. 1). Dissimilarities, based on allelic data, were calculated with 10,000 bootstraps, and transformed into Euclidean distances using a power transformation of 0.5. PCoA was performed using the DARwin 6.0.14 software (Perrier and Jacquemoud-Collet 2006), supplemented by the scatterplot3d R package for 3D visualization.

Fig. 1
figure 1

Workflow of the structure analyses and the core collections construction

As linked SNPs can account for too much in the population structure variance, particularly in linkage disequilibrium (LD) regions (Price et al. 2008), a pruned subset of SNPs was also used for the PCoA. This filtering was completed using the PLINK 1.9 software, keeping only the SNPs with the higher minor allele frequency, and based on a threshold of r2 = 0.2 (command--indep-pairwise 50 5 0.2). A subset of 100 SNPs, a number similar to that of SSR alleles, randomly selected, was also tested using the PLINK 1.9 software (command--thin-count 100) to compare PCoA results.

The genetic structure of our panel was also investigated using two types of analyses depending on the markers. Bayesian model–based analysis using the STRUCTURE 2.3.4 software was implemented for the SSR markers (Pritchard et al. 2000), following the method of our previous work (Bernard et al. 2018b). The ΔK method (Evanno et al. 2005), implemented in STRUCTURE harvester (Earl and von Holdt 2012), was used to determine the most likely K.

Using SNP markers, a sparse non-negative matrix factorization algorithm was implemented using the sNMF 2.0 software, available as a function of the LEA R package (Frichot and François 2015). This software has a fast and efficient program for estimating individual admixture coefficients from large genomic datasets, and produces results very close to Bayesian clustering programs such as STRUCTURE (Frichot et al. 2014). The choice of the best number of clusters (K) is based on a cross-entropy criterion implemented in the LEA R package. For SSR and SNP markers, thresholds of 0.8 and 0.7 for admixture coefficients, respectively, were chosen to consider one accession as admixed.

Next, the genetic relationships between the 150 accessions were also assessed with the Neighbor-Joining method (Saitou and Nei 1987) using the DARwin 6.0.14 software, for both the SSR and SNP datasets. The Unweighted Neighbor-Joining option was used to build the trees.

Core collections construction

Several core collections based on both marker types were constructed using two methods and two selection thresholds (Fig. 1). (i) The two methods: we used the “maximum length sub tree” function (Perrier et al. 2003) of the DARwin 6.0.14 software, which looks for a subset of accessions minimizing the redundancy between them, and limiting where possible the loss of diversity (diversity here is expressed by the tree built), and the “entry-to-nearest-entry” method (Odong et al. 2013), implemented in the Core Hunter 3 software (De Beukelaer et al. 2018), which looks for a subset of accessions as different as possible from each other, avoiding selecting a few clusters of similar accessions at the extreme ends of the distribution. (ii) The two selection thresholds: using the two methods mentioned above, we first used the SSR dataset to capture at least 80% or 90% of total allelic diversity, and the number of alleles captured in one SSR-based core collection was also estimated in the SNP dataset (selection threshold based on percentage of total allelic diversity). Secondly, we used a sampling intensity of 33% (n = 50/150) to observe the differences in the selected accessions, using these two methods (selection threshold based on a number of selected accessions).

Results

Comparison of the first level of structure using SSR and SNP markers

Using SSR markers, the higher drop of ΔK is for K = 2, followed by a rise for K = 6, and very similar results were obtained using SNP markers, since the higher drop of the cross-entropy criterion is for K = 2, with a curve slope starting for K = 6 (Fig. 2). To compare the results using SSR and SNP markers, we focused on the first level of structure for K = 2 (S2 Table). The individual admixture coefficients are shown for each of the 150 accessions (Fig. 3). For both marker types, the clustering is linked to the geographic origin of the accessions, as we already described (Bernard et al. 2018b; Bernard et al. 2020). Group A contains the accessions from Eastern Europe and Asia (named “E”), and group B contains the accessions from Western Europe and America (named “W”).

Fig. 2
figure 2

The most likely K subpopulations. K was evaluated with (a) the ΔK method using SSRs, and (b) the cross-entropy criterion using SNPs

Fig. 3
figure 3

Bar plots showing the individual admixture coefficients of the 150 accessions for K = 2. Structure was assessed (a) using SSRs, and (b) using SNPs. The accessions are ordered by their country of origin, in alphabetical order. Group A in red contains the accessions from Eastern Europe and Asia (“E”), whereas group B in yellow contains the accessions from Western Europe and America (“W”) and hybrids from INRAE

Using SSR markers, with a threshold of 0.8 for the individual admixture coefficient, we found 17 admixed accessions, so 88.7% of the accessions were assigned to a group (Table 1). They include 11 French and Californian modern hybrids and 6 accessions from Eastern Europe, Israel, and Central Asia, confirming our previous SSR analysis.

Table 1 Percentage of population assignment to a group from K = 2 to K = 6, using SSR and SNP markers

Using SNP markers and a threshold of 0.7, we found 24 admixed accessions (84% of assignment), including 9 of the 17 found using SSR markers (Fig. 4). From these admixed accessions, 7 are now clustered into group B, and “UK 215AG12” is now in group A. In addition, 14 accessions from group A, and 1 from group B, based on SSR clustering, are found admixed using SNP markers (Fig. 4). Therefore, only 23/150 accessions (15%) are differently clustered. At all events, we did not find any group exchange from A to B, or from B to A, when comparing the clustering based on SSR or on SNP markers.

Fig. 4
figure 4

SNP-based clustering results, compared with SSR-based clustering results. For a threshold of 0.7 for admixture, SNPs clustered 14 accessions found in group A, and one found in group B, compared with SSRs, into the admixed group. Conversely, SNPs clustered one and seven accessions found to be admixed, compared with SSRs, into groups A and B, respectively. There was no clustering exchange between groups A and B, when comparing both markers

When using a threshold of 0.8 for the SNPs, the percentage of population assignment for K = 2 is low (70.7%) but is still high for a threshold of 0.7 (84.0%) (Table 1), showing that 0.7 and 0.8 are more appropriate thresholds for SNPs and SSRs, respectively. In addition, we ran a Spearman rank correlation test for K = 2 and found that the clustering between SSR and SNP markers is highly correlated, up to 84%.

The PCoAs constructed in 2D and 3D show the clustering of the 150 accessions following the results obtained from K = 2 (Fig. 5). For K = 2, the PCoA results are in agreement with the structure based on SSR or SNP markers, since the scatterplots for groups A and B are well-defined by the first principal component. The admixed accessions are positioned mainly between the two groups. For both methods, the three Manregian walnuts (“Chase C7”, “Wepster W2,” and “Adams 10”) are found to be genetically diverse. Regarding the percentage of explained variation, they are also comparable using both types of markers. The first three axes (x, y, and z) explain 21.86% of the cumulative variation for SSRs, and 14.91% for SNPs.

Fig. 5
figure 5

Principal Coordinate Analysis scatterplots. PCoAs were constructed in 3D using (a) SSR, and (b) SNP markers, and in 2D using (c) SSR, and (d) SNP markers. The 150 accessions are colored following the K = 2 results: group A in red, group B in yellow, and admixed in grey

However, we found some differences between the two types of marker. Using SSR markers, the scatterplot corresponding to group A is more extensive, whereas group B is more scattered using SNP markers. Additionally, this group is split in two, showing the French accessions on one side and the Californian ones on the other.

The Neighbor-Joining method implemented in DARwin 6.0.14 was used to construct grouping trees with the 150 accessions (Fig. 6). The main branching groups of the trees obtained with both markers are in agreement with the structure results (K = 2), since they are mainly defined by groups A and B. The two accessions “Jin Long 1” from China and “PI 15 95 68” from Afghanistan are among the longest length branches. However, few differences were detected between the structure and the tree, with accessions from group A found in the branching group mainly corresponding to group B: 9 accessions using SSR markers, and 4 accessions using SNP markers.

Fig. 6
figure 6

Neighbor-Joining trees. Trees were constructed using (a) SSR, and (b) SNP markers. The 150 accessions are colored following the K = 2 results: group A in red, group B in yellow, and admixed in grey

We then inferred the individual admixture coefficients of the 150 accessions from K = 3 to K = 6 for both markers (S2 Table). Bar plots are given for information in S1-S4 Figs.

Comparison of three sets of SNPs for PCoA assessment

In addition, for PCoA assessment, we compared the entire set of 364,275 SNPs with a subset of SNPs filtered for LD, with a threshold of r2 = 0.2 (Fig. 7). We then retained 24,422 SNPs, or 6.7% of the entire set. Interestingly, the results found for both datasets are strongly similar, with the scatterplots well-distinguished, according to the K = 2 results (Fig. 7). We still distinguish the French accessions from the Californian accessions within group B. The main difference is that the variance is better explained by the first three axes using the LD-pruned set (18.90% vs. 14.91%). By comparing the entire set with a random subset of 100 SNPs, a range comparable with the total number of SSR alleles, we found that the scatterplots are less well-defined, but still in agreement with the K = 2 results (Fig. 7). In this case, we cannot distinguish the French accessions from the Californian accessions within group B.

Fig. 7
figure 7

Comparison of SNP sets for Principal Coordinate Analysis. PCoA were constructed using (a) the entire set of 364,275 SNPs, (b) the LD-pruned subset of 24,422 SNPs, and (c) the random set of 100 SNPs

Comparison of core collections using SSR and SNP markers

In light of the results obtained concerning structure analyses (13 SSRs give similar results compared to 364,275 SNPs), we decided to construct first four core collections based on SSR markers. We created them with both core collections construction methods and two thresholds, to capture at least 80% or 90% of total allelic diversity. To capture at least 80% of total SSR allelic diversity, we need 16 accessions using the method based on the “maximum length sub tree” function of DARwin 6.0.14, and 15 accessions using the “entry-to-nearest-entry” method of Core Hunter 3 (Table 2). Next, to capture at least 90% of allelic diversity, we need 27 and 32 accessions for each construction method, respectively. The accessions belong mainly to group A, from Eastern Europe and Asia, known to be more diverse: 11/16 and 12/15 respectively for 80%, and 20/27 and 21/32 respectively for 90%. They include, for instance, Iranian accessions, the Indian “Sopore” and several accessions from the Botanical Garden of Kiev. Regarding the French accessions, few landraces such as “Corne,” “Marbot” and “Grandjean1” were kept only when capturing 90% of allelic diversity using “Core Hunter 3”.

Table 2 Construction of the core collections using SSRs and two different methods of construction, to capture at least 80% or 90% of total allelic diversity

Moreover, the consistency of the results between both percentages of total captured allelic diversity was checked for the two core collection construction methods. Using Darwin 6.0.14, the 16 accessions selected to capture 80% of allelic diversity were found among the 27 selected to capture 90% (Table 2). However, using Core Hunter 3, 2/15 accessions selected to capture 80% were not found among the 32 selected to capture 90%. Hence, for 80%, 9 accessions (among 16 or 15) are in common between the two methods, and 19 accessions for 90% (among 27 or 32). We then estimated the number of alleles captured in the SNP dataset for each core collection. For the two SSR-based core collections capturing at least 80% of total allelic diversity, they also capture more than 99.5% of total SNP allelic diversity. For 90%, they reach more than 99.9%.

Next, we constructed four additional core collections using the same sampling intensity threshold of 33% to select 50/150 accessions, with both methods and both markers. Results show that 34/50 accessions are in common between the SSR and SNP markers using Darwin 6.0.14, and 32/50 accessions using Core Hunter 3 (Table 3). The accessions belong mainly to group A, from Eastern Europe and Asia, known to be more diverse (between 30 and 31/50 using SSRs, and between 22 and 23/50 using SNPs). They both include the Iranian accessions, the Chinese “Jin Long 1,” “PI 15 95 68” from Afghanistan, the Indian “Sopore,” the Bulgarian “Izvor 10” and “Plovdivski,” and several accessions from the Botanical Garden of Kiev.

Table 3 Construction of the core collections using both marker types, SSRs and SNPs, and two different methods, to keep n = 50 accessions

We also checked the consistency of the results between the two core collection construction methods for both markers. Using SSR markers, 39/50 accessions are in common between the two methods, and 45/50 accessions using SNP markers. Regarding the number of alleles, for SSR markers, we captured 97.1% and 94.3% of the 105 total alleles found within the entire collection, using the “maximum length subtree” and the “entry-to-nearest entry” methods, respectively. Using SNP markers, we captured more than 99.9% of the 728,550 total alleles found with the two methods (Table 3).

Discussion

The use of either SSR or SNP markers shows comparable results for the collection structure

By genotyping the panel of 150 J. regia accessions using 13 SSRs and more than 300,000 SNPs, we obtain similar results. Both types of marker showed a first level of structure for K = 2, with no exchange of accession between the main groups (A and B), which are related to the geographical origin of the accessions. Similar findings were reported in Ebrahimi et al. (2016), with K = 2 mainly corresponding to a group from Asia, and a group of European and North Africa. The exchanges observed between both types of marker only concern accessions that switched from one group to the admixed cluster, or vice versa. From K = 3 to K = 6, the results remain highly comparable, with the highlighting of substructures linked, for instance, to “Payne” pedigree for the Californian modern hybrids, or to geographical area (South-West vs. South-Est) for French landraces. When comparing structure results for K = 2 with PCoAs, and with grouping trees, the results are still consistent. More precisely, we found as the most diverse accessions, using both markers, the three Manregian walnuts originating from seed collected in north-eastern China and the two accessions “PI 15 95 68” and “Jin Long 1,” both coming from Asia (Bernard et al. 2018b). Consequently, SNP markers confirmed our results obtained using SSR markers (Bernard et al. 2018b). When considering the LD-pruned set of 6.7% of the entire set of SNPs, PCoA showed consistent clustering patterns and also consistency with those using the 13 SSRs.

These kinds of findings have previously been observed in other works, except that the number of markers needed to obtain a comparable resolution power would be different. For example, the broad patterns of PCoA were similar using 36 SSRs with 2.2 alleles per locus on average and 36 SNPs in 375 Indian accessions of rice (Singh et al. 2013). However, in local accessions of cowpea from East African countries, similar clustering patterns were found using more SNPs than SSRs (151 vs. 13) (Desalegne et al. 2017). In jujube, within a core-collection of 150 accessions, only 18 were classified into different groups based on the results of structure analysis using 24 SSRs and 4680 SNPs (Chen et al. 2017). Within various inbred maize lines, SSRs performed better at clustering accessions into groups using about 10 times more SNPs (Hamblin et al. 2007; Yang et al. 2011). In apple, another clonally propagated perennial, a comparison between 15 SSRs and more than 15,000 SNPs showed strong concordance between both markers for relatedness estimation and that about five to ten SNPs were equal to one SSR (Larsen et al. 2018). Other works suggest the use of three times (Müller et al. 2015) or seven to 11 times more SNPs to obtain comparable informative results (Van Inghelandt et al. 2010). In our study, we took care to choose highly polymorph and robust SSRs suitable for J. regia diversity studies, by reviewing the literature (Dangl et al. 2005; Woeste et al. 2002; Dang et al. 2016). When working with biallelic markers such as SNPs, it is known that the genetic distances can be equivalent to those calculated with SSRs using the formula n(k−1), where k is the average number of alleles per locus, and n is the number of loci (Laval et al. 2002). With 13 loci and 8.1 alleles per locus on average (four times that for SNPs), we theoretically need [13*(8.1−1)] = 93 SNPs to obtain equivalent genetic distances in our panel. Our findings based on PCoA clustering using the random set of 100 SNPs are consistent with this number of SNPs.

Moreover, SNPs globally tend to give higher proportions of inferred admixture, as observed in sunflower (Filippi et al. 2015). Regarding the admixture thresholds chosen, the results found when comparing 0.7 or 0.8 for SNPs also show that 0.7 is clearly more suitable for obtaining comparable structure. In this respect, we found that the percentage of assignment decreases as the K increases, particularly for the SNPs. Such differences were also reported in maize (Yang et al. 2011) and grape (Emanuelli et al. 2013).

The choice of marker type will depend on the necessary task related to germplasm conservation or utilization

The management of PGR comprises the following main steps: their conservation, which consists in acquisition of plant material (through the collection or protection of reserves in situ, or by exchange of ex situ material), their maintenance (such as storing and propagation), their characterization (based on both genotype and phenotype), and their utilization for research, breeding programs, or production (Bretting and Widrlechner 1995). Due to the increasing availability of genomics tools, a cross-disciplinary field which aims to use genomics in germplasm management has grown (Jia et al. 2017). But undoubtedly, the choice of SSR or SNP markers will depend on the purposes. For obvious reasons, a first choice criterion could be the cost of genotyping. New genomics technologies have a cost that has decreased dramatically in recent years. Apart from the DNA extraction, the cost of SNP vs. SSR genotyping was about 2600 times less in our study. For guidance only, we paid USD$8.80 for 13 SSR loci per sample (0.7 USD$/locus/sample) and USD$98.90 for one array of 364,275 robust loci per sample (2.7E-4 USD$/locus/sample). However, SSR genotyping is often more “flexible,” since we could choose precise numbers of loci and samples. In the case of a SNP array, all the loci available are assayed, for a 96-well DNA plate. In addition, only 59.8% of the available SNPs on the array were usable for the analyses after quality control. The cost comparison was also a topic of discussion in Larsen et al. (2018), in which SNP data was 10 times more expensive than SSR data. However, their SSR genotyping required much more manual preparation, such as PCR reactions.

A second choice criterion could be the nature of the plant material managed, particularly if the researcher works only on one crop, or with its wild species. In our case, the SSRs used were also highly transferable into wild species of the genus Juglans spp. (Bernard et al. 2018b). Conversely, the Axiom™ J. regia 700K SNP array used is valid on the cultivated species J. regia only and failed on our few wild species accessions tested. But this is not a general case. In Emanuelli et al.’s work (2013), the set of 384 SNPs used served to analyze 2273 accessions of grapes (Vitis spp.), including cultivated grapevines (V. vinifera ssp. sativa), wild grapevines (V. vinifera ssp. sylvestris), and non-Vitis vinifera species used as rootstocks.

A third reason to use either marker could be the main purpose of the genotyping. Well-chosen neutral SSRs would be sufficient for population structure and relationship determination, particularly in the first steps of germplasm management, since the computational time for analysis is lower. But SNPs, likely to be associated with functional variation, would be preferred for a genome-wide association study purpose (Vargas et al. 2016; Larsen et al. 2018).

Markers and construction methods must be compatible with the preservation of phenotypic variability

In light of our results, we found that SSR-based core collections could capture more than 99.5% of SNP total allelic diversity. SSR markers, when properly selected using the literature, may be the markers of choice to achieve this goal. Using SSRs, our analyses preferentially selected accessions from East Europe and Asia, as expected, because of their global higher genetic diversity, and helped us to understand that French landraces have a moderate level of genetic diversity. Given that French landraces represent 20% of the entire plant material panel, these findings confirm that their diversity is moderate. However, here it is important to note that the accessions selected in one core collection, even when slightly different, are not always the same in the repeated runs in Core Hunter 3, contrary to in Darwin 6.0.14. This feature, algorithm-dependent and more frequent when the panel has redundancy (Vargas et al. 2016), makes the comparison between construction methods more difficult.

In parallel with the preservation of allelic diversity, it is also necessary to take phenotypic variability into account. The INRAE walnut germplasm collection contains some accessions with unusual traits, such as weeping branches, laciniate leaves, or purple foliage, which may be used for ornamental purposes. Interestingly, the accession with weeping branches “Pleureur” is in the two core collections capturing 90% of SSR total allelic diversity, and the accession with laciniate leaves is in the two core collections constructed using Core Hunter 3. Based on chronological phenotypic data available (Bernard et al. 2019) and new data acquired, we also explored whether the core collections contain a high or low range of variability of some important traits. For the budbreak date for instance, crucial for climate change adaptation, the ten earlier accessions are “Early Ehrhardt,” “Mire,” “Payne,” “Serr,” “Kfar Hanania,” “IR 60-1,” “Sopore,” “Z 53,” “Ashley,” and “Lu Guang,” with a range of budbreak date from 65 to 75 Julian days. Five among the ten are found on the core collection “90% - Darwin.” Conversely, none of the ten later accessions (“Fertignac,” “Le Bordelais,” “St Jean n°1,” “Lalande,” “Candelou,” “Maribor,” “Semence Comité Dordogne,” “Ronde de Montignac,” “Culplat,” and “Romaine”), with a range of budbreak date from 110 to 122 Julian days, is found in the four core collections. Curators are well aware of the importance of phenotype knowledge for an effective PGR management. Currently, new ways to construct core collections are being evaluated to maximize the frequency of minor alleles and phenotypes (Vargas et al. 2016). An interesting application of such “custom methods” in carrot is well-described in Corak et al. (2019). The authors suggest that a core subset based on “custom methods,” e.g., core balancing functional and genetic diversity based on high-density genomic data and estimated breeding values, would be directly useful for breeders and more efficient for breeding programs.

Conclusion

In our comparison using 150 J. regia accessions, both SSR and SNP markers were highly efficient for highlighting the INRAE walnut collection structure, even though the number of SSRs is much lower than the number of SNPs. Moreover, similar results were obtained for core collections, irrespective of the construction method and the marker type used. The strategy for core collection construction can be based on several criteria and must meet germplasm management requirements (genetic diversity) and constraints (cost and surface of conservation). The lower the number of selected accessions, the higher the probability of discarding a genotype with traits of interest. Thus, this highlights the importance of phenotypic evaluation in germplasm management.

It is therefore important to consider the task of germplasm management when choosing the most appropriate marker. In general, few SSR markers are suitable and sufficient for obtaining a global idea of the structure and the genetic diversity of a germplasm, and for constructing a core collection.