figure b

Introduction

The transition from normoglycaemia to clinically diagnosed diabetes has a long subclinical ‘latent’ period, during which sufficient physiological reserve exists to counteract worsening peripheral insulin resistance and progressive pancreatic beta cell loss and failure. In this subclinical phase, circulating biomarkers of beta cell failure and molecular biomarkers of disease risk (e.g. genomics, metabolites and proteins) have been proposed to clarify early pathogenesis of diabetes. Limited data using genomics and metabolite profiling in young adults have suggested that a complex underlying molecular architecture of diabetes propensity and obesity may exist for over a decade prior to clinically overt diabetes [1, 2]. Furthermore, broad evidence supports the notion that structural phenotypes of ‘metabolic health’ (alterations in adipose tissue and muscle distribution/fat content) may inform diabetes risk before frank dysglycaemia occurs [3]. Nevertheless, most studies examining metabolite profiles have focused on mid-life, where much of the metabolic risk may already be ‘encoded’ in traditional clinical risk factors.

The primary hypothesis of this study was that metabolic phenotypes reflecting the earliest stages of diabetes development in young adulthood would identify biomarkers and pathways of diabetes risk. We establish molecular signatures of future diabetes risk based on association between circulating polar and lipid metabolites in young adulthood and multiple endophenotypes implicated in diabetes (e.g. adiposity, inflammation, muscle phenotypes) prospectively over a follow-up period of over 20 years, highlighting known and novel pathways implicated by associated metabolites. Finally, to study the early metabolic origins of diabetes, we investigate the association of these metabolite signatures with incident diabetes in two separate cohorts across the lifespan: a large cohort of young White and Black adults (Coronary Artery Risk Development in Young Adults study [CARDIA]); and a validation cohort of middle-aged adults (the Framingham Heart Study [FHS]).

Methods

The general flow of methods for the study is shown in Fig. 1. The datasets analysed during the current study are available from the CARDIA study (www.cardia.dopm.uab.edu) or FHS (www.framinghamheartstudy.org) directly.

Fig. 1
figure 1

Study scheme. See Statistical methods for further details

CARDIA study

CARDIA is a prospective cohort study of 5115 White and Black participants (age 18–30 years, recruited in 1985–1986 from four field centres: Birmingham, AL; Chicago, IL; Minneapolis, MN; Oakland, CA). Here, we quantified metabolites in 2358 individuals in a fasting state (>8 h) at the year 7 visit, selected across a spectrum of metabolic risk with cases additionally selected to enrich for individuals with incident cardiovascular disease during follow-up [4]. We excluded individuals who had diabetes (fasting blood sugar ≥7 mmol/l or self-reported diabetes at the study visit or any CARDIA study examination prior to year 7) and those who did not have contemporary (year 7) BMI, glucose, parental history of diabetes, or diabetes at or before year 7, yielding 2083 participants. The final study subsample had characteristics similar to those of the full cohort attending the year 7 visit (electronic supplementary material [ESM] Table 1). Assessment of standard cardiovascular risk factors as well as computed tomography (CT) characterisation of adipose, liver, and muscle tissue have been described previously [5,6,7].

Metabolite profiling was performed via LC-MS (Broad Institute, Cambridge, MA), in four platforms: (1) lipids (C8-positive); (2) a variety of amino acid metabolites, acylcarnitines, dipeptides and other cationic polar metabolites (hydrophilic interaction liquid chromatography [HILIC]-positive); (3) fatty acids, eicosanoids and bile acids (C18-negative); and (4) sugars, organic acids, nucleotides and anionic polar metabolites (HILIC-negative). Methods for metabolite quantification and analysis have been published previously [8, 9]. Metabolite profiling in the FHS has been described previously [1].

FHS

We studied 1797 participants from the FHS Second Generation (Offspring) cohort who underwent fasting metabolite profiling at the fifth examination cycle and were free of prevalent diabetes (defined as medication use for diabetes, fasting glucose ≥7 mmol/l or random glucose ≥11.1 mmol/l on or before examination 5; ESM Table 2) [1]. Metabolite profiling in the FHS was done previously in a case-cohort design, with a case–control match for selected conditions including incident diabetes to which a set of randomly selected control individuals was added [1]. For this study, we combined individuals across the subsamples. All individuals provided written informed consent, and CARDIA and FHS studies were approved by the Institutional Review Board at each participating institution.

Endophenotypes and clinical outcomes

Endophenotypes in CARDIA included waist circumference (at year 7, 10, 15, 20, 25 and 30 visits), C-reactive protein (year 7, 15, 20 and 25), IL-6 (year 20) and CT-based measures at year 25, including liver attenuation, pericardial fat volume, abdominal fat volume, visceral fat volume, subcutaneous fat volume, intermuscular fat volume, psoas fat and muscle volume, paraspinous fat and muscle volume, lateral oblique fat and muscle volume, rectus fat and muscle volume, and visceral/subcutaneous fat volume ratio. Of note, we included fat and muscle measures given their postulated connection to diabetes.

Our primary clinical outcome in CARDIA study participants was incident diabetes, defined as the composite of self-reported diabetes, fasting glucose ≥7 mmol/l or 2 h glucose of ≥11.1 mmol/l on an OGTT (if performed). Age of onset of diabetes was taken as self-reported age of onset of diabetes or, if not reported, age at first examination with self-reported diabetes or with plasma glucose measurements consistent with diabetes. Time to diabetes or censor was measured as the difference in age from the year 7 study visit to censoring (loss to follow-up) or to the onset of diabetes. Of note, while we were unable to differentiate type 2 diabetes from type 1 diabetes, most cases of diabetes identified at the period in adulthood that we have investigated is likely to be type 2 [10]. For FHS participants, our primary outcome for validation was diabetes (defined as medication use for diabetes, fasting glucose ≥7 mmol/l, random glucose ≥11.1 mmol/l or HbA1c ≥6.5%, where measured). Survival time was taken as the difference in age at the examination 5 study visit (for the Offspring cohort) to age at censoring or age of onset of diabetes.

Statistical analysis

Exposure preparation

CT fat and muscle volumes were scaled by height raised to the 2.7 power. We selected the power of 2.7 for allometric height indexing, as this minimised the relationship between height and individual CT measures (relative to a linear or 1.7 power), thereby making them less body-size dependent. After scaling for body size where appropriate, distributions of continuous phenotypes and covariates were graphically examined. Continuous endpoints and covariates were subjected to hyperbolic arcsine transformation to improve normality. All continuous phenotypes were then mean-centred and standardised for further analysis.

Single metabolite to endophenotype regression

The first step was to examine relationships between metabolites and endophenotypes on a single-metabolite basis (using linear or logistic regression). Six sets of nested models were constructed with varying combinations of adjustments (across age, sex, race, year 7 BMI, year 7 glucose, parental history of diabetes, year 7 waist circumference, year 7 C-reactive protein, and/or the Framingham Diabetes Score [coefficients taken from https://framinghamheartstudy.org/fhs-risk-functions/diabetes/, accessed 14 January 2021]). In each case, a Benjamini–Hochberg false discovery rate (FDR) correction was applied within each combination of endophenotype and set of adjustments to control for multiplicity.

Defining metabolite scores for diabetes

Metabolites across all four platforms were combined (duplicates included) to generate scores. To generate a multivariable score for diabetes susceptibility, we employed two approaches: (1) directly fitting a Cox survival model for incident diabetes; and (2) combining regression models that were fit on endophenotypes related to diabetes susceptibility to build a score for diabetes. In both approaches, we used least absolute shrinkage and selection operator (LASSO) regression to select metabolites most explanatory for each given outcome (either diabetes itself or each endophenotype), using repeated cross-validation to optimise hyperparameters. For the second approach, we then applied principal components analysis (PCA) to metabolite coefficients from LASSO models of endophenotypes to organise the strength of the metabolome–endophenotype relationships into a metabolic ‘signature.’ This two-staged approach (penalised regression followed by PCA) has been employed previously to generate summary scores based on high-dimensional metabolomic data across various, related clinical phenotypes [8]. The principal components that result are loaded on adiposity, muscle or inflammatory outcomes, with individual metabolites given a numerical weight for each principal component. Scores were then calculated as a linear sum of log-transformed and scaled metabolite level multiplied by its principal component-defined weight, across all metabolites separately for derived scores. Scores were then mean-centred/standardised for analysis. The relationship between these scores and incident diabetes was evaluated using Cox regression with the same adjustment models described above in single-metabolite regressions. Unadjusted survival plots across tertiles of the scores for each outcome were generated using the Kaplan–Meier method. Effect modification was evaluated for age, sex and race on each of these scores. Model R2 was computed using the method of Xu and O’Quigley, and C statistic was computed by the method of Harrell. Net reclassification improvement (NRI) was computed at the 75th percentile of follow-up time using methods for censored data.

FHS validation

Because not all of the identified metabolites included in scores in CARDIA were measured in the FHS validation cohort, we identified matching metabolites and then fit ‘reduced’ metabolite scores in CARDIA to apply to FHS using linear regression. In this approach, the ‘full’ scores in CARDIA were the response (dependent) variables and metabolites in the full score that were also available for validation in FHS were predictor (independent) variables. We measured correlation of this ‘reduced’ score against the ‘full’ score in CARDIA (derivation cohort) to test whether a score based on the smaller set of metabolites carried to validation would still be highly correlated with the full score. These reduced metabolite scores were subsequently computed in the FHS, standardised, and validated against incident diabetes using Cox regression, adjusted for age, sex, fasting glucose and parental history of diabetes and either BMI or waist circumference. We explored effect modification by age, sex and race (race in the CARDIA cohort only) using multiplicative interaction terms. We evaluated NRI and model discrimination and fit as in CARDIA.

R (version 4.1.0; R Foundation for Statistical Computing, Vienna, Austria) was used for statistical analyses. All models were built on complete cases. A two-sided p value less than 0.05 was considered significant, with type 1 multiplicity corrections using the FDR method, as specified above.

Results

Clinical features of the study sample

The characteristics of the CARDIA subsample at the time of metabolite collection (year 7) are shown in ESM Table 1. The CARDIA subsample for this study was composed of young adults with an approximately equal distribution by sex and race (mean ± SD age 32.1 ± 3.6 years; 43.9% women; 42.7% Black) and a mean ± SD BMI of 25.6 ± 4.9 kg/m2. The study sample exhibited a low prevalence of hypertension and normal fasting blood glucose (mean ± SD 5.0 ± 0.4 mmol/l) and lipid levels. The characteristics of the FHS subsample are shown in ESM Table 2. In general, included FHS participants were older (mean ± SD age 54.7 ± 9.7 years), heavier (mean ± SD BMI 27.4 ± 4.8 kg/m2) and had a worse cardiometabolic risk profile.

Metabolites associated with adiposity and proinflammatory endophenotypes identify pathways of diabetes susceptibility in young adulthood

A large number of significant metabolite associations was shared across proinflammatory endophenotypes (waist circumference, visceral and intermuscular adiposity, hepatic steatosis, inflammation; ESM Fig. 1). Branched-chain amino acids (muscle insulin resistance [11]), dimethylguanidinovaleric acid (hepatic steatosis [12,13,14]), and glutamate and tyrosine (pancreatic beta cell dysfunction and decreased insulin secretion [15]) were associated with greater adiposity and/or inflammation, while glycine (associated with decreased oxidative stress and diabetes risk [16,17,18]), glutamine [19] and 1-methylnicotinamide (anti-inflammatory [20]) were lower in individuals with greater visceral adiposity (Table 1). We also observed many lipids related to endophenotypes in previous reports in smaller or older populations, including proinflammatory bioactive sphingomyelins and ceramides [21, 22], long-chain triacylglycerols and diacylglycerols of low saturation, and lipid plasmalogens linked to inflammation or insulin resistance (e.g. C36:2 phosphatidylcholine [PC] plasmalogen [23]). In addition, we identified several metabolites associated with adiposity or inflammation in young adults that have not been widely reported previously, including microbe- or diet-derived metabolites (cinnamoylglycine [24], hippurate [25], trigonelline [26, 27]) and several biologically active phospholipids (lysophosphatidylcholines [LPCs] and lysophosphatidylethanolamines [LPEs] [28,29,30,31]).

Table 1 Clinical/pathogenic correlates of metabolites associated with diabetes and related phenotypes

The metabolome in young adulthood identifies distinct phenotypic axes relevant to diabetes pathogenesis

We next sought to identify integrated relationships across multiple metabolites and phenotypes to define metabolic predisposition to diabetes using a two-stage LASSO–PCA approach (Fig. 1). Results of the LASSO regression for each endophenotype (subset to the top 100 largest magnitude penalised regression coefficients across endophenotypes) are shown in Fig. 2 and visually suggest that clusters of metabolites may be concordantly related to distinct proinflammatory phenotypes. We describe two major ‘phenotypic principal components’ (Fig. 3a), explaining 59.2% of the total variance in the metabolome–endophenotype relationship. Principal component 1 appeared to reflect proinflammatory adiposity, with high loadings on waist circumference, C-reactive protein and IL-6, and visceral-like fat. Principal component 2 was highly loaded on muscle volumes and appeared to reflect an android tissue distribution. The weightings for the 100 most heavily weighted metabolites in principal components are shown in Fig. 3b; correlations between metabolite scores and individual phenotypes are shown in ESM Fig. 2. We observed a slightly lower proinflammatory adiposity score in White people relative to Black people, and a widely different distribution of android scores in men vs women (consistent with greater muscle mass in men; ESM Fig. 3). We observed a mild to moderate degree of correlation between the metabolite scores and BMI or waist circumference (9–56% variance explained by BMI or waist circumference for both scores; ESM Fig. 4). Metabolites associated with proinflammatory adiposity and android distribution (summarised in Table 1) identified both familiar and novel pathways relevant to diabetes across a wide array of diabetes pathophysiology, including gut microbial products, muscle tissue insulin resistance and fatty liver disease.

Fig. 2
figure 2

Heatmap of penalised regression coefficients (LASSO), representing the 100 largest magnitude penalised regression coefficients for metabolites across endophenotypes. Each column specifies a given endophenotype. Each row is a metabolite, ordered using complete-linkage clustering. Regression models are unadjusted and the metabolites shown are those that survive LASSO regression. These penalised regression coefficients therefore represent the joint multivariable relationship between the endophenotypes and the metabolome and are entered into PCA. This represents the first step in our analytical strategy (from Fig. 1). CE, cholesterol ester; DHA, docosahexaenoic acid; 9,10-diHOME, 9,10-dihydroxy-12-octadecenoic acid; DMGV, dimethylguanidinovalerate; ht., height; PE, phosphatidylethanolamine; PS, phosphatidylserine; SDMA, symmetric dimethylarginine; SM, sphingomyelin; TAG, triacylglycerol; Y, CARDIA examination year

Fig. 3
figure 3

PCA of penalised regression coefficients. This approach effectively reduces the dimensionality of the metabolome–endophenotype relation, described by principal components having numerical loadings to describe their dependence on metabolism and phenotype. (a) Loadings of the first two principal components in a varimax rotation PCA of these regression coefficients (59.2% total variance explained in the metabolome–endophenotype relationship) on each phenotype: principal component 1 was labelled ‘proinflammatory adiposity’, given its high loadings on visceral and visceral-like depots; principal component 2 was labelled ‘android’, given its high loading on muscle depots. (b) Weights of selected metabolites. All weights shown in ESM Table 3. For weights that would extend beyond the horizontal axis limits (−6 to +6), we indicate the number, and the bar is at the limit. These ‘weights’ are then multiplied by each respective metabolite level and added across all constituent metabolites to generate a metabolite-based score of proinflammatory adiposity or android-like body composition principal component. ADMA, asymmetric dimethylarginine; CE, cholesterol ester; 9,10-diHOME, 9,10-dihydroxy-12-octadecenoic acid; 12,13-diHOME, 12,13-dihydroxy-9-octadecenoic acid; DMGV, dimethylguanidinovalerate; ht., height; PE, phosphatidylethanolamine; SM, sphingomyelin; SDMA, symmetric dimethylarginine; TAG, triacylglycerol

Coefficients for metabolites for each score are provided in ESM Table 3, and full regression results are shown in ESM Tables 430.

Composite metabolite-based scores in young adulthood are associated with long-term diabetes development over >20 years

Over a median follow-up of 23 years (IQR 18, 23 years) in CARDIA study participants, we observed 239 incident diabetes cases. As expected, in models fit on incident diabetes within CARDIA (Cox LASSO), we observed strong associations with incident diabetes with all adjustments (Table 2). We examined the association of each metabolite-based score with diabetes and found that phenotype-based scores were prognostic for incident diabetes within CARDIA participants (Table 2, Fig. 4; HR of 2.10 per 1 SD higher proinflammatory adiposity score and HR of 2.24 per SD higher android score). The scores improved risk discrimination by C-index, increasing from 0.762 to 0.781 (p=0.02) for the proinflammatory adiposity score and to 0.778 (p=0.03) for the android score. Favourable continuous NRI range was strong for the proinflammatory adiposity score (NRI 0.404 [95% CI 0.253, 0.568]) and moderate for the android score (NRI 0.284 [95% CI 0.105, 0.417]), compared with a model with clinical risk factors including BMI (similar findings with models with waist circumference, Table 2). There was also a modest interaction between the two metabolomic scores developed on subclinical phenotypes in the CARDIA participants, with additive effects of adverse changes in both (p=0.01, ESM Fig. 5). No statistically significant interactions were seen in the scores with age, race or sex.

Table 2 Survival analysis for incident diabetes in CARDIA study participants
Fig. 4
figure 4

Diabetes-free survival in CARDIA study participants by metabolite-based scores for endophenotypes and diabetes. Survival free of incident diabetes over time in is shown by tertile of the diabetes (a), proinflammatory adiposity (b) and android metabolite (c) scores. The numbers in the table below the plots represent the numbers at risk. Tft, test for trend

We next proceeded to external validation of the metabolite scores in FHS participants. The Cox LASSO models fit on diabetes in CARDIA study participants were associated with diabetes in minimally adjusted (age- and sex-adjusted) but not fully adjusted models in FHS participants (Table 3). The distribution of each metabolite-based score across sex and age in FHS is shown in ESM Fig. 6, demonstrating similar patterns by sex and no large relationship with age. Correlation of the final (‘reduced’) scores for the android and proinflammatory scores were excellent in CARDIA study participants (R=0.97 for the final reduced vs full score in both cases). We replicated the relationship between the proinflammatory adiposity score and incident diabetes in the FHS participants (283 incident cases over a median of 18 years [IQR 12, 20]; ESM Fig. 6). The proinflammatory adiposity score was associated with increased risk of incident diabetes (HR per 1 SD higher score 1.33, p=0.0004) after adjustment for risk factors (Table 3), with improvement of discrimination based on C-index increase from 0.836 to 0.841 (p=0.05). The relationship of the android score with diabetes in FHS participants was significant in models adjusted for age and sex but not independently significant of all risk factors including BMI (in adjusted model: HR 1.19, p=0.09; Table 3). Interactions with age and sex were not significant.

Table 3 Survival analysis for incident diabetes in FHS participants

Discussion

The principal aim of this study was to identify comprehensive profiles comprised of known and novel metabolites associated with endophenotypes of diabetes susceptibility (e.g. systemic inflammation, body composition) in young Black adults and young White adults early in the course of disease. In addition to extending metabolites previously implicated in diabetes to a younger, biracial cohort (CARDIA study), we identified many novel metabolites in pathways implicated in diabetes not previously widely reported for further investigation (e.g. bioactive lipid species, microbe-derived/diet-derived metabolites). Multi-metabolite scores derived from phenotypes were independently associated with long-term diabetes development from young adulthood to mid-life and provided incremental risk stratification beyond clinical risk factors in the CARDIA study, with validation in an independent sample of individuals across a broad age range (FHS). Of note, the proinflammatory score was associated with diabetes across both cohorts, while the android score was only significant in the CARDIA study participants. In addition, metabolite scores fit directly on diabetes were associated with diabetes in the CARDIA study participants after full adjustment but not in the FHS participants. Collectively, the observation that metabolite-based signatures in young adulthood provide prognostic value for diabetes incremental to race, sex and traditional diabetes risk factors over two decades highlights a potential role for metabolic risk stratification at an early stage when intervention may be most likely to reap maximal benefits.

The current study offers several important areas of novelty to the burgeoning literature on translatable, precision approaches to diabetes prevention. First, quantification of the circulating metabolome in CARDIA study participants (a young adult sample of Black and White individuals at high lifetime metabolic risk) extends the current literature on metabolomic risk prediction to an early period in diabetes development. Indeed, much of the work in large cohorts examining metabolite profiles has focused on mid-life, where much of the metabolic risk may already be ‘encoded’ in traditional clinical risk factors. Accordingly, the predictive performance of prior multi-metabolite scores in diabetes prediction in large, heterogeneous community-based samples across a wide age spectrum has been modest, with limited added prognostic value for metabolites [1, 2]. Of note, recent seminal studies have suggested that metabolite-based diabetes prediction may offer greater value in individuals who are younger [32] or who are without prevalent dysglycaemia [33], possibly due to decreased sensitivity of clinical factors alone, differences among cohorts (race, age, geographic location), analytical follow-up or methodology. Indeed, seminal trans-cohort work by Ahola-Olli and colleagues [32] involving 229 metabolites (polar/lipid metabolites, fatty acids) measured by NMR demonstrated a similar effect size for diabetes, though restricted to White European study participants. The results in the CARDIA study participants are fundamentally useful: (1) they extend the utility of metabolite-based markers of diabetes susceptibility across race, sex and geography; and (2) they demonstrate further comprehensive discovery of many new metabolites and metabolic pathways (due to a broader metabolome coverage) not previously implicated in diabetes in human investigation. The consistency of our findings with previously published work (see review [34]) lends to the validity of the approach and the importance of the shared metabolites in the pathogenesis of diabetes. Further studies based on these results can therefore prioritise metabolites for downstream mechanistic discovery and clinical application, thereby limiting the cost and scope of metabolite profiling performed in the clinical context.

In addition to a broader metabolome coverage in a young adult cohort not previously reported, we used integrative statistical techniques to develop multi-metabolite scores rooted in precise measures of body composition, inflammation and anthropometry (endophenotypes) to focus discovery on preclinical, quantitative pathophysiology heterogeneous across individuals and central to diabetes. The use of continuous phenotypes as outcomes to guide metabolite discovery (as opposed to dysglycaemia alone) not only improves statistical power but also may uncover novel metabolite patterns related to diabetes susceptibility. Accordingly, many identified metabolites in the identified pathophenotypic principal components in the CARDIA study participants have not previously been widely associated with diabetes (Table 1). Certainly, this pattern is consistent with observations from human genetics suggesting that diabetes susceptibility loci may not necessarily reflect glycaemic traits alone [35]. Indeed, a heterogeneity of metabolic alterations that precede diabetes (e.g. microbial diversity, intestinal function, innate immunity) are uniquely captured by the approach here, highlighting opportunities for broad downstream mechanistic discovery.

Furthermore, apart from metabolites with well-known association with diabetes (e.g. branched-chain amino acids; Table 1), we uncovered metabolites implicated in traditional mechanisms of diabetes pathogenesis across multiple tissues, including muscle insulin resistance (e.g. asymmetric dimethylarginine [36, 37]), oxidative stress and inflammation (bilirubin/biliverdin [38], glycine [16, 17, 39]), and intestinal inflammation and permeability (asparagine [40]). Moreover, we identified several additional metabolite classes with a suspected mechanistic role in diabetes not previously widely reported, including microbe- or food-derived products and bioactive lipids. Sphingomyelins and ceramide species (mostly related to inflammation/adiposity in the CARDIA study) have been implicated in insulin resistance [41], though with variable associations in humans [42]. Several LPCs and LPEs (biologically active lipid byproducts of plasma membrane metabolism) were associated with inflammation and adiposity in CARDIA study participants. Despite known roles of some LPCs in innate immune activation [43], LPCs associated with lower inflammation or lower incident diabetes risk in CARDIA participants are modifiable with weight loss in studies of obesity (e.g. LPC 18:1, LPC 20:4 [44]), underscoring the complexity and context-dependence of lipid signalling. Select plasmalogens (a suspected buffer to proinflammatory lipid oxidation [45] associated with insulin resistance in animal models [23]) were also associated with adiposity phenotypes in the CARDIA study participants. Moreover, eicosanoid metabolic byproducts of arachidonic acid linked to diabetes complications [46, 47] were associated with diabetes in CARDIA study participants (e.g. 5-hydroxyeicosatetraenoic acid [5-HETE]). In addition, 12,13-diHOME (12, 13-dihydroxy-9-octadecenoic acid; a lipokine released by brown adipose tissue during exercise [9, 48], implicated in brown fat activation [49] and skeletal muscle metabolic efficiency [48]) also exhibited favourable weighting in the proinflammatory metabolite score. Finally, we identified microbe- and food-derived metabolites, some of which may be functional or modifiable by dietary intake, that are linked to obesity and insulin sensitivity (cinnamoylglycine, hippurate, trigonelline, histidine). Additional work to understand mechanistic implications of these physiologically plausible associations is warranted.

The novelty of this study comes in the uniqueness of the sample (one of the largest young White and Black American adult populations with decades of follow-up for events), broad metabolome coverage (permitting discovery of novel metabolite–phenotype associations central to diabetes risk) and statistical approaches used to unify phenotypes, metabolites and clinical risk. Nevertheless, these results have important limitations. While our replication was not across race (FHS), we observed no evidence of effect modification by race in CARDIA study participants, suggesting that the metabolic abnormalities present in young adulthood may be similarly associated with diabetes risk in Black adults and White adults. We acknowledge that the use of CT-defined fat distributions at year 25 to derive metabolite-based scores for diabetes that are then applied across the entirety of follow-up does introduce some bias, in that adiposity itself may influence and be influenced by metabolic dysfunction and diabetes. In addition, given that metabolites are quantified on a relative scale, clinical translation of metabolic risk scores require establishing absolute quantification, normative bounds across age and sex (among other confounders of interest) and testing across a broad array of samples. Nevertheless, identifying robust, minimal signatures of risk that are biologically plausible in the pathogenesis of diabetes is an important first step to prioritising those metabolites relevant for absolute quantification and further downstream validation. We recognise that the sample studied here did not have a high proportion of individuals with severe obesity and did not study longitudinal modification of activity and diet as a method to alter the metabolome and mitigate diabetes risk. Of note, recent reports suggest that acute exercise may alter the circulating levels of some metabolites implicated in diabetes risk [9]. Finally, using novel techniques, we observed validation of the proinflammatory (but not the android) score across both cohorts. These findings may represent the differences between the two cohorts (e.g. age, sex, distribution of adiposity), and further validation by different methods and in larger, diverse cohorts would be necessary.

In conclusion, in a large, biracial cohort of young adults, we used broad metabolite platform and integrative analytic approaches across multiple, diverse phenotypes of metabolic risk to uncover novel associations between the circulating metabolome and selected endophenotypes reflecting metabolic risk, specifying pathways of insulin resistance, oxidative stress, hepatic steatosis, and inflammatory lipid signalling. Multi-metabolite scores derived from these approaches (specifically proinflammatory adiposity) were associated with incident diabetes, suggesting the importance of inflammatory metabolic alterations early in adulthood to diabetes risk. Given the potential impact of nutrition- and activity-based interventions on several identified metabolic pathways, further work targeting precision clinical applications and molecular discovery based on metabolic profiles early in adulthood diagnostically and therapeutically are warranted.