Nature Medicine2024Full TextOpen AccessHighly Cited

Data-driven cluster analysis identifies distinct types of metabolic dysfunction-associated steatotic liver disease

Violeta Raverdy, Federica Tavaglione, Estelle Chatelain et al.

108 citations2024Open Access — see publisher for license terms1 related compound

Research Article — Peer-Reviewed Source

Original research published by Raverdy et al. in Nature Medicine. Redistributed under Open Access — see publisher for license terms. MedTech Research Group provides these references for informational purposes. We do not conduct original research. All studies are the work of their respective authors and institutions.

Abstract

Metabolic dysfunction-associated steatotic liver disease (MASLD) exhibits considerable variability in clinical outcomes. Identifying specific phenotypic profiles within MASLD is essential for developing targeted therapeutic strategies. Here we investigated the heterogeneity of MASLD using partitioning around medoids clustering based on six simple clinical variables in a cohort of 1,389 individuals living with obesity. The identified clusters were applied across three independent MASLD cohorts with liver biopsy (totaling 1,099 participants), and in the UK Biobank to assess the incidence of chronic liver disease, cardiovascular disease and type 2 diabetes. Results unveiled two distinct types of MASLD associated with steatohepatitis on histology and liver imaging. The first cluster, liver-specific, was genetically linked and showed rapid progression of chronic liver disease but limited risk of cardiovascular disease. The second cluster, cardiometabolic, was primarily associated with dysglycemia and high levels of triglycerides, leading to a similar incidence of chronic liver disease but a higher risk of cardiovascular disease and type 2 diabetes. Analyses of samples from 831 individuals with available liver transcriptomics and 1,322 with available plasma metabolomics highlighted that these two types of MASLD exhibited distinct liver transcriptomic profiles and plasma metabolomic signatures, respectively. In conclusion, these data provide preliminary evidence of the existence of two distinct types of clinically relevant MASLD with similar liver phenotypes at baseline, but each with specific underlying biological profiles and different clinical trajectories, suggesting the need for tailored therapeutic strategies.

Full Text

Abstract

Main

Nonalcoholic fatty liver disease, now referred to as metabolic dysfunction-associated steatotic liver disease (MASLD) 1 , 2 , is currently the most common chronic liver disease worldwide, with an estimated global prevalence of approximately 30% (ref. 3 ). MASLD comprises a spectrum of disorders ranging from isolated steatosis to metabolic dysfunction-associated steatohepatitis (MASH), ultimately leading to advanced fibrosis, cirrhosis and hepatocellular carcinoma 4 . However, not every individual diagnosed with MASLD will progress to MASH and later stages of liver disease, indicating the presence of a substantial interindividual variation in the disease progression 5 . Furthermore, MASLD harbors an increased risk of cardiovascular disease and type 2 diabetes 6 , 7 , which also widely varies among individuals. This interindividual variability in the severity and progression of MASLD and its extrahepatic consequences, together with the challenges of finding a specific drug treatment, highlight the need for more personalized approaches 8 – 10 . Given this context, advancements in diagnostic strategies for risk stratification and efficient testing of new drugs in at-risk populations are urgently needed 11 . Emerging evidence points to the clinical relevance of distinguishing different types of MASLD on the basis of distinct pathophysiological mechanisms and rates of disease progression 5 . For example, genetic predisposition to hepatic steatosis is associated with increased risk of liver-related events, while offering protection against coronary artery disease 12 , 13 . Specifically, PNPLA3 rs738409 (p.I148M), the strongest genetic variant predisposing to MASLD, is associated with a reduction in intrahepatic turnover of lipids droplets but is not causally linked to ischemic heart disease in individuals with MASLD 14 . In contrast, other mechanisms central to MASLD pathophysiology, such as hepatic de novo lipogenesis or adipose tissue dysfunction, have been associated with insulin resistance and a higher risk for type 2 diabetes and cardiovascular disease, but with only a moderate risk of liver-related events 10 . In the present study, we identified two types of MASLD by using a data-driven clustering approach focused on key hepatic and cardiometabolic traits. These two MASLD types have distinct biological profiles and risks for cardiometabolic disease and diabetes, despite having the same severity of MASLD on liver histology. We then clustered four independent cohorts of individuals at-risk for MASLD from Italy, Finland, Belgium and the United Kingdom, with consistent results, supporting the validity of the proposed clustering.

Results

Cluster analysis identifies two distinct types of MASLD Cluster analysis and identification of MASLD types were performed on the basis of the data of 1,389 French participants from the Atlas Biologique de l’Obésité Sévère (ABOS) cohort (Extended Data Fig. 1 ). Overall, we identified six clusters with distinctive patterns of the six clustering variables in the ABOS cohort (Fig. 1 ). We then added patients from three independent cohorts to these clusters, namely, the Universitair Ziekenhuis Antwerpen (UZA) cohort from Belgium ( n = 463), the Molecular Architecture of FAtty Liver Disease in individuals with obesity undergoing bAriatric surgery (MAFALDA) cohort from Italy ( n = 261) and the Helsinki cohort from Finland ( n = 375) (Extended Data Fig. 2 ). Due to the low number of participants in some individual clusters across cohorts, we pooled the three cohorts for the following analyses, resulting in a consolidated cohort of 1,099 individuals, referred to hereafter as the validation cohort (Fig. 1 ). Fig. 1 Characteristics of the six data-driven clusters in the ABOS cohort and in the validation cohort. a , b , The distribution of data-driven clusters in the ABOS cohort ( a ) and the validation cohort ( b ). c , d , Radar charts representing the median values of age, BMI, HbA1c, LDL, triglycerides and ALT for each cluster in the ABOS cohort ( n = 1,389) ( c ) and the validation cohort ( n = 1,099) ( d ). The dark gray line represents the 95th percentile observed in the ABOS cohort. e , f , Bar plots representing the proportion of patients with MASH at histology in the ABOS cohort ( n = 1,325) ( e ) and the validation cohort ( n = 1,099) ( f ). Statistical tests used include either a chi-squared test or Fisher’s exact test, both two-sided with Bonferroni correction. Significance levels are indicated as follows: *** P < 0.001, $ indicates P = 0.011 ( e ); $ indicates P = 0.0052, @ indicates P = 0.0046, *** P < 0.001 ( f ). g , h , Radar charts represent the proportion of patients with NAS ≥4, steatosis grade ≥1, lobular inflammation grade ≥1, ballooning grade ≥1, and fibrosis stage ≥1 and ≥2 for each cluster in the ABOS cohort ( g ) and the validation cohort ( h ). Fig. 2 Characteristics of the three clusters across the ABOS cohort, validation cohort and UK Biobank. a – i , Characteristics of the liver-specific, cardiometabolic and control clusters in the ABOS cohort ( a – c ), in the validation cohort ( d – f ) and in the UK Biobank ( g – i ). In a , d and g , the distribution of data-driven clusters is presented. The radar charts represent the median values of age, BMI, HbA1c, LDL, triglycerides and ALT for each cluster in the ABOS cohort ( b ), validation cohort ( e ) and UK Biobank ( h ). The dark gray line represents the 95th percentile observed in the ABOS cohort. The bar plots represent the proportion of patients with MASH at histology in the ABOS cohort ( n = 1325) ( c ) and the validation cohort ( n = 1,099) ( f ), or at-risk MASH on MRI in the UK Biobank ( n = 6,792) ( i ). Statistical tests used include either a chi-squared test or Fisher’s exact test, both two-sided with Bonferroni correction. Significance levels are indicated as follows: *** P < 0.001 ( c ); $ P = 0.0011, *** P < 0.001 ( f ); *** P < 0.001 ( i ). cT, iron-corrected T1; adj-p, adjusted P value. In the ABOS cohort, cluster 1 contained 18% of participants and was characterized by older age and hypertension; cluster 2 included 11% of participants and had the highest hemoglobin A1c (HbA1c), high triglycerides and hypertension; cluster 3 had 13% of participants, young age and the highest body mass index (BMI); cluster 4 had 26% of participants and the highest low-density lipoprotein (LDL) cholesterol levels; cluster 5 had 7% of participants and the highest alanine aminotransferase (ALT) levels; and cluster 6 had 24% of participants and a majority of females with a more favorable metabolic profile (Fig. 1 and Extended Data Table 1 ). Despite marked differences in age and prevalence of type 2 diabetes between clusters 2 and 5, liver histology revealed high prevalence of MASH and advanced fibrosis ( F ≥ 3) in these two subgroups, as compared with other clusters combined: 33.6% and 24.2% versus 5.0%, and 21.8% and 15.8% versus 3.4%, respectively (all adjusted P < 0.001 versus other clusters combined). To further examine the potential differences in mechanisms driving MASH, we pooled the clusters with lower severity of MASLD (clusters 1, 3, 4 and 6) in a ‘control’ cluster, which was compared with cluster 2 and cluster 5 (Fig. 2 and Table 1 ). Table 1 Patient characteristics based on cluster allocation in the ABOS cohort ( n = 1,389) Control Cardiometabolic Liver-specific Adjusted P Adjusted P cardiometabolic versus liver-specific Adjusted P cardiometabolic versus control Adjusted P liver-specific versus control N 1,132 158 99 − − − Clinical data Age (years) 41 (18) 52 (11.75) 37 (15) <0.001 <0.001 <0.001 0.75 Women (

The liver-specific cluster is enriched in at-risk genetic variants

MASLD has a strong genetic component with variants in PNPLA3 , TM6SF2 , MBOAT7 and GCKR accounting for a large fraction of its heritability and accelerating liver disease progression to MASH, cirrhosis and hepatocellular carcinoma 15 – 17 . We hypothesized that the liver-specific cluster could be enriched in these genetic variants. Therefore, we examined the difference of polygenic risk score of hepatic fat content (PRS-HFC) distribution in the liver-specific cluster 5 compared with the cardiometabolic and control clusters in ABOS, finding an enrichment of PRS-HFC in this cluster (adjusted P = 0.034 and adjusted P < 0.001 versus the cardiometabolic and control clusters, respectively) (Table 1 ). Results were similar when we considered only the PNPLA3 rs738409 variant ( P < 0.01 and P < 0.001 versus the cardiometabolic and control clusters, respectively) (Fig. 3 ). These results were confirmed in UK Biobank participants (Extended Data Table 2 ). Fig. 3 Genotype distribution of the PNPLA3 rs738409 C > G stratified by clusters in the ABOS cohort and UK Biobank and differential hepatic gene expression and plasma metabolomics across clusters in the ABOS cohort. a , Genotype distribution of the PNPLA3 rs738409 C > G stratified by clusters in the ABOS cohort. The bar graph shows the percentages of homozygotes (GG) and heterozygotes (CG) patients at risk across liver-specific (LS), cardiometabolic (CM) and control clusters. Statistical tests were chi-squared test or Fisher exact test as appropriate, two-sided with Bonferroni correction. Significance levels are indicated as follows: $ indicates P = 0.0079, *** P < 0.001. b , c , Differential hepatic gene expression and plasma metabolomics across clusters. The Euler diagrams illustrate the differential gene expression in liver tissue ( b ) and plasma metabolomics ( c ), across the three clusters: cardiometabolic (CM), liver-specific (LS) and control (CTRL). The sizes of the areas in the Euler diagram are proportional to the number of differentially expressed features they represent.

Risk of liver and cardiovascular outcomes, and type 2 diabetes

In the UK Biobank, individuals allocated in the six clusters exhibited similar characteristics to those observed in the ABOS cohort (Extended Data Table 2 and Extended Data Fig. 4 ). During a median (interquartile range) follow-up of 13.4 (12.6–14.1) years, there were 2,676 (1.12%) individuals who developed chronic liver disease, with the liver-specific and cardiometabolic clusters being the ones with the highest cumulative incidence (both P < 0.001 versus control cluster) (Fig. 4 and Extended Data Table 2 ). Following adjustment for age, sex and alcohol intake, the liver-specific and cardiometabolic clusters had a more than fourfold increased risk of chronic liver disease compared with the control cluster (adjusted hazard ratio (HR) 4.52, 95% confidence interval (CI) 3.88–5.26, P < 0.001, and adjusted HR 4.04, 95% CI 3.50–4.66, P < 0.001, respectively) (Fig. 4 ). Fig. 4 Cumulative incidence of chronic liver disease, cardiovascular disease and type 2 diabetes across clusters in the prospective UK Biobank. a – c , Cumulative incidence of chronic liver disease ( a ), cardiovascular disease ( b ) and type 2 diabetes ( c ) across clusters in the prospective UK Biobank. In each panel, the lines represent the cumulative incidence in the different clusters (cardiometabolic (CM) in red, liver-specific (LS) in blue and control in gray), with the shaded area representing 95% CI. HRs with 95% CIs and corresponding P value were calculated by Cox proportional hazards models for cardiometabolic (in red) and liver-specific (in blue) clusters versus control cluster (in gray), adjusted for age, sex and alcohol intake (g per day). Survival curves were compared using the pairwise log-rank test, with Holm correction. During a median (interquartile range) follow-up of 13.4 (12.7–14.1) years, there were 20,721 (10.59%) individuals who developed cardiovascular disease, with the cardiometabolic cluster being the one with the highest cumulative incidence: 21.88% in the cardiometabolic cluster versus 10.37% in the control cluster (HR 2.31, 95% CI 2.16–2.47; P < 0.001 versus control), and 9.52% in the liver-specific cluster (HR 0.91, 95% CI 0.82–1.00; P = 0.054 versus control) (Fig. 4 and Extended Data Table 2 ). When the analysis was adjusted for age, sex and alcohol intake, the cardiometabolic cluster had a significantly increased risk of experiencing cardiovascular disease compared with the control cluster (adjusted HR 1.80, 95% CI 1.68–1.93; P < 0.001), which was also significantly higher than the increase in risk of the liver-specific cluster compared with the control cluster (adjusted HR 1.18, 95% CI 1.07–1.31; P = 0.001) (Fig. 4 ). During a median (interquartile range) follow-up of 13.3 (12.6–14.1) years, there were 8,563 (4.35%) individuals who developed type 2 diabetes, with the cardiometabolic cluster being the one with the highest cumulative incidence ( P < 0.001 versus both liver-specific and control clusters) (Fig. 4 and Extended Data Table 2 ). Following adjustment for age, sex and alcohol intake, the cardiometabolic cluster had a nearly sevenfold increased risk of developing type 2 diabetes compared with the control cluster (adjusted HR 6.82, 95% CI 6.01–7.73; P < 0.001), which was higher than the increase in risk of the liver-specific cluster compared with the control cluster (adjusted HR 2.91, 95% CI 2.62–3.23; P < 0.001) (Fig. 4 ). Of note, a majority of participants from the cardiometabolic cluster also presented with type 2 diabetes, which may explain the higher risk of cardiovascular disease observed in this cluster. Likewise, the mean HbA1c level remained superior in the cardiometabolic cluster after excluding patients with preexisting type 2 diabetes for analyzing incident diabetes (Extended Data Table 2 ). However, adjusting for HbA1c did not fully remove the association of the cardiometabolic cluster with type 2 diabetes risk. Sensitivity analyses excluding individuals with BMI <27 kg m −2 or those with excessive alcohol consumption (>50/60 g per day for women/men) showed similar results to the main analysis (Extended Data Table 3 ). In summary, the cardiometabolic cluster had a higher risk of developing cardiovascular disease and type 2 diabetes, and a similar risk of developing chronic liver disease, as compared with the liver-specific cluster.

The added value of clustering beyond individual variables

We then explored the added value of the proposed clustering, beyond each of its individual components, to predict the various clinical outcomes. For that purpose, for each outcome, we first examined the overall predictive power of each variable of interest compared with clustering alone. No individual variable performed better than clustering at predicting simultaneously the three clinical outcomes (Extended Data Table 4 ). For example, ALT alone predicted incident chronic liver disease better than clustering, but clustering was superior at predicting cardiovascular disease. In contrast, HbA1c predicted incident cardiovascular disease better than clustering, but clustering performed better in the prediction of chronic liver disease. Likewise, among patients without diabetes at the time of inclusion, age, BMI, HbA1c, ALT and triglycerides performed better in predicting the risk of incident diabetes better than clustering alone. In contrast, clustering did better than LDL cholesterol alone at predicting all outcomes. Second, we performed multivariable analyses, in which the clustering model was first adjusted for sex, age and alcohol use, and second, one by one, ALT, HbA1c, triglycerides, BMI or LDL cholesterol (Fig. 5 ). Although in most cases the HR estimates of at-risk clusters were reduced after further adjustment for one other clustering variable, all values remained statistically significant compared with the control cluster in at least one at-risk cluster for each outcome. Collectively, these data show that clustering was superior to each individual variable in predicting simultaneously all three clinical trajectories. Fig. 5 Added value of the clustering model to predict cumulative incidence of chronic liver disease, cardiovascular disease and type 2 diabetes, among UK Biobank participants. a – c , Added value of the clustering model to predict cumulative incidence of chronic liver disease ( a ), cardiovascular disease ( b ) and type 2 diabetes ( c ), among UK Biobank participants. Multivariable analyses evaluating the predictive value of the clustering model adjusted for age/sex/alcohol, independently of each additional individual variable, are included in the clustering. TG, triglycerides. The dots represent the HR estimates, and the error bars represent the 95% CIs.

Differential liver transcriptomic analysis across clusters

To gain insights into the biological differences between the cardiometabolic and liver-specific clusters, we performed differential gene expression analysis in the liver in a subset of the ABOS cohort participants, including 97 individuals from the cardiometabolic cluster, 63 from the liver-specific cluster and 671 from the control cluster. The comparison of the cardiometabolic and the liver-specific clusters showed upregulation of genes involved in cholesterol metabolism and biosynthesis (for example, HMGCS1 , MVD , CYP51A1 , LSS , SC5D and LDLR ) and glycolysis (for example, ALDOC ) in the cardiometabolic cluster (Fig. 3 and Supplementary Table 1 ), which were identified as enriched pathways also by Gene Ontology biological processes (GO-BP) analysis, together with alcohol metabolic processes (Extended Data Fig. 3 ). The chitinase 3-like 1 ( CHI3L1 ) gene, linked to liver fibrogenesis 18 , was the most highly differentially expressed, possibly reflecting a slightly higher albeit not significantly different fibrosis stage in the individuals in this cluster as well as an older age (Table 1 ). Similar results were obtained when comparing the cardiometabolic and the control clusters, confirming the upregulation of genes involved in cholesterol metabolism and synthesis in the cardiometabolic cluster (Extended Data Fig. 3 ), mirroring the higher metabolic dysfunction, type 2 diabetes and cardiovascular risk observed in this cluster. When comparing the liver-specific and the control clusters, we observed upregulation of genes involved in lipid droplet homeostasis and intrahepatic lipid transport, including FABP4 and FABP5 , in the liver-specific cluster. This cluster also showed upregulation of genes implicated in inflammation, including CXCL9 and SPP1 , and liver carcinogenesis, including ANXA2P1 and HULC (Extended Data Fig. 3 and Supplementary Table 1 ). GO-BP analysis confirmed these results, showing an upregulation of lipid localization, immunoregulatory, inflammatory and wound healing processes 19 and mirroring the elevated liver enzymes observed in this cluster as well as a higher risk of progressive liver disease in UK Biobank (Extended Data Fig. 3 ).

Differential metabolomic analysis across clusters

To further elucidate biological differences between the cardiometabolic and liver-specific clusters, we analyzed the metabolomics data available in ABOS (Fig. 3 ). When comparing the cardiometabolic and liver-specific clusters, we observed increased concentrations of carbohydrates in the cardiometabolic cluster (Extended Data Fig. 3 ), reflecting the dysglycemic state (Table 1 ). However, most differences concerned amino acid and lipid metabolites, and particularly the amino acid metabolites tyramine O -sulfate, homocitrulline, p -cresol glucuronide, phenylacetylglutamine, phenylacetylglutamate, 4-hydroxyphenylacetylglutamine, 4-hydroxyphenylacetate and imidazole propionate, previously associated with the gut microbiota 20 – 22 , had the highest and most significant increase in the cardiometabolic cluster. Deoxycholate, a secondary bile acid, was also elevated, suggesting changes in lipid metabolism and liver function. These metabolites were also differentially abundant between the cardiometabolic and control clusters (Extended Data Fig. 3 and Supplementary Table 1 ) and, therefore, probably linked to the dysmetabolic state. Differences were also observed in the comparison between the liver-specific and control clusters, with elevated levels of 5α-androstan-3α,17β-diol monosulfate, its disulfate form, glycoursodeoxycholic acid sulfate, and taurochenodeoxycholic acid 3-sulfate suggesting changes in steroid processing. Furthermore, higher levels of ursodeoxycholate, glycochenodeoxycholate glucuronide and glycochenodeoxycholate 3-sulfate and decreased levels of cysteine-glutathione disulfide were observed in both the liver-specific and cardiometabolic clusters compared with the control cluster (Extended Data Fig. 3 and Supplementary Table 1 ). Possibly linked to oxidative stress and liver function, we observed decreased levels of cysteine-glutathione disulfide both in the liver-specific and in the cardiometabolic cluster compared with the control cluster, thus indicating that reduced antioxidant capacity might be a common feature in the two MASH subtypes or a consequence of the severe phenotype. Taken together, these transcriptomics and metabolomics analyses support the existence of two biologically distinct types of severe MASLD.

Molecular features of the cardiometabolic cluster versus dysglycemia

Since a majority of individuals in the cardiometabolic cluster have type 2 diabetes, we also investigated if the molecular features of that cluster differ from those merely associated with dysglycemia. For that purpose, we analyzed liver gene transcripts and metabolites that were differentially abundant between the cardiometabolic cluster versus the control cluster, as compared with those that were differentially abundant between individuals with type 2 diabetes versus nondiabetic controls. We found that the cardiometabolic cluster differentially exhibited a set of 199 unique liver transcripts that were not overexpressed in the type 2 diabetes group, indicating a distinctive transcriptional signature corresponding to 58 pathways expressed in the cardiometabolic cluster but not present in the type 2 diabetes group. Specifically, the cardiometabolic cluster shows distinct molecular pathways that involve unique aspects of lipid transport and metabolism, immune response modulation, oxidative stress and extracellular matrix remodeling, suggesting a heightened state of metabolic activity and cellular defense, as well as active involvement in managing inflammation (Supplementary Table 1 ). Regarding metabolites, our analyses also revealed a significant overlap between type 2 diabetes and cardiometabolic cluster, with 151 metabolites that were differentially abundant in both subgroups, many being directly linked to dysglycemia, such as monosaccharides and disaccharides (for example, glucose and sucrose). However, we identified a distinctive subset of 88 metabolites unique to the cardiometabolic cluster. These ‘cardiometabolic-specific’ metabolites include glycerophospholipids, sphingolipids, amino acid derivatives, protein metabolism and metabolites of bile acids unveiling a metabolic signature particular to this cluster at risk for MASH. These metabolites highlight disturbances in lipid processing, protein and energy metabolism, inflammatory profile and potential gut microbiome interactions that are not present in the type 2 diabetes profile (Supplementary Table 1 ).

Discussion

In the present study, using unsupervised hard clustering, we identified two distinct endotypes of at-risk MASLD, namely, cardiometabolic MASLD and liver-specific MASLD. Both types were characterized by a severe liver phenotype at baseline; however, they showed different underlying biological profiles and distinct clinical progression patterns. These two newly defined types of MASLD could be robustly identified in several independent and well-characterized cohorts, using a simple algorithm based on six widely available traits: age, BMI, HbA1c, ALT, LDL cholesterol and triglycerides ( https://ulr-metrics.univ-lille.fr/masldclusters/ ). The two types of at-risk MASLD could not be distinguished by their liver phenotype assessed by histology nor by MRI, and they were both associated with an increased risk of incident chronic liver disease. The cardiometabolic MASLD was, however, specifically characterized by a higher prevalence of dyslipidemia, hypertension and dysglycemia, resulting in a high risk of incident cardiovascular disease and type 2 diabetes. In contrast, the liver-specific MASLD was characterized by a more pronounced elevation of liver enzymes at a younger age and showed limited risk of diabetes progression and incident cardiovascular disease. The liver-specific MASLD was also characterized by a specific genetic background with a higher frequency of the minor allele of PNPLA3 rs738409 and a higher polygenic risk score for hepatic fat content. Importantly, the proposed clustering outperformed its individual components in simultaneously predicting liver phenotype and future risk of the different clinical outcomes. As expected, several individual continuous variables also showed a good predictive value for predicting specific clinical outcomes in the overall UK Biobank population, namely, ALT for chronic liver disease and HbA1c for cardiovascular disease and incident diabetes. In contrast, the clustering approach surpassed all individual variables for simultaneously predicting the three outcomes. Of note, after adjustment for ALT in multivariable analysis, the risk of chronic liver disease became lower in the liver-specific cluster than in the control cluster, while it remained increased in the cardiometabolic cluster. Confirming the strong association between the risk of liver disease and ALT in the liver-specific cluster, this result also indicates that ALT may overestimate the risk of chronic liver disease when other clustering variables are not considered. Similarly, the positive association between the cardiometabolic cluster and cardiovascular risk became negative after adjustment for HbA1c, suggesting that HbA1c alone may overestimate the risk of cardiovascular disease, in which other clustering variables such as triglycerides or age may favor cardiovascular disease, independently of dysglycemia. Finally, in the liver-specific cluster, the elevated risk of incident diabetes was eliminated after adjustment for ALT, underlying the specific role played by the liver in the physiopathology of dysglycemia 23 . Taken together, our findings highlight the potential of clustering to provide a more comprehensive risk assessment, identifying patients at risk for a range of liver and cardiometabolic diseases rather than focusing on a single condition. In addition, the resulting assignment of individuals into two clearly labeled clusters of at risk MASLD facilitated the exploration of their biological nature. Specifically, the cardiometabolic cluster exhibited unique liver gene transcripts and pathways not present in type 2 diabetes, involving lipid transport, immune response and inflammation and vascular function-related pathways. In addition, metabolomic analyses identified numerous metabolites common to both type 2 diabetes and the cardiometabolic cluster, mostly linked to dysglycemia but also some metabolites uniquely associated with the cardiometabolic cluster. These unique metabolites, including glycerophospholipids, sphingolipids and bile acid metabolites, indicate specific disturbances in lipid processing, protein and energy metabolism, and inflammation. The cardiometabolic cluster was also characterized by an increase of several gut microbiota metabolites previously linked to insulin resistance and diabetes pathogenesis, such as imidazole propionate, p -cresol glucuronide, phenylacetylglutamine, 4-hydroxyphenylacetylglutamine and phenylacetylglutamate 20 – 22 . Similarly, higher levels of p -cresol glucuronide and 4-hydroxyphenylacetylglutamine have been linked to cardiovascular toxicity and mortality 22 , 24 , 25 . These metabolites, which are produced by the gut microbiota from aromatic amino acids, might explain at least in part the increased cardiovascular risk observed in this cluster. In contrast, the liver-specific MASLD was more related to changes in lipid metabolism confined to the hepatocyte, in line with its specific genetic background. In this study we identify distinctive endotypes of at-

Methods

Study cohorts ABOS cohort ABOS is a prospective study ( NCT01129297 ) aiming to identify the key factors influencing the outcomes of bariatric surgery. A total of 1,545 participants enrolled between 2006 and 2021 at the Lille University Hospital, Lille, France, were included in the present analysis. All individuals provided written informed consent before inclusion. Ethical approval for the study was granted by the Comité de Protection des Personnes Nord Ouest VI (Lille, France). Demographic characteristics, anthropomorphic measurements, medical history, concomitant medication and laboratory tests were collected before surgery as previously described 37 – 40 . A 75 g oral glucose tolerance test was performed after overnight fasting at baseline and 1 year after surgery. Type 2 diabetes status was defined at baseline on the basis of a previous history of diabetes, use of antidiabetic medications, fasting plasma glucose ≥126 mg dl −1 (7.0 mmol l −1 ) and/or 2 h plasma glucose ≥200 mg dl −1 (11.1 mmol l −1 ) during oral glucose tolerance test, and/or HbA1c ≥6.5% (48 mmol l −1 ) 41 . Liver histology was obtained at baseline through a percutaneous liver needle biopsy performed during surgery as previously described 42 – 44 . All liver biopsies were analyzed at Lille University Hospital by two expert liver pathologists, according to the NASH Clinical Research Network (NASH CRN) scoring system, as previously described 45 , 46 . Briefly, pathologists were blinded to the patient’s clinical and biological data. The reports were drawn up using a standardized template adapted to the recommendations of the NASH CRN group. All biopsies obtained before 2011 were reanalyzed and adapted to NASH CRN recommendations. Liver biopsies from patients with ‘borderline NASH’ histology, or with borderline size or length, were reanalyzed by two expert pathologists. The diagnosis of MASH was made by pathologists in the simultaneous presence of steatosis, inflammation and ballooning. Disease activity was subsequently graded with the nonalcoholic fatty liver disease activity score (NAS) according to specific histological features, as the unweighted sum of the scores for steatosis (0–3), lobular inflammation (0–3) and ballooning (0–2) ranging from 0 to 8. Liver fibrosis was scored from F0 to F4 (ref. 45 ).

UZA cohort

The UZA cohort included 467 patients referred to the Obesity Clinic at Antwerp University Hospital, Edegem, Belgium, for suspected MASLD based on imaging and biochemistry data. The collection of clinical, anthropometric and histological data has been previously described 47 , 48 . A percutaneous or laparoscopic-guided percutaneous liver needle biopsy was performed on participants with overweight/obesity as part of the Hepatic and Adipose Tissue and Functions in Metabolic Syndrome (HEPADIP) study (Belgian registration number B30020071389, Antwerp University Hospital File 6/25/125) as previously described 47 . Liver histology was assessed according to the NASH CRN 45 , 46 . Individuals with alcohol consumption above 30/20 g per day in men/women were excluded from the analysis. Written informed consent was obtained from all patients in both cohorts, and the studies were conducted in conformity with the Declaration of Helsinki.

MAFALDA cohort

A total of 264 participants with liver biopsy data from the MAFALDA cohort were included in the analyses 49 . Briefly, consecutive individuals with morbid obesity eligible for bariatric surgery were recruited from May 2020 to June 2021 at Fondazione Policlinico Universitario Campus Bio-Medico, Rome, Italy. Preoperative clinical and laboratory data were collected using standardized procedures. An intraoperative liver biopsy was obtained. Liver histology was assessed according to the NASH CRN 45 , 46 , as described above. Individuals with alcohol consumption above 30/20 g per day in men/women were excluded from the analysis. The MAFALDA study has been approved by the Local Research Ethics Committee (no. 16/20), and it was conducted in accordance with the principles of the Declaration of Helsinki. All participants gave written informed consent to the study.

Helsinki cohort

The Helsinki cohort enrolled 343 consecutive individuals with morbid obesity eligible for bariatric surgery and 42 consecutive individuals with a BMI ≥25 kg m −2 undergoing liver biopsy for suspected MASH, all recruited between 2006 and 2018 at the Helsinki University Hospital, Helsinki, Finland. A week before the liver biopsy, participants underwent clinical examination and blood sampling as previously described 50 . Liver histology was assessed according to the NASH CRN 45 , 46 , as described above. Individuals with alcohol consumption above 30/20 g per day in men/women were excluded from the analysis. The study was approved by the Local Research Ethics Committee at Helsinki University Hospital. All participants gave written informed consent to the study.

UK Biobank cohort

The UK Biobank is a large prospective cohort study recruiting approximately 500,000 participants (age 40–69 years) between 2006 and 2010 throughout the United Kingdom 51 . Clinical and laboratory data were collected using highly standardized procedures. Medical diagnoses were obtained through linkage of hospital admissions, death and cancer registers from the National Health Service records (data fields 41270, 40001, 40002 and 40006). The UK Biobank study has been approved by the NorthWest Multicenter Research Ethics Committee (no. 21/NW/0157). All participants gave written informed consent to the study. Data used in this study were obtained under application number 37142. In the current study, we selected unrelated UK Biobank participants of European ancestry on the basis of our quality control pipeline, which has been described in detail previously 15 , 52 , 53 , and we included individuals with BMI ≥25 kg m −2 and/or with type 2 diabetes as defined elsewhere 15 . Participants were scanned at the UK Biobank Imaging Centre in Cheadle (United Kingdom) using a Siemens 1.5T MAGNETOM Aera as described in detail elsewhere 54 , 55 . Briefly, a shortened modified look locker inversion (ShMOLLI) was used to quantify liver T1, and a multi-echo-spoiled gradient echo was used to quantify liver iron and fat. Data were analyzed using LiverMultiScan Discover 4.0 software. Hepatic steatosis was defined by PDFF >5.5%) (ref. 54 ), MASH by PDFF >5.5% and iron-corrected T1 mapping (cT1) by >800 ms (refs. 54 , 56 ).

Cluster analysis

Six variables associated with MASLD physiopathology and increased risk of MASH were selected for clustering in ABOS, namely, age, BMI, HbA1c, ALT, LDL cholesterol and circulating triglycerides. Cluster analysis and identification of MASLD subtypes were performed on 1,389 ABOS participants (Fig. 1 ), after the exclusion of 54 patients for self-declaration alcohol consumption above 50/60 g per day for women and men, respectively, at the first visit, to avoid any risk of inclusion of patients with alcohol-related liver disease; 58 participants for a BMI ≤30 kg m −2 ; 27 participants for missing values in clustering traits (that is, age, BMI, HbA1c, ALT, LDL cholesterol and circulating triglycerides); and 17 participants having absolute standardized values of 5 or higher in at least one of the clustering traits (Extended Data Fig. 1 ). The analysis was performed using the partitioning around medoids method in R (package ‘cluster’, version 2.1.4) 57 , which is a more robust version of k -means clustering. Distances were computed as Euclidean distances using standardized variables scaled to a mean of 0 and a standard deviation of 1. To estimate the optimal number of clusters, we evaluated the silhouette widths 58 for each clustering, varying the number of clusters going from three clusters to ten clusters. We determined the optimal number of clusters by choosing the configuration that yielded the highest silhouette coefficients, signifying well-delineated clusters whose members are closely related to one another and distinctly separate from individuals in other clusters. We then assessed the stability of the resulting clusters using the R function clusterboot from the fpc package (v.2.2-12), by resampling 2,000 times the original data and computing the Jaccard similarities of the original clusters to the most similar clusters in the resampled data. The mean (standard deviation) Jaccard-similarity measure was 0.73 (0.07) across all clusters. Data from the UZA, MAFALDA and Helsinki cohorts were normalized using ABOS values for centering and scaling. Then, participants were allocated to the cluster they were most similar to after the exclusion of participants having absolute standardized values of 5 or higher in at least one of the clustering traits, calculated as their Euclidean distance from the nearest cluster medoid derived from ABOS coordinates. Data from the UK biobank cohorts were normalized using ABOS values for centering and scaling. Participants were allocated to the cluster they were most similar to after the exclusion of those with self-reported history or medical diagnosis of other causes of liver disease, with a medical diagnosis of the target longitudinal outcome at baseline, or having absolute standardized values of 5 or higher in at least one of the clustering traits, calculated as their Euclidean distance from the nearest cluster medoid derived from ABOS coordinates. The Calinski–Harabasz Index was 263 for the ABOS cohort and reached 174 in the validation cohort, indicating well-defined clusters and confirming the transportability of the proposed stratification in diverse populations. In the UK Biobank cohort, encompassing a broader BMI range and less clinically extreme cases, the Calinski–Harabasz Index increases even further to 18,774, probably due to the larger and more diverse sample size.

Visualizing individual risk in relation to their phenotype

As a potential aid for assisting clinicians in defining individual profiles of patients with MASLD, we developed an app ( https://ulr-metrics.univ-lille.fr/masldclusters/ ).

Genotyping

In the ABOS cohort, genotyping was available for 1,259 participants and was performed using the Illumina Infinium assay 59 . This analysis was conducted at the SNO&SEQ Technology Platform, Molecular Medicine, BMC, Husargatan 3, Uppsala, Sweden. Results were analyzed using the software GenomeStudio 2.0.3. The following variants were assessed: PNPLA3 rs738409 C > G (p.I148M), TM6SF2 rs58542926 C > T (p.E167K), MBOAT7 rs641738 C > T and GCKR rs1260326 C > T (p.P446L). In the UK Biobank, genotyping was available for approximately 490,000 individuals and was performed using two similar genotyping arrays (that is Affymetrix UK BiLEVE and UK Biobank Axiom arrays) as described elsewhere 60 . The following variants were assessed: PNPLA3 rs738409 C > G (p.I148M), TM6SF2 rs58542926 C > T (p.E167K), MBOAT7 rs641738 C > T and GCKR rs1260326 C > T (p.P446L). The PRS-HFC was computed according to the originally reported formula 61 .

Long-term longitudinal outcomes

We analyzed the risk of developing hepatic and extrahepatic outcomes and overall mortality in the UK Biobank cohort. To estimate the incidence of liver outcomes, we selected 213,180 individuals without self-reported history or medical diagnosis of any liver disease (International Classification of Diseases 10th edition (ICD-10) B18, B19, C22.0, E83.0, E83.1, E88.0, I82.0, I85.0, I85.9, K70, K71, K72.1, K72.9, K74.1, K74.2, K74.3, K74.4, K74.5, K74.6, K75.2, K75.3, K75.4, K75.8, K75.9, K76.5, K76.6, K76.7, K76.8, K76.9, K83.0, R18 and Z94.4) at baseline and identified those who developed chronic liver disease (ICD-10 C22.0, I85.0, I85.9, K70, K72.1, K72.9, K73, K74.0, K74.1, K74.2, K74.6, K76.0, K76.6, K76.7, K76.8, K76.9 and Z94.4) across the clusters. Participants were excluded from the analyses if they received a medical diagnosis of competing liver diseases (ICD-10 B18, B19, E83.0, E83.1, E88.0, I82.0, K71, K74.3, K74.4, K74.5, K75.2, K75.3, K75.4, K75.8, K75.9, K76.5 and K83.0) before the diagnosis of liver outcome. To estimate the incidence of cardiovascular outcomes, we selected 195,739 individuals without self-reported history or medical diagnosis of chronic viral hepatitis (ICD-10 B18 and B19), other causes of liver disease (ICD-10 E83.0, E83.1, E88.0, I82.0, K70, K71, K74.3, K74.4, K74.5, K75.2, K75.3, K75.4, K75.8, K75.9, K76.5, K76.8, K76.9 and K83.0) and cardiovascular disease (ICD-10 I20–I25, I60–I64, I69 and G45) at baseline, and identified those who developed cardiovascular disease across the clusters. To estimate the incidence of type 2 diabetes, we selected 196,791 individuals without self-reported history or medical diagnosis of chronic viral hepatitis (ICD-10 B18 and B19), other causes of liver disease (ICD-10 E83.0, E83.1, E88.0, I82.0, K70, K71, K74.3, K74.4, K74.5, K75.2, K75.3, K75.4, K75.8, K75.9, K76.5, K76.8, K76.9 and K83.0) and type 2 diabetes as defined elsewhere 53 at baseline, and identified those who developed type 2 diabetes (ICD-10 E11 and E14) across the clusters. Detailed information about the UK Biobank methods and clinical diagnosis is provided in Supplementary Table 2 .

Liver transcriptomic data generation and normalization

Liver transcriptomic data were available for a subset of 831 participants from the ABOS cohort, as previously described 62 . Total RNA was extracted from 30 mg frozen liver biopsies for Affymetrix microarray analysis using TRIzol reagent (Thermo Fisher Scientific), followed by purification on RNeasy columns (Qiagen). RNA purity and quantity were assessed using a Nanodrop spectrometer (Thermo Fisher Scientific). RNA integrity was quantified using the Agilent RNA6000 Nano assay and an Agilent 2100 BioAnalyzer. Raw data from Affymetrix microarrays were first processed with robust multi-array average (RMA) with GC correction and scale intensities (CG-RMA-scale) as a normalization method.

Metabolomic data generation and normalization

In the ABOS cohort, nontargeted global metabolomic analysis was performed on plasma samples in 1,322 participants by Metabolon, using two independent platforms: ultrahigh performance liquid chromatography/tandem mass spectrometry optimized for basic species or acidic species, and gas chromatography–mass spectrometry. Raw data for metabolomics were transformed using log transformation and imputation with minimum observed values for each compound.

Statistical analysis

Data were reported as median (interquartile range) for continuous variables and frequencies (percentages) for categorical variables. Clusters were compared using the Kruskal–Wallis test, chi-squared test or Fisher’s exact test, as appropriate. Raw P values were adjusted for multiple testing separately for clinical data, histological data and genetic data. To control the family-wise error rate, the Bonferroni method was used. Differences were considered statistically significant when adjusted P value(s) were less than 0.05. For statistically significant variables, post hoc analysis was performed comparing pairwise MASH-enriched MASLD clusters (2 and 5) and the combined nonenriched MASLD clusters (1, 3, 4 and 6) using the Dunn test, chi-squared test or Fisher’s exact test, as appropriate, with Bonferroni adjustment. Differential analysis of liver transcriptomic across the clusters was performed using moderated t -tests from the R Bioconductor package Limma v.3.60.4. The same methodology was also applied to metabolomic after exclusion of xeniobiotics. Differences were considered statistically significant when P value(s) adjusted for multiple comparisons using the Benjamini–Hochberg correction (to control the false discovery rate) were less than 0.05 and the absolute value of log 2 fold change was greater than 0.26. Group comparisons for genes were represented using volcano plots. The number of differentially expressed genes between the various clusters were reported through Euler diagrams. Pathway enrichment on the transcriptome was performed with the R package ClusterProfiler (v.4.7.1), based on GO-BP pathways. The GSEA method was run with the absolute value of the moderated t -test statistic as ranking metric. The P values of enriched pathways were adjusted using the Benjamini–Hochberg procedure, and an adjusted P value <0.05 was considered significant. In the UK Biobank, clusters were compared using analysis of variance, Kruskal–Wallis test, chi-square test or Fisher’s test as appropriate, adjusted for multiple testing separately for clinical data and genetic data, using the Bonferroni method. Similarly, post hoc comparisons were carried out with Bonferroni correction. The incidence of chronic liver disease, cardiovascular disease and type 2 diabetes were defined as the composite occurrence of the clinical event or event-related death during follow-up. Then, the cumulative incidence of the clinical outcomes was computed according to the Aalen–Johansen method for chronic liver disease, cardiovascular disease and type 2 diabetes, taking into account the competing occurrence of other-cause death, and of selected liver disease (only in the case of chronic liver disease; see above for ICD-10 codes). Cause-specific HRs were calculated through Cox regressions, adjusted for age, sex and alcohol intake. The proportional hazard assumption was verified through the inspection of the Schoenfeld residuals. Sensitivity analyses were performed (1) including only individuals with BMI ≥27 kg m −2 and (2) excluding those with harmful alcohol consumption (>50/60 g per day for women/men). Statistical analyses and graphical representations were performed using R statistical software v.4.4.1 (R Foundation for Statistical Computing, Vienna, Austria).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Online content

Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41591-024-03283-1.

Supplementary information

Supplementary Information Supplementary Table 1. The liver gene expression and plasma metabolites with significant differences between the cardiometabolic and liver-specific clusters (A), cardiometabolic and control (B) and liver-specific and control (C). Metabolites with significant differences between the cardiometabolic and liver-specific clusters (D), cardiometabolic and control (E) and liver-specific and control (F). Molecular features that were differentially expressed between type 2 diabetes and non-T2D groups, and between cardiometabolic and control clusters (G). Supplementary Table 2. Definition of self-reported history of liver disease, cardiovascular disease and type 2 diabetes (UK Biobank data-field 20002) and ICD-10 codes used to define liver disease, cardiovascular disease and type 2 diabetes. Reporting Summary

Extended data

Extended Data Table 1 Patient characteristics based on cluster allocation in the ABOS cohort (n=1,389) Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Adj-p Cluster 1–6 N 256 158 180 361 99 335 − Clinical data Age (years) 53 (10) 52 (11.75) 34 (16) 46 (11) 37 (15) 30 (10) <0.001 Women n (%) 175 (68.4) 86 (54.4) 123 (68.3) 290 (80.3) 57 (55.6) 310 (92.5) <0.001 BMI (Kg/m2) 45.5 (7.5) 44.85 (9.4) 59.7 (7.93) 44.4 (7.2) 43.9 (6.5) 43.8 (5.9) <0.001 Waist circumference (cm) 140 (19.75) 134 (20.5) 161 (20.25) 138 (17) 137 (15) 139 (14) <0.001 Significant alcohol intake 1 (n) 1 9 (6.9) 7 (8.0) 6 (6.0) 15 (8.4) 3 (6.7) 12 (6.7) 1 Glucose profile HbA1c (%) 6.2 (1.03) 9.2 (2.28) 5.8 (0.8) 5.8 (0.7) 5.9 (1.05) 5.4 (0.5) <0.001 Fasting glucose (mmol/L) 6.1 (1.79) 10.24 (5.3) 5.49 (1.22) 5.55 (1.14) 5.83 (1.72) 5.11 (0.61) <0.001 Fasting insulin (UI/L) 2 13.9 (11.7) 15.1 (16.05) 16.75 (10.35) 13.7 (9.8) 19.65 (15.23) 13.7 (9.62) <0.001 Lipid profile Total cholesterol (mmol/L) 4.37 (0.84) 4.47 (1.33) 4.7 (0.96) 5.86 (0.86) 5.09 (0.89) 4.6 (0.89) <0.001 HDL cholesterol (mmol/L) 1.16 (0.36) 0.98 (0.29) 1.11 (0.31) 1.16 (0.31) 1.01 (0.31) 1.14 (0.34) <0.001 LDL cholesterol (mmol/L) 2.47 (0.75) 2.53 (1.05) 2.97 (0.81) 3.85 (0.7) 3.33 (0.9) 2.9 (0.77) <0.001 Triglycerides (mmol/L) 1.4 (0.77) 2.34 (1.56) 1.27 (0.68) 1.49 (0.73) 1.61 (0.8) 1.11 (0.6) <0.001 Liver function tests AST (UI/L) 22 (10) 30 (18) 22 (11) 23 (8) 44 (20.75) 21 (9) <0.001 ALT (UI/L) 25 (15) 39 (26) 26 (17) 26 (14) 75 (26.5) 21 (15) <0.001 GGT (UI/L) 31 (24.25) 58 (71.75) 28.5 (21.25) 30 (25) 53.5 (47.75) 22 (16) <0.001 Comorbidities Hypertension n (%) 201 (78.5) 138 (87.3) 109 (60.6) 200 (55.4) 55 (55.6) 107 (31.9) <0.001 Type 2 diabetes n (%) 140 (54.7) 156 (98.7) 50 (27.8) 98 (27.1) 41 (41.4) 23 (6.9) <0.001 Dyslipidemia n (%) 137 (53.5) 132 (83.5) 75 (41.7) 332 (92.0) 59 (59.6) 83 (24.8) <0.001 Medications Anti-hypertensive drugs n (%) 180 (70.3) 125 (79.1) 62 (34.4) 139 (38.5) 34 (34.3) 37 (11%) <0.001 Oral glucose-lowering drugs n (%) 122 (47.8) 148 (94.3) 34 (18.9) 63 (17.5) 29 (29.3) 14 (4.2) <0.001 Insulin n (%) 30 (11.8) 83 (52.5) 5 (2.8) 9 (2.5) 3 (3.0%) 2 (0.6) <0.001 Lipid-lowering drugs n (%) 112 (43.8) 95 (60.1) 18 (10.0) 52 (14.4) 10 (10.1) 9 (2.7) <0.001 Statins n (%) 104 (40.6) 81 (51.3) 11 (6.1) 42 (11.6) 5 (5.1) 8 (2.4) <0.001 Liver histology 3 Steatosis grade ≥ 1 n (%) 213 (85.9) 150 (97.4) 150 (85.2) 303 (85.8) 90 (92.8) 213 (64.5) <0.001 Lobular inflammation grade ≥ 1 n (%) 76 (31.4) 83 (54.6) 51 (30.4) 105 (30.1) 53 (55.8) 79 (24.6) <0.001 Ballooning grade ≥ 1 n (%) 29 (12.0) 59 (38.8) 20 (11.8) 23 (6.6) 24 (25.3) 15 (4.7) <0.001 MASH n (%) 16 (6.6) 51 (33.6) 14 (8.3) 16 (4.6) 23 (24.2) 8 (2.5) <0.001 Fibrosis stage ≥ 2 n (%) 26 (11.3) 49 (33.3) 22 (13.3) 21 (6.3) 19 (20.0) 12 (3.9) <0.001 Fibrosis stage 3-4 n (%) 15 (6.5) 32 (21.8) 7 (4.2) 9 (2.7) 15 (15.8) 4 (1.3) <0.001 NAS score 2 (2) 3 (3) 1 (2) 1 (1) 3 (2.5) 1 (2) <0.001 Genetics PNPLA3 rs738409 n (CC/CG+GG) 129 (54.9) 79 (57.7) 95 (59.0) 195 (59.1) 31 (36.0) 189 (61.0) 0.009 TM6SF2 rs58542926 n (CC/CT+TT) 197 (84.9) 118 (86.8) 147 (90.7) 298 (90.0) 69 (80.2) 273 (87.2) 0.42 MBOAT7 rs641738 n (CC/CT+TT) 74 (32.2) 38 (27.5) 48 (29.8) 109 (32.8) 21 (24.4) 104 (33.3) 1 GCKR rs1260326 n (CC/CT+TT) 76 (32.9) 41 (29.7) 54 (33.5) 111 (33.6) 23 (26.7) 105 (33.5) 1 PRS-HFC + 4 0.27 (0.27) 0.26 (0.27) 0.19 (0.33) 0.26 (0.27) 0.39 (0.41) 0.19 (0.27) <0.001 PRS-HFC− 5 0.13 (0.13) 0.13 (0.13) 0.13 (0.13) 0.13 (0.13) 0.13 (0.07) 0.13 (0.13) 1 Data were reported as median (interquartile range) for continuous variables and frequencies (percentages) for categorical variables. Clusters were compared using Kruskal-Wallis test, Chi-squared test, or Fisher’s exact test, as appropriate. Differences were considered statistically significant when p-value(s) adjusted for multiple comparisons using Bonferroni correction, performed separately for clinical data, histological data and genetic data, were less than 0.05. 1 Significant alcohol intake was defined as a daily consumption above 20 g in women and 30 g in men 2 Patients receiving insulin were excluded. 3 Liver histology was available from 1325 participants 4: PRS-HFC + Polygenic Risk Score was calculated with the formula: prs=0.266∗PNPLA3_012 + 0.274∗TMS6F2_012 + 0.065∗GCKR_012 + 0.063∗MBOAT7_012 5: PRS-HFC - Polygenic Risk Score was calculated without PNPLA3 with the formula: prs=0.274∗TMS6F2_012 + 0.065∗GCKR_012 + 0.063∗MBOAT7_012 Abbreviations: ALT, alanine aminotransferase; AST, aspartate aminotransferase; BMI, body mass index; eGFR, estimates of glomerular filtration rate; GCKR, glucokinase regulator; GGT, gamma glutamyltransferase; HbA1c, hemoglobin A1c; HDL, high-density lipoprotein; HOMA2-B, homeostasis model assessment 2 estimates of beta-cell function; HOMA2-IR, homeostasis model assessment 2 estimates of insulin-resistance; LDL, low-den

Extended data

is available for this paper at 10.1038/s41591-024-03283-1.

Supplementary information

The online version contains supplementary material available at 10.1038/s41591-024-03283-1.

Peer review information

Nature Medicine thanks Ewan Pearson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Anna Maria Ranzoni, in collaboration with the Nature Medicine team.

Article Details

DOI	10.1038/s41591-024-03283-1
PubMed ID	39653777
PMC ID	PMC11645276
Journal	Nature Medicine
Year	2024
Authors	Violeta Raverdy, Federica Tavaglione, Estelle Chatelain, Guillaume Lassailly, Antonio De Vincentis, Umberto Vespasiani‐Gentilucci, S. U. Qadri, Robert Caïazzo, Hélène Verkindt, Chiara Saponaro, Julie Kerr‐Conte, Grégory Baud, Camille Marciniak, Mikaël Chetboun, Naima Oukhouya‐Daoud, Samuel Blanck, Jimmy Vandel, Lisa Olsson, Rima Chakaroun, Viviane Gnemmi, Emmanuelle Leteurtre, Philippe Lefèbvre, Joel T. Haas, Hannele Yki‐Järvinen, Sven Francque, Bart Staels, Carel W. le Roux, Valentina Tremaroli, Philippe Mathurin, Guillemette Marot, Stefano Romeo, François Pattou
License	Open Access — see publisher for license terms
Citations	108