Introduction

Systemic lupus erythematosus (SLE) is an autoimmune disease with a wide variety of clinical manifestations that can affect any organ. Approximately 50% of patients with SLE have lupus nephritis (LN) during the course of the disease, and up to 10% of patients with LN develop end-stage renal disease1,2. The mortality rate in patients with LN is higher than that in patients without LN2,3. Previous reports have demonstrated that renal function decline at baseline4,5 and delayed treatment responsiveness6,7,8 and were independent risk factors for poor renal prognosis. Therefore, it is crucial to identify factors that can predict early treatment responsiveness. According to the 2003 classification by the International Society of Nephrology/Renal Pathology Society (ISN/RPS), LN was classified into six classes based solely on the degree of glomerular injury based on renal histopathology9. Of the six classes, classes III and IV are especially important because of the high disease activity and poor renal prognosis in them1. The major scope of the 2003 classification was standardizing the definitions of pathologic findings, emphasizing clinically relevant lesions, and encouraging uniform and reproducible reporting across clinical centres. After the 2003 classification was published, various verification studies have demonstrated its clinical usefulness10,11,12,13 and the high interobserver reproducibility in diagnosing LN14. However, several studies have suggested that further improvements to the 2003 classification are needed15,16,17. Non-glomerular lesions, such as vascular18 and tubulointerstitial lesions19,20,21,22,23, which were not included in the 2003 classification were found to be important in predicting the prognosis in LN. Subsequently, the classification was revised by ISN/RPS in 2016 and published in 201824.

One of the major changes in the 2016 classification was the introduction of the modified semi-quantitative scoring system that included activity index (AI) and chronicity index (CI), which were originally published in 198325. AI and CI were introduced instead of subclass A, A/C, or C used for qualitative assessment of active or chronic lesions in the 2003 classification; subclass A was for purely active lesions, subclass A/C was for any combination of active and chronic lesion, subclass C was for purely chronic lesions9. AI includes pathological findings, such as endocapillary hypercellularity, neutrophils/karyorrhexis, fibrinoid necrosis, hyaline deposits, cellular and/or fibrocellular crescents, and interstitial inflammation. CI includes pathological findings, such as global/segmental sclerosis, fibrous crescents, interstitial fibrosis (IF), and tubular atrophy (TA). Of all these parameters, the scores of fibrinoid necrosis and cellular/fibrocellular crescents were set doubled weight. Notably, the 2016 classification incorporated the evaluation of tubulointerstitial lesions in the quantitative scoring system in the form of AI for interstitial inflammation and CI for IF/TA as opposed to the 2003 classification, which was merely based on the glomerular lesions. Several definitions of the pathological findings have also been revised. To date, the clinical utility of the 2016 classification has not been fully investigated.

The aim of the present study was to investigate the clinical usefulness of the 2016 classification with that of the 2003 classification by evaluating the achievement of complete remission (CR) and renal function decline in adult patients with first-onset class III/IV LN based on the Nagoya Kidney Disease Registry (N-KDR).

Results

Study participants

We screened 233 consecutive patients with LN in our real biopsy registry between January 2004 and December 2014. We enrolled patients who underwent the first renal biopsy, were ≥ 16 years of age, who met ≥ 4 American College of Rheumatology (ACR) criteria26 of SLE, and were classified to have class III or IV LN. We excluded patients with missing medical or pathological records (n = 9), a history of renal function deterioration (n = 4), conservative treatment without immunosuppressive therapy (n = 1), immunosuppression before induction therapy for LN (n = 49), observational period less than a month (n = 1), and evaluable glomeruli less than six (n = 1). Finally, 91 patients were enrolled in this study. We assessed their pathological findings and renal function decline during the observational duration (Analysis 1). Of these, six patients were excluded because of missing adequate follow-up data, and 85 were assessed for CR (Analysis 2). The detailed flowchart is shown in Fig. 1.

Figure 1
figure 1

Flow chart of patient selection. Ninety-one patients with first-onset lupus nephritis were enrolled in this study and assessed for pathological findings and renal function decline during the observational duration (Analysis 1). After excluding 6 patients with missing follow-up data, achievement of complete remission was evaluated for 85 patients (Analysis 2). LN lupus nephritis; ACR American College of Rheumatology.

Baseline characteristics

Baseline characteristics are summarized according to the eGFR27 levels at baseline as lower eGFR group (eGFR < 60 ml/min/1.73 m2, n = 42 [46%]) and higher eGFR group (eGFR ≥ 60 ml/min/1.73 m2, n = 49 [54%]) (Table 1). Patients in the lower eGFR group were older, had heavier proteinuria, more severe haematuria and higher proportion of nephrotic syndrome than those in the higher eGFR group. Anti-dsDNA, serum C3 levels, and SLE disease activity index (SLEDAI)28 scores were not significantly different between the groups.

Table 1 Baseline characteristics (N = 91).

Pathological findings according to the 2003/2016 classification

The proportion of the patients with class IV LN was higher in the lower eGFR group than that in the higher eGFR group (71% [30/42] and 33% [16/49], respectively) (Fig. 2a), while there was no difference in the A and A/C subclasses (Fig. 2b). Both AI and CI were higher in the lower eGFR group (Fig. 2c,d) than those in the higher eGFR group. In the pathological components of AI (Fig. 2e–j), patients in the lower eGFR group had higher scores of cellular/fibrocellular crescents (Fig. 2i) and interstitial inflammation (Fig. 2j) than those in the higher eGFR group. In the pathological components of CI (Fig. 2k–n), patients in the lower eGFR group had higher scores of IF (Fig. 2m) and TA (Fig. 2n) than those in the higher eGFR group. Of all the pathological components, fibrinoid necrosis of 4 or 6 points and global/segmental sclerosis and fibrous crescents with 3 points was not observed in any of the patients (Fig. 2g,k,l). We analysed the relationship between AI/CI and class III/IV LN. The median AI in class IV was higher than that in class III (9 [interquartile range, IQR: 7–13] vs. 4 [IQR: 3–6], respectively). All patients with ≥ 11 points in AI had pathological class IV (n = 19) (Fig. 2o). The median CI in both class III and IV was 2 points, and there was no statistically significant difference between the classes (Fig. 2p).

Figure 2
figure 2

Pathological findings according to the 2003/2016 classification. Baseline pathological findings are described according to the baseline eGFR levels (eGFR < 60 ml/min/1.73 m2, n = 42 and eGFR ≥ 60 ml/min/1.73 m2, n = 49). (a,b) are based on the 2003 classification: class III or IV (a) and subclass of A, C, and A/C (b). (c,d) are based on the 2016 classification: activity index (c) and chronicity index (d). (ej) represent the following pathological components of activity index in the 2016 classification: endocapillary hypercellularity (e), neutrophils/karyorrhexis (f), fibrinoid necrosis (g), hyaline deposits (h), cellular/fibrocellular crescents (i), and interstitial inflammation (j). (kn) represent chronicity index in the 2016 classification: global/segmental sclerosis (k), fibrous crescents (l), interstitial fibrosis (m), and tubular atrophy (n). (o,p) are distributions of activity index (o) and chronicity index (p) in patients with class III and IV LN, respectively. N number, eGFR estimated glomerular filtration rate, LN lupus nephritis. *p < 0.01. **p < 0.001.

Correlation between the baseline characteristics and pathological findings

AI was inversely correlated with eGFR (Spearman’s correlation: Rs = − 0.40) and directly with urinary protein levels (Rs = 0.36), severity of haematuria (Rs = 0.40) and anti-dsDNA antibody level (Rs = 0.35). CI was inversely correlated with eGFR (Rs = − 0.52). Endocapillary hypercellularity was inversely correlated with serum C3 levels (Rs = − 0.43). Cellular crescents were inversely correlated with eGFR (Rs = − 0.34). Interstitial inflammation and IF/TA were also inversely correlated with eGFR (Rs = − 0.55, − 0.56, and − 0.53, respectively) (Table 2). AI was highly correlated with the scores of cellular/fibrocellular crescents (Rs = 0.84). CI was strongly correlated with the scores of interstitial inflammation (Rs = 0.91), IF (Rs = 0.95), and TA (Rs = 0.95). They demonstrated high correlation with each other as well (interstitial inflammation and IF, Rs = 0.93; interstitial inflammation and TA, Rs = 0.95; and IF and TA, Rs = 0.98) (see Supplementary Table S1 online).

Table 2 Correlations between the baseline characteristics and pathological findings.

Medications during the induction therapy, clinical outcomes, and adverse events

The overall median observation period was 51 (IQR: 23–77) months, and there was no statistically significant difference between the groups (p = 0.50) (Table 3). Median interval from renal biopsy to start of the induction therapy was 1 (IQR: − 7–9) day, and there was no statistically significant difference between the two groups (p = 0.85). During induction therapy, prednisolone was prescribed for all patients. The proportion of patients in the lower eGFR group who received methylprednisolone pulse therapy was higher than that in the higher eGFR group (62% [26/42] vs. 51% [25/49], respectively). However, the proportion of patients who received any type of immunosuppressants was not statistically different between the two groups. Of all patients, five were lost to follow-up and four died during induction therapy. Of the remaining, 82 received maintenance therapy, and of these, 66 responded to induction therapy29. There was no statistically significant difference in the content of maintenance treatment between the groups. Overall, 54/85 patients achieved CR; the cumulative incidence of CR in the lower eGFR group was lower (38%, 15/39) than that in the higher eGFR group (55%, 39/46). Overall (n = 91), 16 patients developed 1.5-fold increase in sCr, eight patients had doubling of sCr, and two patients reached end-stage renal disease (ESRD) during the entire observation period. Six patients died, and all of them were in the lower eGFR group. Regarding the adverse events after the initiation of induction therapy, the incidence of steroids-induced diabetes was significantly higher in the lower eGFR group (52%, 22/42) than that in the higher eGFR group (31%, 15/46) (see Supplementary Table S2 online).

Table 3 Medication during 6-month induction therapy and clinical outcomes.

Survival curves for renal function decline and CR

The cumulative incidence of renal event (1.5-fold increase in sCr)-free survival and CR are illustrated in Fig. 3. Time to CR was assessed within 5 years from the initiation of induction therapy because none of the patients achieved CR after 5 years. The baseline eGFR levels were not associated with renal function decline (p = 0.80) (Fig. 3a), but patients in the higher eGFR group were more likely to achieve CR than were those in the lower eGFR group (p < 0.001) (Fig. 3b). Similarly, the presence of nephrotic syndrome was not associated with renal function decline (p = 0.84) (Fig. 3c), but patients without nephrotic syndrome were also more likely to achieve CR than those with nephrotic syndrome (p = 0.006) (Fig. 3d).

Figure 3
figure 3

Survival curves of clinical outcomes. Kaplan–Meier plots are described according to the baseline eGFR levels (a,b), and with or without nephrotic syndrome (c,d). The cumulative incidence of renal event (1.5-fold increase in serum creatinine, sCr)-free survival is indicated on the y-axis (a,c), and that of complete remission is indicated on the y-axis (b,d). Duration (months) from the initiation of the induction therapy is indicated on x-axis. eGFR estimated glomerular filtration rate; NS nephrotic syndrome.

Clinical predictors of renal function decline

Baseline disease activity metrics (i.e. eGFR less than 60 ml/min/1.73 m2, existence of nephrotic syndrome, and levels of anti-dsDNA antibody and serum C3 levels) were not associated with renal function decline (Table 4). Patients with class IV were not significantly different in terms of renal function decline from those of class III (hazards ratio, HR [95% confident interval, CI] 0.58 [0.21–1.60]). Similarly, patients of A/C subclass were also not statistically different from those of A subclass (HR [95% CI] 0.90 [0.33–2.42]). Regarding the 2016 classification, higher CI was associated with renal function decline (HR [95% CI] 1.18 [0.99–1.40]), although higher AI was not associated with it (HR [95% CI] 1.00 [0.88–1.15]. Higher CI was identified as an independent predictor of renal function decline after adjusting for eGFR and urinary protein level (adjusted HR [95% CI] 1.24 [1.01–1.53]) (Model 1 in Table 4). Higher scores of IF and lower scores of hyaline deposits, which were chosen via the forward–backward stepwise selection method, were identified as independent predictors of renal function decline (adjusted HR [95% CI] 2.66 [1.43–4.93], 0.45 [0.21–0.97], respectively). Scores of global/segmental sclerosis were not associated with renal function decline after adjustments for eGFR, urinary protein levels, and pathologically relevant factors (adjusted HR [95% CI] 0.40 (0.14–1.13) (Model 2 in Table 4).

Table 4 Associated factors for renal function decline.

Identification of clinical predictors of CR

Baseline renal function decline was associated with achieving CR, while nephrotic syndrome, anti-dsDNA antibody, and serum C3 levels were not (Table 5). Patients with class IV LN were not significantly different in terms of achieving CR than those with class III LN (HR [95% CI] 0.67 [0.39–1.15]). Similarly, patients with A/C subclass were not significantly different from those with A subclass (HR [95%CI] 0.82 [0.48–1.40]). Regarding the 2016 classification, higher AI or CI was associated with failure in achieving CR (HR [95%CI] 0.89 [0.82–0.96] vs. 0.70 [0.67–0.82], respectively). AI/CI was adjusted for clinically relevant factors, such as baseline eGFR levels and presence of nephrotic syndrome (Model 1 in Table 5). The association between AI and CR was no longer significant after adjustments for eGFR and urinary protein levels (adjusted HR [95%CI] 0.99 [0.91–1.08]). Higher CI was identified as an independent predictor of failure in achieving CR (adjusted HR [95%CI] 0.75 [0.64–0.88]). Cellular crescents were associated with CR; however, they were not selected by the forward–backward stepwise selection method. The scores of interstitial inflammation were also adjusted for eGFR and urinary protein levels (Model 2 in Table 5. Higher interstitial inflammation score was identified as an independent predictor of failure in achieving CR (adjusted HR [95%CI] 0.39 [0.25–0.61]).

Table 5 Associated factors for complete remission.

Discussion

We demonstrated the clinical usefulness of the 2016 classification based on a multivariable model approach, in which clinically relevant factors, such as eGFR and urinary protein levels were taken into consideration. Detailed analysis of the 2016 classification allowed us to better comprehend the clinical importance of evaluating the interstitial lesions. This is the first study to evaluate the utility of the 2016 classification in patients with first-onset class III/IV LN by comparisons with the 2003 classification in terms of predicting clinically important outcomes, CR, and renal function decline.

In the present study, CI was associated with renal function decline and CR independently of eGFR and urinary protein levels mainly due to its high correlation with the scores of interstitial lesions. Both AI and CI were predictive of CR. Of the components of AI, interstitial inflammation was associated with CR, and of the components of CI, IF was independently associated with renal function decline. Therefore, it is crucial to assess interstitial lesions in order to predict renal prognosis in patients with LN. In contrast, AI was not associated with CR after adjusting for eGFR and urinary protein levels. Cellular crescents, which were highly correlated with AI, had moderate correlation with eGFR and urine protein levels. These correlations probably attenuate the association of AI and CR. Therefore, we demonstrated the utility of CI and importance of assessing interstitial regions in predicting renal prognosis, as previously reported19,20,21,22,23.

In our study, however, we did not identify active glomerular lesions as potential risk factors of poor renal prognosis. Crescentic lesions and fibrinoid necrosis were not associated with renal function decline in our study, although previous reports showed them as indicators for poor renal prognosis4,19,23,29,30. Hyaline deposits were rather inversely correlated with renal function decline in the present study while Austin et al.25 adopted it as an active indicator associated with prognosis. A recent research for clinical and histopathologic predictors of renal outcomes for LN demonstrated that wire loops, or hyaline deposits, were associated with eGFR recovery rather than decline23. This is consistent with our results. There are two possible reasons for these discrepancies. One is the improvement of treatment for LN over time. Our patients received immunosuppressant therapy depending on their disease activities, and as high as 80.5% of them responded to the treatments accordingly. Of active glomerular lesions, hyaline deposits, or subendothelial deposits, might represent an early pathological change of LN that was likely to heal easily by immunosuppressive treatment. Another reason is the differences in the background of patients. Most of the previous studies included first-onset LN patients as well as those who had already been treated for SLE. In contrast, we included only first-onset LN patients without previous immunosuppressive treatments. Because active glomerular lesions of LN were considered to be reversible, we believe that they did not reflect the long-term renal prognosis in our study.

We suggest that the pathological classification system should be improved by investigating the effects of each pathological component through an evidence-based process such as the MEST score in the Oxford classification of IgA nephropathy31. Our results suggest that treatment resistance factors, such as interstitial lesions and treatment response factors, such as hyaline deposits should be considered separately. Further investigations are required to identify the pathological findings that are associated with the clinical outcomes and determine their weightages in the scoring system.

There were several limitations to this study. First, this was a retrospective observational study. However, to the best of our knowledge, this is the largest multi-centre cohort study of adult patients with first-onset class III/IV LN. These results can be generalizable in various clinical settings. Second, there might have been substantial differences in the treatment strategies between the hospitals. There was no unified protocol for the treatment, and it was decided at the discretion of the doctors. The potential differences in the treatment strategies over the course of the study period might have also affected the clinical course of LN. However, these results reflect the real-world data and have high generalizability.

In conclusion, we demonstrated that comprehensive and quantitative assessments of the renal biopsy specimen based on the 2016 classification can provide useful information to predict the renal prognosis in patients with first-onset class III/IV LN. Of the pathological findings, interstitial lesions were strong predictors of both short- and long-term renal prognoses. Further prospective validation studies are currently underway.

Methods

Patient selection and study design

This study was a retrospective, multi-center cohort study. Primary LN was diagnosed in 233 consecutive patients from N-KDR between January 2004 and December 2014. Inclusion criteria were as follows: (1) diagnosed at first-biopsy, (2) aged over 16 years, (3) fulfilled 4 and more ACR criteria26, and (4) diagnosed with class III/IV LN. Exclusion criteria were as follows: (1) no medical or pathological records, (2) history of renal function decline, (3) no induction therapy, (4) previous immunosuppression, (5) less than 1-month observation period, and (6) total evaluable number of glomeruli less than 6. A history of renal function deterioration was defined as follows: (1) renal atrophy at diagnosis or (2) continuous decline in estimated glomerular filtration rate (eGFR) < 60 ml/min/1.73 m2 within 3 months prior to diagnosis. Induction therapy was defined as the 6-month immunosuppressive medications for remission induction for LN. Previous immunosuppression was defined as history of other immunosuppressive therapies before ≥ 2 weeks of initiation of induction therapy for LN. Overall, 91 patients with first-onset class III/IV LN and new prescriptions of any immunosuppression were observed between January 2004 and July 2016; the observations were performed until ESRD or death, whichever was early, or the last available data of urinary proteins or sCr. All of them were followed up at the following 16 nephrology centres: Nagoya University Hospital, Anjyo Kosei Hospital, Ogaki Municipal Hospital, Kasugai Municipal Hospital, Ichinomiya Municipal Hospital, Konan Kosei Hospital, Japanese Red Cross Nagoya Daiichi Hospital, Yokkaichi Municipal Hospital, Handa City Hospital, Tosei General Hospital, Chubu Rosai Hospital, Chutoen General Medical Center, Toyota Kosei Hospital, Gifu Prefectural Tajimi Hospital, Tsushima City Hospital, and Nagoya Memorial Hospital. All patients provided written informed consent. The study was approved by the Ethics Committee of the Nagoya University (approval number: 2010-1135-4) and adhered to the Declaration of Helsinki.

Baseline characteristics

The baseline was defined as the time just prior to the initiation of induction therapy for LN. The clinical data included the sex, age, sCr, eGFR, which was estimated using the equation recently proposed by the Japanese Society of Nephrology: eGFR [ml/min/1.73 m2]0.194 × sCr−1.094 × Age−0.287 × 0.739 [if female]27, anti-dsDNA antibody level, serum C3 level, 24-h urinary protein excretion (g/day) or urinary protein-to-creatinine ratio (g/gCr), haematuria, and SLEDAI score28. The severity of haematuria expressed as −/+/++/+++. Nephrotic syndrome was defined as urinary protein ≥ 3.5 g/day or urinary protein-to-Cr ratio ≥ 3.5, and serum albumin < 3.0 mg/dl.

Pathological findings

All the patients (n = 91) were assessed renal pathological findings which were assessed according to the ISN/RPS 20039 and 201624 classifications. All of the biopsy samples from 16 facilities were processed at the department of Nephrology in Nagoya University Hospital. Renal biopsy specimens were evaluated under light microscopy separately by two nephrologists (A.H and M.K) under the supervision of one experienced nephropathologist (M.N). The stains used included periodic acid Schiff (PAS), periodic acid-methenamine-silver (PAMS), and Masson’s trichrome stains. In cases of conflicting interpretations, conclusion was derived based on discussions. The scores of AI and CI were calculated based on the 2016 classification.

Medications during induction and maintenance therapy

All drugs used during induction and maintenance therapy were investigated. Induction therapy was defined as the immunosuppressive therapy for the first 6-month of treatment for LN. Maintenance therapy was defined as the immunosuppressive therapies administered after the 6-month induction therapy. The drugs included prednisolone, methyl prednisolone pulse, calcineurin inhibitors (cyclosporine or tacrolimus), cyclophosphamide, azathioprine, mizoribine, mycophenolate mofetil and rituximab.

Adverse events

Adverse events after the initiation of induction therapy included cardiovascular disease, cerebrovascular disease, femoral head osteonecrosis, steroids-induced diabetes, gastric ulcers, first infectious disease that required hospitalization, herpes zoster or cytomegalovirus infections that required medications, and cancer. Steroids-induced diabetes was defined as initiating new antidiabetic medications after the initiation of induction therapy.

Definition of clinical outcomes

The primary outcome was renal function decline, which was defined as 1.5-fold increase in sCr or 50% increase in sCr from the baseline level. The secondary outcome was the achievement of CR, which was defined as achievement of both proteinuria < 0.5 g/gCr or g/24 h and recovery of normal renal function32. Normal renal function was defined as (1) returning to the sCr levels before the onset of LN or (2) sCr < 1.0 mg/dl (if male) and < 0.7 mg/dl (if female) if the past sCr level was unknown. Treatment response to induction therapy was assessed at 6 months after the initiation of induction therapy, which was defined as both ≥ 50% decrease in proteinuria from the baseline to at least sub-nephrotic levels and stabilization (± 25%) or improvements in sCr (but not completely reverting to normal)33. Doubling of sCr was defined doubling of sCr level from the baseline value. ESRD was defined as the disease stage that required initiation of dialysis or renal transplantation.

Statistical analysis

Continuous variables with asymmetric distribution are presented as median [IQR]. Categorical variables are expressed as percentages. Spearman’s correlation coefficients were used to examine the relationships between the continuous variables. The cumulative probability of attaining the outcomes was calculated using the Kaplan–Meier method, and log-rank test was employed for hypothesis testing. The time-to-clinical outcomes were calculated between the date of the initiation of induction therapy and the date of the clinical outcomes. Loss to follow-up, ESRD, and all-cause death were censored. In order to use the 2016 classification for quantitative prognostic evaluation, we performed exploratory investigation of their mutual correlation and relevance to the renal prognosis using Rs. The proportional hazards assumption for covariates was tested using scaled Schoenfeld residuals. Both baseline and pathological data were examined using univariable and multivariable Cox’s proportional hazards models in order to identify independent predictors associated with the clinical outcomes. Covariates included both the clinical and pathological findings, and we selected pathological components using a stepwise method to avoid multicollinearity of these findings. All statistical models were performed using complete case analysis. The level of statistical significance was set at p value < 0.05. All statistical analyses were performed using Stata SE v14.0 (STATA Corp, 4905 Lakeway Drive College Station, Texas 77845-4512, USA, www.stata.com).