Introduction

Although differentiated thyroid carcinoma (DTC) is the most frequent endocrine pediatric carcinoma, it is a rare disease in childhood and among adolescents [1]. Its incidence increases over the entire adolescence and particularly among females [2]. At the onset DTC, pediatric patients have higher risk of cervical lymph node and lung metastases, when compared to the adult counterpart [3]. However, the prognosis is favorable, and the mortality rate is very low [4,5,6]. In this particular setting, customizing the risk of relapse of each patient may avoid unbeneficial treatment.

For the first time in 2015, American Thyroid Association (ATA) published specific guidelines for pediatric patients affected by DTC [7]. These guidelines introduced a new classification, based on post-surgical features (i.e., TNM), which divides DTCs in patients at low, intermediate, and high risk of relapse. The goal of this stratification system was not strictly to define the risk of disease mortality but rather to identify those patients at higher risk of disease persistent/recurrence with the aim to recognize who should undergo further treatments. However, this risk classification derived directly from the adults’ guidelines [3] has never been extensively validated in large pediatric population.

The initial treatment consists of total thyroidectomy followed by postoperative radioactive iodine therapy (RAIT) whenever indicated. The effective role and benefit of RAIT has never been fully clarified, especially in children at low and intermediate risk, whereas a significant benefit in terms of recurrence rate and overall survival was reported in DTC patients at highest risk [8]. The goal would be to maintain the low disease-specific mortality currently experienced by children with DTC as well as to reduce the potential complications of therapy and over-treatment [9].

Since an increasing incidence of the DTC has been observed [10], the detection of factors predicting treatment response and survival is of clinical interest.

In adults, the pre-ablation stimulated thyroglobulin (sTg) had high predictive value in identifying the risk of disease persistence/recurrence after initial treatment and overall survival [3, 8, 11] regardless the DTC stage. Instead, the role of sTg in DTC pediatric patients remains to be fully established [12,13,14,15]. At the same time, the usefulness of ATA risk classification in children with DTC has been evaluated in few studies [16,17,18,19,20,21], and also the 1-year response to initial therapy categories presented in the 2015 adults’ guideline (i.e., excellent response, biochemical incomplete response, structural incomplete response and indeterminate response) [3] has been tested few cases of pediatric population.

In this clinical scenario, the aim of this study was to investigate the role of some risk factors, such as sTg and ATA risk classification, in predicting treatment response at 1 year after RAIT.

The second aim was to investigate whether the same variables and the dynamic risk classification proposed 1 year after RAIT derived by adults ATA guidelines [3] may predict the last disease status.

Materials and methods

Patients

The DTC databases of six Italian Nuclear Medicine Departments were retrospectively screened to retrieve all DTC pediatric patients (age 0–18) treated between 1990 and 2020.

Inclusion criteria were histological diagnosis of DTC; age less or equal to 18 years at the time of diagnosis; surgery (thyroidectomy plus lymph-node dissection, if necessary) and RAIT performed; and at least 12 months of follow-up after RAIT.

Exclusion criteria were more than 18 years old; the absence of at least 12 months of follow-up; and the lack of RAIT performed. All selected DTCs were reclassified according to the latest TNM version [22].

About the exclusion criteria, we decided to include also patients with previous radiotherapy or chemotherapy treatments (so with a previous cancer) to evaluate the potential predictive and prognostic role of this feature.

All patients were admitted to our Nuclear Medicine Departments for the ablation of thyroid remnant and any subsequent radiometabolic therapies if needed, according to AIMN (Italian Association of Nuclear Medicine), EANM (European Association of Nuclear Medicine), and ATA guidelines differently applied according to the version in force during the patient’s treatment period [3, 7, 22]. The administered activity of RAI was established according to the risk class based on the TNM staging of the American Joint Committee on Cancer/International Union against Cancer currently in use, the status of the disease and on a scintigraphic evaluation with an empirical approach. In selected cases, scintigraphy was implemented by dosimetry (when feasible and available) and RAI administered estimated with a dosimetric approach.

The main epidemiological features (gender, age at diagnosis, puberty status, familiarity for DTC, previous oncological diseases, and related therapies), post-surgical histopathological data (i.e., tumor histology, tumor size, capsular invasion, vascular invasion, presence of Hashimoto thyroiditis, multicentricity, lymph node, and distant metastases involvement), biochemical data at the time of the first RAIT (TSH, Tg, anti-Tg antibody measurements), ATA class risk (low, intermediate, or high) [7], and information related to the RAI administration (first RAI activity administrated, total amount of RAI administered, total number of radiometabolic therapies) were collected.

This study was approved by the ethics committee of ASST Spedali Civili di Brescia Hospital (NP 4258) as principal investigator and by ethics committee of other hospitals.

Laboratory analysis

Serum Tg was measured using immunoradiometric assays (radioimmunoassay, electrochemiluminescence, chemiluminescence-ECLIA) according to each local institute protocol and expressed as ng/ml. Normal ranges were 0–40 ng/ml (radioimmunoassay), < 55 ng/ml (electrochemiluminescence), and 3.5–77 ng/ml (ECLIA), respectively. Tg antibodies (TgAb) were measured using the passive agglutination method and chemiluminescence and dichotomized in positive (for Tg interference) or negative (not interference) according to each institutional normality range. TgAb was dichotomized in positive and negative according to the normal values of the local laboratory measurement. TSH levels were measured by electrochemiluminescence immunoassay and expressed in mIU/l.

Follow-up and outcomes

The median follow-up time was 133 ± 100 months (range 12–360 months). Patients were followed every 6–12 months with physical examinations and laboratory measurements (Tg on levothyroxine therapy or sTg in selected cases, TgAb, and TSH level) and with several imaging procedures, such as neck ultrasound, RAI diagnostic whole body scan (DxWBS), or other as appropriate. The disease status was registered and updated at each evaluation. DxWBS was performed after the administration of 370 MBq of RAI about 12 months after the first radiometabolic therapy in all cases, except of patients that received a new RAIT for persistence of disease or incomplete response.

The clinical status of each patient one-year after the first RAIT has been re-classified according to the therapy categories classification of 2015 ATA guidelines [3], which identifies four classes of response (excellent response, biochemical incomplete response, structural incomplete response, and indeterminate response) based on a combination of imaging (RAI WBS, neck ultrasound, and any additional imaging exams) and biochemical (TSH, Tg, and TgAb) findings.

Any additional therapies (surgery or further RAIT) were decided based on either radiological evidence of persistent/progressive disease (structural disease), evidence of persistent positive Tg, or rising Tg/TgAb (biochemical disease) in compliance with the guidelines in force at that time.

Furthermore, at the last control, the patients were classified as having no evidence of disease (NED) or with persistent disease, based on a combination of laboratory and imaging features. NED was defined by the absence of cervical lymph node metastases or local relapse on a recent neck ultrasound, no evidence of structural disease and either suppressed Tg < 1 ng/ml or sTg < 2 ng/ml (without the presence of TgAb) [23]. In case of positive TgAb, NED was considered if declining anti-Tg antibody and concomitant negative neck ultrasound and/or negative DxWBS [23]. Instead, pediatric patients who did not fulfill the previous mentioned criteria for NED at the last follow-up were defined as having persistent disease.

Statistical analysis

Statistical analysis was carried out using Statistical Package for Social Science (SPSS) version 23.0 for Windows (IBM, Chicago, Illinois, USA) and MedCalc Software version 18.1 (Ostend, Belgium). The descriptive analysis of categorical variables was summarized by the calculation of simple and relative frequencies while the continuous variables by median, mean, standard deviation, and range values. The statistical significance of the continuous variables was tested with a Student’s t-test or Mann–Whitney’s U-test, and a χ2 test was performed for the categorical variables. A p value of ≤ 0.05 was considered statistically significant. The Youden index from the receiver operating characteristic (ROC) curve was applied as a criterion for selecting the best threshold point for sTg to predict 1-year treatment response (excellent response vs not excellent response) and this result was compared with several sTg thresholds (2, 5, and 10 ng/ml) suggested in ATA guidelines [7]. Hazard ratios with 95% confidence intervals (CI) were calculated by univariate and multivariate Cox regression analysis.

For the evaluation of the last disease status, only patients with more than 3 years of follow-up were included.

Univariate logistic regression analysis was applied to evaluate the ability of the main clinicopathological factors to predict long-term outcome (NED vs persistence of disease). Multivariate regression analysis was applied to evaluate the ability of the aforementioned features. A two-tailed p < 0.05 was considered statistically significant.

Results

Patient selection

Totally 303 pediatric patients with a histological diagnosis of DTC were initially recruited in the study. Eighteen patients were lost immediately after the diagnosis and were excluded by the analysis. All the remaining 285 patients underwent a near-total thyroidectomy, associated with central neck dissection in 170 cases and lateral neck dissection in 102 cases.

All patients underwent total thyroidectomy; all operations were performed by local experienced thyroid surgeons (all with more than 10 years of experience in thyroid surgeries). Prophylactic central lymph node dissection was not routinely performed, but only in case of suspected or pathologically confirmed N1a disease or when advanced primary tumors (T3 or T4) were noted. Lateral compartment neck dissection was performed in case of clinically suspicious or pathologically (based on imaging and/or cytology) confirmed N1b disease.

After surgery, 276 out of 285 (97%) patients received RAIT (eight patients affected by papillary microcarcinoma and one died few days after thyroidectomy for surgical complications did not) (Supplemental Fig. 1).

Fig. 1
figure 1

ROC curve analysis which evaluated the role of stimulated Tg in ng/ml for predicting treatment response 1 year after RAIT

A second operation was performed only in six cases when local relapse or nodal disease was discovered.

Before the first RAI administration, levothyroxine was discontinued for 20–30 days in two hundred and forty-seven patients, while in the remaining 29 patients, recombinant human thyrotropin (rhTSH) (Genzyme Corporation) was administered intramuscularly with a dose of 0.9 mg on 2 consecutive days during treatment with levothyroxine; in these patients, RAIT was administered the day after the second rh-TSH injection. rh-TSH injection was an off-label use authorized by local center, and a specific informed consent was signed by each patient and previously investigated by other authors in the diagnostic field [24].

Patient features

The patients’ age ranged from 4 to 18 years with a median age of 15; there was a prevalence of female (F:M = 2.2:1).

In most cases, the diagnosis of DTC was done after 2010 (n 120), with a prevalence higher than the previous 2 decades (1990–2002 and 2001–2010).

All pediatrics had a histopathological diagnosis of DTC: 154 classic variant of papillary carcinoma, 59 follicular variant of papillary carcinoma, 41 aggressive papillary variants (17 tall cells variant, 11 diffuse sclerosing variant, 2 columnar-cell variant, 7 solid variant and 1 hobnail variant of papillary carcinoma, and 3 poorly differentiated carcinoma), 18 follicular carcinoma (10 were minimally invasive and 8 widely invasive), 3 Hurtle cell carcinoma, and one case of noninvasive follicular thyroid neoplasm with papillary-like nuclear features (NIFPT).

Tumor size of primary carcinoma was 22 ± 14 mm (range 9–90 mm). Multicentricity of neoplastic lesions was present in 100 cases (37%), capsular invasion in 138 (50%), and vascular invasion in 89 (32%).

The median administered activity of first RAIT was 2.7 GBq (interquartile 1–3.7), while the median cumulative RAI activity administered per patient was 3.7 GBq (interquartile 1.1–37). RAI activities administered depended on several factors such as surgery extent, tumor size, presence of metastases, pubertal stage, and body weight. Often, the choice of activity had been discussed in the local multidisciplinary tumor board groups. For children in pre-puberty or post-puberty with low-risk disease, the activity administered was usually 1.1 GBq; in case or post-puberty with intermediate-high risk, the median activity was 2.5 GBq with a maximum of 3.7 GBq.

Thirty-four out of 276 patients had a previous history of oncological disease: chronic lymphatic leukemia (CLL) in 19 cases, Hodgkin lymphoma in 4, medulloblastoma in 2, neuroblastoma in 2, non-Hodgkin lymphoma in 1, breast cancer in 1, Wilms kidney tumor in 1, astrocytoma in 1, acute myeloid leukemia in 1, dysgerminoma in 1, and a concomitant breast and clear cell renal carcinoma in 1. Among them, 32 underwent radiotherapy and/or chemotherapy and two surgical interventions. In patients without TgAb interference (n = 203), sTg at the time of first radioiodine treatment was 92 ± 482 ng/ml (range 0.04–5940); TgAb were present in 73 patients (26%). The patient’s features are summarized in Table 1.

Table 1 Baseline features of our 276 patients

Treatment response after 1 year

At 1 year after the first RAIT, 146 (53%) children showed excellent response, 37 (14%) indeterminate response, and 91 (33%) incomplete response. The remaining 2 patients were lost during the follow-up before the first year (Table 1). Among incomplete response, 79 had both biochemical and structural incomplete response, 7 only biochemical incomplete response, and 4 only structural incomplete response.

At univariate analysis, children with excellent response at 1 year after RAIT were significantly younger (p = 0.043), had less frequently capsular (p < 0.001) and vascular invasion (p < 0.001), had smaller tumor size (p = 0.007) and sTg (p < 0.001) and less frequently presence of TgAb (p = 0.036), and presented more frequently ATA low-risk disease (p < 0.001) compared to not excellent response group (including indeterminate response and incomplete response) (Table 2). Only sTg confirmed to be an independent predictor of treatment response at multivariate analysis (HR 5.02, p < 0.001). However, also baseline anti-Tg antibodies confirmed to be more frequent in not excellent response group than excellent response group (HR 2.22, p = 0.040). ROC analysis showed that a pre-ablation sTg ≥ 27.2 ng/ml was the best threshold to discriminate excellent response from not excellent response with a sensitivity of 65.1% (95%CI 54.1–75.1), specificity 86.7% (95%CI 79.1–92.4), and AUC of 0.747 (Fig. 1). In fact, the 81 patients with a sTg ≥ 27.2 ng/ml showed a higher rate of not excellent response 1 year after RAIT (79%) as compared with those 182 patients with sTg level < 27.2 ng/ml (not excellent response rate 23%) (Fig. 2).

Table 2 Comparison between DTC patients with excellent response and not excellent response 1 year after RAIT
Fig. 2
figure 2

Comparison between 1-year response rate in patients with high or low baseline stimulated Tg after 1 year

Investigating the principal sTg thresholds reported in literature (such as 10 and 2 ng/ml) [7], the diagnostic performances of relative ROC curves were less accurate in predicting treatment response (Table 3).

Table 3 Comparison between diagnostic accuracy of different sTg thresholds for predicting treatment response 1 year after RAIT

Last disease status

After a median follow-up of 133 ± 100 months (range 12–360 months), NED was reported in 240 cases (87%), while persistent disease was observed in the remaining 34 (13%). Except of one patients died after surgery, all the others included in the study were alive at the last control.

Overall, of the 34 patients with persistent disease at last follow-up, 9 had evidence of structural disease, 9 of biochemical disease, and the remaining 16 both biochemical and structural disease.

At univariate analysis, sTg (as continuous variable and dichotomized with 27.2 ng/ml), 1 year treatment response categories, RAI first activity administered, total amount or RAI administered, and total number of RAIT were significantly correlated with the last disease status (Table 4). At multivariate analysis, only sTg (as continuous and cutoff derived) and 1-year treatment response categories were significantly associated with the risk of persistent disease (p = 0.023, p = 0.029, and < 0.001, respectively).

Table 4 Univariate and multivariate analyses for predictor of NED

Particularly, 25% of patients with baseline sTg ≥ 27.2 ng/ml showed structural o biochemical disease, while only 8% of patients with sTg < 27.2 ng/ml showed disease persistence at the end of follow-up (p < 0.001). Moreover, among patients with excellent response 1 year after the first RAIT, only 1% showed disease persistence, while among patients with excellent response, 26% showed structural or biochemical disease at the end of follow-up (p < 0.001) (Fig. 3).

Fig. 3
figure 3

Evaluation of last disease status in patients with high or low sTg (A) and with excellent or not excellent response 1 year after RAIT (B)

Among 146 pediatrics with excellent response 1 year after RAIT, only two had structural recurrences (appearance of lung metastases in both cases) 2 years and 6 years after the diagnosis. They underwent four and five radiometabolic therapies for a total of 21.9 and 23.6 GBq of RAI, respectively, and showed persistent disease at the last follow-up.

Instead, among patients with 1-year incomplete response, a new surgery was performed in 6 cases, a second RAIT in 26 patients, and more than 2 RAITs in 60 cases. Finally 62 (68%) were NED, and the remaining 29 (32%) had persistent disease.

In the indeterminate response group (n = 37), only 10 patients followed a conservative protocol without receiving new therapies, while the other 27 underwent a new RAI treatment (in most cases, 22, only another radiometabolic therapy and in the remaining 5 more than 2 treatments) (Fig. 4).

Fig. 4
figure 4

Flowchart of patients management after first RAIT

Discussion

In this multicentric study, we collected data from six different Italian centers widely distributed from Northern to Southern Italy. Despite the relative long period included (1990–2020) and the different geographical distribution, all center followed the same international guidelines to manage DTC patients, allowing to share these analyses.

The main aim of this study was to investigate the reliability of epidemiological, clinical, pathological, biochemical, and RAI-related variables to predict disease persistence/relapse 1 year after initial treatment (i.e., thyroidectomy + RAIT) and long-term survival in pediatric patients affected by DTC.

As the main finding, we demonstrated that sTg can predict excellent response 1 year after the first RAIT with the best cutoff value of 27.2 ng/ml. In this context, sTg seems to be the principal predictive value of disease response with a significantly higher impact on risk assessment when compared with the pediatric 2015 ATA guidelines risk classification. Moreover, we found that high levels of sTg and lack of excellent response 1 year after RAIT are parameters significantly associated to final outcome.

A risk-stratification system for recurrence/persistence of disease and a treatment response categories system after RAIT are the milestone of the recent ATA guidelines [3, 7] and are essential for providing the best care available for DTC patients.

DTC is usually a disease with good prognosis and optimal therapy response but may be associated with an excess of follow-up procedures, such as unnecessary surveillance, diagnostic tests, and medical appointments and even with overtreatment. This issue may have even more impact in pediatric patients, since these patients are expected to have lifelong follow-up and higher risk of RAIT-related side effects.

On the other hand, the early detection of those pediatric patients with aggressive disease and poor prognosis goes in the same direction as the purpose of a personalized therapy and management in the light of precision medicine, maximizing the efficacy and the necessity of RAIT.

In this context, the dynamic risk stratification and the individualization of treatment response categories proposed by the 2015 adults’ guidelines are particularly helpful but never applied in the pediatric context [3]. However, these classification systems are validated and shared mainly in adults [3] and not yet for pediatric population.

One of the major biases of pediatric DTC guidelines is the paucity of specific data and studies available, despite the fact that the peculiarity of the presentation, prognosis, and management of DTC in children has been clarified [4,5,6]. Thus, extrapolating and applying results from studies of adult population in pediatric field might lead to equivocal conclusions.

It is well known that total thyroidectomy and RAIT had a positive impact on disease free survival in pediatric patients [25,26,27,28], but the indication to RAIT is nowadays under debate especially in low-risk patients [7].

Awareness of the complications of treatment assumes increasing importance, making it imperative to balance the risks of treatment against potential gains from aggressive therapies; moreover, it makes mandatory a discussion of these potential risks with the patient and their parents throughout the course of their treatments.

Before us, only few studies have investigated the role of dynamic risk stratification in pediatrics with controversial results [12, 17, 20]. First, Lazar et al. [20] demonstrated that patients with excellent response had a better final prognosis when compared with not excellent response; even, all patients with incomplete response remained with persistent disease at the last control. Sung et al. [17] showed similar evidences reporting a higher risk of recurrent/persistent disease in the indeterminate and incomplete response group compared to excellent response. Indeed, our findings are in partial agreement with these papers [17, 20]. In a recent paper [29], response to therapy together with age and ATA risk predicted significantly event-free survival.

We found that patients with excellent and indeterminate response to initial treatment had similar good prognosis and a significant number of patients with incomplete response who were NED at the last follow-up.

Our results are concordant with those of Zanella et al. [12], showing in a relative small sample who underwent RAIT that only the dynamic risk stratification was independently associated with the last disease status (i.e., only univariate statistical model has been evaluated).

Beyond the importance of the 1-year treatment response classification, we demonstrated that sTg is associated with the final outcome and may predict response to RAIT. Tg is a specific and sensitive protein used as marker for the presence of follicular thyroid cells, and serum Tg measurement is a cornerstone tool in the management of DTC patients and is reasoned as the most sensitive method to detect persistent or recurrent disease. Few studies demonstrated the usefulness of sTg as prognostic factor in small and heterogeneous DTC pediatric populations and/or mixed with adults [12, 14, 16, 21, 30] and suggested various sTg cutoff values ranging from 10 to 37.8 ng/ml. This wide interval is directly related to the patients features recruited. For example, Klain et al. [14] including 45 patients with low to intermediate risk DTC demonstrated that a sTg of 10 ng/ml had the best accuracy to predict persistence of disease (sensitivity of 81%, specificity of 100%). On the contrary, Zanella et al. [12] proposed as reliable cutoff a sTg of 37.8 ng/ml having a sensitivity of 81% and a specificity of 100%. In our study, we got a threshold of 27.2 ng/ml to predict excellent response 1 year after RAIT and associated with disease outcome. This value seems to be more related to the clinical practice being derived from a larger sample and from different centers.

To validate the usefulness of this value, we compared it with other cutoff values proposed in literature such as 2 and 10 ng/ml [7]. Particularly, 10 ng/ml is a threshold often recurring and suggested also by Francis et al. [7], despite this value is not directly derived by pediatric population. Moreover, according to a meta-analysis [8], the best cutoff of sTg for predicting persistent disease in adult DTC patients is 10 ng/ml. The threshold proposed by us (27.2 ng/ml) is relatively higher than that proposed for adults, probably associated to the higher tumor burden of disease usually present at diagnosis in children, the presence of more frequently well-differentiated thyroid cells (a sort of more iodine-avidity cells), and the consequent best response to RAIT. However, the treatment response is optimal with an excellent response rate after 1 year registered in more than 50% and complete remission observed at the last control in 87%. Furthermore, a direct comparison between 10 and 27.2 ng/ml in the evaluation of the last disease status demonstrated a better performance of 27.2 ng/ml.

These observations might help to guide the follow-up of young patients with DTC, differentiating those who require a less intensive treatment from those who are candidates for more aggressive therapies and intense follow-up. However, it is fundamental to underline that all our patients received RAIT, and it could be reasonable to believe that the optimal response to therapy and good prognosis achieved is related to RAIT. In this context, analyzing our data is not possible to propose to avoid RAI in patients with low sTg at ablation.

Instead, ATA class risk stratification does not seem to have a predictive role in our analysis, and this evidence is similar to others [12, 20]. Thus despite the initial risk, the patients that received RAIT had a good response and optimal prognosis.

Our study contains several limitations, including its retrospective design, the potential use of heterogeneous management approaches over a relatively long period included, and the heterogeneity of laboratories methods applied related to each institutional protocol.

On the other hand, we must stigmatize the impossibility of having a homogeneity of approaches, methodologies, and tools considering the long period of 30 years in which the analysis has been retrospectively carried out. Despite the evident and significant methodological limitations, which are unavoidable, we still consider the results obtained from a real-life clinical practice in such a delicate sector as the pediatric one to be interesting and possibly useful. Further studies are desirable, but the data obtained can be considered a new and possibly useful piece in the mosaic of the clinical scenario of DTC in the pediatric field.

In conclusion, in pediatric DTC, a pre-ablation sTg ≥ 27.2 ng/ml is significantly associated with 1-year treatment response and with the risk of long-term persistent disease and therefore should be considered the principal factor that enables to identify patients who may need more intensive surveillance. Besides, 1-year response categories may also serve to predict the disease status at the last control.