Introduction

Organ preservation constitutes a paradigm shift in the management of patients with rectal cancer. One of the main reasons for exploring organ preservation strategies is the potential to preserve anorectal function, thus avoiding the need for permanent colostomy and maintaining quality of life (QoL)1. Deteriorations in several parameters of bowel function — including urgency, frequency, incontinence and bowel movement clustering — can occur with variable frequency in patients with rectal cancer who receive low anterior resection after neoadjuvant chemoradiotherapy (CRT). The number of clinical trials examining organ preservation strategies such as nonoperative management (NOM) or local excision (LE) only after CRT in patients with rectal cancer is progressively increasing1. Habr-Gama and colleagues were the first to implement a selective NOM approach in patients with resectable rectal cancers with a clinical complete response (cCR) following CRT2. Since this initial study, data from several studies, including the International Watch and Wait database analysis, indicate that deferral of surgery in patients with a cCR seems to be oncologically safe; although more randomized data are needed to confirm both the long-term oncological outcomes and the superiority of organ preservation in terms of QoL, as assessed using patient-reported outcomes (PROs)3,4,5,6,7,8,9,10,11. LE, using either transanal endoscopic microsurgery or transanal minimally invasive surgery, is an alternative organ preservation approach for selected patients with small T1–T3 rectal cancers and a good response after CRT, as demonstrated in the CARTS, TREC and GRECCAR2 trials9,12,13,14. The ongoing STAR-TREC trial (NCT02945566) is exploring the value of NOM and LE, depending on the degree of response after neoadjuvant treatment in patients with early-stage (cT1–T3bN0) disease. LE alone is an effective primary treatment option for selected patients with certain early-stage rectal cancers (such as stage cT1N0 without adverse histopathological features) that has been shown to reduce the risk of morbidity without jeopardizing long-term oncological outcomes15,16,17.

In accordance with the findings of the Definition for the Assessment of Time-to-event Endpoints in Cancer (DATECAN) trials18, we provided recommendations on the use of clinical and surrogate end points in the different phases (I–III) of rectal cancer trials in 2020 (ref.19). However, standardization of the key outcome measures for trials involving organ preservation approaches is currently lacking. Trials involving organ preservation approaches thus far are characterized by marked heterogeneity in selection criteria, treatment strategies, choice of end points and design, all of which limit the accuracy of both data interpretation and comparison between studies. Hence, an international consensus is needed to ensure consistency, and thus to facilitate appropriate data collection, interpretation and the comparison of organ preservation outcomes either as part of a trial (‘intended’ organ preservation) or outside a trial (‘incidental’ organ preservation) in patients with a cCR after standard neoadjuvant treatment, as is now permitted by several guidelines, including those provided by ESMO17, the NCCN20 and ASTRO21. Here, to our knowledge, we provide the first expert Consensus Statement on key outcome measures for organ preservation in patients with rectal cancer, with a particular focus on NOM. We have convened an international group of clinical trialists with extensive experience in rectal cancer, including organ preservation strategies, and used the Delphi process to collect opinions, with the aim of providing a standardized approach to outcome measurement and reporting in this setting.

Methods

Search strategy and selection criteria

References were retrieved from several electronic databases (PubMed/MEDLINE, Web of Science and the Cochrane Library and Google Scholar), which were searched for published articles and abstracts from international meetings containing data from retrospective, prospective and randomized clinical trials investigating any organ preservation approaches for patients with rectal cancer, published from inception to 1 April 2020 (Supplementary information). Two investigators (authors E.F. and C.R.) extracted data on the key outcome measures of organ preservation from all selected studies to be included in the Delphi process, reviewed the list of retrieved articles and selected potentially relevant articles (Supplementary Fig. 1).

Establishing a consensus

The guideline panel comprised a multidisciplinary and interprofessional team, including clinical oncologists, radiation oncologists, medical oncologists, surgical oncologists, a pathologist, radiologists with expertise in rectal cancer and a bioinformatician. A Delphi method was used to achieve consensus recommendations based on votes from all panelists, recorded using the SurveyMonkey program (https://www.surveymonkey.com), with additional information shared via e-mail. A threshold of ≥70% agreement was deemed to be required to reach consensus on each item. More information is provided on the formation of a consensus panel and the Delphi method in the Supplementary information.

Results

Literature search and review

A total of 3,090 publications were retrieved from the literature search. 667 abstracts were selected for full-text assessment, after removal of duplicates and screening of the titles and abstracts (Supplementary Fig. 1). After full-text article review and exclusion of manuscripts that were either unrelated to the present topic and/or not written in English, 396 manuscripts were considered relevant to the scope of the present study. We identified the following seven outcome measures as key to an organ preservation strategy: definition of end points (methodology and criteria to define response, unequivocal nomenclature); choice of primary end point according to the trial phase and design; time point of tumour response assessment (RA) and determination of a cCR; response-based decision algorithms and the use of biopsy sampling; follow-up methods (schedules and timelines); organ preservation-specific anorectal function tests; and QoL assessment and PROs. The seven outcome measures were then developed into 32 clinical questions to include in the Delphi survey (Supplementary Table 1).

Consensus procedures and Delphi rounds

The questionnaires used in the first and second Delphi rounds on the seven key outcome measures of organ preservation, together with the corresponding answers, are provided (Supplementary Tables 1 and 2). In a third Delphi round, the final consensus manuscript recommendations on key outcome measures were prepared and agreed on by all members (100%) of the panel. The flow diagram of the study procedures used to establish an international consensus, including rounds 1–3, is provided (Supplementary Fig. 2). The results of the consensus procedure and individual Delphi rounds are described in detail in the Supplementary information.

Recommendations

Criteria and definitions of end points

As part of the Delphi process, the panel reached a consensus on the definitions of organ preservation, locoregional regrowth after NOM and locoregional recurrence after LE or total mesorectal excision (TME), respectively (Box 1). Definitions of an incomplete/poor response, local regrowth and local recurrence are provided separately for clarity. The various criteria used to define a cCR in the literature are described in Supplementary Table 4. The panel concluded that the ‘Amsterdam/Maastricht’ criteria4 are best suited to define cCR and near cCR (ncCR). The panel also agreed with the definition of organ preservation-adapted disease-free survival (DFS) originally proposed in 2020 (ref.19). The definition of TME-free DFS used in the OPRA trial was introduced for the first time at the ASCO annual meeting 2020 (refs22,23), which explains why consensus was not reached for this end point. The definition of TME-free DFS was provided separately by the primary investigator of the OPRA trial (author J.G.-A.).

Choice of primary end point

Comparisons of the RA time points used to determine cCR in randomized studies of organ preservation strategies indicate substantial variability, both in terms of the time point and the primary end point selected (Table 1). The panel recommended that different primary end points should be used according to the trial design and phase, taking into consideration the initial tumour stage, use of standard or intensified experimental treatment regimens, intended or incidental organ preservation, NOM or LE strategies, and overall aim of treatment. Consensus was reached on several primary end points.

Table 1 Variations in outcome measures across different randomized trials of organ preservation approaches in rectal cancer

Recommendations

  • Early assessments of tumour response (such as the cCR rate) should be used as primary end points for early phase I/II trials designed to identify strategies that increase cCR rates and enable NOM or LE using more intensive radiotherapy, CRT or total neoadjuvant treatment (TNT) regimens to select tolerable and locally effective treatment regimens for further testing in larger cohorts, such as the Danish trial7 or the recently completed CAO/ARO/AIO-16 trial (NCT03561142). Notwithstanding, both the risks and the benefits of treatment intensification should be considered carefully in these contexts.

  • Organ preservation assessed at 30–36 months after commencing treatment should be the primary intermediate end point for randomized phase II/III trials using either NOM or LE (for patients with a cCR or ncCR), such as WW3 (NCT04095299), STAR-TREC (NCT02945566) or ACO/ARO/AIO-18.1 (NCT04246684). Rectal function, toxicities and QoL should be regarded as pivotal secondary outcomes, to be considered for inclusion as composite or co-primary end points, as in the GRECCAR2 trial9,12.

  • Organ preservation-adapted DFS at 3 years19 should be used as a primary end point if organ preservation is permitted within (but is not the primary purpose of) a phase III trial, especially in trials enrolling patients with locally advanced tumours.

Time points to determine cCR

Evidence on optimal timing of RA to determine a cCR is still emerging and can be influenced by many variables (such as initial tumour stage, biology, treatment duration and intensity, time since treatment completion, and the methodology used to assess response); however, the panel indicated the importance of providing clear consensus-based recommendations for future trials and routine clinical practice. Representative examples of specific trial designs illustrate the complexity of identifying the optimal timing for accurate RA owing to the highly variable designs and treatment durations of the various clinical trials conducted in this area (Fig. 1; Table 1).

Fig. 1: Representative examples of RA time points to determine cCR and primary end points in organ preservation trials involving patients with rectal cancer.
figure 1

The different preoperative or definitive treatment options are characterized by variable durations and time to response assessment (RA), and therefore time to making a decision on organ preservation (OP) strategies versus total mesorectal excision (TME) surgery, as illustrated below the x axis. Examples of corresponding clinical trials, including details of the tumour/nodal/metastasis (TNM) stages of the enrolled patients and the treatment arms are shown on the left side in dark blue boxes (also summarized in Table 1, which, as in this figure, only includes randomized studies). The time point of RA and, hence, the determination of clinical complete response used in the different trials is indicated by orange boxes. The primary end point of the trials is shown on the right side in light blue boxes. The advent of total neoadjuvant therapy, often with highly variable treatment duration, has added to the complexity of selecting the optimal RA time point. AV, anal verge; CRT, chemoradiotherapy; cTNM, clinical TNM staging; DFS, disease-free survival; DRE, digital rectal examination; LE, local excision; NOM, nonoperative management; SCRT, short-course radiotherapy; SIB, simultaneous integrated boost of radiotherapy.

Recommendations

Time points for RA and determining cCR should be selected according to trial design and treatment strategies (Box 2).

Response-based decisions and biopsy use

A question often raised is whether clinicians should wait longer before deciding on surgery if restaging after preoperative treatment reveals a ncCR. The optimal timing for evaluation of a cCR greatly depends on the context of treatment design. No consensus was reached on the timing of the second assessment, although the panel supported waiting longer in this setting. Notably, the decision on whether to proceed to surgery or wait longer should also take into account initial tumour stage, treatment approach and the RA time points, as described above.

Another important point concerns the role of biopsy sampling in patients with a ncCR or cCR. In both scenarios, consensus agreement was reached that biopsy sampling does not provide additional value and could lead to false-negative results. Long-term follow-up data from a prospective study assessing the watch-and-wait strategy after CRT in this setting indicate that biopsies have only limited clinical value for ruling out residual cancer5. A further analysis of data from this study clearly indicates that biopsy samples provide no added diagnostic value, especially when the criteria for a cCR are fulfilled5,24. In contrast to the original study, in which biopsy sampling was indicated in case of ncCR5, the panel did not recommend a biopsy as mandatory for ncCR, for the abovementioned reasons.

Recommendations

  • The panel does not recommend biopsy sampling as mandatory for those with either a cCR or ncCR as it does not provide any additional diagnostic value and could lead to false-negative results.

  • Where a biopsy sample is nevertheless obtained from a patient with an ncCR and is negative on analysis, the panel recommends that extended waiting and reassessment after 6–12 weeks should be considered, again depending on the treatment approach.

Follow-up procedures and schedule

The panel reached a consensus that serum carcinoembryonic antigen tests, digital rectal examination (DRE), rectoscopy, pelvic MRI, and chest and/or abdominal CT should all be part of the follow-up of patients treated using an organ preservation approach (Table 2). The majority indicated that serum carcinoembryonic antigen levels should be assessed every 3 months during the first 3 years after completion of treatment, and then every 6 months during years 4–5 after treatment. Consensus was also established that DRE, endoscopy and MRI should be conducted every 3–4 months during the first 2 years after completion of treatment, and then every 6 months during years 3–5 after treatment. Finally, the preferred time schedule to perform CT of the chest and/or abdomen is every 6–12 months during the first year after completion of treatment, and annually during years 2–5 after treatment (Table 2).

Table 2 Consensus follow-up methods and intervals for organ preservation strategies

Anorectal function measurement

The panel was asked to select the optimal method of measuring anorectal function from the various frequently used tests, combining a mix of clinician-reported and patient-reported instruments. These included the Wexner score25, the Low Anterior Resection Syndrome (LARS) score26, the MSKCC Bowel Function Instrument (MSKCC BFI) score27, the Vaizey score28 and manometry (Supplementary Table 5).

Recommendations

  • Patient-reported LARS score is recommended as the best-available method of measuring anorectal function.

  • A new organ preservation-specific score should be developed that includes the ability to measure other functional aspects, such as urinary and sexual dysfunction in addition to bowel dysfunction.

QoL assessment and PROs

The panel reached a consensus that the European Organization for the Research and Treatment of Cancer Core Quality of Life Questionnaire (EORTC QLQ-C30) is the standard method of QoL assessment and should always be used. The panel was asked to vote on five proposed QoL and function scales. These included overall QoL, physical function, role function, social function and emotional function. Consensus was achieved on the role of all five proposed scales.

The panel also agreed on the ten most important symptomatic toxicity items from a list of 20 proposed items for evaluation as part of a patient-reported assessment. These included bowel urgency, faecal incontinence, bowel frequency, diarrhoea, tenesmus, toilet dependency, night-time bowel opening, urinary urgency, impotence and pain. Among the panel, 42% voted for the use of the EORTC QLQ-CR29 in addition to QLQ-C30. EORTC QLQ-CR29 covers many aspects of bowel, urinary, stoma and sexual function, although it does not include all bowel symptoms that can occur following NOM or LE, and in particular fails to collect information on bowel urgency and toilet dependency. These bowel issues are included in the LARS score, although this score lacks items relating to urinary and sexual dysfunction as well as stoma-related items for patients for whom organ preservation was not possible. Thus, all panel participants indicated a need to develop a new, validated NOM and LE-specific PRO measure (or extension) (Supplementary Table 1).

Finally, the panel was provided with a list of different time points and asked to vote on the optimal timings for measurements of symptomatic toxicities, QoL and anorectal function. The panel recommended that toxicities should be measured at baseline, 3 months, 12 months, 24 months, 36 months and 60 months after a decision on whether to undergo NOM or LE. A similar consensus was reached on using the same time points for measurements of QoL and anorectal function.

Recommendations

  • Overall QoL, physical function, role function, social function and emotional function should be used to document adverse events and how they affect patients.

  • Ten symptomatic toxicity items (bowel urgency, faecal incontinence, bowel frequency, diarrhoea, tenesmus, toilet dependency, night-time bowel opening, urinary urgency, impotence and pain) were selected as the highest priorities for evaluation, with a specific time schedule for measurement.

  • A new, validated PRO scale should be developed specifically for patients undergoing treatment with organ preservation approaches.

Discussion and future perspectives

Here, we provide the first international consensus recommendations on key outcome measures for organ preservation strategies in patients with rectal cancer. Undoubtedly, these strategies are still in a transitional phase, and we are only at the beginning of a new era in which evidence regarding many aspects of organ preservation is far from complete1. The incompleteness of such data is reflected by the lack of consistency in outcome measurements and reporting in clinical trials and retrospective or population-based series, which underlines the importance of these consensus recommendations. Ambiguous clinical outcomes have often also been reported, which reflects the heterogeneity of inpatient inclusion criteria for specific interventions, including various radiotherapy and/or chemotherapy regimens. We recommend that investigators use these consensus recommendations as a framework when designing studies involving organ preservation approaches for patients with rectal cancer.

The use of ambiguous language in definitions of clinical end points, such as cCR, tumour regrowth, disease recurrence, organ preservation and DFS with or without considering tumour regrowth has often led to confusion. The use of the term ‘local regrowth’, instead of local recurrence, to describe tumour regrowth that occurs after an initial cCR was agreed at the Champalimaud (Lisbon, Portugal) meeting in 2014, owing to differences in time course, salvageability and the more favourable prognosis associated with local regrowth over local recurrence29. Nevertheless, the distinction between local (or locoregional) regrowth and local (or locoregional) recurrence has often been far from clear, and rigorous definitions are often not provided. Here, consensus was reached on several exact descriptions of end points, which will hopefully avoid such disparities and enable future cross-trial comparisons. Consensus was also reached on the improved definition of DFS (organ preservation-adapted DFS)19 proposed in 2020, which incorporates both NOM and LE. TME-free DFS was only recently introduced as an end point and was first reported in a presentation of data from the OPRA trial at the ASCO annual meeting 2020 (refs22,23), although the definition of this term is provided for future reference.

The choice of the most appropriate outcome measure is a crucial component of trials involving organ preservation approaches30. The selection of primary end points in prospective studies has often been rather arbitrary. Owing to differences in both the treatment strategies selected and their durations, the panel acknowledged that ‘one size does not fit all’ for organ preservation strategies, and recommended the use of specific end points according to the clinical scenario. Similar to the pathological complete response (pCR) end point used in trials involving radical surgery after neoadjuvant treatment31, cCR was suggested as an end point for small-cohort phase I/II trials testing intensified treatment regimens with the aim of identifying tolerable and locally effective regimens for further testing in larger cohorts (such as an observational study conducted by Appelt et al.7, in which CRT was combined with radiotherapy dose escalation with brachytherapy). Of note, sustained cCR at 12 months comprises part of the end point of organ preservation and was thus not recommended as a separate end point in this Consensus Statement. Organ preservation at 30–36 months after the start of treatment was agreed on as the primary end point for phase II/III trials involving the use of NOM and/or LE to achieve organ preservation, and this end point is being used in the ongoing STAR-TREC (NCT02945566), OPERA (NCT02505750) and ACO/ARO/AIO-18.1 (NCT04246684) trials. The time point for defining organ preservation varies among studies (Table 2), although we recommend a 30–36-month time window after the start of treatment, reflecting the prolonged treatment time of TNT and that tumour regrowth mostly occurs within 24–30 months after completion of treatment8,32. Organ preservation-adapted DFS is recommended for use in phase III trials that allow organ preservation but specifically aim to improve oncological outcomes, and especially to reduce the risk of distant metastases (such as the TRIGGER trial33).

No perfect primary end points exist for organ preservation approaches and all end points are susceptible to certain pitfalls34. Furthermore, the choice of primary end point serves the statistical purpose of trial design, whereas secondary end points, especially QoL, PROs and anorectal function (one of the main arguments for deferring surgery), should be regarded as equally important13,35,36,37. Shared decision-making with patients and risk:benefit analyses (such as those exploring the balance between NOM or LE and treatment toxicity) should be considered for trials involving ‘intended’ organ preservation. The fact that bad responders might be overtreated, in that they might receive intensified CRT followed by TME following a lack of response, should also not be underestimated, as shown in the GRECCAR2 trial, in which many patients in the LE group required completion TME, resulting in increased morbidities and adverse events9,12. In this context, future studies should aim to clarify which inclusion criteria should be used to advocate for LE, the optimal timing of LE depending on tumour response (cCR versus ncCR versus residual disease), and how this relates to pretreatment disease staging38,39,40.

The optimal time point for determining achievement of a cCR constitutes one of the biggest challenges to testing organ preservation approaches, given that tumour response to treatment is a dynamic phenomenon affected by tumour size, histology, biology, treatment strategy and the time interval between preoperative and/or definitive treatment and the decision to proceed to NOM or LE (or TME)19. This complexity is reflected in the variable time points for RA to determine cCR across different studies owing to variations in treatment schedule and design (Fig. 1). Knowledge of the kinetics of tumour response has mainly been derived from the operative setting. In a pooled analysis of data from 4,431 patients, pCR rates increased at intervals >6–7 weeks post-CRT, whereas a Dutch Surgical Colorectal Audit analysis comprising 1,593 patients revealed a peak in the percentage of patients with a pCR at 10 weeks post-CRT — 16 weeks after commencing treatment41. The advent of TNT, with highly variable treatment durations across different trials, has added to the complexity of this issue. For example, in a phase II trial, patients received either two, four or six cycles of folinic acid, 5-fluorouracil and oxaliplatin (FOLFOX) chemotherapy after standard CRT, and underwent surgery at 6, 11, 15 and 19 weeks after completion of CRT; pCR rates were 18%, 25%, 30% and 38%, respectively42. Whether these differences can be explained by the use of intensified chemotherapy or by the prolonged interval before surgery remains uncertain. The CAO/ARO/AIO-12 trial compared two TNT sequences: induction chemotherapy plus CRT versus CRT plus consolidation chemotherapy, demonstrating a pCR in 17% and 25% of patients, respectively43. Similar data favouring CRT plus consolidation chemotherapy were reported in the OPRA trial, which showed 3-year TME-free survival of 59% versus 43% for induction chemotherapy plus CRT22.

The panel agreed that defining one specific time point for assessing cCR is impossible, considering the range of different treatment strategies used; initial tumour stage and risk features should be considered. In a meta-analysis comprising data from 602 patients from 11 series, advanced cT stage (cT3–4 versus cT1–2) predicted a worse response and local regrowth32. Thus, for patients with early-stage tumours receiving CRT or short-course radiotherapy (SCRT), we recommend the two-step approach adopted in the STAR-TREC trial, which involves RAs at 12 weeks and 16–20 weeks after starting treatment, analogous to the approach used for patients with anal cancer44. Following publication of data from the phase III RAPIDO45 and PRODIGE46 trials demonstrating improvement in the primary end points of disease-related treatment failure and DFS, respectively, TNT is expected to be integrated into the management of patients with locally advanced rectal cancer in the next updates of treatment guidelines in this area. The panel recommends that the timing of cCR assessments should be adapted according to the duration of TNT, that is, 20–38 weeks after commencing treatment, as is the current approach in various trials, including the OPERA (NCT02505750), ACO/ARO/AIO-18.1 (NCT04246684), GRECCAR12 (NCT02514278), OPRA22 and TRIGGER33 trials (Fig. 1). The optimal length of time between commencing treatment and determining cCR, in terms of both oncological safety and clinical effectiveness of treatment, remains unclear, and is particularly relevant in patients receiving prolonged TNT. In the RAPIDO trial45, the investigators suggested that early response imaging could be advocated to identify patients with disease progression during preoperative treatment47. Indeed, close monitoring is important to identify poor responders early enough to offer immediate surgery. The panel provided these practical recommendations but acknowledged that evidence for the optimal timing of cCR monitoring is far from complete.

The Amsterdam/Maastricht criteria were selected as the recommended method of defining cCR and ncCR4. The diagnosis of ncCR poses a challenge to clinical decision-making owing to the nonbinary nature of this end point and the role of the disease trajectory, which can make imaging-based assessments difficult. The panel recommends that longer intervals after commencing treatment should be considered, as performed in several studies, in which RA was repeated 3 months later3,5; although, for assessments of ncCR, this decision should also take into account treatment duration. Importantly, on the basis of data from previous studies5,24, biopsy sampling was not recommended by the panel, and should not be routinely performed owing to the risk of false-negative findings (for example, owing to sampling from a fibrotic area) and a lack of evidence of value, especially when DRE, endoscopy and MRI criteria for cCR are all fulfilled1,48. Indeed, residual cancer cells are often found in the muscularis propia, which could explain the high risk of false-negative results with biopsy sampling, as samples are often obtained from more superficial areas49. In contrast to an original study, in which biopsy sampling was indicated in patients with a ncCR5, the panel does not recommend mandatory biopsy sampling to define ncCR. The definition of ncCR requires consideration of both lymph node regression and the presence of morphological features associated with node positivity (such as a round, irregular border and heterogeneous signal) combined with a diameter of ≥5 mm (refs50,51,52,53). LE can be used in patients with ncCR, both for diagnostic and therapeutic purposes13,54, although this approach is also associated with increased morbidity if completion TME is required9,12. The criteria for completion TME after initial LE need to be further elucidated.

For patients with early-stage rectal cancers with an adenomatous component, the accuracy of diagnosing a residual adenomatous polyp after CRT poses a major challenge to organ preservation approaches. Previous data indicate that these tumours might be suitable for primary treatment with CRT and organ preservation; however, residual adenomatous polyps often include high-grade dysplastic components and should therefore be removed using full-thickness LE55,56.

Diagnostic imaging can be a notoriously inaccurate method of determining the extent of locally advanced disease at initial diagnosis and further research efforts are needed in this area. Nonetheless, staging is highly relevant in the context of organ preservation as previous studies have indicated that increasing cT stage, tumour volume or, alternatively, tumour length and bowel wall circumferential extent at baseline are the most important predictors of a cCR11,57,58,59. Furthermore, inaccurate staging of cT1 tumours as cT2 rectal cancers (upstaging) can lead to unnecessary treatment with CRT within clinical trials. LE alone without CRT is considered sufficient and can reduce the risk of morbidities without jeopardizing long-term oncological outcomes60,61,62,63 for patients with pT1 tumours and no adverse-risk features15,16. However, completion TME is recommended for patients with adverse histopathological features (location in the middle or lower third of the submucosa (SM ≥ 2), grade 3 disease, venous invasion and lymphatic invasion) detected in the resected LE specimen. Alternatively, for patients with pT1 and adverse histopathological features, LE plus adjuvant CRT has been explored, although further studies are required to clarify the role of CRT in this setting17,64.

Retrospective and prospective studies have explored various different methods and follow-up schedules, most of which were designed empirically and extrapolated from guidelines on operative management2,3,4,6,7,10,54,65. This heterogeneity was reflected in the large discrepancy of panel participant votes on the most appropriate follow-up schedule after Delphi round 1. The panel recommended that follow-up should comprise serum carcinoembryonic antigen testing, DRE, rectoscopy, pelvic MRI and chest and abdominal CT, and agreed a specific follow-up schedule in order to avoid inconsistencies (Table 2). Local regrowth after initial cCR typically occurs within the first 2–3 years of treatment; therefore, a period of 3 years of monitoring using all available methods was strongly recommended in order to capture as many events as possible. Precautionary further monitoring in the fourth and fifth years was also recommended.

Regarding individual methods for organ preservation, a meta-analysis of data from 602 patients32 indicates that serum carcinoembryonic antigen level is not predictive of local regrowth after an initial cCR; however, serum carcinoembryonic antigen values were missing in 45% of patients, which should be considered when interpreting these findings. Thus, the predictive value of serum carcinoembryonic antigen remains unclear and more prospective studies are required to clarify any possible role. MRI and endoscopy have been demonstrated to have complementary roles in determining cCR and predicting local regrowth, although failures have also been reported66,67,68,69. The role of chest and abdominal CT requires further exploration. We recommend CT every 6–12 months within 1 year of treatment, and annual CT during years 2–5, partly because the watch-and-wait strategy is not routinely established and long-term safety data from randomized studies are currently unavailable. In the International Watch and Wait database registry analysis, distant metastases were diagnosed in only 8% of 880 patients with a cCR following CRT, mostly during the first 3 years after treatment8. In a systematic review of data from 17 (mostly retrospective) studies including a total of 1,387 patients who received NOM, the maximum risk of distant metastases was 5.5% in patients with a sustained cCR but 23.1% in those with regrowth after an initial cCR, a scenario requiring a high level of caution70; similar data were reported from a retrospective comparison of these two approaches10. Furthermore, the 5-year incidence of metastases was 28% in poor responders (ypT2–3) after CRT in the GRECCAR2 trial12 and, thus, special caution is also required in this patient subgroup if LE is explored. Of note, in the updated International Watch and Wait database report published in December 2020 (after completion of the second round of the Delphi process for this Consensus Statement), the probability of remaining free from local regrowth for an additional 2 years if a patient had a sustained cCR for 1 year or 3 years was 88.1% and 97.3%, respectively, after a median follow-up of 55.2 months71. These data indicate that the intensity of the watch-and-wait strategy can safely be reduced in patients with a sustained cCR for the first 3 years after treatment.

One of the main arguments for exploring the efficacy of NOM is the potential for preservation of both sphincter and anorectal function. Previous research demonstrated inferior anorectal function with major LARS score after CRT plus surgery (in up to 67% of patients) compared with CRT alone (in up to 36% of patients); however, comparisons between different studies are complicated by the seemingly arbitrary use of different anorectal function scores35,36,37,72. Despite the lack of evidence from randomized cohorts comparing TME with NOM or LE, the panel recommended that the LARS score26 is the most practical PRO measure for routine use. The panel also acknowledged the limitations of the LARS score (including a lack of specific validation for organ preservation approaches and reporting being limited to symptoms related to bowel dysfunction) and recommended that a new PRO designed and validated specifically for patients with rectal cancer undergoing treatment with organ preservation approaches should instead be developed.

Improvements in QoL constituted one of the main arguments for avoiding surgery, although randomized evidence of the superiority of SCRT and/or CRT for organ preservation is lacking, apart from the TREC study, which demonstrated high levels of organ preservation in 19 of 27 randomized patients (70%), with improvements in QoL after SCRT compared with surgery that were sustained at 36 months of follow-up monitoring14. Other data are mostly derived from series that used a wide variety of different questionnaires to assess QoL and PROs, none of which are validated for use in an organ preservation setting35,36,37,72. Therefore, the panel agreed on several recommendations for future studies: (1) five QoL and function scales should always be used to document adverse events and how they affect patients; (2) ten symptomatic toxicity items were selected as the highest priorities for evaluation; (3) a specific time schedule for measurement; and (4) a new validated questionnaire, or short extension to an existing instrument (such as EORTC QLQ-CR29 or the LARS score) should be developed specifically for patients with rectal cancer undergoing organ preservation approaches designed to capture both symptomatic toxicities (bowel, urinary and sexual dysfunction) as well as the effects of more intensive active surveillance protocols on QoL, for use both within trials and in clinical practice. Importantly, the aspects of QoL and PROs discussed here are the first international consensus and provide an important foundation for attempts to harmonize outcome measures and data documentation.

Our study has several limitations. First, the panel of trialists was selected non-systematically; selection was based on their international profile in the field, which could lead to bias. Second, the consensus recommendation process was based on online surveys. Holding face-to-face meetings to discuss discrepancies that arose during the process was not possible, and such issues were further clarified through e-mail correspondence. Third, although the threshold of 70% required to reach a consensus has been used previously in several other statements73,74,75, this remains an arbitrary threshold that constitutes a methodological limitation of Delphi surveys76. Prospective evidence on the safety and effectiveness of organ preservation is continuously emerging, and this will probably mean that certain outcome measures will need to be adapted in the future. Therefore, the present consensus should serve as a guide to enable further augmentation rather than to fully replace clinical judgment. Several key questions and uncertainties regarding organ preservation approaches for patients with rectal cancer remain to be addressed (Box 3). Fourth, only physicians and researchers participated in the surveys, whereas other stakeholders (such as industry sponsors and patient representatives) were not involved. This limited inclusiveness was considered to be essential given that organ preservation constitutes a new area of clinical investigation and that consensus on several highly complex key outcome measures was needed as a first step. This project will, in the near future, be extended to a wider group comprising multiple stakeholders including patients and/or patient representatives in order to achieve greater consensus, which will also include the development of a new EORTC organ preservation-specific QoL module. Indeed, patients often have differing perceptions of what they consider most relevant in discussions about their treatment, and differences have been described between the importance assigned by patients and clinicians to specific clinical and functional outcomes in the context of organ preservation35,77,78.

Conclusions

In summary, to the best of our knowledge, this is the first international expert Consensus Statement to provide comprehensive and rigorous recommendations on the key outcome measures to be assessed and reported both in trials and in routine clinical practice for patients with rectal cancer who are eligible for organ preservation. Implementation of this consensus has important implications as it will promote the harmonized recording and reporting of data from organ preservation strategies in patients with rectal cancer, thus improving the interpretation and comparison of new trial findings in addition to the standardization of routine clinical practice.