Integrating digital pathology and mathematical modelling to predict spatial biomarker dynamics in cancer immunotherapy

Hutchinson, L. G.; Grimm, O.

doi:10.1038/s41746-022-00636-3

Download PDF

Article
Open access
Published: 12 July 2022

Integrating digital pathology and mathematical modelling to predict spatial biomarker dynamics in cancer immunotherapy

npj Digital Medicine volume 5, Article number: 92 (2022) Cite this article

5886 Accesses
6 Citations
13 Altmetric
Metrics details

Subjects

Abstract

In oncology clinical trials, on-treatment biopsy samples are taken to confirm the mode of action of new molecules, among other reasons. Yet, the time point of sample collection is typically scheduled according to 'Expert Best Guess'. We have developed an approach integrating digital pathology and mathematical modelling to provide clinical teams with quantitative information to support this decision. Using digitised biopsies from an ongoing clinical trial as the input to an agent-based mathematical model, we have quantitatively optimised and validated the model demonstrating that it accurately recapitulates observed biopsy samples. Furthermore, the validated model can be used to predict the dynamics of simulated biopsies, with applications from protocol design for phase 1–2 studies to the conception of combination therapies, to personalised healthcare.

Computational approaches to modelling and optimizing cancer treatment

Article 19 July 2023

Characterisation of tumour microenvironment remodelling following oncogene inhibition in preclinical studies with imaging mass cytometry

Article Open access 08 October 2021

Evaluating and comparing immunostaining and computational methods for spatial profiling of drug response in patient-derived explants

Article 14 December 2020

Introduction

In 2018, 9.6 million people died of cancer despite huge and long-standing efforts to develop a cure¹. One challenge has been the lack of predictive quantitative methodology that improves clinical trial design, including the scheduling of on-treatment biopsies, identifying the best combination treatments, and patient selection. In particular, during early phase oncology clinical trials one aim is to verify the mode of action of the drug by investigating treatment effects in the tumour tissue by histological methods. To ensure meaningful conclusions regarding the mode of action, the time point at which samples (i.e. tumour biopsies) are taken is critical. Typically a single core-needle biopsy is taken at baseline, and another during the treatment period. Given the ordeal for the patient, the number of biopsy samples taken must be minimised, and the time point needs to be chosen wisely. However, currently, this time point is chosen by clinical experts based on ‘best-guess’ rather than being informed by quantitative scientific approaches. Such approaches can also be used to identify appropriate combination treatment partners as well as to select patients to optimise the likelihood of successful therapy both during clinical trials and routine medical procedures. The impact on and cost for patients, drug developers and healthcare systems are obvious.

Here we describe a quantitative method that integrates digital pathology and mathematical disease modelling to address the issues mentioned above. We have developed a simulation model that is capable of predicting a full-time course of spatial biomarkers, which can be used to identify the most informative time point at which to take an on-treatment biopsy sample in the context of oncology clinical trials.

Spatial biomarkers derived from immunohistochemistry (IHC) images have been shown to contain information that is predictive of prognosis, tumour recurrence or response to treatment via statistical analysis^2,3,4, and more recently, machine-learning methods^5,6. Here, we use a mechanistic modelling approach to utilise the full spatiotemporal resolution of paired biopsy samples to gain insights based on biological hypotheses and to perform predictive simulations.

The underlying algorithm is an agent-based model implemented in Matlab, based on that described by Kather et al.⁷. The model comprises two populations of agents; tumour cells and immune cells laid out on a grid, which behaves according to prescribed rules and ultimately captures emergent spatial dynamics. There are several examples of agent-based models representing tumour-immune interactions which are capable of reproducing realistic spatial features and were recently reviewed⁸. However, the validation of these models has been mostly qualitative, i.e. visual assessment or comparison of simulation results to images of tissue sections at a single time point^7,9,10. In one example, spatial distributions of cells in simulated samples are quantitatively compared using the radial distribution function¹¹. In this work, we use paired clinical samples that were taken at two timepoints for each patient: pretreatment (baseline) and on-treatment, allowing us to validate the dynamic components of the model using the spatial and temporal resolution.

We used data from phase 1 clinical trials for Simlukafusp, an immunomodulatory molecule which localises in the tumour due to its FAP binding, and stimulates the IL2beta/gamma receptors expressed on immune cells. We note that inter-patient variability is significant in such a dataset and that we do not have information on previous treatments or the evolutionary status of individual tumours. However, we assume from¹² that even the small samples from core-needle biopsies are representative of intra-tumour heterogeneity for a given patient. Due to the lack of a widely accepted comparative measure to compare the spatial distributions of cells in sets of clinical images, we derived a spatial agreement measure (SAM), which we used to optimise and test the model.

To this end, we have simulated the on-treatment immune cell distribution using the baseline biopsies as input and compared the predictions to the observed immune cell distribution from the on-treatment samples, achieving a mean accuracy of 77%. It is striking that a single baseline feature, the spatial distribution of immune cells, is predictive of the on-treatment immune cell distribution with such high accuracy.

Finally, we provide two examples for an application of this method - clinical protocol design (choice of biopsy scheduling) and assessment of potential combination therapy partners.

Box

Integration of digital pathology and mathematical modelling.

The input data to the model is a spatial map of CD8+T-cells (CD8: a cluster of differentiation 8) which is derived from IHC slides from patient samples. Specifically, using a machine-learning algorithm trained by a human expert to detect cytotoxic CD8+T-cells, IHC stained tissue sections are analysed. This algorithm provides the type (i.e. proliferating or non-proliferating CD8+ cell) and spatial coordinates of each cell in the tumour tissue. This spatial map of CD8 cells is mapped onto a grid and provided to the mathematical model as an input. In mathematical model realisations, each cell on the grid can proliferate, migrate, die or interact with other cells according to probabilistic rules initiated at each time step. The likelihood for a cell to perform an action at any given time step is calculated based on probability distributions and its local neighbourhood. For example, in order for an immune cell to kill a tumour cell, it must be in physical proximity. Thus, spatially resolved input data are key.

Results

In order to train the model to reproduce biopsy images that are in spatial agreement with observed on-treatment biopsies, we performed a local sensitivity analysis and parameter optimisation. For quantitative comparison between the observed biopsy samples and our model simulations at the corresponding time point, we computed spatial summary statistics and designed an agreement score called the spatial agreement measure (SAM). The model optimisation step revealed the parameter values that performed best according to the SAM. The performance of these parameter sets was validated using a holdout set of patient samples which were not used to train the model. Furthermore, we used the validated model to show that samples taken at around 30 days would provide information on maximum CD8 infiltration at a population level. We also predicted the time course of tissue biomarkers based on two different theoretical immunotherapy combination partners by incorporating their modes of action into the model parameters. Finally, we demonstrate how spatial features vary widely based on initial conditions, i.e. the baseline biopsy. Therefore we propose that the model can also be used in a personalised fashion to predict the optimal time point for biopsy scheduling for individual patients.

Preprocessing of the images for model input, optimisation and validation

From the two clinical studies, there were a total of 71 patients from whom paired biopsies were available at the time of data processing. Of these, 44 paired samples included enough tissue in both the pre- and on-treatment biopsy samples to use as an input for the model. These 44 patients were split at random into a training set (n = 37) and a test set (n = 7). Each pre- and on-treatment biopsy sample was processed using our purpose-built algorithm, and tiles were selected to maximise fields of view and minimise the inclusion of edges and overlaps, as illustrated in Fig. 1. A summary of the training and testing datasets is presented in Tables 1, 2.

**Fig. 1: Integrating digital pathology and mathematical modelling.**

Table 1 Patient and sample information for patients in the training set and test set.

Full size table

Table 2 Indications and biopsy locations are represented in the dataset.

Full size table

Using the baseline tiles as the initial condition for model realisations, the model simulates the behaviour of every cell on the grid for a specified period of time-based on rules of proliferation, migration, killing and death. Simulated biopsies are generated and saved at 24 h intervals, creating a rich time course. A schematic that indicates how the data were preprocessed and used as an input to the model simulations and for model optimisation and validation is shown in Fig. 1.

Derivation of a spatial agreement measure (SAM) based on the radial distribution function

The spatial distribution of T-cells is critical for tumour-immune interactions both in biology and in the computational model. Given the stochastic nature of biological events, and indeed the stochastic nature of the computational realisations, an image-to-image comparison must be performed on derived spatial statistics, rather than comparing images directly. Therefore, a comparison of simulated biopsies to real on-treatment biopsies requires a measure of spatial agreement. Since each observed or simulated patient sample is represented by a set of tiles, we wish to compare the spatial distributions of two sets of images to one another. To this end, we developed a method based on the radial distribution function (RDF) as shown in Fig. 2a. The spatial agreement measure (SAM) is a spatial statistic that describes the agreement between two sets of biopsy images, for example, simulated and observed biopsies, by quantifying the overlap of the respective RDF ranges.

**Fig. 2: Development of a spatial agreement measure (SAM) for model optimisation and validation.**

In order to capture the variability in the spatial distributions, we provide extra weight to the observed cell density within a short radius of any given CD8 cell, via the VarSAM. The VarSAM is designed to identify artificially high SAM values that result from RDFs that have a narrow distribution. The SAM and VarSAM are summarised pictorially in Fig. 2b.

Sensitivity analysis: an exploration of model sensitivity

As shown in Fig. 3, the parameters that displayed the highest sensitivity were those related to the immune cell dynamics, namely the probabilities of CD8 cell proliferation, death, and killing as well as the randomness of CD8 cell migration and the influx rate of immune cells (see Table 4 for the definitions of model parameters). In contrast, several model parameters did not appear to have a strong effect on the numbers of CD8 cells and tumour cells, for example, the proliferation, migration and death rates of tumour cells. Moreover, we note that the number of CD8 cells at baseline influences the sensitivity of the model parameters. It is worth pointing out that our sensitivity analysis is not exhaustive since it is centred on one region of high dimensional parameter space, and therefore does not capture co-dependencies between model parameters. Supplementary Fig. S1 contains further visualisations of the sensitivity analysis results.

Model optimisation: a set of model parameters can reliably reproduce spatial features of biopsy images

The model optimisation (summarised in Fig. 4) identified 12 parameter sets that met the criteria for spatial agreement, and these were then used to test the model performance on the holdout dataset and are hereafter referred to as the population parameters. To capture inter-individual variability, we accepted all parameter sets that met the criteria, rather than identifying one top-performing parameter set. We note that the accepted parameters tend to feature a low value for IMpdeath, the probability of spontaneous CD8 cell apoptosis. In contrast, there does not appear to be a clear trend in the values accepted for the other parameters.

The relationships between pairs of parameters and their performance is shown in Fig. 4b. The definitions of model parameters are provided in Table 4. We note that overall there is no noticeable association between accepted parameter values. We also note that a very small proportion of parameter values met our criteria for acceptance (1.2%).

Null benchmark for predictive accuracy

In order to establish if the agent-based model does indeed add an improvement in the prediction of spatial features compared to predictions in the absence of an underlying model, we wish to determine a reference value for predictive accuracy that the model must exceed in order to be of incremental predictive value. This reference value will be calculated using the last observation carried forward method (LOCF). To this end, we computed the RDFs for the baseline and on-treatment biopsies and calculated the SAM and VarSAM for each patient in the dataset (n = 44). Using the same thresholds that were used to accept or reject parameter sets in the model optimisation, we found that for 41% of patients (18 out of 44) the distribution of CD8 cells in the on-treatment sample were well represented by the baseline sample. Therefore in order for the agent-based model to be of predictive value, the final parameter set must produce simulated biopsies that agree with the observed biopsies in more than 41% of cases.

Model validation: the trained model accurately reproduces spatial biopsy features

To test the model performance, we ran simulations on the unseen hold-out dataset comprising seven patients using the population parameter values identified in the optimisation step. The results are shown in Fig. 5. Using the SAM to deduce whether simulated biopsies were a faithful representation of the observed on-treatment samples, we found that the population parameter sets were able to reproduce spatial features of on-treatment biopsy samples with a mean accuracy of 77% (and median accuracy of 100%) compared to 35% mean accuracy with randomly selected control parameter sets and the 41% acceptance rate from the LOCF method described above (note that the LOCF was calculated using the full dataset to compensate for the lack of repeats using different parameter sets). For one of the patients in the test set, none of the simulations with the control parameters nor the population parameters were accepted by SAM, which is likely due to the small number of tiles in the on-treatment sample (4) compared to the baseline sample (53). We consider our findings robust to different sample collection times since the on-treatment timepoints for the training set varied between 15–113 days after the first sample was taken. Therefore we are confident that the model is able to reliably predict a full time course of biopsy images including spatial features. It is also noteworthy that a single baseline feature (CD8 spatial distribution) is sufficient model input to predict on-treatment biopsy samples. We note that there is room for improvement for predicting CD8 cell numbers, and this is to be expected since we optimised the model to reproduce the spatial arrangement of CD8 cells rather than CD8 numbers per tile. By performing stochastic repeats and simulations with all accepted parameter sets, we believe we have included sufficient variability to represent inter- and intra-patient variability that arises from, for example, taking biopsy samples from different locations.

Model application: the model predicts spatial CD8 dynamics for monotherapy and hypothetical combination therapies

An example of the predicted CD8 cell time course for patients in the test set is shown in Fig. 5c. For some patients, the model predicts that the number of CD8 cells levels off or begins to decrease after 30 days, whereas for other patients, the number of CD8 cells continues to increase. Depending on the purpose of biopsy collection, drug development teams may choose to propose a cohort or personalised approach for biopsy collection. Simulations generated with this model can be used to inform the optimal time point for biopsy collection in both cases.

We present two examples of hypothetical combination therapies as shown in Fig. 6 compared to the monotherapy example which was used to train the model. The first hypothetical combination partner (scenario 2) increases the proliferation probability of CD8 cells and the second (scenario 3) leads to an increase in CD8 influx into the tumour. While both hypothetical drugs lead to an initial increase in CD8 cell number compared to the initial scenario, the molecule that increases the proliferation of CD8 cells, perhaps counterintuitively, leads to a later decrease in CD8 cell number. In contrast, the molecule that increases the influx rate of CD8 cells leads to a sustained increase in CD8 cell number. These emergent phenomena are a major reason to use mathematical and computational models to explore complex biological processes.

**Fig. 6: Use of model predictions for clinical trials.**

In many cases, the spatial distribution of CD8 cells may be a more important biomarker than the total number of CD8 cells. In Fig. 6b we illustrate how simulated biopsies and spatial metrics can also be extracted from the simulations at any chosen time point in order to inform decision-making.

Discussion

In this article, we have described a predictive scientific approach integrating mathematical modelling and digital pathology. The model has been optimised and tested quantitatively using a novel spatial agreement measure and accurately (mean accuracy 77%) predicts the spatiotemporal distribution of cytotoxic CD8+T cells in tumour tissue. These results encourage us to apply this model to address some of the challenges faced in clinical trials, namely the scheduling of on-treatment biopsies. We view this model as a tool to inform the human decision-maker for biopsy scheduling, perhaps revealing some hidden features that may influence their decision. We assert that such a model may also be used to investigate combination therapy partners or novel targets for immunotherapeutic drugs by exploring parameter regimes which correspond to pharmacological interventions. In addition, we believe that the spatial agreement measure (SAM) we have derived to compare spatial biomarkers of simulated and observed data has broad applications in the fields of digital pathology and mathematical modelling.

Although the mathematical model appears to reliably reproduce the spatiotemporal data on CD8 distribution, naturally it has some limitations. These limitations fall into two classes: clinical operations and model implementation. Given the number of patients typically enroled in early phase clinical trials, the sample size is necessarily small. Furthermore, the size of the tissue sample obtained by needle biopsies is small and often fragmented, which renders some of the datasets unusable. However, for the usable samples, it has been shown that even these small tissue sections usually provide a reliable representation of intra-tumour heterogeneity¹². Relating to the model implementation, we found that the simulations were heavily influenced by the number of CD8 cells present, and our optimisation process appears to be slightly biased towards choosing parameter sets which lead to an increase in CD8 number. This arises from using a measure to compare the spatial distributions, which is dominated by noise in the case of too few CD8s. In addition, due to computation time and memory restrictions, we were only able to sample 1000 parameter sets for the optimisation with 5 stochastic repeats per simulation. Finally, for the development described in this article, the concentration of the drug in the tumour and its effect on the model parameters was out of scope due to lack of data, however, this is planned for a future version.

Investigations into the dynamic behaviour of complex mechanistic models can reveal both expected and unexpected parameter dependencies, which can be interpreted in terms of their physiological meaning. In the best case, elucidating the effects of physiologically relevant parameters on model outcomes can lead to hypothesis generation for novel modes of action of new drugs. For example, an expected outcome of a sensitivity analysis such as the one performed in this article may be to find that the rate at which CD8 cells kill tumour cells has a strong influence on the numbers of tumour and immune cells. Indeed, there are numerous molecules in development which are intended to increase killing events. An unexpected outcome of our sensitivity analysis was that the randomness of CD8 cell motility also has a notable effect on the model outcome. Mechanistic models such as the one presented here are a valuable tool for investigating novel modes of action for new molecules alone and in combination.

Big data approaches in which labelled data is used to train a machine-learning algorithm are a current trend in personalised healthcare. The advantage of these approaches is, if sufficient data are available, to be able to make a prediction of some future event such as clinical response without mechanistic knowledge of the underlying biological process. On the other hand, using mechanistic modelling one can obtain a full-time course of events (and not just the time point on which the algorithm was trained) and understand the mechanism by which the prediction is made. For example, we have seen that a single feature, the spatial distribution of CD8+T-cells, is sufficient to predict the ground truth with high (77%) accuracy. Machine-learning approaches show a similar performance but the biological or mechanistic interpretation of features is more challenging. The two approaches can be complementary to one another such that a mechanistic model like the one described here generates a full-time course of biomarkers, and the accordingly enriched dataset can then be used to train an ML algorithm.

Regarding the clinical utility of the described approach, a major question during early drug development relates to the mode of action (MoA) of the drug and if indeed it works as it has been designed to do. Researchers investigating or wishing to confirm the MoA of such a new investigational drug could use this model to obtain quantitative guidance on the most informative time point for biopsy collection. For example, the effect of molecules aimed at increasing the number of T-cells in the tumour should be assessed when the assumed effect has resulted in a measurable increase in the latter. To this end, researchers could use CD8 maps from baseline biopsies of the disease of interest and run simulations on different baseline distributions. When additional data becomes available as the clinical trial proceeds, the model can be updated.

The vision we have for drug development is one where tools and approaches such as the one described here are a routine part of all steps in the drug development pipeline guiding human experts in their decisions. As a result, decisions can be reached earlier and more efficiently reducing the cost of drug development and, most importantly, improving patient benefit. Regulators have already begun to appreciate the value of computational approaches and in the future, they may demand quantitative data for decision support.

Methods

Data

At the time of data curation, there were 71 patients enroled in two clinical trials for Simlukafusp (NCT03063762 and NCT02627274) who had had baseline and on-treatment biopsy samples. Of these, 44 biopsy pairs contained enough tissue (at least four 100 × 100 cell width fields of view fit within the tissue annotations for baseline and on-treatment samples) to use as an input to the model simulations. In the NCT02627274 trial patients enroled had advanced solid tumours, and in the NCT03063762 trial patients had advanced renal cell carcinoma. Baseline biopsies were usually collected within 7 days before the start of treatment. On-treatment biopsies were obtained at a range of timepoints as described in Table 1.

IHC stained tissue sections (with antibodies against CD8, a marker of T-cells, and Ki67, a marker of proliferation) are digitised and analysed with an in-house developed machine-learning algorithm¹³. First, the colour of images is deconvolved¹⁴. Next, candidate objects (i.e. parts of the image that might conceivably be a tumour or T cell based on morphological, intensity and colour criteria) are detected and then classified by an expert (i.e. pathologist) trained classifier. To this end, the human expert reviews images where candidate cells have been marked and either accepts or rejects the suggestion. This generated training data is used to train the classifier. In the last step, the performance of the trained algorithm is tested on a hold-out set (unseen images of the same kind). The typical classification accuracy is ~90%. The spatial coordinates (in a whole slide coordinate system) of detected objects, CD8+T cells (proliferating and non-proliferating based on the expression of Ki67) and proliferating tumour cells, is stored in a database. The resulting spatial map of cells from the whole sample is provided as an input to the preprocessing algorithm.

Technical details

The tumour cell detection algorithm described above only detects tumour cells at a specific stage in their cell cycle (when expressing Ki67). Assuming a typical diameter of a tumour cell to be comparable to that of a CD8 T-cell (~5 µm), we populated empty space within the tumour annotations (which were hand-drawn by a pathologist) with tumour cells up to a density of 70% of the maximum possible with tight packing of cells. The size of each grid cell in our simulations is considered to be one cell width, which we assume is on the order of magnitude of 5 µm. Although alternative grid definitions are possible (such as off-lattice¹⁵ or hexagonal¹⁶), in order to be consistent with the original model, we opted to work with a square lattice and single occupancy of grid spaces. We did not observe any artefacts in our simulations due to the lattice choice.

We subdivide each biopsy into as many minimally overlapping square tiles (of dimension 100 × 100 cell widths) as possible. Each of these tiles is passed to the model as an input. The tiling algorithm that we developed selects tiles from the images while obeying the following rules: at least 90% overlap with tissue annotations and tile overlap of ≤10%.

As well as the positions of CD8 and tumour cells in the tiles, the model requires physiological properties of the cells as an input such as proliferation capacity and stem cell-ness. To this end, for each cell at each position, a number of different properties are stored and updated with each iteration of the model. Since there is no way to obtain these properties from the real biopsy images, we assigned them according to realistic estimations. The properties and our allocations at baseline are listed in Table 3.

Table 3 Assigned properties of the tumour cells and immune cells in the preprocessing script.

Full size table

For each patient in the dataset, the input to the simulation model consists of a stack of tiles extracted from the full pretreatment biopsy image, each of size 100 × 100 cell widths, and a structure containing the properties of the cells in each tile. Depending on the size of the original sample, the number of tiles per patient ranged from 1–65 and the number of tiles per patient are listed in Table 1. We excluded samples from which less than four tiles were extracted to increase statistical confidence in our results.

Model extension

The mathematical model is based on that from ref. ⁷, which we extended to suit our purpose. Specifically, since we are using tiles from real biopsy samples as an input to the model we replaced the original no-flux boundary conditions with periodic boundary conditions which better reflect the physiological scenario. In addition, we increased the resilience of tumour cells such that several attacks by immune cells were necessary to lead to tumour cell death based on in vivo imaging data¹⁷. A description of the cell properties assigned for simulations is provided in Table 3 and the model parameters with descriptions are provided in Table 4.

Table 4 Parameter names, descriptions and ranges scanned for the sensitivity analysis and optimisation steps.

Full size table

Spatial agreement evaluation

In order to evaluate the agreement between real and simulated biopsy samples, we developed a spatial agreement measure (SAM). The spatial distribution of CD8 cells on a given tile is summarised by the normalised RDF, defined in¹⁸ as the square taxicab paired correlation function. This normalised RDF, which comprises a vector of relative cell densities at each possible inter-cell distance, can be derived from any simulated or observed tile. We assess agreement between collections of tiles from simulated and observed biopsies using these RDF vectors (see Fig. 2a). At each possible inter-cell distance, the relative cell density values in observed and simulated biopsy tiles are compared. Specifically, the proportion of relative cell densities in simulated tiles that fall within the range of relative cell densities from observed tiles ±20% is calculated and compared to a threshold value (threshSAM) at each possible inter-cell distance. The calculation is performed for all possible inter-cell distances and the SAM is defined as the proportion of inter-cell distances for which the threshold was met.

Specifically, the SAM is calculated according to the following four steps.

1.
Consider the matrices Ro and Rs such that the rows of Ro represent the normalised RDFs for each observed tile of the on-treatment biopsy for a given patient and the rows of Rs represent the normalised RDFs for all stochastic repeats for simulated tiles for the same patient at the appropriate time point. The number of rows of Ro is the number of tiles identified on the on-treatment biopsy sample and the number of rows of Rs is the number of tiles identified on the pretreatment biopsy sample multiplied by the number of stochastic repeats. Both Ro and Rs have the same number of columns, which is defined by the length of the RDF, M, in this case, half of the computational domain length, equal to 50 distance units. To summarise Ro_ij is the normalised paired correlation function at distance j for the ith tile from the observed on-treatment biopsy and Rs_kj is the normalised paired correlation function at distance j for the kth simulation of the baseline biopsy.
2.
Identify the upper (u) and lower (l) threshold vectors for RDF acceptance from the RDFs of observed tiles Ro such that
$$u_j = \mathop {{\max }}\limits_i Ro_j + 0.2 \times \mathop {{{{{\mathrm{range}}}}}}\limits_i \;Ro_j,$$
$$l_j = \max \left( {\mathop {{\min }}\limits_i Ro_j - 0.2 \times \mathop {{{{{\mathrm{range}}}}}}\limits_i Ro_j,0} \right).$$
3.
For each RDF distance j, find the number of RDFs from simulated tiles Rs that lie within the bounds defined by u and l and divide by the total number of simulations, N, to obtain the vector of a proportion of accepted simulated RDFs for each distance, A:
$$A_j = \frac{{\mathop {\sum }\nolimits_k \left( {l_j \,<\, Rs_{kj} \,<\, u_j} \right)}}{N}.$$
4.
Find the number of RDF distances for which the proportion of simulated RDFs that fall within the accepted range is at least threshSAM and divide by the maximum distance M to obtain the SAM.
$${SAM} = \frac{{\mathop {\sum }\nolimits_j A > {threshSAM}}}{M}.$$

The distribution of CD8 cells in the immediate vicinity provides important information regarding the size and frequency of CD8 clusters¹⁸. To ensure that the vicinity is given appropriate weight, we perform a second SAM step to quantify the agreement between the ranges of RDF values within the first 15 distance units. The VarSAM is the ratio of the ranges of the RDFs of simulated and observed biopsies in the first 15 distance units. Figure 2a summarises the SAM and VarSAM.

Mathematically, the VarSAM is defined as

$${{VarSAM}} = \min \left( {\frac{{{{range}}\left( {Rs\left( {i\, <\, 16} \right)} \right)}}{{{{range}}\left( {Ro\left( {i \,< \,16} \right)} \right)}},\frac{{{{range}}\left( {Ro\left( {i \,<\, 16} \right)} \right)}}{{{{range}}\left( {Rs\left( {i \,< \,16} \right)} \right)}}} \right).$$

Sensitivity analysis methodology

For the model sensitivity analysis, simulations were performed using parameter values within realistic physiological ranges, sampled either linearly or logarithmically as summarised in Table 4. In light of the computing power available, the parameters were varied independently. The simulations were performed for a hand-selected set of six tiles from different patients which had different numbers of CD8 cells at baseline, covering a range between 7 and almost 800 CD8 cells per tile. Every simulation was repeated five times with different random seeds, and an average of the simulation results was calculated. The sensitivity of CD8 cell number and tumour cell number to changes in the parameter values was calculated at the 20-day time point of the simulations according to

$${\mathrm{CD8}}\;{\mathrm{sensitivity}} = \frac{{({{CD8}}({{max}}) - {{CD8}}({{min}}))/{{CD8}}({{baseline}})}}{{({{par}}({{max}}) - {\mathrm{par}}({{min}}))/{{par}}({{mid}})}}$$

such that par(max) and par(min) are the maximum and minimum values of the explored parameter range and CD8(max) and CD8(min) CD8 cell number are the corresponding CD8 cell number for simulations using those parameter values. The CD8 cell number is normalised by the number of CD8 cells at baseline and the parameter range is normalised by the midpoint of the range scanned (par(mid)). An analogous expression was used to calculate tumour cell sensitivity.

Longitudinal plots of cell numbers against time for the sensitivity analysis simulations are provided in Supplementary Fig. 1.

Optimisation methodology

For the model optimisation, we selected 1000 sets of the four chosen parameters (IMpprol, IMpdeath, IMrwalk and IMpkill) from uniform or logarithmic distributions within realistic physiological ranges, as indicated in Table 4. Model simulations were performed using each parameter set for every tile, for every patient in the training set, and the simulations were repeated five times to allow averaging over stochastic runs. The end-time of the simulation was set according to the real time-point of the on-treatment biopsy sample for each patient. The spatial distribution of CD8 cells from observed biopsies was compared with that of the simulated biopsies via the SAM, providing a score between 0–1 for every parameter set, for every patient as summarised in the heatmap in Fig. 4. There are several circumstances in which there are exceptions to this method which are handled as follows:

1.
In cases where the simulations ended before the prescribed end-time (usually due to the tumour being eradicated), the last simulated time point was used to calculate the SAM.
2.
In cases where there are very few CD8 cells on either the simulated or observed biopsy sample, the RDFs are dominated by noise and the SAM is no longer considered representative.
1. a.
  Whenever the majority of tiles from the observed on-treatment biopsy for any given patient contained less than ten CD8 cells, we excluded these samples from the SAM calculations.
2. b.
  In the case that there were less than 10 CD8 cells in the majority of simulated tiles for any given parameter set, the SAM was not calculated and a NaN was recorded. In these cases, rather than comparing the spatial distribution of CD8 cells in collections of tiles, the number of CD8 cells per tile is compared instead. Specifically, the two-sample t-test was calculated to evaluate whether the total CD8 cells per tile in observed and simulated tiles could belong to the same underlying distribution, and therefore whether the parameter set should be accepted.

In order to identify parameter sets that generated realistic on-treatment biopsies across the population, we ranked the parameter sets by the mean SAM across all patients and accepted all parameter sets where the SAM was >0.7. From the selected parameter sets, we excluded those that would not yield sufficient variability as measured by the VarSAM (explained above). The threshold for VarSAM was 0.3. The resulting final population parameter set contains 12 parameter combinations that we consider to produce the most realistic biological behaviour.

Model validation

For the model validation, we used the 12 population parameters identified through the model optimisation, along with the baseline CD8 distributions for the unseen holdout dataset of seven patients to perform simulations (with five stochastic repeats per parameter set). As a control, we repeated this exercise five times with 12 randomly chosen parameter sets. To compare the simulated spatial distributions of CD8 cells with the observed distributions of CD8 cells for the test patients, we calculated the SAM as described in the previous section. We consider the model accuracy to be defined as the percentage of accepted simulations according to the same thresholds that were used for the optimisation, (SAM > 0.7 and VarSAM > 0.3).

Ethics statement

All participants provided written informed consent to take part in the study and the study was approved by the ethics committee of Hoffman-La Roche.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The cell position (gridstack) data is available as a zipped file at the following location https://doi.org/10.5281/zenodo.6631677.

For up-to-date details on Roche’s Global Policy on the Sharing of Clinical Information and how to request access to related clinical study documents, see here:

https://go.roche.com/data_sharing.

Code availability

All code is located at the following public repository https://doi.org/10.5281/zenodo.6631666.

References

World Health Organisation. WHO: cancer. https://www.who.int/health-topics/cancer#tab=tab_1 (2021).
Barua, S. et al. Spatial interaction of tumor cells and regulatory T cells correlates with survival in non-small cell lung cancer. Lung Cancer 117, 73–79 (2018).
Article Google Scholar
Corredor, G. et al. Spatial architecture and arrangement of tumor-infiltrating lymphocytes for predicting likelihood of recurrence in early-stage non–small cell lung cancer. Clin. Cancer Res. 25, 1526–1534 (2019).
Article CAS Google Scholar
Brown, J. R. et al. Multiplexed quantitative analysis of CD3, CD8, and CD20 predicts response to neoadjuvant chemotherapy in breast cancer. Clin. Cancer Res. 20, 5995–6005 (2014).
Article CAS Google Scholar
Saltz, J. et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Rep. 23, 181–193.e7 (2018).
Article CAS Google Scholar
Schwen, L. O. et al. Data-driven discovery of immune contexture biomarkers. Front. Oncol. 8, 627 (2018).
Article Google Scholar
Kather, J. N. et al. In silico modeling of immunotherapy and stroma-targeting therapies in human colorectal cancer. Cancer Res. 77, 6442–6452 (2017).
Article CAS Google Scholar
Norton, K. A., Gong, C., Jamalian, S. & Popel, A. S. Multiscale agent-based and hybrid modeling of the tumor immune microenvironment. Processes 7, 1–23 (2019).
Article Google Scholar
Gong, C. et al. A computational multiscale agent-based model for simulating spatio-temporal tumour immune response to PD1 and PDL1 inhibition. J. R. Soc. Interface 14, 20170320 (2017).
Kather, J. N. et al. High-throughput screening of combinatorial immunotherapies with patient-specific in silico models of metastatic colorectal cancer. Cancer Res. 78, 5155–5163 (2018).
Article CAS Google Scholar
Alfonso, J. C. L. et al. In-silico insights on the prognostic potential of immune cell infiltration patterns in the breast lobular epithelium OPEN. https://doi.org/10.1038/srep33322 (2016).
Mani, N. L. et al. Quantitative assessment of the spatial heterogeneity of tumor-infiltrating lymphocytes in breast cancer. Breast Cancer Res. 18, 78 (2016).
Article Google Scholar
Lorsakul, A. et al. Automated wholeslide analysis of multiplex-brightfield IHC images for cancer cells and carcinoma-associated fibroblasts. In Proc. Volume 10140, Medical Imaging 2017: Digital Pathology 41–46 (SPIE, 2017).
Ruifrok, A. C. & Johnston, D. A. Quantification of histochemical staining by color deconvolution. Anal. Quant. Cytol. Histol. 23, 291–299 (2001).
CAS PubMed Google Scholar
Macnamara, C. K., Caiazzo, A., Ramis-Conde, I. & Chaplain, M. A. J. Computational modelling and simulation of cancer growth and migration within a 3D heterogeneous tissue: the effects of fibre and vascular structure. J. Comput. Sci. 40, 101067 (2020).
Article Google Scholar
Birch, C. P. D., Oom, S. P. & Beecham, J. A. Rectangular and hexagonal grids used for observation, experiment and simulation in ecology. Ecol. Modell. 206, 347–359 (2007).
Article Google Scholar
Lehmann, S. et al. In vivo fluorescence imaging of the activity of CEA TCB, a novel T-cell bispecific antibody, reveals highly specific tumor targeting and fast induction of T-cell–mediated tumor killing. Clin. Cancer Res. 22, 4417–4427 (2016).
Gavagnin, E., Owen, J. P. & Yates, C. A. Pair correlation functions for identifying spatial correlation in discrete domains. Phys. Rev. E 97, 062104 (2018).
Article CAS Google Scholar

Download references

Acknowledgements

We are grateful to Roisin Stephens and Antoine Soubret for useful scientific discussions and to Stefan Evers, Christophe Boetsch and Michael Bensch for reviewing the final draft. We also thank Helen Byrne, Philip Maini, Alex Phipps, Benjamin Ribba and William Pao for their encouragement and useful discussions. Most importantly, we would like to thank the patients for their participation in the clinical trial.

Author information

Authors and Affiliations

Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, 4070, Basel, Switzerland
L. G. Hutchinson & O. Grimm

Authors

L. G. Hutchinson
View author publications
You can also search for this author in PubMed Google Scholar
O. Grimm
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.G.H. and O.G. contributed equally to this work. Specifically, L.G.H. and O.G. conceived, implemented and wrote the article jointly.

Corresponding author

Correspondence to L. G. Hutchinson.

Ethics declarations

Competing interests

L.G.H. and O.G. are full-time employees of Hoffman-La Roche Ltd. The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental Material

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hutchinson, L.G., Grimm, O. Integrating digital pathology and mathematical modelling to predict spatial biomarker dynamics in cancer immunotherapy. npj Digit. Med. 5, 92 (2022). https://doi.org/10.1038/s41746-022-00636-3

Download citation

Received: 10 November 2021
Accepted: 24 June 2022
Published: 12 July 2022
DOI: https://doi.org/10.1038/s41746-022-00636-3

This article is cited by

A new treatment for breast cancer using a combination of two drugs: AZD9496 and palbociclib
- Ophir Nave
- Yehuda Shor
- Moriah Sigron
Scientific Reports (2024)

Subjects

Abstract

Similar content being viewed by others

Computational approaches to modelling and optimizing cancer treatment

Characterisation of tumour microenvironment remodelling following oncogene inhibition in preclinical studies with imaging mass cytometry

Evaluating and comparing immunostaining and computational methods for spatial profiling of drug response in patient-derived explants

Introduction

Results

Preprocessing of the images for model input, optimisation and validation

Derivation of a spatial agreement measure (SAM) based on the radial distribution function

Sensitivity analysis: an exploration of model sensitivity

Model optimisation: a set of model parameters can reliably reproduce spatial features of biopsy images

Null benchmark for predictive accuracy

Model validation: the trained model accurately reproduces spatial biopsy features

Model application: the model predicts spatial CD8 dynamics for monotherapy and hypothetical combination therapies

Discussion

Methods

Data

Technical details

Model extension

Spatial agreement evaluation

Sensitivity analysis methodology

Optimisation methodology

Model validation

Ethics statement

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplemental Material

Reporting Summary

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

A new treatment for breast cancer using a combination of two drugs: AZD9496 and palbociclib

Search

Quick links