Introduction

Among women, invasive breast cancer is the most commonly diagnosed female cancer in most countries worldwide.1 Increased mammographic breast density, which describes the radiologically appearing white tissue on a mammogram, is one of the strongest breast cancer risk factors2. A recent meta-analysis found that percent density, which reflects the proportion of total breast area comprised of dense fibroglandular tissue, is a stronger predictor of risk than absolute dense area.3 It is estimated that 43% of US women 40–74 years of age have dense breasts,4 but mechanisms accounting for the relationship between elevated density and breast cancer risk remain ill-defined.

Studies highlight that pre-cancerous lesions5 and breast tumors6 are more likely to occur in mammographically dense regions within the breast, suggesting the relevance of localized as well as global density measures in cancer development. The few studies that have examined histological correlates of breast density have suggested that higher breast density is associated with greater epithelial cell content and non-fatty stroma.7,8 While most studies to date have utilized quantitative microscopy to characterize breast tissue from women undergoing procedures for suspect lesions, one study of non-cancerous autopsy breast tissues also showed positive relationships between epithelial and non-fatty stromal tissue, particularly stromal collagen area, and percent density.9

Advancements in automated digital pathology now allow increased opportunities for characterization and quantification of breast tissue organization that can complement traditional microscopic assessments. Moreover, increasingly, studies are utilizing automated digital tools for complex tissue pathology assessment of breast cancer outcomes.10,11 The recent incorporation of progressive artificial intelligence platforms into digital pathology work systems now allows the utilization and expansion of these approaches to larger scale molecular epidemiological studies. Specifically, deep learning methods such as convolutional neural networks,12 are increasingly being employed for histological image recognition with high accuracy and reproducibility.13,14,15 We previously developed a deep learning convolutional neural network model for the assessment of tissue characteristics in hematoxylin and eosin (H&E)-stained whole slide breast tissue images,16,17 which classified whole slide images as epithelial, stromal and fat tissue. In this current study, we hypothesized that application of this model to whole slide images of H&E-stained fixed tissue specimens collected from diagnostic image-guided breast biopsies might enable identification of specific histologic correlates that underpin breast density, including both global and localized (peri-lesional) measures. Secondly, as more than 25 million women in the US have dense breasts,4 and because only a small proportion of these women will develop breast cancer, we also aimed to identify tissue correlates of breast density that may be important for distinguishing malignant from benign biopsy diagnoses separately among women with high and low breast density, to help inform cancer risk stratification among women undergoing a biopsy following an abnormal mammogram.

Results

Patient characteristics

Overall, patient characteristics were largely similar between the training (n = 588) and testing (n = 264) sets (Table 1). The mean age was 50 years, and most women were of white race (91.3%), college educated (82.3%), of normal weight (50.4%) and premenopausal (58.1%). Most mammograms were categorized after work-up as suspicious abnormality (BI-RADS diagnostic category 4: 83.7%). The remainder were categorized as probably benign (BI-RADS diagnostic category 3: 5.9%) or highly suggestive of malignancy (BI-RADS diagnostic category 5: 10.5%). A little over half of the core needle biopsies were ultrasound-guided (54.6%), with the remainder being stereotactic-guided (45.3%). Median global fibroglandular volume was 34.4%, and median localized fibroglandular volume was 40.0%. No difference was observed for global and localized fibroglandular volume between the training and testing sets. Among the n = 1036 biopsy targets, most biopsy diagnoses were benign (78.2%). Benign breast disease diagnoses were categorized according to benign non-proliferative (including non-proliferative fibrocystic change and other benign and discrete entities), proliferative without atypia (including ductal hyperplasia and sclerosing adenosis) and proliferative with atypia (including atypical ductal and lobular hyperplasia). Further, 8.0% of all biopsies yielded in-situ lesions, and 13.8% were invasive carcinoma (Table 1).

Table 1 Selected characteristics of study participants from the BREAST-Stamp Project, who were referred for an image-guided breast biopsy, stratified by the training and testing sets (n = 852)

Associations between histologic features and breast density (global and localized fibroglandular volume)

As mentioned in the methods, 37 features were extracted from the output of the convolutional neural network model. Using these identified features in separate random forest regression models trained to predict global and localized fibroglandular volume, the correlations between predicted and actual fibroglandular volume measurements were 0.94 for global and 0.93 for localized fibroglandular volume, respectively. The top 10 correlates identified as most important for predicting both fibroglandular volume measurements are shown in Table 2, and the corresponding Gini index plots for global and localized fibroglandular volume are shown in Supplementary Fig. 2. Overall, similar features were identified as correlates of global and localized fibroglandular volume measures; however, some differences were noted. Normalized non-fatty stromal tissue quantity (i.e., stromal tissue quantity normalized to total breast tissue area on the whole slide image) and normalized fat quantity (i.e., fat tissue quantity normalized to breast tissue area on the whole slide image) were the strongest predictors of both global and localized fibroglandular volume. Of note, epithelium quantity did not rank among the top 10 features for global fibroglandular volume and was ranked 8th for localized fibroglandular volume. Features characterizing the spatial arrangement of the epithelial regions assessed using an area-Voronoi diagram18,19 were among the top 10 features ranked for prediction of both global and localized fibroglandular volume.

Table 2 Summary of top 10 ranked histologic features identified in the random forest model for the prediction of global and localized % fibroglandular volume (FGV)

Sensitivity analyses were conducted to examine the influence of body mass index (BMI) and menopausal status on the predictions, and results from these investigations are detailed in Supplementary Table 2. BMI was consistently ranked as the strongest predictor of fibroglandular volume when included in the model. Interestingly, in this model, the normalized fat quantity was the next most important feature for both global and localized fibroglandular volume, followed by normalized non-fatty stroma quantity. When analyses were stratified by menopausal status, some differences in top ranking features were noted as outlined in Supplementary Table 2. For global fibroglandular volume prediction, the top-ranked features were similar; however, for localized fibroglandular volume, fat-related variables ranked lower among postmenopausal women than for premenopausal women.

Exploratory investigation relating histologic features to biopsy diagnosis among patients with high and low fibroglandular volume

As elevated breast density is common among women,4 yet only a small proportion will develop invasive breast cancer, we aimed to identify histological correlates that could inform future breast cancer risk stratification among women undergoing diagnostic biopsy with either high or low breast density. The main objective of this exploratory investigation was to examine if the histologic features that were associated with cancer status were similar and/or different among women with low vs. high fibroglandular volume. Thus, using the 37 features, a random forest classifier was trained to predict invasive cancer vs. benign breast disease among women stratified into high or low fibroglandular volume (using the median cut-point of global (34.4%) and localized (40%) fibroglandular volume from the training population). The top-ranked features for predicting invasive cancer status separately among women with high vs. low fibroglandular volume are shown in Table 3. Firstly, features associated with the spatial arrangements of the epithelial regions were ranked most important (top two features) for predicting cancer status among women, irrespective of global fibroglandular volume (Table 3). H&E images highlighting examples of the top-ranked epithelial region spatial arrangement features, with corresponding mammograms from patients whose biopsies yielded diagnoses of atypical ductal hyperplasia and invasive carcinoma, are shown in Fig. 2a, b, respectively. Despite similar radiological global fibroglandular volume on both mammograms, the H&Es from each diagnostic biopsy, targeted to locally dense regions within the breast, reflect differences in the spatial arrangement of epithelium (Fig. 2a, b). Within Fig. 2, two features are highlighted: the mean and median area ratio of each epithelial region to its Voronoi region. Figure 2a represents a H&E whole slide image with low mean and median area ratio of each epithelial region to its Voronoi region. This slide has a diagnosis of atypical ductal hyperplasia and has both global and localized fibroglandular volume > median (global fibroglandular volume: 45%; localized fibroglandular volume: 61%). In contrast, Fig. 2b represents a H&E whole slide image with higher mean and median area ratio of each epithelial region to its Voronoi. This slide has a diagnosis of invasive carcinoma and has both global and localized fibroglandular volume > median (global fibroglandular volume: 49%; localized fibroglandular volume: 49%). Features of epithelial regions were also strongly associated with invasive cancer status in models stratified by localized fibroglandular volume. Among women with high localized fibroglandular volume, epithelial morphology features ranked as the most important (4 out of the top 5). Among women with low localized fibroglandular volume, epithelium quantity and the median number of epithelial regions were the top two ranked features, followed by normalized stroma quantity.

Table 3 Summary of top 10 ranked histologic features identified in the random forest model for the prediction of invasive cancer status among women with high and low % fibroglandular volume

The performance of the model for predicting invasive cancer among women with high vs. low global fibroglandular volume in the testing set is shown in Fig. 3a, b. An AUC of 0.92 (95% CI: 0.80–0.99) was achieved for predicting invasive cancer diagnosis among women with high global fibroglandular volume, and an AUC of 0.84 (95% CI: 0.71–0.94) was reached for predicting an invasive cancer diagnosis among women with low global fibroglandular volume. For cancer detection stratified according to high and low localized fibroglandular volume, similar prediction values were observed, as shown in Fig. 3c, d (high localized fibroglandular volume: AUC: 0.92 (95% CI: 0.79–0.99); low localized fibroglandular volume: AUC: 0.81 (95% CI: 0.65–0.96)). No significant differences were observed between the AUCs for high vs. low global (p = 0.24) or localized fibroglandular volume (p = 0.24).

Discussion

We report that we can predict global and local mammographic fibroglandular volume by applying a deep convolutional neural network model to H&E-stained sections of image-guided breast biopsies prompted by an abnormal mammogram. Specifically, we show that greater non-fatty stromal and adipose tissue content and the spatial distribution of epithelial regions in tissues, rather than total epithelial quantities, were the strongest correlates of % fibroglandular volume. The cardinal histopathologic feature of breast cancer on low magnification is ‘invasion’, characterized by irregular epithelial growth with incursion of cells into normal structures. As anticipated, features extracted from the output of the convolutional neural network indicated that epithelial organization is the strongest correlate of invasive cancer irrespective of fibroglandular volume. Thus, we hypothesize that more complex analyses of dense tissue using convolutional neural networks or other imaging technologies may enable radiological recognition of textural patterns that reflect the epithelial disorganization characteristic of breast cancer. Recent preliminary analyses using convolutional neural networks suggest the potential of this approach.20

Our findings agree with prior literature using quantitative microscopy9 to understand histological correlates of breast density. Similarly, our findings support prior studies that suggest radiological density is largely non-fatty stroma, with relatively little variation in epithelial content by mammographic density.8,9 Further, we showed that other quantitative measures of fat tissue were also highly ranked as being important for the prediction of % fibroglandular volume. The heterogeneous nature of the top-ranked histologic features further supports the complexity of quantitative measures of breast density. A novel finding of our study was the identification of the spatial arrangement of epithelial regions as ranking among the top 10 correlates of fibroglandular volume. To define spatial arrangements, we used an area-Voronoi diagram and Delaunay triangulation, which are approaches that would be very difficult to reproduce using visual assessment. Voronoi decomposition is a method whereby an area is partitioned into smaller areas that surround regions that are closest to pre-specified points.19,21 In essence, our results suggest that tissues that display a high ratio of epithelial area to its corresponding areas of influence are characteristic of cancer in both high and low global fibroglandular volume contexts. The identified Voronoi area along with the area ratio of each epithelial region to its Voronoi region ranked among the top 10 correlates for both global and localized fibroglandular volume measures.

Beck and colleagues were among the first to highlight the potential of digital image analysis for examining histological features of breast cancer. They developed and utilized C-Path (Computational Pathologist), a machine learning tool, which identified features of stromal morphology that were especially important for predicting breast cancer prognosis.10 Although prognosis was not the focus of our analyses, using a similar approach, we also found that the quantity of non-fatty breast stromal tissue was among the top-ranked predictors of fibroglandular volume, supporting the contributory role of stroma to fibroglandular volume. This study highlights the importance of examining the tissue microenvironment of dense tissue in more detail, including conducting in-depth analysis of stromal components17 including collagen.22,23

A major clinical challenge is differentiating between the non-fatty stroma and at-risk epithelium that together constitute the ‘white’ dense areas that appear on a mammogram. Thus, despite similar measures of breast density for a radiologically dense breast, there could be considerable heterogeneity of tissue composition within the dense regions. As density alone may not be capable of defining epithelial organization, other techniques are needed. Potential solutions could be alternative imaging or further classification of density using neural networks.20 Findings from our exploratory analysis relating histologic correlates to biopsy diagnosis highlight the interindividual heterogeneity that may be apparent at the histological level despite having comparable radiological densities. Interestingly, we found that irrespective of fibroglandular volume, spatial arrangement of epithelium was the most predictive of a cancer diagnosis, showing that deciphering composition of the mammographic fibroglandular volume is important for identifying abnormalities at the histological level. Of note, the performance of the model was better in detecting cancer status among women with high fibroglandular volume (both global and localized) than among women with low fibroglandular volume, though this difference was not statistically significant. This could be an artifact of the model, i.e., a challenge of recognizing spatial patterns in low density. However, this finding could also support the concept of epithelial-stromal interaction in the progression of invasive cancer. Understanding the heterogeneity24 and significance of the epithelial region spatial arrangement and organization may provide important etiological clues for tumorigenesis, and additional assessment of these features is needed to examine their relationships with other epithelial histological risk markers including terminal duct lobular units.25

Since the publication by Beck and colleagues, there have been substantial advancements in digital pathology methodology, particularly with the advent of deep learning. For example, our investigation complements and expands on existing studies that have highlighted the potential of deep learning for identifying factors associated with breast cancer diagnosis.14,15,26,27,28 The publication of the CAMELYON16 challenge winners showed the ability of deep learning algorithms to detect lymph node metastasis with high accuracy with a comparable AUC to that obtained following pathological assessment (AUC = 0.96).15 While our limited sample size and the cross-sectional nature of the study design prevented detailed investigation of features associated with breast biopsy diagnoses, our preliminary findings also support the need for further investigations of biopsy tissue using deep learning algorithms.

This study has many important clinical implications and considerations. Firstly, the ability to make predictions using feature assessment alone and without the inclusion of additional breast cancer risk factor information suggests the utility of deep learning approaches for the clinical setting. However, to investigate potential influences of patient characteristics, we conducted sensitivity analyses. As expected given its well-established strong inverse association with % fibroglandular volume,29 BMI was the highest ranked feature for predicting % fibroglandular volume for models in which it was included. While recognition of clinical and participant characteristics is important, the inclusion of such factors in analytical models may mask lesser associations identified by the random forest approach. Second, clinically relevant histological features of biopsy tissue accompanied with radiological information may be of benefit to integrate into breast cancer risk models,30 which are increasingly being used in clinical practice for determining risk of invasive breast cancer. Our findings are of particular relevance for women with elevated breast density, who have had a prior breast biopsy, and as such are at elevated risk of developing invasive breast cancer. We aim that by identifying validated histological features at the time of clinical biopsy following an abnormal mammogram, we may be able to discriminate women at highest risk. Increased efforts are ongoing to include histological information, as well as mammographic density, in risk prediction tools as evidenced by the BCSC-BBD model.31 However, these current risk models do not yet incorporate detailed histology in risk estimates. The integration of biopsy histological features to current risk models that assess radiological and risk factor information may ultimately improve risk assessment and inform clinical management strategies by providing additional risk information on the increasing number of women undergoing breast biopsies after a mammogram. Furthermore, the application of deep learning models that can utilize histological breast biopsy features to predict future risk of breast cancer among women with dense breasts will be important among the growing population of women who experience an initial benign breast biopsy diagnosis. Future expanded studies will address these questions.

Our study has many strengths. Firstly, this analysis is one of the largest breast tissue studies to date to apply convolutional neural network models for the identification of tissue correlates of mammographic breast density. Further, from a biological mechanistic perspective, the ability to examine relationships between breast tissue features and localized fibroglandular volume measures allows the additional assessment of characteristics of the microenvironment of the suspect lesion, particularly factors that cannot be quantified by visual assessment but that may be important markers of cancer. Of note, we observed similarities in the top identified histologic correlates of both global and localized % fibroglandular volume, supporting the utility of biopsy tissues in understanding the global breast milieu. Further strengths of this study included the use of deep learning for delineating characteristics of tissue organization as well as for quantification of tissue components. Additionally, the utilization of diagnostic H&E whole slide images supports investigations of samples that are routinely collected during the clinical investigation following a biopsy, which suggests this approach may have clinical applicability and could compliment routine diagnostic assessment. This study related volumetric measures of breast density, determined from FFDM images, to 2D histological images from FFPE tissues, providing an important step toward a novel and complex approach to understanding breast cancer lesions and their relationships with breast density. Additional understanding of volumetric breast density would be gained by examining the 3D architecture of the BBD and breast cancer diagnoses. For example, future studies that incorporate volumetric density measures from 3D imaging modalities along with fresh tissues will provide a complementary extension to these findings.

However, this study also has limitations. While random forest approaches are effective in deciphering which histological features contribute most to model prediction, they do not yield easily quantifiable results for strengths of association. Our investigation of deep learning approaches to identify histologic features associated with cancer among women with high versus low breast density, while promising, was hampered by sample size. In our current sample set, the number of cancer cases within the testing dataset was limited in order to maximize the reliability of model training. Thus, additional, larger prospective studies are needed to identify biomarkers for cancer risk stratification among women with high breast density who may be referred to diagnostic biopsy following an abnormal mammogram. While the BREAST-Stamp participants are a representative sample of the population of women undergoing diagnostic investigation after an abnormal breast imaging exam, the women enrolled within the study were primarily white (91.3%), which is reflective of the catchment area of the University of Vermont Cancer Center. Further, detailed information on lifestyle factors including alcohol consumption and smoking were not available for the full study population in this analysis. Thus, additional studies among more diverse populations are warranted to determine the generalizability of study findings and to determine whether tissue correlates of mammographic density vary by race and also by lifestyle breast cancer risk factors. In addition, our analysis was restricted to H&E-stained tissue sections. While using H&E sections is important as they are clinically meaningful and routinely prepared following biopsy, investigation of features associated with complementary histological stains to characterize the breast microenvironment may also be informative. An additional consideration is the applicability of this approach to other populations. This investigation included breast tissue sections from a single cross-sectional study, for which standardized protocols were followed for specimen preparation, tissue sectioning and staining, and were completed in the same laboratory at the University of Vermont Medical Center. While this rigorous methodology reduced potential variability in the tissue samples being assessed, it may limit the generalizability of the findings. The approach applied in this current study used extensive contrast and color augmentation during training. This method increases the robustness of the deep learning model against staining variations, but may not be sufficient when dealing with external datasets with significant staining variations. Therefore, additional validation studies are needed that include tissue sections prepared in multiple laboratories. Such studies would be highly informative for determining the robustness of deep learning within diverse pathological clinical settings.

In conclusion, we highlight the potential of applying convolutional neural network models to digital pathology to gain insights into histological correlates that correspond to radiologic measures of breast fibroglandular volume, and to cancer risk. In doing so, in a population of women undergoing diagnostic breast biopsy, we found that epithelial organization was the strongest correlate of invasive cancer irrespective of fibroglandular volume. In addition, we found in agreement with prior studies that fat and non-fatty stromal features were important determinants of radiologic fibroglandular volume. As radiologic density alone may not be capable of defining epithelial organization, these findings suggest opportunities for future efforts using neural networks for enhanced capture of novel histologic as well as breast imaging features that may advance our understanding of breast tumorigenesis.

Methods

Study population

This study included women referred for diagnostic image-guided breast biopsy after an abnormal breast imaging exam between October 2007 and June 2010 at the University of Vermont Medical Center, and were enrolled as part of the National Cancer Institute’s (NCI) cross-sectional, molecular epidemiologic Breast Radiology Evaluation and Study of Tissues (BREAST)-Stamp Project. Details of the BREAST Stamp Project and study eligibility characteristics have been described previously.25,29,32 Eligible participants were women aged 40–65 years referred for image-guided biopsy who did not have breast implants, had not been diagnosed with breast cancer or received cancer treatments, had not undergone breast surgery within one year and had not received chemoprevention. During the enrollment period, mammography registry data indicated that 1227 patients met these eligibility criteria. Information supplied by the radiology facility included final assessment of the mammogram, in BI-RADS categories: 3, “probably benign finding”; 4, “suspicious abnormality”; and 5, “highly suggestive of malignancy”.33 A standard health history questionnaire which assessed established breast cancer risk factors was collected at the time of the mammogram,34 and upon providing consent to be enrolled in the study, additional detailed breast cancer risk factor information was collected by the research coordinator.29 The distribution of the collected breast cancer risk factor information, including the demographic and lifestyle characteristics of the enrolled BREAST Stamp study population, has been previously described.25,29,32 Details of the analytical population included in this current analysis are outlined in more detail below and described in Table 1. The Institutional Review Boards at the NCI and the University of Vermont approved the protocol for this project for either active consenting or a waiver of consent to enroll participants, link data and perform analytical studies.

Breast biopsy specimens

Breast tissues obtained from ultrasound-guided core needle (14-gauge) or stereotactic-guided vacuum-assisted (9-gauge) biopsy, were routinely processed, and representative H&E-stained breast tissue sections were obtained from the formalin-fixed paraffin-embedded target blocks for each biopsy and, when collected during biopsy, from non-target blocks representing surrounding non-target tissue. The diagnosis was confirmed following pathological report review. For women who had ≥ two unilateral biopsy targets, the two targets with the most severe diagnoses were selected. If there were ≥ two bilateral targets, then one target from each breast was selected, sampling the tissues with the most severe diagnoses. H&E-stained breast biopsy tissue sections were digitized at ×20 magnification using the Aperio (47.7%) or Hamamatsu scanning systems (52.3%).

Assessment of breast density

Assessment of breast density was conducted at the University of California, San Francisco on pre-biopsy raw digital mammograms from full-field digital mammography systems.25,29,32,35,36 Briefly, quantitative global29 and localized25 fibroglandular tissue volume (cm3) measures were determined using craniocaudal mammograms of the ipsilateral breast, taken at the time-point prior but nearest to the biopsy date. Percent (%) global fibroglandular volume was estimated using Single X-ray Absorptiometry, which utilized a breast density phantom attached to the compression paddle of the mammography machine.25,29,32,35,36 For the assessment of % localized peri-lesional fibroglandular volume measurements, the biopsy location and radius were identified on the pre-biopsy mammogram by the study radiologist.25 Localized % fibroglandular volume measurements at a volume ~0–2 mm3 surrounding but excluding the biopsy target location were utilized in this analysis.

Analytical population

Of the women eligible for this study, 882 (69%) had Single X-ray Absorptiometry fibroglandular volume results available for the ipsilateral breast within the year before their breast biopsy. Of these, 852 women had target and non-target H&E slides from 1036 breast biopsies available for assessment. For convolutional neural network model training and assessment, as outlined in more detail below, the study population was randomly subdivided into a training dataset (n = 588; 69%) and a testing dataset (n = 264; 31%). Overall, the 588 women in the training set had 687 biopsies which encompassed 1587 H&E stained sections (667 from the target and 920 from the non-target blocks). For the testing group of 264 women, there were 349 biopsies (454 sections from non-target blocks). An overview of the study design is shown in Fig. 1.

Fig. 1
figure 1

Workflow overview utilizing training and testing sets for the prediction of global and localized fibroglandular volume (FGV) measures from identified convolutional neural network model features

Development of the deep learning convolutional neural network model

Using the digitized H&E whole slide images from 588 women included in the training set, a deep convolutional neural network was trained to generate maps of tissue composition that classified whole slide images as epithelial, stromal and fat tissue.16,17 For model training, both target and non-target slides were included. The trained model was an 11-layer fully convolutional VGG-like network, a neural network architecture developed by Oxford’s Visual Geometry Group (VGG).37 The performance of the convolutional neural network model for generating whole slide image maps of epithelial, stromal and fat tissue has been outlined previously,16 and an example of the classification is shown in Supplementary Fig. 1. Briefly, the initial classification of the breast tissue (epithelial, stromal and fat composition) was completed through training of the convolutional neural network model based on manual annotation of these regions in 100 whole slide images, by trained students; these annotations were furthered reviewed by a pathologist. The AUC of the model for the classification of the breast tissue was 0.95.16 Following the generation of the whole slide image maps, features were extracted from the output of the convolutional neural network. These features were grouped into three main categories, describing global tissue quantities, the morphology of the epithelial regions, and spatial arrangements of epithelial regions. To examine spatial arrangements of epithelial regions, region adjacency graphs were used including area-Voronoi diagrams and Delaunay triangulation.18,19 The area-Voronoi diagram was utilized in the context of spatial distribution analysis to define areas of influence of epithelial regions in the image. Given a set of segmented epithelial regions A1,…,An in a whole slide image, the area-Voronoi of a region Va (Ai) is defined as the set of pixels in the image from which the distance to Ai is less than or equal to any other regions in the image. Overall, 37 features were extracted within these three categories; a description of the 37 features and their distributions in the training and testing sets are shown in Supplementary Table 1.

Statistical analysis

Patient characteristics were compared between the training and testing sets using chi-square or Fisher’s exact tests for categorical variables and Wilcoxon rank sum tests for continuous variables. Using the 37 features extracted from the output of the convolutional neural network, a random forest regression model was used to predict global fibroglandular volume (%) and a separate random forest model was used to predict localized fibroglandular volume (%) (i.e., in the region of the biopsy target). The scikit-learn21 Python method was used for training of the random forest models. These models were then applied to the independent testing set to predict the fibroglandular volume measures. We chose random forests as this approach can account for any non-linear relationships between the features and has been shown to work well even when the number of features exceeds the number of observations.38 The output from the random forest model includes the Gini index plot as a measure of the predictive importance of the features. Supplementary Fig. 2 shows the Gini index results for features associated with global and localized % fibroglandular volume. Relationships between the predicted and radiologically quantified (actual) fibroglandular volume measures were assessed using Spearman rank correlations (r). Several sensitivity analyses examined the potential influence of participant characteristics known to be associated with fibroglandular volume on observed findings: (a) we additionally included body mass index (BMI) in the random forest regression model; and (b) we stratified analyses by menopausal status. We also assessed the potential influence of histologic features that were strongly correlated with each other in the prediction model. For highly correlated feature pairs (Spearman correlation: r ≥ 0.85), one feature was randomly selected to be excluded from the model. We then retrained the random forest models on the remaining 25 features. We also used the 25 features to separately predict each fibroglandular volume measure. When the number of features in the prediction model was reduced to include only one from among highly correlated features, the top selected features for fibroglandular volume prediction were similar; therefore, we present results from random forest analyses including all 37 features.

In an exploratory analysis, we examined the potential of the extracted histologic features for predicting cancer status (benign vs. invasive biopsy diagnosis) among women with high and low fibroglandular volume. Firstly, the patient population was stratified by fibroglandular volume (high vs. low), using the median cut point of global (34.4%) and localized (40%) fibroglandular volume from the training population. For this analysis, all in-situ diagnoses were excluded from both model training and testing. Thus the cancerous group was restricted to biopsy diagnoses of invasive carcinoma and the benign group included diagnoses of non-proliferative and proliferative benign breast disease (with and without atypia). Using the 37 features previously extracted from the convolutional neural network output, a random forest classifier was trained to predict cancer status separately among women with high and low fibroglandular volume. The classifier performance for cancer status prediction was assessed using area under the receiver-operating characteristic (ROC) curve (AUC) analysis on the probabilities generated by the random forest classifier. 95% confidence intervals (CIs) were generated using a patient-stratified percentile bootstrapping method.39 ROC curves of the cancer detection systems among patients with high and low global or localized % fibroglandular volume were compared using the bootstrap method in R package “pROC”, which computes, stores and compares the AUC of each ROC curve.40