Introduction

Mammographic density is one of the strongest risk factors for breast cancer, with a four- to six-fold greater breast cancer risk in women with the highest vs. lowest levels of density1,2,3. Research identifying mechanisms for these associations, or how changes in density affect risk, are limited by our reliance on visual estimation (i.e., Breast Imaging Reporting Data and Reporting System (BI-RADS)) or operator-assisted thresholding methods which require inputs by a trained user (e.g., Cumulus)4, that are labor intensive and prone to intra- and inter-reader variability5. To address this need we developed APD, an automated approach to estimate percent mammographic density (PMD). APD is moderately correlated with operator-assisted thresholding methods (r = 0.70), and in a Mayo Clinic case-control study yielded stronger risk estimates when comparing extreme quartiles (operator-assisted odds ratio [OR]: 3.8, 95% confidence interval [CI] 2.4–6.0 vs. automated OR: 5.2, 95% CI 3.3–8.2)6 may help us to better understand how dense tissue is influencing breast cancer risk.

Comparing mammograms from women with similar PMD, there may be considerable heterogeneity in the appearance of dense tissue known as texture. This information is ignored in standard measurements of PMD, yet, emerging evidence demonstrates that it is related to breast cancer risk7,8. In addition, several studies have demonstrated that texture features are associated with breast cancer risk, independent of PMD7,9,10,11,12,13. Using an automated image analysis system, Manduca et al. identified features within several texture classes whose association with breast cancer was of similar magnitude to PMD10. A texture summary measure called ‘V’ captures gray-scale variation in mammograms and was a significant predictor of breast cancer in three Mayo Clinic cohort studies14. Comparing extreme quartiles, V was more strongly associated with breast cancer (relative risk [RR]: 3.5, 95% CI, 1.9–6.4) than was PMD (RR: 2.2, 95% CI, 1.8–2.6). While studies have demonstrated independent associations of texture features and PMD, the interrelationship between texture, automated and manual density, with respect to breast cancer risk remains unclear.

In the current nested case-control study, we evaluated the independent and joint associations of PMD, APD, and V with breast cancer risk, using data on 1900 breast cancer cases and 3921 matched controls in the Nurses’ Health Studies. Given demonstrated positive associations between PMD and breast cancer risk, one of our primary goals was to determine whether APD performed similarly and could therefore be a more accessible and reproducible mammographic breast density measure for use in research. Another primary goal was to determine whether APD, PMD, and V were each independently associated with breast cancer risk. Finding independence between these measures would provide further information about the interrelationships between their underlying phenotypes.

Results

Participant characteristics

Table 1 presents participant characteristics at time of mammogram according to case/control and menopausal status. The median time from mammogram to diagnosis was 4.1 years with an interquartile range from 1.6 to 6.8 years. Participant characteristics according to image resolution and by exposure quartile are presented in Supplementary Tables 1 and 2. Cases had a mean age of 53.3 years compared to 52.6 years among controls. Cases had a higher mean PMD (37.6% vs. 32.1%), APD (18.5% vs. 17.5%), and V (0.1 vs. −0.1) as compared with controls. Similar differences were observed among pre- and postmenopausal women. Hormone receptor status among cases was predominantly ER+/PR+ (54.5%).

Table 1 Participant characteristics at time of mammogram for breast cancer cases and controls by menopausal status.

PMD, APD, V, and breast cancer risk by menopausal status

PMD, APD, and V were moderately correlated (Supplementary Fig. 3) and each was positively associated with breast cancer risk (Table 2). V was more strongly correlated to APD than PMD (r = 0.83 vs. r = 0.61). Compared to the lowest quartile (Q1), individuals in the highest quartile (Q4) of PMD were almost three times more likely to develop breast cancer (OR: 2.90, 95% CI: 2.39, 3.52; p trend < 0.01). PMD associations were stronger among premenopausal (Q4 vs. Q1 OR: 3.56, 95% CI: 2.49, 5.08; p trend < 0.01) compared to postmenopausal women (Q4 vs. Q1 OR: 2.32, 95% CI: 1.78, 3.04; p trend < 0.01). With adjustment for V, the association was somewhat attenuated, but remained significant overall (Q4 vs. Q1 OR: 2.36, 95% CI: 1.90, 2.94; p trend < 0.01), and among premenopausal (Q1 vs. Q4 OR: 2.86, 95% CI: 1.94, 4.21; p trend < 0.01) and postmenopausal women (Q4 vs. Q1 OR 1.91, 95% CI: 1.41, 2.59; p trend < 0.01). For APD, compared to Q1, women in Q4 were twice as likely to develop breast cancer (overall: OR: 2.10, 95% CI: 1.75, 2.51; p trend < 0.01; premenopausal: OR: 2.27, 95% CI: 1.67, 3.08; p trend < 0.01; postmenopausal OR: 2.04, 95% CI: 1.59, 2.62; p trend < 0.01). When adjusted for V, comparing Q4 vs. Q1 was associated with a 36% increased breast cancer risk overall (OR: 1.36, 95% CI: 1.04, 1.77; p trend = 0.05), 55% among premenopausal (OR: 1.55, 95% CI: 1.03, 2.32; p trend = 0.05), and 25% among postmenopausal women (OR: 1.25, 95% CI: 0.85, 1.84; p trend = 0.53). Lastly, for V, compared to women in Q1, those in Q4 were more than twice as likely to develop breast cancer (overall OR: 2.16, 95% CI: 1.81, 2.58; p trend < 0.01; premenopausal OR: 2.46, 95% CI: 1.83, 3.31; p trend < 0.01; postmenopausal OR: 2.09, 95% CI: 1.64, 2.67; p trend < 0.01). Adjustment for PMD or APD attenuated the associations, but V remained associated with breast cancer risk. Associations did not differ by tumor hormone receptor status (Supplementary Table 3) and were strongest among high-resolution images (Supplementary Table 4).

Table 2 ORs and 95% CIs for the associations of PMD, APD, V, with breast cancer risk by menopausal status in the NHS/NHSII.

PMD, APD cross-classified with V and breast cancer risk

V was positively associated with breast cancer risk within each quartile of either PMD or APD (Table 3). Women in Q4 of PMD and V had more than three times higher risk of breast cancer compared to women in Q1 of PMD and V (OR: 3.57; 95% CI: 2.79, 4.58; p interaction = 0.75). Women in Q4 of PMD, but Q1 of V had more than twice the risk of breast cancer compared to those who were low on both measures (OR: 2.50, 95% CI: 1.51, 4.14). High V (Q4) coupled with low PMD (Q1) was not associated with increased risk of breast cancer (OR: 1.40, 95% CI: 0.48, 4.07). Similarly, women in the highest APD and V quartiles, had two and a half times greater risk of breast cancer compared to those with the lowest APD and V (OR: 2.49, 95% CI: 2.02, 3.06; p heterogeneity = 0.75). The patterns were similar, though the magnitude of association was stronger when analyses were restricted to high-resolution images only (Supplementary Table 5).

Table 3 ORs and 95% CIs for the associations between breast cancer risk and APD and PMD, cross-classified with V, in NHS/NHSIIa.

Independent associations of PMD and APD with breast cancer risk

PMD and APD measures were independently associated with breast cancer risk (Table 4). Without adjustment for APD, each SD increase in PMD was associated with a 52% increased breast cancer risk (OR: 1.52, 95% CI: 1.42, 1.63). With adjustment for APD, the association was attenuated but remained significant (OR: 1.43, 95% CI: 1.33, 1.55). Similarly, without adjustment for PMD, APD was associated with a 33% increased breast cancer risk per SD. After adjustment for PMD, there was a 14% increase in breast cancer risk per SD (OR: 1.14, 95% CI: 1.06, 1.22).

Table 4 ORs and 95% CIs for the associations between breast cancer risk and APD and PMD with mutual adjustment, in NHS/NHSIIa.

Discussion

In this large, nested case-control study, we investigated the associations of PMD, APD, and V, a summary measure of mammographic greyscale variation, with breast cancer risk among 1900 breast cancer cases and 3921 controls. This study adds to the literature by simultaneously evaluating two density breast measures (PMD and APD) and a texture measure V. We demonstrated that PMD, APD, and V were independently associated with breast cancer risk. However, PMD was more strongly associated with breast cancer risk than were APD or V. When PMD and APD were modeled together, the association of APD with breast cancer risk was more attenuated than PMD. Associations were generally stronger among premenopausal women and high-resolution images but did not vary by hormone receptor status.

We observed independent and joint effects of V, APD, and PMD on breast cancer risk demonstrating that both the relative amount of fibroglandular tissue and its greyscale variation contribute to breast cancer risk. While all three measures were associated with breast cancer risk, PMD had the strongest magnitude of association. This differs from the Mayo Clinic Studies, where V yielded higher risk estimates than PMD in two of three included cohorts, and in one they were equivalent6,14. In that study the authors conclude that V and PMD are at least equivalent and our findings are consistent with an interpretation that these factors are at least equivalent in their association with breast cancer risk. Further study is required to determine their precise interrelationship. As reviewed in Gastounioti et al. (2016) parenchymal texture classifiers have been assessed with respect to breast cancer risk in at least 20 studies8. A prospective cohort study by Wanders et al. (2018) found that percent dense volume and texture, as assessed using an algorithm the authors previously developed15, were each associated with breast cancer risk12. While risk prediction was not the goal of our study, Winkel et al. (2016) found that combining measures of PMD (BI-RADS), parenchymal patterns (Tabar’s classification), and an automated texture measure, improved breast cancer risk prediction. The area under the curve (AUC) for either measure alone ranged from 0.63 to 0.65, while inclusion of all three yielded an AUC of 0.6916. Importantly, while these features may ultimately improve breast cancer risk prediction, they can also yield insights into breast cancer etiology by identifying specific breast structures implicated in cancer development.

Despite the somewhat stronger associations observed for PMD, validation of APD has important implications for future research17,18. Manual measures are subject to reader differences19, measurement error20, and are time-intensive. An automated measure, such as APD, can provide a more reliable measure, suitable for use in risk assessment, mandatory breast density notification, and measuring changes in density. Inconsistent breast density measurement could bias study results toward the null21, lead to unreliable risk stratification if included in risk assessment models, or misinformed decision-making regarding screening intervals and modalities after legally mandated breast density notification22,23, and reduce reliability when assessing change in PMD24. We have demonstrated that including breast density in a breast cancer risk assessment model improves discriminatory accuracy25,26 and a recent paper shows that the Tyrer–Cuzick model with density can provide useful data at on risk for at least 10 years27. However, those estimates were based on a single baseline breast density assessment. Cuzick et al24. noted that the reliability for change in BI-RADS density was only moderate (r = 0.48–0.67) when evaluating mammograms 10 years apart. As changes in PMD over time are likely to be small28, a continuous measure, more sensitive than BI-RADS which has just four categories, is needed. In recent years, several automated measurement tools have been developed to assess volumetric or area-based breast density18,29,30,31,32. These new measurement approaches are essential for PMD use in breast cancer risk assessment and screening decision-making. Yet, our finding that PMD yielded stronger associations with breast cancer risk than did APD or V, two automated measures, suggests that there is information captured through operator-assisted thresholding that is not captured through automation. Continued effort is needed to compare the strengths and weaknesses of these approaches and develop standardized methods for clinical PMD assessment.

Study strengths include the use of prospective data from the Nurses Health Studies, two cohorts with validated disease ascertainment, and comprehensive data on breast cancer risk factors, tumor hormone receptor status, measures of PMD, APD, and texture features. This study has several important limitations. Our study focused on V, a summary texture measure, but there are many other features that have potential implications. For example, Malkov et al. (2016) examined 46 breast texture features and identified 15 that were significantly associated with breast cancer risk (p < 0.05), several of which were only weakly correlated with PMD9. In future studies, we will assess the relative importance of multiple features, independently and in combination with PMD. There is potential measurement error in the exposure assessment, particularly given that mammograms in this study were collected from across the United States across many years and images had multiple image resolution levels. To address this, we conducted sensitivity analyses and found the strongest associations among high-resolution images. This study utilized digitized film mammograms. As of November 1, 2019, 99.9% (21,156/21,182) of all accredited mammography units in the US are digital33. However, Nielsen et al. found that while differences in population characteristics and imaging technology did affect texture feature measurement, these factors did not impact the association between texture and breast cancer risk7. Further, Vachon et al. showed that associations with breast cancer were similar between full field digital mammogram image types (raw or processed) and digitized film34, demonstrating that these measures are stable phenotypes across image acquisition approaches and our findings are valid despite the use of digitized film mammograms. The relative consistency of our results compared to other studies with different imaging modalities confirms this assertion. Lastly, the Nurses’ Health Studies consist of predominantly white women. Future studies should assess associations among more diverse populations.

In conclusion, we demonstrate that APD is feasible and yields risk estimates that are similar to, but somewhat weaker than, a more labor-intensive and less reproducible manual measure. Our finding of independent and joint associations of PMD, APD, and V provides insights into breast cancer etiology. This study supports existing evidence that the amount dense tissue and its heterogeneity are both important factors in breast cancer risk.

Methods

Study population

Our study population includes 1900 breast cancer cases and 3921 matched controls from the Nurses’ Health Study (NHS) and Nurses’ Health Study II (NHSII). NHS began in 1976 with 121,700 female registered nurses aged 30–55 from 11 states. NHSII began in 1989 with 116,429 female registered nurses, aged 25–42, from 14 states. Participants in each cohort completed a baseline questionnaire at enrollment and are followed by biennially mailed questionnaires to collect information on newly diagnosed diseases, exposures, and covariates.

Mammogram collection and processing

Mammogram collection was conducted within the NHS and NHSII breast cancer case-control studies nested in the blood subcohorts35. One or two controls were matched to breast cancer cases on age, menopausal status at blood draw and diagnosis, current hormone therapy use, month, time of day, fasting status at time of blood collection, and luteal day (NHSII timed samples only). We collected pre-diagnostic screening mammograms conducted as close as possible to the blood draw date, but before June 1, 2004 (NHS) or June 1, 2007 (NHSII). We collected additional mammograms conducted around 1997 from NHSII cases and controls who participated in the NHSII cheek cell collection. In total, mammograms were collected from 2062 breast cancer cases and 4196 matched controls. We excluded 162 cases and 275 controls due to missing data on V or BMI. Film mammogram cranio-caudal (CC) views of each breast were digitized with a Lumysis 85 laser film scanner or a VIDAR CAD PRO Advantage scanner (VIDAR Systems Corporation, Herndon, VA, USA). The correlation between percent density measures from the two scanners was 0.8836. Cohort participants provided written informed consent. The study protocol was approved by the institutional review boards of the Brigham and Women’s Hospital and Harvard T.H. Chan School of Public Health, and those of participating registries as required.

Manual percent mammographic density (PMD)

PMD was assessed using Cumulus, a computer-assisted thresholding software program (University of Toronto, Toronto, Canada), from digitized film mammograms (craniocaudal view)37. PMD was calculated as dense breast area divided by the total breast area. All images were assessed by a single reader (within-person intra-class correlation coefficient > 0.90)38. Observed inter-batch variability was accounted for using methods described elsewhere39,40. PMD measures in the left and right breast were averaged.

Automated percent mammographic density (APD)

This APD method detects small dense regions on a mammogram after applying a wavelet high-pass filter and produces an output analogous to that of the operator-assisted PMD. The basic algorithm and its validation were described previously and details are provided in Supplementary Methods6,41,42. To avoid the chest wall, detection is performed in cranioclaudial views only. First, the breast area was detected creating a binary mask shown in Supplementary Fig. 1 using a method described in related work. Because the images in this study had high variability and contained many artifacts in the non-breast area, automated segmentation for each image was evaluated visually. When the breast area segmentation was deemed not appropriate for further processing, manual intervention was applied. The breast area detection performance is provided in the next section. After this step, modifications to the APD algorithm were required to account for the relatively low resolution of the mammograms used in this study (171, 232, or 300 µm) described in detail in the next section; the main modification is based on multiplying a given mammogram with a noise field producing an image illustrated in Supplementary Fig. 2 and then filtering this image in place of the raw image.

Automated percent mammographic density (APD) technique and modifications

The density detection is based on the signal dependent noise characteristic in mammograms, analyzed in the high-pass wavelet filtered image. Due to the low resolution, the noise characteristic was not strong enough to characterize the density in the filtered images, requiring algorithm modifications. These modifications are described here within the context of the algorithm flow described in the main report. In the unmodified algorithm, the digitized images were first transformed (a pixel mapping) described previously6 [and then processed with a high-pass wavelet filter. The density detection is performed in two stages in the wavelet filtered image, differing in thresholds based on predefined significance levels (i.e., operating parameters). To boost the noise signal, each transformed image was multiplied with a different realization of a zero mean unit variance Gaussian noise field and then scaled to 14 bit (integer) prior to applying the wavelet high-pass filter. Examples of this process are shown in Supplementary Fig. 2. The automated density detection was then performed by replacing a given transformed image with its corresponding noise field exemplified in Supplementary Fig. 2. These modified images were then filtered and the density detection was constrained to the breast area using the algorithm described previously6,41,42. In the first stage, a reference variance (i.e., the global adipose variance) is estimated using the entire breast area in the filtered image. A small search window (4 × 4 pixels) is scanned across the filtered image. At each window location, a chi-square statistical test is performed by comparing the local variance calculated within the window with the reference variance to decide if the region is dense or not as represented in the raw image (i.e., larger local variation is more likely to correspond to a region with high breast density in the raw image). The connection between the local variation in the filtered image and the digitized (raw) image density characteristics was demonstrated previously41,43. This detection method is applied a second time in the filtered image by updating (refining) the reference variance. The reference variance is updated by calculating the variance across the breast area in the filtered image using regions not labeled as dense in the first detection stage. Because of the noise field multiplication modification, we used 300 images (100 images from each resolution) as a test set to adjust the algorithm’s parameters. Each training image had the corresponding density determined with Cumulus. The operating points were determined by finding the parameters that gave the optimal correlation with the Cumulus measures. These correspond to significance levels used in the first and second detection stages discussed above respectively: 0.001 and 0.0001. The entire dataset was then processed.

Breast area segmentation performance

This algorithm produced acceptable breast area detection in most samples. However, there were both total and partial failures that were corrected manually. In some instances, artifacts external to the breast area remained that induced failures in subsequent segmentation processing steps; these failures occurred ~3% of the images, which were corrected manually. Approximately 6% on the images had a digitizing process anomaly (i.e., the film was not position correctly when fed into the digitizer) noted as bright wedges on the vertical borders of the images. After the breast area detection, these over contrasted areas were removed manually.

Automated summary measure of texture features (‘V’)

V is an automated measure that captures gray-scale variation on a mammogram. The algorithm was described previously and is briefly outlined here14,42. The breast is first segmented from the background. The breast area is then eroded by 25% along a radial direction to eliminate the region corresponding to where the breast was not in contact with the compression paddle (an approximation) during the image acquisition44. The erosion process is illustrated in Supplementary Fig. 1. This erosion step reduces unwanted variation in the V calculation. V is calculated as the standard deviation (SD) of the pixel values within the eroded breast area. This is a continuous measure that is not synthetically normalized. There are no operating parameters or thresholds required for this measure (i.e., training data is not required), although the background segmentation processing required preliminary analyses. Because the mammograms were at various resolutions, they were resolution-normalized prior to generating V. The V distributions were also normalized to account for intensity scale differences across the various forms of digitized mammograms. Extensive additional processing steps were required and are described in related work45. Example images with high and low levels of PMD and V are shown in Supplementary Fig. 2.

Statistical analyses

We used unconditional logistic regression to determine the association between PMD measures, V, and breast cancer risk, while adjusting for matching factors and the following potential confounders: age at mammogram (years), body mass index (kg/m2), menopausal status (premenopausal, postmenopausal, unknown), hormone therapy use (never, past, current, unknown), mammogram read batch (batch 1, batch 2, batch 3). APD, PMD, and V were categorized into quartiles based on their distribution in controls. We tested for linear trend using category medians as a continuous variable. We also evaluated each measure as a continuous variable and present effect estimates for a one SD increase to account for scale differences. To determine whether each measure was independently associated with breast cancer risk, we present models for PMD or APD adjusted for V, V adjusted for PMD or APD, and APD adjusted for PMD and vice versa. To assess potential interaction, we used likelihood ratio tests with nine degrees of freedom to compare a model with cross-classified quartiles of PMD (or APD) and V to a model with PMD (or APD) and V in quartiles. We conducted secondary analyses: (1) by estrogen (ER) and progesterone (PR) receptor status (ER+/PR+, ER+/PR−, or ER−/PR−) and (2) restricted to high-resolution images. All analyses use α = 0.05 for statistical significance. All statistical tests were two sided. All analyses were performed using SAS software (version 9.4 SAS Institute, Cary, NC, USA).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.