Skip to main content
Log in

A model-based approach for imputing censored data in source apportionment studies

  • Published:
Environmental and Ecological Statistics Aims and scope Submit manuscript

Abstract

Sources of particulate matter (PM) air pollution are generally inferred from PM chemical constituent concentrations using source apportionment models. Concentrations of PM constituents are often censored below minimum detection limits (MDL) and most source apportionment models cannot handle these censored data. Frequently, censored data are first substituted by a constant proportion of the MDL or are removed to create a truncated dataset before sources are estimated. When estimating the complete data distribution, these commonly applied methods to adjust censored data perform poorly compared with model-based imputation methods. Model-based imputation has not been used in source apportionment and may lead to better source estimation. However if the censored chemical constituents are not important for estimating sources, censoring adjustment methods may have little impact on source estimation. We focus on two source apportionment models applied in the literature and provide a comprehensive assessment of how censoring adjustment methods, including model-based imputation, impact source estimation. A review of censoring adjustment methods critically informs how censored data should be handled in these source apportionment models. In a simulation study, we demonstrated that model-based multiple imputation frequently leads to better source estimation compared with commonly used censoring adjustment methods. We estimated sources of PM in New York City and found estimated source distributions differed by censoring adjustment method. In this study, we provide guidance for adjusting censored PM constituent data in common source apportionment models, which is necessary for estimation of PM sources and their subsequent health effects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Aruga R (1997) Treatment of responses below the detection limit: some current techniques compared by factor analysis on environmental data. Anal Chim Acta 354(1–3):255–262

    Article  CAS  Google Scholar 

  • Bell ML, Dominici F, Ebisu K, Zeger SL, Samet JM (2007) Spatial and temporal variation in \(\text{ PM }_{2.5}\) chemical composition in the United States for health effects studies. Environ Health Perspect 115(7):989–995

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Bell ML, Ebisu K, Peng RD, Samet JM, Dominici F (2009) Hospital admissions and chemical composition of fine particle air pollution. Am J Respir Crit Care Med 179(12):1115–1120

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Chen H, Quandt SA, Grzywacz JG, Arcury TA (2013) A Bayesian multiple imputation method for handling longitudinal pesticide data with values below the limit of detection. Environmetrics 24(2):132–142

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B (Methodological) 39(1):1–38

  • Environmental Protection Agency (2009) Integrated Science Assessment for particulate matter (Final Report). U.S. Environmental Protection Agency, Washington, DC. http://cfpub.epa.gov/ncea/cfm/recordisplay.cfm?deid=216546. Accessed 19 May 2015

  • Environmental Protection Agency (2015) Electronic code of federal regulations, 40 eCFR

  • Farnham IM, Singh AK, Stetzenbach KJ, Johannesson KH (2002) Treatment of nondetects in multivariate analysis of groundwater geochemistry data. Chemom Intell Lab Syst 60(1–2):265–281

    Article  CAS  Google Scholar 

  • Francis RA, Small MJ, VanBriesen JM (2009) Multivariate distributions of disinfection by-products in chlorinated drinking water. Water Res 43(14):3453–3468

    Article  CAS  PubMed  Google Scholar 

  • Franklin M, Koutrakis P, Schwartz J (2008) The role of particle composition on the association between \(\text{ PM }_{2.5}\) and mortality. Epidemiology 19(5):680–689

    Article  PubMed Central  PubMed  Google Scholar 

  • Fung KY, Krewski D (1999) On measurement error adjustment methods in Poisson regression. Environmetrics 10(2):213–224

    Article  Google Scholar 

  • Ganser GH, Hewett P (2010) An accurate substitution method for analyzing censored data. J Occup Environ Hyg 7(4):233–244

    Article  PubMed  Google Scholar 

  • Hackstadt AJ, Peng RD (2014) A bayesian multivariate receptor model for estimating source contributions to particulate matter pollution using national databases. Environmetrics 25(7):513–527

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Harris CW, Kaiser HF (1964) Oblique factor analytic solutions by orthogonal transformations. Psychometrika 29(4):347–362

    Article  Google Scholar 

  • Helsel D (2010) Much ado about next to nothing: incorporating nondetects in science. Ann Occup Hyg 54(3):257–262

    Article  CAS  PubMed  Google Scholar 

  • Helsel DR (2005a) More than obvious: better methods for interpreting nondetect data. Environ Sci Technol 39(20):419A–423A

    Article  CAS  PubMed  Google Scholar 

  • Helsel DR (2005b) Nondetects and data analysis. Statistics for censored environmental data. Wiley, New York

    Google Scholar 

  • Helsel DR (2006) Fabricating data: how substituting values for nondetects can ruin results, and what can be done about it. Chemosphere 65(11):2434–2439

    Article  CAS  PubMed  Google Scholar 

  • Henry RC (1997) History and fundamentals of multivariate air quality receptor models. Chemom Intell Lab Syst 37(1):37–42

    Article  CAS  Google Scholar 

  • Hopke PK, Liu C, Rubin DB (2001) Multiple imputation for multivariate data with missing and below-threshold measurements: time-series concentrations of pollutants in the arctic. Biometrics 57(1):22–33

    Article  CAS  PubMed  Google Scholar 

  • Hopke PK, Ito K, Mar T, Christensen WF, Eatough DJ, Henry RC, Kim E, Laden F, Lall R, Larson TV et al (2006) PM source apportionment and health effects: 1. Intercomparison of source apportionment results. J Expo Sci Environ Epidemiol 16(3):275–286

    Article  CAS  PubMed  Google Scholar 

  • Ito K, Xue N, Thurston G (2004) Spatial variation of \(\text{ PM }_{2.5}\) chemical species and source-apportioned mass concentrations in New York City. Atmos Environ 38(31):5269–5282

    Article  CAS  Google Scholar 

  • Johnson NL (1949) Systems of frequency curves generated by methods of translation. Biometrika 36(1–2):149–176

  • Kamakura WA, Wedel M (2001) Exploratory Tobit factor analysis for multivariate censored data. Multivar Behav Res 36(1):53–82

    Article  Google Scholar 

  • Kavouras IG, Koutrakis P, Cereceda-Balic F, Oyola P (2001) Source apportionment of \(\text{ PM }_{10}\) and \(\text{ PM }_{2.5}\) in five Chilean cities using factor analysis. J Air Waste Manag Assoc 51(3):451–464

    Article  CAS  PubMed  Google Scholar 

  • Kim E, Hopke PK, Paatero P, Edgerton ES (2003) Incorporation of parametric factors into multilinear receptor model studies of Atlanta aerosol. Atmos Environ 37(36):5009–5021

    Article  CAS  Google Scholar 

  • Krall JR, Anderson GB, Dominici F, Bell ML, Peng RD (2013) Short-term exposure to particulate matter constituents and mortality in a national study of US urban communities. Environ Health Perspect 121(10):1148–1153

    PubMed Central  PubMed  Google Scholar 

  • Laden F, Neas LM, Dockery DW, Schwartz J (2000) Association of fine particulate matter from different sources with daily mortality in six US cities. Environ Health Perspect 108(10):941–947

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Larson T, Gould T, Simpson C, Liu L-JS, Claiborn C, Lewtas J (2004) Source apportionment of indoor, outdoor, and personal \(\text{ PM }_{2.5}\) in Seattle, Washington, using positive matrix factorization. J Air Waste Manag Assoc 54(9):1175–1187

    Article  PubMed  Google Scholar 

  • Lee S, Liu W, Wang Y, Russell AG, Edgerton ES (2008) Source apportionment of \(\text{ PM }_{2.5}\): Comparing PMF and CMB results for four ambient monitoring sites in the southeastern United States. Atmos Environ 42(18):4126–4137

    Article  CAS  Google Scholar 

  • Lingwall JW, Christensen WF, Reese CS (2008) Dirichlet based Bayesian multivariate receptor modeling. Environmetrics 19(6):618–629

    Article  Google Scholar 

  • Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley, New York

    Book  Google Scholar 

  • Mage D (1996) A probability model for the age distribution of sids. J Sudd Infant Death Syndr Infant Mortal 1(1):13–31

    Google Scholar 

  • Mage D, Wilson W, Hasselblad V, Grant L (1999) Assessment of human exposure to ambient particulate matter. J Air Waste Manag Assoc 49(11):1280–1291

    Article  CAS  PubMed  Google Scholar 

  • Mage DT (1980) An explicit solution for sb parameters using four percentile points. Technometrics 22(2):247–251

    Google Scholar 

  • Mar TF, Ito K, Koenig JQ, Larson TV, Eatough DJ, Henry RC, Kim E, Laden F, Lall R, Neas L et al (2006) PM source apportionment and health effects. 3. Investigation of inter-method variations in associations between estimated source contributions of PM\(_{2.5}\) and daily mortality in Phoenix, AZ. J Expo Sci Environ Epidemiol 16(4):311–320

    Article  CAS  PubMed  Google Scholar 

  • Marmur A, Unal A, Mulholland JA, Russell AG (2005) Optimization-based source apportionment of \(\text{ PM }_{2.5}\) incorporating gas-to-particle ratios. Environ Sci Technol 39(9):3245–3254

    Article  CAS  PubMed  Google Scholar 

  • Maykut NN, Lewtas J, Kim E, Larson TV (2003) Source apportionment of \(\text{ PM }_{2.5}\) at an urban IMPROVE site in Seattle, Washington. Environ Sci Technol 37(22):5135–5142

    Article  CAS  PubMed  Google Scholar 

  • McDonald JD, Zielinska B, Sagebiel JC, McDaniel MR, Mousset-Jones P (2003) Source apportionment of airborne fine particulate matter in an underground mine. J Air Waste Manag Assoc 53(4):386–395

    Article  CAS  PubMed  Google Scholar 

  • Muthén BO (1989) Tobit factor analysis. Br J Math Stat Psychol 42(2):241–250

    Article  Google Scholar 

  • Nikolov MC, Coull BA, Catalano PJ, Godleski JJ (2007) An informative Bayesian structural equation model to assess source-specific health effects of air pollution. Biostatistics 8(3):609–624

    Article  PubMed  Google Scholar 

  • Nikolov MC, Coull BA, Catalano PJ, Godleski JJ (2011) Multiplicative factor analysis with a latent mixed model structure for air pollution exposure assessment. Environmetrics 22(2):165–178

    Article  CAS  Google Scholar 

  • Norris G, Vedantham R, Duvall R, Henry RC (2007) EPA Unmix 6.0 fundamentals & user guide. US Environmental Protection Agency, Washington

    Google Scholar 

  • Norris G, Vedantham R, Wade K, Brown S, Prouty J, Foley C (2008) EPA positive matrix factorization 3.0 fundamentals & user guide. US Environmental Protection Agency, Washington

    Google Scholar 

  • Ostro B, Feng W-Y, Broadwin R, Green S, Lipsett M (2007) The effects of components of fine particulate air pollution on mortality in California: results from CALFINE. Environ Health Perspect 115(1):13–19

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Paatero P (1999) The multilinear engine: a table-driven, least squares program for solving multilinear problems, including the n-way parallel factor analysis model. J Comput Graph Stat 8(4):854–888

    Google Scholar 

  • Paatero P, Hopke PK (2003) Discarding or downweighting high-noise variables in factor analytic models. Anal Chim Acta 490(1–2):277–289

    Article  CAS  Google Scholar 

  • Paatero P, Tapper U (1994) Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2):111–126

    Article  Google Scholar 

  • Peng RD, Bell ML, Geyh AS, McDermott A, Zeger SL, Samet JM, Dominici F (2009) Emergency admissions for cardiovascular and respiratory diseases and the chemical composition of fine particle air pollution. Environ Health Perspect 117(6):957–963

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Polissar AV, Hopke PK, Poirot RL (2001) Atmospheric aerosol over Vermont: chemical composition and sources. Environ Sci Technol 35(23):4604–4621

    Article  CAS  PubMed  Google Scholar 

  • Querol X, Alastuey A, Rodriguez S, Plana F, Ruiz CR, Cots N, Massagué G, Puig O (2001) \(\text{ PM }_{10}\) and \(\text{ PM }_{2.5}\) source apportionment in the Barcelona metropolitan area, Catalonia, Spain. Atmos Environ 35(36):6407–6419

    Article  CAS  Google Scholar 

  • R Core Team R (2012) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna

    Google Scholar 

  • Rizzo MJ, Scheff PA (2007) Fine particulate source apportionment using data from the USEPA speciation trends network in Chicago, Illinois: comparison of two source apportionment models. Atmos Environ 41(29):6276–6288

    Article  CAS  Google Scholar 

  • Song X-H, Polissar AV, Hopke PK (2001) Sources of fine particle composition in the northeastern US. Atmos Environ 35(31):5277–5286

    Article  CAS  Google Scholar 

  • Song Y, Tang X, Xie S, Zhang Y, Wei Y, Zhang M, Zeng L, Lu S (2007) Source apportionment of \(\text{ PM }_{2.5}\) in Beijing in 2004. J Hazard Mater 146(1–2):124–130

    Article  CAS  PubMed  Google Scholar 

  • Thurston GD, Spengler JD (1985) A quantitative assessment of source contributions to inhalable particulate matter pollution in metropolitan Boston. Atmos Environ 19(1):9–25

    Article  CAS  Google Scholar 

  • Zanobetti A, Schwartz J (2009) The effect of fine and coarse particulate air pollution on mortality: a national analysis. Environ Health Perspect 117(6):898–903

    Article  PubMed Central  PubMed  Google Scholar 

  • Zanobetti A, Franklin M, Koutrakis P, Schwartz J (2009) Fine particulate air pollution and its components in association with cause-specific emergency admissions. Environ Health 8(58):1–12

    Google Scholar 

  • Zhou J, Ito K, Lall R, Lippmann M, Thurston G (2011) Time-series analysis of mortality effects of fine particulate matter components in Detroit and Seattle. Environ Health Perspect 119(4):461–466

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Zidek JV, Wong H, Le ND, Burnett R (1996) Causality, measurement error and multicollinearity in epidemiology. Environmetrics 7(4):441–451

    Article  Google Scholar 

Download references

Acknowledgments

Funding for Dr. Krall was provided by National Institute on Aging (T32AG000247). This work was supported by awards R01ES019560 and R21ES020152 from the National Institute of Environmental Health Sciences. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Environmental Health Sciences or the National Institutes of Health.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roger D. Peng.

Additional information

Handling Editor: Bryan F. J. Manly.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 191 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Krall, J.R., Simpson, C.H. & Peng, R.D. A model-based approach for imputing censored data in source apportionment studies. Environ Ecol Stat 22, 779–800 (2015). https://doi.org/10.1007/s10651-015-0319-6

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10651-015-0319-6

Keywords

Navigation