Abstract
Sources of particulate matter (PM) air pollution are generally inferred from PM chemical constituent concentrations using source apportionment models. Concentrations of PM constituents are often censored below minimum detection limits (MDL) and most source apportionment models cannot handle these censored data. Frequently, censored data are first substituted by a constant proportion of the MDL or are removed to create a truncated dataset before sources are estimated. When estimating the complete data distribution, these commonly applied methods to adjust censored data perform poorly compared with model-based imputation methods. Model-based imputation has not been used in source apportionment and may lead to better source estimation. However if the censored chemical constituents are not important for estimating sources, censoring adjustment methods may have little impact on source estimation. We focus on two source apportionment models applied in the literature and provide a comprehensive assessment of how censoring adjustment methods, including model-based imputation, impact source estimation. A review of censoring adjustment methods critically informs how censored data should be handled in these source apportionment models. In a simulation study, we demonstrated that model-based multiple imputation frequently leads to better source estimation compared with commonly used censoring adjustment methods. We estimated sources of PM in New York City and found estimated source distributions differed by censoring adjustment method. In this study, we provide guidance for adjusting censored PM constituent data in common source apportionment models, which is necessary for estimation of PM sources and their subsequent health effects.
Similar content being viewed by others
References
Aruga R (1997) Treatment of responses below the detection limit: some current techniques compared by factor analysis on environmental data. Anal Chim Acta 354(1–3):255–262
Bell ML, Dominici F, Ebisu K, Zeger SL, Samet JM (2007) Spatial and temporal variation in \(\text{ PM }_{2.5}\) chemical composition in the United States for health effects studies. Environ Health Perspect 115(7):989–995
Bell ML, Ebisu K, Peng RD, Samet JM, Dominici F (2009) Hospital admissions and chemical composition of fine particle air pollution. Am J Respir Crit Care Med 179(12):1115–1120
Chen H, Quandt SA, Grzywacz JG, Arcury TA (2013) A Bayesian multiple imputation method for handling longitudinal pesticide data with values below the limit of detection. Environmetrics 24(2):132–142
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B (Methodological) 39(1):1–38
Environmental Protection Agency (2009) Integrated Science Assessment for particulate matter (Final Report). U.S. Environmental Protection Agency, Washington, DC. http://cfpub.epa.gov/ncea/cfm/recordisplay.cfm?deid=216546. Accessed 19 May 2015
Environmental Protection Agency (2015) Electronic code of federal regulations, 40 eCFR
Farnham IM, Singh AK, Stetzenbach KJ, Johannesson KH (2002) Treatment of nondetects in multivariate analysis of groundwater geochemistry data. Chemom Intell Lab Syst 60(1–2):265–281
Francis RA, Small MJ, VanBriesen JM (2009) Multivariate distributions of disinfection by-products in chlorinated drinking water. Water Res 43(14):3453–3468
Franklin M, Koutrakis P, Schwartz J (2008) The role of particle composition on the association between \(\text{ PM }_{2.5}\) and mortality. Epidemiology 19(5):680–689
Fung KY, Krewski D (1999) On measurement error adjustment methods in Poisson regression. Environmetrics 10(2):213–224
Ganser GH, Hewett P (2010) An accurate substitution method for analyzing censored data. J Occup Environ Hyg 7(4):233–244
Hackstadt AJ, Peng RD (2014) A bayesian multivariate receptor model for estimating source contributions to particulate matter pollution using national databases. Environmetrics 25(7):513–527
Harris CW, Kaiser HF (1964) Oblique factor analytic solutions by orthogonal transformations. Psychometrika 29(4):347–362
Helsel D (2010) Much ado about next to nothing: incorporating nondetects in science. Ann Occup Hyg 54(3):257–262
Helsel DR (2005a) More than obvious: better methods for interpreting nondetect data. Environ Sci Technol 39(20):419A–423A
Helsel DR (2005b) Nondetects and data analysis. Statistics for censored environmental data. Wiley, New York
Helsel DR (2006) Fabricating data: how substituting values for nondetects can ruin results, and what can be done about it. Chemosphere 65(11):2434–2439
Henry RC (1997) History and fundamentals of multivariate air quality receptor models. Chemom Intell Lab Syst 37(1):37–42
Hopke PK, Liu C, Rubin DB (2001) Multiple imputation for multivariate data with missing and below-threshold measurements: time-series concentrations of pollutants in the arctic. Biometrics 57(1):22–33
Hopke PK, Ito K, Mar T, Christensen WF, Eatough DJ, Henry RC, Kim E, Laden F, Lall R, Larson TV et al (2006) PM source apportionment and health effects: 1. Intercomparison of source apportionment results. J Expo Sci Environ Epidemiol 16(3):275–286
Ito K, Xue N, Thurston G (2004) Spatial variation of \(\text{ PM }_{2.5}\) chemical species and source-apportioned mass concentrations in New York City. Atmos Environ 38(31):5269–5282
Johnson NL (1949) Systems of frequency curves generated by methods of translation. Biometrika 36(1–2):149–176
Kamakura WA, Wedel M (2001) Exploratory Tobit factor analysis for multivariate censored data. Multivar Behav Res 36(1):53–82
Kavouras IG, Koutrakis P, Cereceda-Balic F, Oyola P (2001) Source apportionment of \(\text{ PM }_{10}\) and \(\text{ PM }_{2.5}\) in five Chilean cities using factor analysis. J Air Waste Manag Assoc 51(3):451–464
Kim E, Hopke PK, Paatero P, Edgerton ES (2003) Incorporation of parametric factors into multilinear receptor model studies of Atlanta aerosol. Atmos Environ 37(36):5009–5021
Krall JR, Anderson GB, Dominici F, Bell ML, Peng RD (2013) Short-term exposure to particulate matter constituents and mortality in a national study of US urban communities. Environ Health Perspect 121(10):1148–1153
Laden F, Neas LM, Dockery DW, Schwartz J (2000) Association of fine particulate matter from different sources with daily mortality in six US cities. Environ Health Perspect 108(10):941–947
Larson T, Gould T, Simpson C, Liu L-JS, Claiborn C, Lewtas J (2004) Source apportionment of indoor, outdoor, and personal \(\text{ PM }_{2.5}\) in Seattle, Washington, using positive matrix factorization. J Air Waste Manag Assoc 54(9):1175–1187
Lee S, Liu W, Wang Y, Russell AG, Edgerton ES (2008) Source apportionment of \(\text{ PM }_{2.5}\): Comparing PMF and CMB results for four ambient monitoring sites in the southeastern United States. Atmos Environ 42(18):4126–4137
Lingwall JW, Christensen WF, Reese CS (2008) Dirichlet based Bayesian multivariate receptor modeling. Environmetrics 19(6):618–629
Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley, New York
Mage D (1996) A probability model for the age distribution of sids. J Sudd Infant Death Syndr Infant Mortal 1(1):13–31
Mage D, Wilson W, Hasselblad V, Grant L (1999) Assessment of human exposure to ambient particulate matter. J Air Waste Manag Assoc 49(11):1280–1291
Mage DT (1980) An explicit solution for sb parameters using four percentile points. Technometrics 22(2):247–251
Mar TF, Ito K, Koenig JQ, Larson TV, Eatough DJ, Henry RC, Kim E, Laden F, Lall R, Neas L et al (2006) PM source apportionment and health effects. 3. Investigation of inter-method variations in associations between estimated source contributions of PM\(_{2.5}\) and daily mortality in Phoenix, AZ. J Expo Sci Environ Epidemiol 16(4):311–320
Marmur A, Unal A, Mulholland JA, Russell AG (2005) Optimization-based source apportionment of \(\text{ PM }_{2.5}\) incorporating gas-to-particle ratios. Environ Sci Technol 39(9):3245–3254
Maykut NN, Lewtas J, Kim E, Larson TV (2003) Source apportionment of \(\text{ PM }_{2.5}\) at an urban IMPROVE site in Seattle, Washington. Environ Sci Technol 37(22):5135–5142
McDonald JD, Zielinska B, Sagebiel JC, McDaniel MR, Mousset-Jones P (2003) Source apportionment of airborne fine particulate matter in an underground mine. J Air Waste Manag Assoc 53(4):386–395
Muthén BO (1989) Tobit factor analysis. Br J Math Stat Psychol 42(2):241–250
Nikolov MC, Coull BA, Catalano PJ, Godleski JJ (2007) An informative Bayesian structural equation model to assess source-specific health effects of air pollution. Biostatistics 8(3):609–624
Nikolov MC, Coull BA, Catalano PJ, Godleski JJ (2011) Multiplicative factor analysis with a latent mixed model structure for air pollution exposure assessment. Environmetrics 22(2):165–178
Norris G, Vedantham R, Duvall R, Henry RC (2007) EPA Unmix 6.0 fundamentals & user guide. US Environmental Protection Agency, Washington
Norris G, Vedantham R, Wade K, Brown S, Prouty J, Foley C (2008) EPA positive matrix factorization 3.0 fundamentals & user guide. US Environmental Protection Agency, Washington
Ostro B, Feng W-Y, Broadwin R, Green S, Lipsett M (2007) The effects of components of fine particulate air pollution on mortality in California: results from CALFINE. Environ Health Perspect 115(1):13–19
Paatero P (1999) The multilinear engine: a table-driven, least squares program for solving multilinear problems, including the n-way parallel factor analysis model. J Comput Graph Stat 8(4):854–888
Paatero P, Hopke PK (2003) Discarding or downweighting high-noise variables in factor analytic models. Anal Chim Acta 490(1–2):277–289
Paatero P, Tapper U (1994) Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2):111–126
Peng RD, Bell ML, Geyh AS, McDermott A, Zeger SL, Samet JM, Dominici F (2009) Emergency admissions for cardiovascular and respiratory diseases and the chemical composition of fine particle air pollution. Environ Health Perspect 117(6):957–963
Polissar AV, Hopke PK, Poirot RL (2001) Atmospheric aerosol over Vermont: chemical composition and sources. Environ Sci Technol 35(23):4604–4621
Querol X, Alastuey A, Rodriguez S, Plana F, Ruiz CR, Cots N, Massagué G, Puig O (2001) \(\text{ PM }_{10}\) and \(\text{ PM }_{2.5}\) source apportionment in the Barcelona metropolitan area, Catalonia, Spain. Atmos Environ 35(36):6407–6419
R Core Team R (2012) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna
Rizzo MJ, Scheff PA (2007) Fine particulate source apportionment using data from the USEPA speciation trends network in Chicago, Illinois: comparison of two source apportionment models. Atmos Environ 41(29):6276–6288
Song X-H, Polissar AV, Hopke PK (2001) Sources of fine particle composition in the northeastern US. Atmos Environ 35(31):5277–5286
Song Y, Tang X, Xie S, Zhang Y, Wei Y, Zhang M, Zeng L, Lu S (2007) Source apportionment of \(\text{ PM }_{2.5}\) in Beijing in 2004. J Hazard Mater 146(1–2):124–130
Thurston GD, Spengler JD (1985) A quantitative assessment of source contributions to inhalable particulate matter pollution in metropolitan Boston. Atmos Environ 19(1):9–25
Zanobetti A, Schwartz J (2009) The effect of fine and coarse particulate air pollution on mortality: a national analysis. Environ Health Perspect 117(6):898–903
Zanobetti A, Franklin M, Koutrakis P, Schwartz J (2009) Fine particulate air pollution and its components in association with cause-specific emergency admissions. Environ Health 8(58):1–12
Zhou J, Ito K, Lall R, Lippmann M, Thurston G (2011) Time-series analysis of mortality effects of fine particulate matter components in Detroit and Seattle. Environ Health Perspect 119(4):461–466
Zidek JV, Wong H, Le ND, Burnett R (1996) Causality, measurement error and multicollinearity in epidemiology. Environmetrics 7(4):441–451
Acknowledgments
Funding for Dr. Krall was provided by National Institute on Aging (T32AG000247). This work was supported by awards R01ES019560 and R21ES020152 from the National Institute of Environmental Health Sciences. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Environmental Health Sciences or the National Institutes of Health.
Author information
Authors and Affiliations
Corresponding author
Additional information
Handling Editor: Bryan F. J. Manly.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Krall, J.R., Simpson, C.H. & Peng, R.D. A model-based approach for imputing censored data in source apportionment studies. Environ Ecol Stat 22, 779–800 (2015). https://doi.org/10.1007/s10651-015-0319-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10651-015-0319-6