Abstract
Existing functional data analysis literature has mostly overlooked data with spikes in mean, such as weekly sporting goods sales by a salesperson which spikes around holidays. For such functional data, two-step estimation procedures are formulated for the population mean function and holiday effect parameters, which correspond to the population sales curve and the spikes in sales during holiday times. The estimators are based on spline smoothing for individual trajectories using non-holiday observations, and are shown to be oracally efficient in the sense that both the mean function and holiday effects are estimated as efficiently as if all individual trajectories were known a priori. Consequently, an asymptotic simultaneous confidence band is established for the mean function and confidence intervals for holiday effects, respectively. Two sample extensions are also formulated and simulation experiments provide strong evidence that corroborates the asymptotic theory. Application to sporting goods sales data has led to a number of new discoveries.
Similar content being viewed by others
References
Anzanello M, Fogliatto F (2011) Learning curve models and applications: literature review and research directions. Int J Ind Ergon 41:573–583
Benko M, Härdle W, Kneip A (2009) Common functional principal components. Ann Statist 37:1–34
Bosq D (2000) Linear processes in function spaces: theory and applications. Springer, New York
Cai L, Yang L (2015) A smooth simultaneous confidence band for conditional variance function. TEST 24:632–655
Cai L, Liu R, Wang S, Yang L (2019) Simultaneous confidence bands for mean and variance functions based on deterministic design. Stat Sin 29:505–525
Cao G, Wang L, Li Y, Yang L (2016) Oracle efficient confidence envelopes for covariance functions in dense functional data. Stat Sin 26:359–383
Cao G, Yang L, Todem D (2012) Simultaneous inference for the mean function based on dense functional data. J Nonparametr Statist 24:359–377
Cardot H (2000) Nonparametric estimation of smoothed principal components analysis of sampled noisy functions. J Nonparametr Stat 12:503–538
Cho H, Fryzlewicz P (2015) Multiple-change-point detection for high dimensional time series via sparsified binary segmentation. J R Stat Soc B 77:475–507
Claeskens G, Van Keilegom I (2003) Bootstrap confidence bands for regression curves and their derivatives. Ann Stat 31:1852–1884
de Boor C (1978) A practical guide to splines. Springer, New York
Degras D (2011) Simultaneous confidence bands for nonparametric regression with functional data. Stat Sin 21:1735–1765
Fan J, Huang T, Li R (2007) Analysis of longitudinal data with semiparametric estimation of covariance function. J Am Stat Assoc 102:632–642
Fan J, Lin S (1998) Tests of significance when data are curves. J Am Stat Assoc 93:1007–1021
Fan J, Zhang W (2000) Simultaneous confidence bands and hypothesis testing in varying coefficient models. Scand J Stat 27:715–731
Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice. Springer, New York
Fryzlewicz P, Subba Rao S (2014) Multiple-change-point detection for auto-regressive conditional heteroscedastic processes. J R Stat Soc B 76:903–924
Gu L, Wang L, Härdle W, Yang L (2014) A simultaneous confidence corridor for varying coefficient regression with sparse functional data. TEST 23:806–843
Gu L, Yang L (2015) Oracally efficient estimation for single-index link function with simultaneous confidence band. Electron J Stat 9:1540–1561
Hall P, Müller H, Wang J (2006) Properties of principal component methods for functional and longitudinal data analysis. Ann Stat 34:1493–1517
Huang J, Yang L (2004) Identification of nonlinear additive autoregressive models. J R Stat Soc B 66:463–477
Huang X, Wang L, Yang L, Kravchenko A (2008) Management practice effects on relationships of grain yields with topography and precipitation. Agron J 100:1463–1471
James G, Hastie T, Sugar C (2000) Principal component models for sparse functional data. Biometrika 87:587–602
James G, Sugar C (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98:397–408
Komlós J, Major P, Tusnády G (1976) An approximation of partial sums of independent RV’s, and the sample DF II. Z. Wahrscheinlichkeitstheorie Verw. Gebiete 34:33–58
Li B, Yu Q (2008) Classification of functional data: a segmentation approach. Comput Stat Data Anal 52:4790–4800
Ma S, Yang L, Carroll RJ (2012) A simultaneous confidence band for sparse longitudinal regression. Stat Sin 22:95–122
Ma S (2014) A plug-in the number of knots selector for polynomial spline regression. J Nonparametr Stat 26:489–507
Raña P, Aneiros G, Vilar JM (2015) Detection of outliers in functional time series. Environmetrics 26:178–191
Rice J, Wu C (2001) Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics 57:253–259
Schröder AL, Fryzlewicz P (2013) Adaptive trend estimation in financial time series via multiscale change-point-induced basis recovery. Stat Interface 6:449–461
Song Q, Yang L (2009) Spline confidence bands for variance function. J Nonparametric Stat 21:589–609
Wang J, Liu R, Cheng F, Yang L (2014) Oracally efficient estimation of autoregressive error distribution with simultaneous confidence band. Ann Stat 42:654–668
Wang J, Wang S, Yang L (2016) Simultaneous confidence bands for the distribution function of a finite population and of its superpopulation. TEST 25:692–709
Wang J, Yang L (2009) Polynomial spline confidence bands for regression curves. Stat Sin 19:325–342
Wu W, Zhao Z (2007) Inference of trends in time series. J R Stat Soc B 69:391–410
Yao F, Müller H, Wang J (2005) Functional data analysis for sparse longitudinal data. J Am Stat Assoc 100:577–590
Zhang J (2013) Analysis of variance for functional data. Chapman & Hall/CRC, Boca Raton
Zhao Z, Wu W (2008) Confidence bands in nonparametric time series regression. Ann Stat 36:1854–1878
Zheng S, Liu R, Yang L, Hädle W (2016) Statistical inference for generalized additive models: simultaneous confidence corridors and variable selection. TEST 25:607–626
Zheng S, Yang L, Härdle W (2014) A smooth simultaneous confidence corridor for the mean of sparse functional data. J Am Stat Assoc 109:661–673
Zhou S, Shen X, Wolfe D (1998) Local asymptotics of regression splines and confidence regions. Ann Stat 26:1760–1782
Acknowledgements
This research was supported in part by National Natural Science Foundation of China Awards 11371272 and 11771240, and the Tsinghua University Center for Data-Centric Management in the Department of Industrial Engineering. Part of the research was carried out when the first author was a visitor at the Department of Statistics, Texas A & M University. The first author thanks the China Scholarship Council (CSC) for providing financial support to visit Texas A & M University. The helpful comments from Editor-in-Chief Lola Ugarte, an Associate Editor and two Reviewers are gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Li Cai, Lisha Li: Co-first authors.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Cai, L., Li, L., Huang, S. et al. Oracally efficient estimation for dense functional data with holiday effects. TEST 29, 282–306 (2020). https://doi.org/10.1007/s11749-019-00655-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-019-00655-5