Abstract
In this article, we focus on the estimation of varying-coefficient mixed effects models for longitudinal and sparse functional response data, by using the generalized least squares method coupling a modified local kernel smoothing technique. This approach provides a useful framework that simultaneously takes into account the within-subject covariance and all observation information in the estimation to improve efficiency. We establish both uniform consistency and pointwise asymptotic normality for the proposed estimators of varying-coefficient functions. Numerical studies are carried out to illustrate the finite sample performance of the proposed procedure. An application to the white matter tract dataset obtained from Alzheimer’s Disease Neuroimaging Initiative study is also provided.
Similar content being viewed by others
1 Introduction
Longitudinal and functional data are frequently encountered in biomedicine, epidemiology, and other fields of natural and social sciences. See, Hand and Crowder (1996), Ramsay and Silverman (2005) and Ferraty (2011) for abundant interesting examples. In recent years, various functional regression models are proposed to conduct a better description and analysis for this complicated data, since it allows the responses or the regressors, or both, are curves. According to the difference in the concrete forms of the responses and regressors, functional regression models are usually subdivided into three categories, including scalar-on-function regression (see, e.g., Cai and Hall 2006; Chen, Hall and Müller 2011; Sang et al. 2018), function-on-function regression (Yao et al. 2005a; He et al. 2010; Wang et al. 2018) and function-on-scalar regression (Wu et al. 2013; Zhang and Wang 2015). For more details about the functional regression models and its applications, one can see Morris (2015), Wang et al. (2016) and Reiss et al. (2017) for an overview.
Motivated by the regression analysis of imaging (or functional) data, such as diffusion tensor imaging (DTI) and positron emission tomography (PET) (Zhu et al. 2007; Friston 2009), many nonparametric and semi-parametric regressions for functional responses have been proposed, see, for example, Luo et al. (2016) and Li et al. (2017). A functional varying-coefficient mixed effects model, which can be used for modeling the dynamic impacts of the covariates of interest on the response, is one of the most powerful tools in addressing these scientific problems for imaging data. Let \(Y_{i}(t)\) be the functional response variable for subject i, \(i=1,\ldots ,n\), and \({{\mathbf {x}}}_{i}=(x_{i1},\ldots ,x_{ip})^{T}\) be the associated p-dimensional covariate vector, then the functional varying-coefficient mixed effects model takes the form
where \({{\varvec{\alpha }}}(t)=(\alpha _{1}(t),\ldots ,\alpha _{p}(t))^{T}\) is the p-dimensional unknown but smooth coefficient function vector, \(\varepsilon _{i}(t)\) are measurement errors, and \(\eta _{i}(t)\) is the random effect function that characterizes individual curve variation (subject-effect) from \(\mathbf {x}_{i}^{T}{{\varvec{\alpha }}}(t)\). Without loss of generality, throughout this paper we assume that \(t\in \mathcal {T}=[0,1]\). Moreover, {\(\eta _{i}(t): t\in \mathcal {T}\)} is a stochastic process indexed by \(t\in \mathcal {T}\) that captures the within-curve dependence. We assume that \(\eta _{i}(t)\) are independent copies of \(\eta (t)\sim \mathrm {SP}(0, \Sigma )\), where \(\mathrm {SP}(\mu , \Sigma )\) denotes a stochastic process with mean function \(\mu (t)\) and covariance function \(\Sigma (s,t)\).
Model (1), as a classical function-on-scalar regression has been widely applied for functional data analysis and extensively studied in the existing literature. For instance, for model (1) without involving the random effect item, Wu and Chiang (2000) proposed two kernel estimators based on componentwise local least-squares criteria to estimate the varying-coefficient functions; Chiang et al. (2001) employed a componentwise smoothing spline method to estimate the time-varying coefficients and studied the corresponding asymptotic properties. Zhang and Chen (2007) adopted the idea of “smoothing first, then estimation” to construct the local polynomial kernel reconstruction-based estimator for mean, covariance and error variance functions for model (1) and derived their asymptotics. Zhu et al. (2012) extended model (1) to a multivariate case and systematically studied the theoretical properties of the resulting estimators obtained by local linear smoother. Zhu et al. (2014) further proposed a spatially varying coefficient model (SVCM) for neuroimaging study and investigated the asymptotic properties of the parameter estimators. Other more studies on varying-coefficient models and mixed effects models for longitudinal and functional data, readers may refer to Fan and Zhang (2008) and Chen and Wang (2011) for comprehensive discussions.
However, these aforementioned estimation approaches for model (1) are mainly based on kernel smoothing and smoothing spline, both of which ignore the within-subject correlation information in the estimation procedures. For longitudinal and functional data, it is well known that there exists a high correlation between the observations on the grid of each datum. As a result, these existing methods might lead to a loss of efficiency. Additionally, the kernel smoothing method utilizes only observations in the neighborhood of the target point for estimation. It is noticeable that these neighborhoods usually contain only a few observations in longitudinal studies, and the within-subject correlation may causes observations out of the neighborhoods carrying significant information. In view of this, the local kernel method might result in inefficient estimators. Therefore, some new techniques improving the efficiency of the conventional local kernel method are highly desirable in longitudinal and functional data analysis. Chen et al. (2010) proposed a global partial likelihood method to study the nonparametric proportional hazards model. The main idea of their global approach is to use all observations instead of only local observations near the target point for estimation to improve the efficiency. Zhou et al. (2018) employed that global smoothing idea combining with the full quasi-likelihood method to conduct the estimation of mean and covariance functions for sparse functional data. Motivated by the idea of global smoothing, we present an efficient estimation procedure for estimating the varying-coefficient functions in model (1) for longitudinal and sparse functional response data. This procedure combines the generalized least squares method and a modified local kernel smoothing criterion which uses the observations on the whole grid of each subject in estimation. An iterative but simple algorithm is proposed to compute the estimator.
The proposed method overcomes the potential drawbacks of existing local kernel methods and improves the efficiency of the resulting estimators. Specifically, there are several advantages for the new method: (i) the proposed estimation procedure involves the global smoothing idea that not only absorbs the advantages of the local linear smoother by using the observations near the target point for estimation, but also utilizes the information available at data outside the neighborhood, which is not used in the local kernel smoothing principle; (ii) the within-subject covariance structure is also incorporated into the estimation procedure to improve the estimators; and (iii) the iterative procedure is easy to implement so that it can be applied to solve various application problems. We derive both uniform consistency and asymptotic normality for the estimated varying coefficient functions. Simulation studies are conducted to compare the performances of our method and the local linear method in terms of root mean squared error, and the results reveal that the proposed method has a smaller root mean squared error than its competitor. This discovery further indicates that the results obtained by our method can more accurately characterize the dynamic relationship between a set of covariates and the functional response variable in application problems. Finally, a real data example is presented to illustrate the newly proposed approach.
The rest of this article is organized as follows. Section 2 introduces the estimation procedure. The theoretical properties including consistency and asymptotic normality of the estimators are established in Sect. 3. Section 4 presents the simulation studies which are performed to evaluate the performance of the method. Section 5 is devoted to applying this method to analyze the white matter tract dataset obtained from Alzheimer’s Disease Neuroimaging Initiative (ADNI). Section 6 concludes the paper and discusses future work. Technical proofs are given in the “Appendix”.
2 Methods
In this paper, we focus on sparse data situations. Suppose that we observe the random functions \(Y_{i}\) at discrete points \(t_{ij}\), that is,
where the observation locations \(t_{ij}\) are independent and identically distributed random variables with density function \(f(\cdot )\). Throughout this paper, we shall assume that \(\mathbf {x}_{i}\), \(t_{ij}\), \(\eta (\cdot )\) and \(\varepsilon (\cdot )\) are all independent and the noise terms, \(\varepsilon _{ij}\), are independent and identically distributed random errors with mean zero and variance \(\sigma ^{2}\). Suppose that \(2\le m_{i}\le M\), for some finite constant M, for all \(i=1,\ldots ,n\). Our proposed estimation procedure contains two key steps: \((\mathrm{I})\) calculate the initial estimate \(\tilde{{\varvec{\alpha }}}(t)\) of \({{\varvec{\alpha }}}(t)\) and construct the nonparametric covariance estimation for \(\Sigma (s,t)\); \((\mathrm{II})\) use the estimated covariance function and the initial estimate \(\tilde{{\varvec{\alpha }}}(t)\) to develop a refined estimate for \({{\varvec{\alpha }}}(t)\), denoted by \(\widehat{{\varvec{\alpha }}}(t)\).
2.1 Initial estimation
We adopt the local linear scatter-plot smoother (Fan and Gijbels 1996) to estimate the varying coefficient function \({{\varvec{\alpha }}}(t)\) and the covariance function \(\Sigma (s,t)\). For any T in a small neighborhood of t, applying the Taylor’s expansion, we have \(\alpha _{k}(T)\approx \alpha _{k}(t)+\dot{\alpha }_{k}(t)(T-t)\equiv a_{k}+b_{k}(T-t)\) for \(k=1,\ldots ,p\), where \(\dot{\alpha }_{k}(t)=d{\alpha }_{k}(t)/dt\). Denote \({\varvec{\mathcal {A}}}_{0}=(a_{1},\ldots ,a_{p})^{T}\) and \({\varvec{\mathcal {A}}}=(a_{1},\ldots ,a_{p},b_{1},\ldots ,b_{p})^{T}\). Let \(K(\cdot )\) be a kernel function and \(K_{h}(\cdot )=h^{-1}K(\cdot /h)\) be a scaled kernel with bandwidth h. Then the local linear estimate of the regression coefficients, \({{\varvec{\alpha }}}(t)\), is simply \(\tilde{{\varvec{\alpha }}}(t)=\widehat{{\varvec{\mathcal {A}}}}_{0}\) where
Let \(C_{ijl}=(Y_{ij}-\mathbf {x}_{i}^{T}\tilde{{\varvec{\alpha }}}(t_{ij})) (Y_{il}-\mathbf {x}_{i}^{T}\tilde{{\varvec{\alpha }}}(t_{il}))\) be the “raw” covariances, where \(\tilde{{\varvec{\alpha }}}(\cdot )\) is the estimated varying coefficient function obtained from (3). Note that \(E(C_{ijl}|\mathrm{\mathbf { x}}_{i},t_{ij},t_{il})\approx \Sigma (t_{ij},t_{il})+\sigma ^2I(t_{ij}=t_{il})\), where \(I(\cdot )\) is an indicator function. Therefore the local linear surface smoother for covariance function \(\Sigma (s,t)\) can be achieved by the raw covariances with their diagonal elements being removed, and this yields that \(\widehat{\Sigma }(s,t)=\widehat{\beta }_{0}\), where \(\widehat{{\varvec{\beta }}}=(\widehat{\beta }_{0},\widehat{\beta }_{1},\widehat{\beta }_{2})^{T}\) and
In order to estimate \(\sigma ^{2}\), we first estimate the variance \(R(t)=\Sigma (t,t)+\sigma ^{2}\) of Y(t). Let \(\widehat{R}(t)\) be the local linear smoother of R(t), then \(\widehat{R}(t)=\widehat{\beta }_{0}\), where \(\widehat{{\varvec{\beta }}}=(\widehat{\beta }_{0},\widehat{\beta }_{1})^{T}\) and
The estimate for the variance \(\sigma ^{2}\) is then,
where the notation \(|\mathcal {T}|\) denotes the length of the interval \(\mathcal {T}\). The more detailed process for the estimation of mean and covariance functions for longitudinal and sparse functional data, one can refer to Yao et al. (2005b) and Li and Hsing (2010).
2.2 Refined estimation for varying-coefficient functions
We know that the local kernel smoothing uses only local observations near the target point for estimation. However, the within-subject correlation, presented in the longitudinal and functional data, may result in observations outside the neighborhood of the target point carrying relevant information. Therefore, the local linear method used in Sect. 2.1 for estimating the varying-coefficient functions may reduce to loss of efficiency. In this subsection, we employ the generalized least squares method incorporating a modification of local linear smoothing to conduct a refined estimator for \({{\varvec{\alpha }}}(\cdot )\). This new method not only adopts the idea that uses all observations instead of the local observations near the target point for estimation but also incorporates the estimated covariance structure which can be used to account for the within-subject correlation into the estimation process to improve the resulting estimators.
Write \({{\varvec{Y}}}_{i}=(Y_{i1},\ldots ,Y_{i,m_{i}})^{T}\), \({{\varvec{t}}}_{i}=(t_{i1},\ldots ,t_{i,m_{i}})^{T}\), \({{\varvec{\alpha }}}({{\varvec{t}}}_{i})=(\alpha _{1}({{\varvec{t}}}_{i}),\ldots ,\alpha _{p}({{\varvec{t}}}_{i}))^{T}\) and \(\alpha _{j}({{\varvec{t}}}_{i})=(\alpha _{j}(t_{i1}),\ldots ,\alpha _{j}(t_{i,m_{i}}))^{T}\) for \(j=1,\ldots ,p\). Note that \(E({{\varvec{Y}}}_{i}|\mathbf {x}_{i},{{\varvec{t}}}_{i})={{\varvec{\alpha }}}^{T}({{\varvec{t}}}_{i})\mathbf {x}_{i}\) and \(\mathrm{cov}({{\varvec{Y}}}_{i}|\mathbf {x}_{i},{{\varvec{t}}}_{i})={{\varvec{\Sigma }}}_{i}+\sigma ^{2}I_{m_{i}}\), where \({\varvec{\Sigma }}_{i}=\{{\varvec{\Sigma }}(t_{ij},t_{il})\}^{m_{i}}_{j,l=1}\) is the covariance matrix of \(\eta _{i}({{\varvec{t}}}_{i})=(\eta _{i}(t_{i1}),\ldots ,\eta _{i}(t_{i,m_{i}}))^{T}\) and \(I_{m_{i}}\) is the identity matrix of order \(m_{i}\) for \(i=1,\ldots ,n\). For convenience, let \({\varvec{V}}_{i}={{\varvec{\Sigma }}}_{i}+\sigma ^{2}I_{m_{i}}\). Then the generalized least squares estimator of \({{\varvec{\alpha }}}(t)\) is obtained by minimizing the follow objective function
Firstly, we replace \({{\varvec{V}}}_{i}\) with its estimates. Since we do not assume any structure on the covariance function of Y(t) other than that it is a smooth function, the local linear estimates presented in Sect. 2.1 is recommended. Let \(\widehat{{\varvec{\Sigma }}}_{i}\) be the estimated covariance matrix of \({\varvec{\Sigma }}_{i}\) and \(\widehat{\sigma }^{2}\) be the estimated variance of measurement errors obtained from (4), (5) and (6). Then \({{\varvec{V}}}_{i}\) is substituted by \(\widehat{{\varvec{V}}}_{i}=\widehat{{{\varvec{\Sigma }}}}_{i}+\widehat{\sigma }^{2}I_{m_{i}}\).
Secondly, we adopt a similar strategy in Chen et al. (2010) and Zhou et al. (2018) to approximate \({{\varvec{\alpha }}}(t)\). Specifically, let \(B_{n}(t)\) be a neighborhood of t, it is easy to see that
Based on the idea of local linear smoother, we use linear function \({{\varvec{\zeta }}}+\dot{{{\varvec{\zeta }}}}(t_{ij}-t)\) to approximate \({{\varvec{\alpha }}}(t_{ij})\) in the first term of (8) and replace \(I(t_{ij}\in B_{n}(t))\) with a kernel function \(K\{(t_{ij}-t)/h\}\). Then for \(t_{ij}\) in a neighborhood of t, we have
where \({{\varvec{\zeta }}}=(\zeta _{1},\ldots ,\zeta _{p})^{T}\), \(\dot{{{\varvec{\zeta }}}}=(\dot{\zeta }_{1},\ldots ,\dot{\zeta }_{p})^{T}\) and \(K _{h}(\cdot )=h^{-1}K(\cdot /h)\) is a scaled kernel function with bandwidth h. This expansion is of great use in longitudinal and functional data analysis by the reason of it not only absorbs the advantage of local linear smoother but also utilizes the information available at data outside the neighborhood region \(B_{n}(t)\). Substituting this approximation and the estimated covariance matrix into (7), we can obtain the refined estimator for \({{\varvec{\vartheta }}}=({{\varvec{\zeta }}}^{T},\dot{{{\varvec{\zeta }}}}^{T})^{T}\) by solving the following equation
where \({{\varvec{K}}}_{i,h}=\mathrm{diag}(K_{h}(t_{i1}-t),\ldots ,K_{h}(t_{i,m_{i}}-t))\) and
Note that the refined estimator can not be directly obtained by solving the the Eq. (9) as there still exists an unknown function vector \({{\varvec{\alpha }}}({{\varvec{t}}}_{i})\). Therefore, we present an iterative procedure by substituting \({{\varvec{\alpha }}}({{\varvec{t}}}_{i})\) with its previous estimator to address this problem. Let \(t_{1}<t_{2}<\cdots <t_{M}\) be the distinct points of \(t_{ij}\) for \(i=1,\ldots ,n\) and \(j=1, \ldots ,m_{i}\). The iterative algorithm is described as follows.
-
Step 0 Taking the estimator fulfilled by (3) as the initial values of \({{\varvec{\alpha }}}(t)\), denoted by \({{\varvec{\alpha }}}_{[0]}(t)=\tilde{{\varvec{\alpha }}}(t)\) for \(t=t_{1},\ldots ,t_{M}\).
-
Step 1 For each \(t=t_{1},\ldots ,t_{M}\), solve the Eq. (9) in which \({{\varvec{\alpha }}}(t)\) is replaced by its previous estimator, that is,
$$\begin{aligned} \sum _{i=1}^n {{\varvec{\Theta }}}^{T}_{i}{{\varvec{K}}}_{i,h}\widehat{{\varvec{V}}}_{i}^{-1}\{{{\varvec{Y}}}_{i} -h{{\varvec{K}}}_{i,h}{{\varvec{\Theta }}}_{i}{{\varvec{\vartheta }}}-(I_{m_{i}}-h{{\varvec{K}}}_{i,h}) {{\varvec{\alpha }}}^{T}_{[s-1]}({{\varvec{t}}}_{i}) \mathbf {x}_{i}\}=0, \end{aligned}$$(10)where \({{\varvec{\alpha }}}^{T}_{[s-1]}({{\varvec{t}}}_{i})\) is the \((s-1)\)th iterative estimator of \({{\varvec{\alpha }}}^{T}({{\varvec{t}}}_{i})\). (10) yields the estimator \(\widehat{{\varvec{\vartheta }}}(t)\), given by
$$\begin{aligned} \widehat{{\varvec{\vartheta }}}(t) = \begin{pmatrix} \widehat{{\varvec{\zeta }}}(t) \\ {\widehat{\dot{{\varvec{\zeta }}}}}(t) \end{pmatrix} =&\left\{ \sum _{i=1}^n {{\varvec{\Theta }}}^{T}_{i}{{\varvec{K}}}_{i,h}\widehat{{\varvec{V}}}_{i}^{-1}h{{\varvec{K}}}_{i,h} {{\varvec{\Theta }}}_{i}\right\} ^{-1} \nonumber \\&\times \sum _{i=1}^n {{\varvec{\Theta }}}^{T}_{i}{{\varvec{K}}}_{i,h}\widehat{{\varvec{V}}}_{i}^{-1} \{{{\varvec{Y}}}_{i}-(I_{m_{i}}-h{{\varvec{K}}}_{i,h}){{\varvec{\alpha }}}^{T}_{[s-1]}({{\varvec{t}}}_{i}) \mathbf {x}_{i}\}. \end{aligned}$$(11)Hence, \({{\varvec{\alpha }}}_{[s]}(t)=\widehat{{\varvec{\zeta }}}(t)\).
-
Step 2 Repeat Step 1 until convergence.
The selection of the bandwidth h for the estimation procedure presented above is achieved by the leave-one-curve-out cross-validation (Rice and Silverman 1991). Specifically, let \(\widehat{{\varvec{\alpha }}}^{(-i)}(\cdot )\) be the estimated varying-coefficient function of \({{\varvec{\alpha }}}(\cdot )\) after removing the ith subject from the data. Then the optimal bandwidth h can be chosen by minimizing the cross-validation score based on the squared prediction error given by
Remark 1
The proposed iterative estimation procedure can be readily implemented as there is a closed-form at each step. Compared to the local linear method used in Zhu et al. (2012), the newly presented estimation approach may be more efficient since it takes the within-subject correlation structure and the observations on the whole grid of each subject into consideration, two of which are significantly important in longitudinal and functional data analysis. It is not hard to see that the estimation in (3) is, in essence, a local marginal estimation, assuming that the within-subject covariance matrices are identities. This leads to a working independence estimator for \({{\varvec{\alpha }}(t)}\) which ignores the correlation among the observations on the grid of each subject, and further causes a loss of efficiency.
Remark 2
Replacing \({{\varvec{\alpha }}(t)}\) with its estimate to model (1), we also can conduct the estimation of each random effect function \(\eta _{i}(t)\). Many existing non-parametric methods such as the local linear regression have been developed to directly estimate all individual functions \(\eta _{i}(t)\), \(i=1,\ldots ,n,\) (see, e.g., Zhu et al. 2012; Li et al. 2017). However, these non-parametric methods may be not suitable for our sparse data situations when the number of observation points for each subject is small. Functional principle component analysis (FPCA) as a key technique in characterizing the features of unknown functions in functional data analysis, is another feasible method for the estimation of \(\eta _{i}(t)\). We recommend using the version of FPCA developed by Yao et al. (2005b) which they refer to as principal components analysis through conditional expectation (PACE) for longitudinal data to estimate each random effect function \(\eta _{i}(t)\). The PACE approach is appealing since it is implemented mainly based on the estimated population covariance function of \(\eta _{i}(t)\), which utilizes the whole sample information and can be easily obtained by using the method presented in Sect. 2.1. Comprehensive surveys on FPCA for longitudinal or sparse functional data were developed in Yao et al. (2005b) and Hall et al. (2006).
3 Asymptotic properties
In this section, we study the asymptotic properties of our estimators. Some regularity conditions needed for the theoretical developments are stated in “Appendix A.1”. All these conditions are very mild and most of these are also adopted by Chen et al. (2010) and Zhou et al. (2018). The detailed proofs for the following theorems are given in “Appendix A.2–A.3”.
To start with, we give some notations which will be used in the theorems. Let \(\nu _{r}=\int u^{r}K(u)du\) and \(\upsilon _{r}=\int u^{r}K^{2}(u)du\) for notational simplicity. For a vector of functions \({{\varvec{\alpha }}}(t)\) of t, denote \(\dot{{{\varvec{\alpha }}}}(t)=d{{\varvec{\alpha }}}(t)/dt\) and \(\ddot{{{\varvec{\alpha }}}}(t)=d^{2}{{\varvec{\alpha }}}(t)/dt^{2}\), which are the componentwise derivatives. Let \(\mathcal {V}_{i,jk}\) be the (j, k)th element of \({{\varvec{V}}}_{i}^{-1}=({{\varvec{\Sigma }}}_{i}+\sigma ^{2} I_{m_{i}})^{-1}\). Denote \(\gamma _{1}(s)=\lim \limits _{n\rightarrow \infty }n^{-1}\sum _{i=1}^n\sum _{j=1}^{m_{i}}E[\mathcal {V}_{i,jk}|t_{ij}=s]\), \(\gamma _{2}(s_{1},s_{2})=\lim \limits _{n\rightarrow \infty }n^{-1}\sum _{i=1}^n\sum _{j\ne k}^{m_{i}}E[\mathcal {V}_{i,jk}|t_{ij}=s_{1},t_{ik}=s_{2}]\) and \(\Omega _{X}=\text {E}[\mathbf{x }\mathbf{x }^{T}]\). Let \(\mathcal {I}\) be the identity operator, and \(\mathcal {A}\) be the linear operator satisfying \(\mathcal {A}(H)(t)=\int _{z}\gamma _{2}(t,z)/\gamma _{1}(t)H(z)f(z)dz\) for any function H.
Theorem 1
Under the regularity Conditions (C1)–(C8) in “Appendix A.1”, when \(n\rightarrow \infty \), we have
in probability.
Theorem 2
Suppose that conditions of Theorem 1 hold. Then, for any \(t\in (0, 1)\), we have
where \({{\varvec{\Gamma }}}(t)=\upsilon _{0}[\Omega _{X}f(t)\gamma _{1}(t)]^{-1}\) and \({{\varvec{\varphi }}}\) is a random vector following the standard normal distribution.
Theorem 2 implies the following Corollary 3.
Corollary 3
Under the Conditions (C1)–(C8) in the “Appendix A.1”, we have
for any \(t\in (0, 1)\), where \({{\varvec{\Xi }}}(t)=\{(\mathcal {I}+\mathcal {A})^{-1}({{\varvec{\Gamma }}}^{1/2})(t)\} \{(\mathcal {I}+\mathcal {A})^{-1}({{\varvec{\Gamma }}}^{1/2})(t)\}^{T}\).
Remark 3
Theorem 1 shows the proposed estimator \(\widehat{\alpha }_{j}(\cdot )\), \(j=1,\ldots ,p\), is uniformly convergent. Theorem 2 shows the asymptotic bias of \((\mathcal {I}+\mathcal {A})(\widehat{{\varvec{\alpha }}}-{{\varvec{\alpha }}})(t)\) is of order \(O(h^2)\), and the asymptotic variance is of order \(O((nh)^{-1})\). As a consequence, the theoretical optimal bandwidth is of order \(n^{-1/5}\). These results are similar to that of Zhou et al. (2018). The asymptotic properties presented above can be viewed as a generalization of the results in Zhou et al. (2018) from mean function estimation to varying-coefficient function estimation. These results will be more widely used in solving various practical problems than that of Zhou et al. (2018). Especially, it can provide theoretical guidance for analyzing the dynamic relationship between a set of covariates and the functional response variable, rather than only exploring the feature of functional variables.
4 Simulation studies
In this section, we are going to perform some simulation studies to illustrate the finite sample properties of the proposed estimation method. In consideration of the estimation procedure we proposed is aimed at estimating the varying-coefficient functions, we will use the accuracy of the varying-coefficient function estimators to demonstrate how well the procedure works. The performances are evaluated via the bias, standard deviation (SD), and root mean squared error (RMSE), at given grid points, which also are used by Zhou et al. (2018). Specifically, the bias, SD and RMSE of an estimator \(\widehat{\alpha }(\cdot )\) are respectively defined as
and \(\mathrm{RMSE=(bias^2+SD^2)^{1/2}}\), where \(\widehat{\alpha }(\cdot )\) is an estimator of \(\alpha (\cdot )\), \(E\widehat{\alpha }(t_{i})\) is estimated by the sample mean based on the N simulated data sets, and \(t_{j}(j=1,\ldots ,n_{g})\) are the grid points at which the function \(\widehat{\alpha }(\cdot )\) is evaluated. In addition, comparisons are also made between our method and the local linear method employed in Zhu et al. (2012) (Local hereinafter). In the following simulated examples, the Epanechnikov kernel is used for nonparametric regression. We set grid points \(n_{g}=100\) and compute the results based on \(N=500\) replications.
Example 4
Generate data from the following model
where \(t_{ij}\sim U[0,1]\), \(\varepsilon _{ij}\sim N(0,\sigma ^2)\) and \(\mathbf {x}_{i}=(1,x_{i1},x_{i2})^{T}\) for \(i=1,\ldots ,n\), \(j=1,\ldots ,m_{i}\). Moreover, \((x_{i1},x_{i2})^{T}\sim N((0,0)^{T},\mathrm{diag}(0.9,0.9)+0.1(1,1)^{\otimes 2})\), and \(\eta _{i}(t_{ij})=\xi _{i1}\psi _{1}(t_{ij})+\xi _{i2}\psi _{2}(t_{ij})\), where \(\xi _{ij}\sim N(0,\lambda _{j}^2)\) for \(j=1,2\). Furthermore, \((x_{i1}, x_{i2}), t_{ij}, \varepsilon _{ij}, \xi _{i1}\) and \(\xi _{i2}\) are independent random variables. We set \((\sigma , \lambda _{1}, \lambda _{2})= (0.3, 1, 0.5)\), and the varying-coefficient functions, and eigenfunctions as follows:
We first conducted simulation studies under the following two regular time points settings.
-
Setting 1: \(m_{i}=5\) with various sample size: \(n=200\) and \(n=400\).
-
Setting 2: \(n=150\) with various time points: \(m_{i}=7\) and \(m_{i}=13\).
Setting 1, two different sample size \(n=200, 400\), and a fixed time points \(m_{i}=5\) are designed. Setting 2, sample size \(n=150\) is fixed with various time points settings \(m_{i}=7\) and \(m_{i}=13\), which guarantees the total number of observations is comparable to that in Setting 1, by decreasing the number of subjects and increasing the observation number from the same individual. Tables 1 and 2 report the numerical results for the proposed method and Local method under Setting 1 and Setting 2, respectively.
Example 5
To illustrate that the proposed procedure is applicable even to the case that the observation times for each trajectory are randomly missed, we replicated Example 4 with an irregular time points design used in Fan, Huang and Li (2007). Specifically, we will generate the observation times randomly, assuming that each individual has a set of “scheduled” observation points, \(\{0,1,2,\ldots ,11\}\), and each scheduled time, except time 0, has a \(20\%\) probability of being skipped, then a uniform [0, 1] random variable is added to a nonskipped scheduled time so that the actual observation time is in fact a random perturbation of a scheduled time. Table 3 presents the numerical results for this case with sample size \(n=200\) and 400.
Tables 1, 2 and 3 summarize the numerical performance of the proposed method and the Local method through the Bias, SD and RMSE. It is shown from Tables 1, 2 and 3 that the RMSE of the proposed method is consistently smaller than that of the Local method for all varying-coefficient function estimators \(\widehat{\alpha }_{1}(t)\), \(\widehat{\alpha }_{2}(t)\) and \(\widehat{\alpha }_{3}(t)\), which indicates that the proposed method performs better than the Local method. Moreover, the proposed method has less bias and variance than the Local method, mainly because of the incorporation of the correlation information and the modified local linear technique. Tables 1, 2 and 3 also show that the Bias, SD and RMSE values decrease as the sample size increases, which corroborates with our consistency and asymptotic normality established in Sect. 3. From Tables 1 and 2, we find that the resulting estimators obtained in Example 4 with Setting 1 have smaller RMSE than those obtained in Example 4 with Setting 2 when the total number of observations for these two settings is close. This finding implies that the sparsity is beneficial to the performance of the proposed method and the estimators of the varying-coefficient functions can be improved with increasing the number of subjects. For all simulation cases, we chose \(h_{0}=0.12\) as the bandwidth for the initial estimates of the varying-coefficient functions and the estimates of the covariance functions. We discover that the proposed method is not sensitive to the initial values. Furthermore, the optimal bandwidth for the proposed method is slightly larger than that for the Local method. This is likely because the correlation matrix is incorporated into the estimation process of the proposed method, which can be treated as the weight as well as the kernel matrix, and further lower data usage. Therefore, the larger bandwidth for the proposed method is needed to enlarge the included range of data.
We also apply the PACE method to Example 5 to conduct the estimation of \(\eta _{i}(t)\). A comparison is made between the PACE method and the local linear approach. The PACE method is implemented with the FPCA function available in the package fdapace (Dai et al. 2019). We choose the first four functional principal components to construct the estimators of \(\eta _{i}(t)\) \((i=1,\ldots ,n)\) for the PACE method. The bandwidth for the local linear method is selected by generalized cross-validation (GCV). We evaluate the performances through the root average square errors (RASE) given by
where \(t^{*}_{ij}(j=1,\ldots ,n^{*}_{g})\) are the grid points at which the function \(\widehat{\eta }_{i}(\cdot )\) is evaluated. Since the observation points for each subject are sparse, we set \(n^{*}_{g}=50\) to calculate the RASE. Table 4 presents the mean and standard deviation (SD) of RASE, both of which are computed based on 200 simulation replications. We can see from Table 4 that all the mean and SD of RASE for the PACE method are significantly smaller than that of the local linear method. These results imply that the PACE method is superior to the local linear method for the estimation of \(\eta _{i}(t)\). Moreover, we observe that the mean value of RASE for the PACE method decreases as the sample size increases, while the value for the local linear method has hardly changed. This is likely due to the fact that the PACE method for constructing the estimators \(\widehat{\eta }_{i}(t)\) \((i=1,\ldots ,n)\) is mainly based on the estimated population covariance function of \(\eta _{i}(t)\), which applies the whole sample information in estimation. Nevertheless, the local linear method for directly estimating each \(\eta _{i}(t)\) only utilizes the observations on the grid of its corresponding curve, therefore, the resulting estimators could not be improved by increasing the size of samples.
To have a more visual display about how well the proposed estimation procedure works, we choose Example 5 with sample size \(n=200\) as an example. Figure 1 exhibits the averaged estimated varying-coefficient functions based on 50 replications and the boxplot of RASE of the varying-coefficient functions over 500 simulations. It can be seen from Fig. 1a–c that the estimated varying-coefficient curves (broken lines), obtained by the proposed method, closely resemble the corresponding true curves (solid lines), reflecting the good performance of the proposed procedure. Figure 1d shows the proposed method performs better than the Local method in terms of RASE.
5 Real data analysis
We apply model (2) and the newly proposed procedure to analyze a real diffusion tensor imaging (DTI) dataset with \(n=213\) subjects that was collected from NIH Alzheimer’s Disease Neuroimaging Initiative (ADNI) study. One goal of the NIH ADNI is to test whether genetic, structural and functional neuroimaging, and clinical data can be integrated to evaluate the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD). For more details on this dataset, see the ADNI publicly available database (http://adni.loni.usc.edu/). In diffusion data analysis, fractional anisotropy (FA), which quantifies the directional strength of white matter tract structure, at a particular location, is one of the most used measures, and has been widely applied to statistical analyses in many imaging studies. During the stage of data processing, we calculate the FA curve for each subject using a Tract-Based Spatial Statistics (TBSS) pipeline analysis (Smith et al. 2006) and the FMRIB software library tool. The more details for this process, one can refer to Li et al. (2017).
In this study, 213 subjects with age of 48.4–73.2 years old are sampled to assess the impact of some indicators which are commonly occurred in diffusion data, such as gender (123 male and 90 female), handiness (193 righthand and 20 left-hand), education level (years, ranges from 9 to 20 (years), mean 15.91), Alzheimer’s disease (AD) status (\(19.6\%\)), mild cognitive impairment (MCI) status (\(55.1\%\)) and Mini-Mental State Exam (MMSE) of ADNI on the FA. We aim to evaluate the effects of these indicators on the FA using model (2). Specifically, the response \(Y_{i}(t)\) represents the FA curve and the covariate vector \(\mathbf{x}_{i}\) includes an intercept term, the gender variable coded by a dummy variable indicating for male, the age of the subject, a handiness indicator coded by a dummy variable indicating for left-hand, the education level, an indicator for AD status, an indicator for MCI status and MMSE. We randomly selected 15 observations from the 83 grid points of each FA curve, so that we can compare the results obtained from the sparse data and the complete data and further validate the new approach. We standardize all variables to be mean zero and variance one, and apply the proposed procedure established in Sect. 2 to estimate the corresponding coefficients.
Figure 2 shows the estimated varying-coefficient functions corresponding to intercept, gender, age, education, AD status, MCI status and MMSE indicator and the associated 95% pointwise confidence intervals for both sparse and complete data. The pointwise confidence intervals were obtained with the standard error based on 200 bootstrap resampling. The dashed lines are the estimates fulfilled by the proposed method for sparse data and the solid lines are estimated by the local linear method for complete data. Figure 2 indicates that the proposed estimation method works well even though data are sparse; it also reveals that age, handiness, education, AD and MCI have more effects on FA than that of gender and MMSE. The intercept function characterizes the nonlinear overall trend of FA. It can be seen from Fig. 2 that the coefficient curve for gender is positive and that for MMSE is near to the zero line, while the coefficients for the other five covariates of interest are negative, at most of the grid points. These findings may indicate that males tend to have higher FA values than females, MMSE has no remarkable effects on FA, AD and MCI patients tend to have lower FA values and being older, left-hand, or more educated may lead to smaller FA values. These discoveries are also consistent with the previous analysis in Li et al. (2017). Finally, we calculated the squared prediction error (i.e., \(SPE=\sum _{i=1}^n\sum _{j=1}^{m_{i}}\{Y_{i}(t_{ij})-\widehat{Y}_{i}^{*}(t_{ij})\}^2\), where \(\widehat{Y}_{i}^{*}\) is the predicted trajectory for the ith subject by using the data with the ith subject being pulled out.) both for the proposed method and the local linear method. The corresponding SPE values for these two approaches are respectively 18.34 and 18.42, which implies that the proposed method has a smaller prediction error than that of the local linear method for analyzing this data.
Figure 3 presents the estimated eigenvalues and eigenfunctions for the covariance function \(\Sigma (s,t)\) with sparse data. The top four nonzero eigenvalues are 0.0028, 0.00047, 0.0003 and 0.00016 respectively. The first four eigenvalues explain about \(94\%\) of the total variability, while, it can be seen from Fig. 3 that the remaining eigenvalues rapidly drop to zero. The first eigenfunction, with a dominant eigenvalue accounting for \(70\%\) of the total variation, varies simply in structure. While the other eigenfunctions also have quite simple structures and most of them are roughly sinusoidal.
6 Conclusions
In this paper, we used a functional varying-coefficient mixed effects model to investigate the dynamic association between a set of covariates of interest and the functional response. An iterative estimation procedure was developed to estimate the varying-coefficient functions in this model for longitudinal or sparse functional data. The proposed estimation procedure not only employs the idea that uses all observations instead of the local observations near the target point for estimation but also incorporates the estimated covariance structure into the estimation process to improve the resulting estimators. The asymptotic properties including uniform consistency and asymptotic normality for the proposed estimators were also established. Finally, we compared the proposed procedure with the local linear method numerically through several simulation studies. The simulation results reveal that our method performed better than the local linear method in terms of the root mean squared error. A real data example is also provided to illustrate our approach.
Our future research will be focused on addressing several important issues. First, it would be interesting to gain a more efficient estimator for the covariance function by using the newly proposed estimation procedure. Second, applying the proposed estimation procedure to investigate other more complex regression models, such as functional index models (Jiang and Wang 2011), single-index varying coefficient (SIVC) models (Luo et al. 2016) and functional additive models (Zhang, Park and Wang 2013), is also a meaningful issue. Third, it is also worthwhile to conduct statistical inferences such as hypothesis tests for longitudinal and functional data.
References
Cai TT, Hall P (2006) Prediction in functional linear regression. Ann Stat 34(5):2159–2179
Chen D, Hall P, Müller HG (2011) Single and multiple index functional regression models with nonparametric link. Ann Stat 39(3):1720–1747
Chen H, Wang Y (2011) A penalized spline approach to functional mixed effects model analysis. Biometrics 67(3):861–870
Chen K, Guo S, Sun L, Wang JL (2010) Global partial likelihood for nonparametric proportional hazards models. J Am Stat Assoc 105(490):750–760
Chiang CT, Rice JA, Wu CO (2001) Smoothing spline estimation for varying coefficient models with repeatedly measured dependent variables. J Am Stat Assoc 96(454):605–619
Dai X, Hadjipantelis PZ, Han K, Ji H (2019) fdapace: functional data analysis and empirical dynamics. R package version (4):1 https://CRAN.R-project.org/package=fdapace
Fan J, Gijbels I (1996) Local polynomial modelling and its applications. Chapman & Hall, London
Fan J, Huang T, Li R (2007) Analysis of longitudinal data with semiparametric estimation of covariance function. J Am Stat Assoc 102(478):632–641
Fan J, Zhang W (2008) Statistical methods with varying coefficient models. Stat Interface 1(1):179–195
Ferraty F (2011) Recent advances in functional data analysis and related topics. Springer, New York
Friston KJ (2009) Modalities, modes, and models in functional neuroimaging. Science 326(5951):399–403
Hall P, Müller HG, Wang JL (2006) Properties of principal component methods for functional and longitudinal data analysis. Ann Stat 34(3):1493–1517
Hand D, Crowder M (1996) Practical longitudinal data analysis. Chapman & Hall, London
He G, Müller HG, Wang JL, Yang W (2010) Functional linear regression via canonical analysis. Bernoulli 16(3):705–729
Jiang CR, Wang JL (2011) Functional single index models for longitudinal data. Ann Stat 39(1):362–388
Li J, Huang C, Zhu H (2017) A functional varying-coefficient single-index model for functional response data. J Am Stat Assoc 112(519):1169–1181
Li Y, Hsing T (2010) Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. Ann Stat 38(6):3321–3351
Luo X, Zhu L, Zhu H (2016) Singlervarying coefficient model for functional responses. Biometrics 72(4):1275–1284
Morris JS (2015) Functional regression. Ann Rev Stat Appl 2:321–359
Pollard D (1984) Convergence of stochastic processes. Springer, New York
Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, New York
Reiss PT, Goldsmith J, Shang HL, Ogden RT (2017) Methods for scalar-on-function regression. Int Stat Rev 85(2):228–249
Rice JA, Silverman BW (1991) Estimating the mean and covariance structure nonparametrically when the data are curves. J R Stat Soc B 53(1):233–243
Sang P, Lockhart RA, Cao J (2018) Sparse estimation for functional semiparametric additive models. J Multivar Anal 168:105–118
Smith SM, Jenkinson M, Johansen-Berg H, Rueckert D, Nichols TE, Mackay CE, Watkins KE, Ciccarelli O, Cader MZ, Matthews PM, Behrens TEJ (2006) Tract-based spatial statistics: voxelwise analysis of multi-subject diffusion data. NeuroImage 31(4):1487–1505
Wang H, Zhong PS, Cui Y, Li Y (2018) Unified empirical likelihood ratio tests for functional concurrent linear models and the phase transition from sparse to dense functional data. J R Stat Soc B 80(2):343–364
Wang JL, Chiou JM, Müller HG (2016) Functional data analysis. Ann Rev Stat Appl 3:257–295
Wu CO, Chiang CT (2000) Kernel smoothing on varying coefficient models with longitudinal dependent variable. Stat Sin 10(2):433–456
Wu S, Müller HG, Zhang Z (2013) Functional data analysis for point processes with rare events. Stat Sin 23(1):1–23
Yao F, Müller HG, Wang JL (2005a) Functional linear regression analysis for longitudinal data. Ann Stat 33(6):2873–2903
Yao F, Müller HG, Wang JL (2005b) Functional data analysis for sparse longitudinal data. J Am Stat Assoc 100(470):577–590
Zhang JT, Chen J (2007) Statistical inferences for functional data. Ann Stat 35(3):1052–1079
Zhang X, Park BU, Wang JL (2013) Time-varying additive models for longitudinal data. J Am Stat Assoc 108(503):983–998
Zhang X, Wang JL (2015) Varying-coefficient additive models for functional data. Biometrika 102(1):15–32
Zhou L, Lin H, Liang H (2018) Efficient estimation of the nonparametric mean and covariance functions for longitudinal and sparse functional data. J Am Stat Assoc 113(524):1550–1564
Zhu H, Fan J, Kong L (2014) Spatially varying coefficient model for neuroimaging data with jump discontinuities. J Am Stat Assoc 109(507):1084–1098
Zhu H, Li R, Kong L (2012) Multivariate varying coefficient model for functional responses. Ann Stat 40(5):2634
Zhu H, Zhang H, Ibrahim JG, Peterson BS (2007) Statistical analysis of diffusion tensors in diffusion-weighted magnetic resonance imaging data. J Am Stat Assoc 102(480):1085–1102
Acknowledgements
This work was supported by the National Natural Science Foundation of China (11971001), the Beijing Natural Science Foundation (1182002), the grants from the National Natural Science Foundation of China (11771145, 11801210). Data used in this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 A.1 Regularity Conditions and Notations
Our proofs use a strategy that is similar to that in Zhou et al. (2018). The following regularity conditions, which will be used to establish the asymptotic properties of the proposed estimators, are imposed to facilitate the proof and are similar to Zhou et al. (2018). They are not the weakest possible conditions, but mainly for mathematical simplicity and may be modified if necessary.
-
(C1)
The kernel \(K(\cdot )\) is bounded symmetric probability density function with the compact support \([-1,1]\) and satisfy \(\nu _{r}=\int u^{r}K(u)du<\infty \), \(\upsilon _{r}=\int u^{r}K^{2}(u)du<\infty \).
-
(C2)
\(\max _{1<i<n}m_{i}\le M\), where M is a constant, and the dimension of covariate vector is bounded.
-
(C3)
\(t_{ij}, i=1,\ldots ,n,j=1,\ldots , m_{i}\) are independent and identically distributed copies of a random variable T defined on [0,1]. Its density function \(f(\cdot )\) is positive and has continuous second derivatives.
-
(C4)
The covariate vectors \(\mathbf{x }_{i}\)’s are independent and identically distributed with \(E(\mathbf{x }_{i})=\mu _{\mathbf{x }}\) and \(||\mathbf{x }_{i}||_{\infty }<\infty \). Assume that \(\text {E}[\mathbf{x }_{i}\mathbf{x }_{i}^{T}]=\Omega _{X}\) is positive definite.
-
(C5)
\(\{\alpha _{l}(t),l=1,\ldots ,p\}\) have continuous third derivatives on their compact support [0, 1].
-
(C6)
There exist estimators \(\tilde{{\varvec{\alpha }}}(t)\), \(\widehat{\Sigma }(s,t)\) and \(\hat{\sigma }\) such that \(\sup \limits _{t\in [0,1]}|\tilde{\alpha }_{i}(t)-\alpha _{i}(t)|=o_{p}(1)\) for \(i=1,\ldots ,p\), \(\sup \limits _{s,t\in [0,1]}|\widehat{\Sigma }(s,t)-\Sigma (s,t)|=o_{p}(1)\) and \(|\hat{\sigma }^{2}-\sigma ^{2}|=o_{p}(1)\).
-
(C7)
\(nh^{5}\rightarrow c\) and \(nh/(\log n)^2\rightarrow \infty \) as \(n\rightarrow \infty \), where c is a constant.
-
(C8)
If \({{\varvec{h}}}(\cdot )\in \mathcal {F}_{p}\) satisfies \(E[\{\gamma _{1}(t)+\int _{0}^{1}\gamma _{2}(t,z)f(z)dz\}\Omega _{X}{{\varvec{h}}}(t)]=0\), then \({{\varvec{h}}}(t)\equiv 0\), where \(\mathcal {F}_{p}\), \(\gamma _{1}(t)\) and \(\gamma _{2}(t,z)\) are defined in the following.
Conditions (C1)-(C5) are mild assumptions for the kernel function, covariates and the function of interest, which are commonly used in many longitudinal literature. Condition (C6) guarantees the convergence of the initial estimators which can be easily achieved by the exiting kernel and spline estimation method. Condition (C7) is the regular condition for the bandwidth, which also has been used in Chen et al. (2010) and Zhou et al. (2018). Condition (C8) ensures identifiable.
For notational simplicity, denote that \(\mathcal {F}_{p}=\{{{\varvec{\zeta }}}(\cdot ): {{\varvec{\zeta }}}(\cdot )\) is continuous on \([0,1]^{p}\}\) and \(\mathcal {F}_{1}=\{\zeta (\cdot ): \zeta (\cdot )\) is continuous on \([0,1]\}\). Let \(\mathcal {V}_{i,jk}\) be the (j, k)th element of \({{\varvec{V}}}_{i}^{-1}=({{\varvec{\Sigma }}}_{i}+\sigma ^{2} I_{m_{i}})^{-1}\) and \(\widehat{\mathcal {V}}_{i,jk}\) be the corresponding estimates. Denote
and
Define \(\mathcal {B}_{n}^{p}=\{{{\varvec{\zeta }}}:\Vert {{\varvec{\zeta }}}\Vert _{\infty }\le C, \Vert {{\varvec{\zeta }}}(x)-{{\varvec{\zeta }}}(y)\Vert \le c[|x-y|+b_{n}],x,y\in [0,1], {{\varvec{\zeta }}}\in \mathcal {F}_{p}\}\) for some constants \(C>0\) and \(c>0\), where \(b_{n}=h+(nh)^{-1/2}(\log n)^{1/2}\). Before beginning to proof the main results, we firstly present a lemma, which will be needed to prove the theorems.
Lemma 1
Suppose that Conditions (C1)-(C7) hold, \(\zeta _{l}(\cdot )\in \mathcal {F}_{1}\) is any bounded function having continuous third derivatives on its compact support, where \(l=1,\ldots ,p\). Then, for \(\jmath =1,\ldots ,p\),
where \(a_{n\jmath }(t)=\dfrac{1}{n}\sum \limits _{i=1}^n\sum \limits _{j=1}^{m_{i}}\sum \limits _{k=1}^{m_{i}}\sum \limits _{l=1}^{p}K_{h}(t_{ij}-t)\mathcal {V}_{i,jk} \{\alpha _{l}(t_{ik})-\zeta _{l}(t_{ik})\}x_{il}x_{i\jmath }\), and
where \(c_{n\jmath }(t)=\dfrac{1}{n}\sum \limits _{i=1}^n\sum \limits _{j=1}^{m_{i}} \sum \limits _{k=1}^{m_{i}}\sum \limits _{l=1}^{p}hK_{h}(t_{ij}-t) \mathcal {V}_{i,jk}K_{h}(t_{ik}-t)\{\zeta _{l}(t)+\dot{\zeta }_{l}(t)(t_{ik}-t) -\zeta _{l}(t_{ik})\}x_{il}x_{i\jmath }\).
The proof of this lemma is similar to that of Theorem 1 in Zhou et al. (2018) and follows from Theorem 37 and Example 38 in Chapter 2 of Pollard (1984).
1.2 A.2 Proof of Theorem 1
Let \({{\varvec{\zeta }}}(t)=(\zeta _{1}(t),\ldots ,\zeta _{p}(t))^{T}\), \(\dot{{{\varvec{\zeta }}}}(t)=(\dot{\zeta }_{1}(t),\ldots ,\dot{\zeta }_{p}(t))^{T}\) and \({{\varvec{\vartheta }}}(t)=({{\varvec{\zeta }}}^{T}(t),\dot{{{\varvec{\zeta }}}}^{T}(t))^{T}\). Denote the score function
For the sake of convenience, we define for \(\jmath =1,\ldots ,p\),
and
where \(K_{ij}(t)=K_{h}(t_{ij}-t)\), \(\tau _{\jmath l}=E(x_{i\jmath }x_{il})\), \(\gamma _{1}(s)\) and \(\gamma _{2}(s_{1},s_{2})\) are defined in “Appendix A.1”. Let \(S_{\jmath }^{(1)}({{\varvec{\vartheta }}};t)\) be the first term in the right of (A.2) and \(S_{\jmath }^{(2)}({{\varvec{\vartheta }}};t)\) be the second term. Under the Conditions (C1)-(C4), it is easy to get that
Similarly, we can check that \(S_{\jmath }^{(2)}({{\varvec{\vartheta }}};t)=O_{p}\{(nh)^{-1/2}\}\) and \(S^{*}_{\jmath }({{\varvec{\vartheta }}};t)=O_{p}\{(nh)^{-1/2}\}\).
Let \({{\varvec{D}}}_{1}({{\varvec{\vartheta }}};t)=(S_{1}({{\varvec{\vartheta }}};t),\ldots ,S_{p} ({{\varvec{\vartheta }}};t))^{T}\), \({{\varvec{D}}}_{2}({{\varvec{\vartheta }}};t)=(S_{1}^{*}({{\varvec{\vartheta }}};t), \ldots ,S_{p}^{*}({{\varvec{\vartheta }}};t))^{T}\) and \({{\varvec{D}}}({{\varvec{\vartheta }}};t)=({{\varvec{D}}}_{1}({{\varvec{\vartheta }}};t)^{T}, {{\varvec{D}}}_{2}({{\varvec{\vartheta }}};t)^{T})^{T}\). Note that \({{\varvec{S}}}_{m}({{\varvec{\vartheta }}};t)={{\varvec{D}}}({{\varvec{\vartheta }}};t)+o_{p}(1)\), then by Conditions (C1)-(C5), we have
where \(\otimes \) is the Kronecker product. The proposed estimation procedure shows that \({{\varvec{S}}}_{m}(\widehat{{{\varvec{\alpha }}}},\widehat{\dot{{{\varvec{\alpha }}}}};t)=0\) uniformly over \(t\in [0,1]\), and it is easy to see \(s_{\jmath }({{\varvec{\alpha }}};t)\equiv 0\) for \(\jmath =1,\ldots ,p\). Then, it follows from Condition (C8) that \({{\varvec{\alpha }}}(\cdot )\) is the unique root of the equation \({{\varvec{s}}}_{m}({{\varvec{\zeta }}};t)\equiv 0\) over \({{\varvec{\zeta }}}\in \mathcal {F}_{p}\). Therefore, to show the uniform consistency of \(\widehat{{\varvec{\alpha }}}(t)\), it suffices to prove the following three conclusions.
-
(i)
For any continuous function vector \({{\varvec{\zeta }}}\) and bounded function vector \(\dot{{\varvec{\zeta }}}\), \(\sup \limits _{t\in [0,1]}\Vert {{\varvec{S}}}_{m}({{\varvec{\vartheta }}};t)-{{\varvec{s}}}_{m}({{\varvec{\zeta }}};t)\otimes (1,0)^{T}\Vert =o_{p}(1)\).
-
(ii)
\(\sup \limits _{t\in [0,1]}\Vert {{\varvec{S}}}_{m}({{\varvec{\vartheta }}};t)-{{\varvec{s}}}_{m}({{\varvec{\zeta }}};t)\otimes (1,0)^{T}\Vert =o_{p}(1)\) uniformly holds over \({{\varvec{\zeta }}}\in \mathcal {B}_{n}^{p}\) and bounded function vector \(\dot{{\varvec{\zeta }}}\).
-
(iii)
\(P(\widehat{{\varvec{\alpha }}}(t)\in \mathcal {B}_{n}^{p})\rightarrow 1\).
Once (i)-(iii) are established, applying the Arzela-Ascoli theorem and (iii), it is easy to show that, for any subsequence of \(\widehat{{\varvec{\alpha }}}(t)\), there exist a convergent subsequence \(\widehat{{\varvec{\alpha }}}_{n_k}(t)\) such that, uniformly over \(t\in [0,1]\), \(\widehat{{\varvec{\alpha }}}_{n_k}(t)\) converges to \({{\varvec{\alpha }}}^{*}(t)\) in probability with \({{\varvec{\alpha }}}^{*}(t)\in \mathcal {F}_{p}\). Observing that
Note that \({{\varvec{S}}}_{m}(\widehat{{{\varvec{\alpha }}}}_{n_{k}},\widehat{\dot{{\varvec{\alpha }}}}_{n_{k}};t)\equiv 0\), the continuity of \({{\varvec{s}}}_{m}({{\varvec{\zeta }}};t)\) and (ii) implies that \({{\varvec{s}}}_{m}({{\varvec{\alpha }}}^{*};t)\) converges to 0 uniformly over \(t\in [0,1]\). As a result, \({{\varvec{s}}}_{m}({{\varvec{\alpha }}}^{*};t)\equiv 0\). Since \({{\varvec{s}}}_{m}({{\varvec{\zeta }}};t)\equiv 0\) has a unique root \({{\varvec{\alpha }}}\) by Condition (C8), we then have \({{\varvec{\alpha }}}^{*}(t)={{\varvec{\alpha }}}(t)\) for all \(t\in [0,1]\), which ensures the uniform consistency of \(\widehat{{{\varvec{\alpha }}}}\).
Proof of (i). Note that
For convenience, we only give the proof of \(\sup \limits _{t\in [0,1]}\Vert S_{\jmath }({{\varvec{\vartheta }}};t)-s_{\jmath }({{\varvec{\zeta }}};t)\Vert \). The similar arguments lead to the conclusion about \(\sup \limits _{t\in [0,1]}\Vert {{\varvec{D}}}({{\varvec{\vartheta }}};t)-{{\varvec{s}}}_{m}({{\varvec{\zeta }}};t)\otimes (1,0)^{T}\Vert \). It follows from (A.2) and (A.3) that
where
and
By Conditions (C1)-(C7) and Lemma 1, it is easily seen that \(I\le O_{p}\{(nh)^{-1/2}(\log n)^{1/2}\}\) and \(II\le O_{p}\{h^2+(nh)^{-1/2}h(\log n)^{1/2}\}\), which implies \(\sup \limits _{t\in [0,1]}\Vert S_{\jmath }({{\varvec{\vartheta }}};t)-s_{\jmath }({{\varvec{\zeta }}};t)\Vert =o_{p}(1)\), and hence statement (i) follows.
Proof of (ii). By (A.1), we obtain that, for any continuous function vectors \({{\varvec{\phi }}}=(\phi _{1},\ldots ,\phi _{p})^{T}\) and any bounded function vector \(\dot{{\varvec{\zeta }}}=(\dot{\zeta }_{1},\ldots ,\dot{\zeta }_{p})^{T}\),
According to (A.2), (A.3) and Lemma 1, it can be shown that, uniformly over \(t\in [0,1]\),
Then, for any \({{\varvec{\zeta }}}\in \mathcal {B}_{n}^p\) and bounded function vector \(\dot{{\varvec{\zeta }}}\), we have
and
Using the similar arguments in Zhou et al. (2018), (ii) immediately holds by (A.4) and (A.5).
Proof of (iii). Note that
where \(\mathbf{A}\), \(\mathbf{B}\) and \(\mathbf{C}\) are \(p\times p\) symmetric matrices with the (r, s)th elements are, respectively, given by
and \(\mathbf{U}_{0}\), \(\mathbf{U}_{1}\) are two p-dimensional vectors with the rth elements respectively are
\(r,s=1,\ldots ,p\). Denote
then by Condition (C6), it can be shown that
Similarly, we can get that
Therefore, we have
and \({{\varvec{H}}}_{n}({\varvec{\zeta }};t)=(H_{1}({\varvec{\zeta }};t),\ldots ,H_{p}({\varvec{\zeta }};t),{{\varvec{0}}}_{p})^{T}+o_{p}(1)\), where \({{\varvec{0}}}_{p}\) is a p-dimensional row vector with elements 0, and for \(r=1,\ldots ,p\),
Combine (A.7), (A.8) and the first equation of (A.6), we have
where
and
for \(r=1,\ldots ,p\). Following some straightforward derivations, we can obtain that
where \(r=1,\ldots ,p\). Let \({{\varvec{\tau }}}_{r}=(\tau _{r1},\ldots ,\tau _{rp})^{T}\) and \(\kappa (\widehat{{\varvec{\alpha }}};t)=(\kappa (\alpha _{1}, \widehat{\alpha }_{1};t),\ldots ,\kappa (\alpha _{p}, \widehat{\alpha }_{p};t))^{ T}\). It can be seen that \({{\varvec{\tau }}}_{r}^{T}\{\widehat{{\varvec{\alpha }}}(t)-{{\varvec{\alpha }}}(t)\}={{\varvec{\tau }}}_{r}^{T}\kappa (\widehat{{\varvec{\alpha }}};t)/\gamma _{1}(t)+o_{p}(1)\). By Conditions (C3)-(C5), (iii) is held immediately. \(\square \)
1.3 A.3 Proof of Theorem 2
For convenience of notation, denote
Note that \(S_{\jmath }(\widehat{{\varvec{\alpha }}},\widehat{\dot{{\varvec{\alpha }}}};t)\) can be rewritten as
Then, it follows that
It is easy to check that \(E\left\{ \partial S_{\jmath }(\widehat{{\varvec{\alpha }}},\widehat{\dot{{\varvec{\alpha }}}};t)/\partial {\widehat{{\varvec{\alpha }}}}^{T}(t)\right\} =O(1)\), \(E\left\{ \partial S_{\jmath }(\widehat{{\varvec{\alpha }}},\widehat{\dot{{\varvec{\alpha }}}};t)/\partial {\widehat{{\varvec{\alpha }}}}^{T}(t_{ik})\right\} =O(1)\) and \(E\left\{ \partial S_{\jmath }(\widehat{{\varvec{\alpha }}},\widehat{\dot{{\varvec{\alpha }}}};t)/\partial \{ {h\widehat{\dot{{\varvec{\alpha }}}}}^{T}(t)\}\right\} =O(h)\). Using a Taylor expansion and the consistency of \((\widehat{{\varvec{\alpha }}},h\widehat{\dot{{\varvec{\alpha }}}})\), we have, for \(\jmath =1,\ldots ,p\),
where
Using the similar technique in the proof of Theorem 1, it can be shown that
and
where \({{\varvec{\tau }}}_{\jmath }=(\tau _{1\jmath },\ldots ,\tau _{p\jmath })^{T}\) and \(\kappa (\widehat{{\varvec{\alpha }}};t)=(\kappa (\alpha _{1}, \widehat{\alpha }_{1};t),\ldots ,\kappa (\alpha _{p}, \widehat{\alpha }_{p};t))^{ T}\). Therefore, we can obtain that, for \(\jmath =1,\ldots ,p\),
Similarly,
where \(\kappa _{1}(\widehat{{\varvec{\alpha }}};t)=(\kappa _{1}(\alpha _{1}, \widehat{\alpha }_{1};t),\ldots ,\kappa _{1}(\alpha _{p}, \widehat{\alpha }_{p};t))^{ T}\) and \(\kappa _{1}(\alpha _{l}, {\widehat{\alpha }}_{l};t)=\int _{0}^{1}\partial \gamma _{2}(t,z)/\partial t\{\alpha _{l}(z)-\widehat{\alpha }_{l}(z)\}f(z)dz\) for \(l=1,\ldots ,p\). Thus, we can conclude that
where \(\mathbf{1}_{p}\) is a p-dimensional column vector with elements 1. Furthermore, by Conditions (C5) and (C6), we have
where \({{\varvec{\xi }}}_{ij}=(\mathbf{x}_{i}^{T}, (t_{ij}-t)\mathbf{x}^{T}_{i})^{T}\). Note that \({{\varvec{S}}}_{m}(\widehat{{\varvec{\alpha }}},\widehat{\dot{{\varvec{\alpha }}}};t)=0\), then substitute (A.10) into (A.9), we get
Denote
Following some straightforward calculations, we can easily obtain that \(E\{\Delta _{1}(t)\}=0\), \(\mathrm{Var}\{\Delta _{1}(t)\}=\frac{\upsilon _{0}\Omega _{X}^{-1}}{f(t)\gamma _{1}(t)nh}(1+o(1))\) and \(E\{\Delta _{2}(t)\}=\frac{1}{2}\upsilon _{2}h^{2}\ddot{{\varvec{\alpha }}}(t)+o(h^2)\). Hence, it follows that
and
where \({{\varvec{\xi }}}_{n}(t)\) converges to the multivariate normal distribution with mean 0 and covariance matrix \({{\varvec{\Gamma }}}(t)=\upsilon _{0}[\Omega _{X}f(t)\gamma _{1}(t)]^{-1}\). Using Lemma 1 and taking the supremum norm on both sides of (A.12) and (A.13), we have \(e_{n,1}=O_{p}\{h^2+(nh)^{-1/2}(\log n)^{1/2}\}\) and \(e_{n,2}=O_{p}\{(nh)^{-1/2}(\log n)^{1/2}\}+o_{p}(h)\). Therefore, it is seen from (A.12) that
where \({{\varvec{\varphi }}}\) is a random vector following the standard normal distribution. Then the proof of Theorem 2 is completed. \(\square \)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Cai, X., Xue, L., Pu, X. et al. Efficient Estimation for Varying-Coefficient Mixed Effects Models with Functional Response Data. Metrika 84, 467–495 (2021). https://doi.org/10.1007/s00184-020-00776-0
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-020-00776-0