Abstract
Statistical inference for highly multivariate point pattern data is challenging due to complex models with large numbers of parameters. In this paper, we develop numerically stable and efficient parameter estimation and model selection algorithms for a class of multivariate log Gaussian Cox processes. The methodology is applied to a highly multivariate point pattern data set from tropical rain forest ecology.
Similar content being viewed by others
References
Baddeley, A., Jammalamadaka, A., Nair, G.: Multitype point process analysis of spines on the dendrite network of a neuron. J. R. Stat. Soc. Ser. C (Appl. Stat.) 63(5), 673–694 (2014)
Chilès, J.-P., Delfiner, P.: Geostatistics: Modeling Spatial Uncertainty. Probability and Statistics. Wiley, New York (1999)
Choi, J., Oehlert, G., Zou, H.: A penalized maximum likelihood approach to sparse factor analysis. Stat. Interface 3(4), 429–436 (2010)
Choiruddin, A., Coeurjolly, J.-F., Letué, F.: Convex and non-convex regularization methods for spatial point processes intensity estimation. Electron. J. Stat. 12(1), 1210–1255 (2018)
Coeurjolly, J.-F., Møller, J., Waagepetersen, R.: A tutorial on Palm distributions for spatial point processes. Int. Stat. Rev. 85(3), 404–420 (2017)
Condit, R.: Tropical Forest Census Plots. Springer-Verlag and R. G. Landes Company, Berlin, Germany and Georgetown, Texas (1998)
Condit, R., Hubbell, S.P., Foster, R.B.: Changes in tree species abundance in a neotropical forest: impact of climate change. J. Trop. Ecol. 12(2), 231–256 (1996)
Diggle, P., Zheng, P., Durr, P.: Nonparametric estimation of spatial segregation in a multivariate point process: bovine tuberculosis in Cornwall, UK. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 54(3), 645–658 (2005)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Guan, Y.: A least-squares cross-validation bandwidth selection approach in pair correlation function estimations. Stat. Probab. Lett. 77(18), 1722–1729 (2007)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer Series in Statistics. Springer New York Inc., New York (2013)
Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity: The Lasso and Generalizations. Monographs on Statistics and Applied Probability. Chapman & Hall/CRC Press, Boca Raton (2015)
Hoerl, A.E., Kennard, R.W.: Ridge regression. Encyclopedia Stat. Sci. 8 (1988)
Hubbell, S.P., Foster, R.B.: Diversity of canopy trees in a neotropical forest and implications for conservation. In: Sutton, S.L., Whitmore, T.C., Chadwick, A.C. (eds.) Tropical Rain Forest: Ecology and Management, pp. 25–41. Blackwell Scientific Publications, Oxford (1983)
Jalilian, A., Waagepetersen, R.: Fast bandwidth selection for estimation of the pair correlation function. J. Stat. Comput. Simul. 88(10), 2001–2011 (2018)
Jalilian, A., Guan, Y., Mateu, J., Waagepetersen, R.: Multivariate product-shot-noise Cox models. Biometrics 71(4), 1022–1033 (2015)
Lan, G., Getzin, S., Wiegand, T., Hu, Y., Xie, G., Zhu, H., Cao, M.: Spatial distribution and interspecific associations of tree species in a tropical seasonal rain forest of China. PLoS ONE 7(9), e46074 (2012)
Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24(3), 1420–1443 (2014)
Møller, J., Waagepetersen, R.: Statistical Inference and Simulation for Spatial Point Processes. Chapman and Hall/CRC, Boca Raton (2003)
Møller, J., Waagepetersen, R.: Modern statistics for spatial point processes. Scand. J. Stat. 34(4), 643–684 (2007)
Møller, J., Syversveen, A.R., Waagepetersen, R.: Log Gaussian Cox processes. Scand. J. Stat. 25(3), 451–482 (1998)
Rajala, T., Murrell, D.J., Olhede, S.C.: Detecting multivariate interactions in spatial point patterns with Gibbs models and variable selection. J. R. Stat. Soc.: Ser. C (Appl. Stat.) 67(5), 1237–1273 (2018)
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)
Thurman, A.L., Fu, R., Guan, Y., Zhu, J.: Regularized estimating equations for model selection of clustered spatial point processes. Statistica Sinica 25(1), 173–188 (2015)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 58(1), 267–288 (1996)
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 67(1), 91–108 (2005)
Waagepetersen, R.: An estimating function approach to inference for inhomogeneous Neyman–Scott processes. Biometrics 63(1), 252–258 (2007)
Waagepetersen, R., Guan, Y., Jalilian, A., Mateu, J.: Analysis of multi-species point patterns using multivariate log Gaussian Cox processes. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 65(1), 77–96 (2016)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 67(2), 301–320 (2005)
Acknowledgements
We thank the editor and the two reviewers for their constructive and helpful comments. The research by A. Choiruddin, F. Cuevas-Pacheco, and R. Waagepetersen is supported by The Danish Council for Independent Research | Natural Sciences, Grant DFF 7014-00074 “Statistics for point processes in space and beyond,” and by the “Centre for Stochastic Geometry and Advanced Bioimaging,” funded by Grant 8721 from the Villum Foundation. The BCI forest dynamics research project was made possible by National Science Foundation Grants to Stephen P. Hubbell: DEB-0640386, DEB-0425651, DEB-0346488, DEB-0129874, DEB-00753102, DEB-9909347, DEB-9615226, DEB-9615226, DEB-9405933, DEB-9221033, DEB-9100058, DEB-8906869, DEB-8605042, DEB-8206992, DEB-7922197, support from the Center for Tropical Forest Science, the Smithsonian Tropical Research k+1 Institute, the John D. and Catherine T. MacArthur Foundation, the Mellon Foundation, the Celera Foundation, and numerous private individuals, and through the hard work of over 100 people from 10 countries over the past two decades. The plot project is part of the Center for Tropical Forest Science, a global network of large-scale demographic tree plots. The BCI soils data set were collected and analyzed by J. Dalling, R. John, K. Harms, R. Stallard and J. Yavitt with support from NSF DEB021104, 021115, 0212284, 0212818 and OISE 0314581, STRI and CTFS. Paolo Segre and Juan Di Trani provided assistance in the field. The covariates dem, grad, mrvbf, solar and twi were computed in SAGA GIS by Tomislav Hengl (http://spatial-analyst.net/).
We thank Dr. Joseph Wright for sharing data on dispersal modes and life forms for the BCI tree species.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices
Proximal Newton method
Suppose we want to find the solution of
where the function \(f(\cdot )\) can be separated into two parts: the function \(a(\cdot )\) which is a convex and twice continuously differentiable loss function and the function \(c(\cdot )\) which is a convex but not necessarily differentiable penalty function. The proximal Newton method is an iterative optimization algorithm that uses a quadratic approximation of the differentiable part \(a(\cdot )\):
where \(\varvec{\theta }^{(k)}\) is the current value of \(\varvec{\theta }\), \(\nabla a(\cdot )\) is the first derivative of \(a(\cdot )\) and \(H(\cdot )\) is an approximation to the Hessian matrix \(\nabla ^2 a(\cdot )\). Letting \({\tilde{\varvec{\theta }}} = \mathop {\hbox {arg min}}\limits _{\varvec{\theta }} {{\hat{f}}}(\varvec{\theta })\), the next value of \(\varvec{\theta }\) is obtained as
for some \(t>0\). That is, \({\tilde{\varvec{\theta }}}\) is used to construct a search direction for the \(k+1\)th value of \(\varvec{\theta }\). Theoretical results in Lee et al. (2014) show that t can be chosen so that \(f(\varvec{\theta }^{(k+1)})<f(\varvec{\theta }^{(k)})\). The matrix \(H(\cdot )\) can be chosen in various ways, see Lee et al. (2014) and Hastie et al. (2015) for more details.
In the following sections, we adapt the proximal Newton method to minimization of our objective function.
1.1 Quadratic approximation for updating \(\varvec{\alpha }_{i\cdot }\)
Let us first regard (8) as a function of \(\varvec{\alpha }_{i \cdot }\),
To minimize (8), we consider the proximal Newton method stated in (15). In particular, we approximate \(b(\varvec{\alpha }_{i \cdot })\) by a quadratic approximation around the current value \(\varvec{\alpha }_{i \cdot }^{(k)}\):
Here, the first derivative is
while \(H(\varvec{\alpha }_{i \cdot }^{(k)})\) is an approximation of the second derivative,
where \(D(\varvec{\alpha }_{i \cdot }^{(k)})=\text {Diag}(\alpha _{i1}^{(k)},\ldots ,\alpha _{iq}^{(k)})\), \(X_{ii,\cdot (1:q)}\) denotes the first q columns in \(X_{ii}\), and \(C(\varvec{\alpha }^{(k)}_{i \cdot })=4 \text {Diag}(X_{ii,\cdot (1:q)}^{\text{ T }}(Y_{ii}- X_{ii} \varvec{\beta }_{ii}(\varvec{\alpha }^{(k)},\varvec{\sigma }^2)))\). Specifically,
To ease the presentation and computation, we write \({{\hat{b}}}(\varvec{\alpha }_{i \cdot })\) from (17) in the form of a least squares problem
where
Replacing b in (16) with \({{\hat{b}}}\), we obtain the approximate objective function \({{\hat{Q}}}_{\lambda ,i}(\varvec{\alpha }_{i \cdot }|\varvec{\alpha }_{i \cdot }^{(k)})\) given in (9). Since (9) is a standard regularized least squares problem, we minimize (9) using a coordinate descent algorithm to obtain \({{\hat{\varvec{\alpha }}}}_{i \cdot }\) as detailed in Sect. B.2.
1.2 Theoretical result regarding proximal Newton update
Let \(\varDelta (\varvec{\alpha }_{i \cdot }^{(k)})={{\hat{\varvec{\alpha }}}}_{i \cdot }-\varvec{\alpha }_{i \cdot }^{(k)}\) where \({{\hat{\varvec{\alpha }}}}_{i \cdot }\) is the minimizer of (9) and according to a line search strategy let
for some \(t>0\). Following the proof of Proposition 2.3 in Lee et al. (2014), we can verify the following theorem.
Theorem 1
Let \( H(\varvec{\alpha }_{i \cdot }^{(k)})=8 D(\varvec{\alpha }_{i \cdot }^{(k)}) X_{ii}^{\text{ T }}X_{ii} D(\varvec{\alpha }_{i \cdot }^{(k)})\). Then
Thus, by Theorem 1, if \(H(\varvec{\alpha }_{i \cdot }^{(k)})\) is positive definite, we can choose \(t>0\) so that \(Q_{i,\lambda }(\varvec{\alpha }_{i \cdot }^{(k+1)},\sigma ^2_i)< Q_{i,\lambda }(\varvec{\alpha }_{i \cdot }^{(k)},\sigma ^2_i)\). That is, the update of \(\varvec{\alpha }_{i \cdot }\) results in a decrease in the objective function (8).
Algorithm
In our block descent algorithm, we minimize (7) with respect to \( \varvec{\sigma }^2, \varvec{\alpha }, \varvec{\phi }\), and \(\varvec{\psi }\) in turn. For \(i=1,\ldots ,p\), we first update \(\sigma ^2_i\) by minimizing (8) using least squares estimation followed by an update of \(\varvec{\alpha }_{i\cdot }\) by minimizing (9) using a coordinate descent method. We denote by \(X_{ij,\cdot k}\) the kth column of \(X_{ij}\) for \(k=1,\ldots ,q\) (\(i \ne j\)) or \(k=1,\ldots ,q+1\) (\(i=j\)). We detail, respectively in Appendices B.1 and B.2, the updates of \(\sigma ^2_i\) and the coordinate descent updates of \(\alpha _{il}\) for \(l=1,\ldots ,q\). A summary of the final algorithm is given by Appendix B.3.
1.1 Update of \(\sigma _i^2\)
The parameter \({{\hat{\sigma }}}_i^2\) is updated using least squares methods. More precisely, the gradient of (8) with respect to \(\sigma _i^2\) is
By solving \(\frac{\partial Q_{\lambda ,i}(\varvec{\alpha }_{i \cdot },\sigma _i^2)}{\partial \sigma _i^2} = 0\), we obtain the update
where \(\max \{a,0\}\) is used to avoid negative results of the update.
1.2 Update of \(\alpha _{il}\)
Let \( r_{ij}= Y^*_{ij} - \sum _{\begin{array}{c} k=1\\ k \ne l \end{array}}^q X^*_{ij,\cdot k} \alpha _{ik}\), where \(Y^*_{ij}\) and \(X^*_{ij}\) are specified in (10). Then we rewrite (9) as
The gradient with respect to \(\alpha _{il}\) is
Following the main argument by Friedman et al. (2010), the coordinate-wise update for \(\alpha _{il}\) is of the form
where \( S(A,\lambda \xi ) =\text {sign}(A)(|A|-\lambda \xi )_+\).
1.3 Algorithm to update \(\varvec{\alpha }, \varvec{\sigma }^2, \varvec{\phi }, \varvec{\psi }\)
For a given q and sequence of \(\lambda \) values \(0 \le \lambda _1,\ldots ,\lambda _M\), the overall procedure to estimate the parameters: \(\varvec{\alpha }, \varvec{\sigma }^2, \varvec{\phi }, \varvec{\psi }\) is described by Algorithm 1. Note that estimates obtained with \(\lambda _{s-1}\) are used as initial values for the estimation with \(\lambda _{s}\), \(s=2,\ldots ,M\).
Rights and permissions
About this article
Cite this article
Choiruddin, A., Cuevas-Pacheco, F., Coeurjolly, JF. et al. Regularized estimation for highly multivariate log Gaussian Cox processes. Stat Comput 30, 649–662 (2020). https://doi.org/10.1007/s11222-019-09911-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-019-09911-y