Skip to main content
Log in

Fast estimation of multivariate spatiotemporal Hawkes processes and network reconstruction

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

We present a fast, accurate estimation method for multivariate Hawkes self-exciting point processes widely used in seismology, criminology, finance and other areas. There are two major ingredients. The first is an analytic derivation of exact maximum likelihood estimates of the nonparametric triggering density. We develop this for the multivariate case and add regularization to improve stability and robustness. The second is a moment-based method for the background rate and triggering matrix estimation, which is extended here for the spatiotemporal case. Our method combines them together in an efficient way, and we prove the consistency of this new approach. Extensive numerical experiments, with synthetic data and real-world social network data, show that our method improves the accuracy, scalability and computational efficiency of prevailing estimation approaches. Moreover, it greatly boosts the performance of Hawkes process-based models on social network reconstruction and helps to understand the spatiotemporal triggering dynamics over social media.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. We obtain latitude and longitude coordinates from https://www.flickr.com/places/info.

References

  • Achab, M., Bacry, E., Gaïffas, S., Mastromatteo, I., Muzy, J.-F. (2017). Uncovering causality from multivariate Hawkes integrated cumulants. The Journal of Machine Learning Research, 18(1), 6998–7025.

    MathSciNet  MATH  Google Scholar 

  • Bacry, E., Bompaire, M., Gaïffas, S., Poulsen, S. (2017). Tick: A python library for statistical learning, with a particular emphasis on time-dependent modelling. arXiv preprint arXiv:1707.03003.

  • Bacry, E., Mastromatteo, I., Muzy, J.-F. (2015). Hawkes processes in finance. Market Microstructure and Liquidity, 1(01), 1550005.

    Article  Google Scholar 

  • Bacry, E., Muzy, J.-F. (2016). First-and second-order statistics characterization of Hawkes processes and non-parametric estimation. IEEE Transactions on Information Theory, 62(4), 2184–2202.

    Article  MathSciNet  Google Scholar 

  • Balderama, E., Schoenberg, F. P., Murray, E., Rundel, P. W. (2012). Application of branching models in the study of invasive species. Journal of the American Statistical Association, 107(498), 467–476.

    Article  MathSciNet  Google Scholar 

  • Bao, J., Zheng, Y., Mokbel, M. F. (2012). Location-based and preference-aware recommendation using sparse geo-social networking data. In Proceedings of the 20th international conference on advances in geographic information systems (pp. 199–208).

  • Brantingham, P. J., Yuan, B., Herz, D. (2020a). Is gang violent crime more contagious than non-gang violent crime? Journal of Quantitative Criminology, https://doi.org/10.1007/s10940-020-09479-1.

    Article  Google Scholar 

  • Brantingham, P. J., Yuan, B., Sundback, N., Schoenberg, F. P., Bertozzi, A. L., Gordon, J., et al. (2020b). Does violence interruption work? UCLA preprint, www.stat.ucla.edu/~frederic/papers/brantingham2.pdf.

  • Brillinger, D. R., Guttorp, P. M., Schoenberg, F. P., El-Shaarawi, A. H., Piegorsch, W. W. (2002). Point processes, temporal. Encyclopedia of Environmetrics, 3, 1577–1581.

    Google Scholar 

  • Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P. (2017). Geometric deep learning: Going beyond Euclidean data. IEEE Signal Processing Magazine, 34(4), 18–42.

    Article  Google Scholar 

  • Chen, S., Shojaie, A., Shea-Brown, E., Witten, D. (2017). The multivariate hawkes process in high dimensions: Beyond mutual excitation. arXiv preprint arXiv:1707.04928.

  • Chiang, W.-H., Yuan, B., Li, H., Wang, B., Bertozzi, A., Carter, J., Ray, B., Mohler, G. (2019). Sos-EW: System for overdose spike early warning using drug mover’s distance-based Hawkes processes. In Joint European conference on machine learning and knowledge discovery in databases (pp. 538–554). Berlin: Springer.

  • Cho, E., Myers, S. A., Leskovec, J. (2011). Friendship and mobility: User movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1082–1090). ACM.

  • Daley, D. J., Vere-Jones, D. (2003). An introduction to the theory of point processes: Volume I: Probability and its Applications. New York: Springer.

    MATH  Google Scholar 

  • Daley, D. J., Vere-Jones, D. (2007). An introduction to the theory of point processes: Volume II: General theory and structure. New York: Springer.

    MATH  Google Scholar 

  • Du, N., Farajtabar, M., Ahmed, A., Smola, A. J., Song, L. (2015). Dirichlet–Hawkes processes with applications to clustering continuous-time document streams. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 219–228). ACM.

  • Duchi, J., Hazan, E., Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121–2159.

    MathSciNet  MATH  Google Scholar 

  • Eichler, M., Dahlhaus, R., Dueck, J. (2017). Graphical modeling for multivariate Hawkes processes with nonparametric link functions. Journal of Time Series Analysis, 38(2), 225–242.

    Article  MathSciNet  Google Scholar 

  • Farajtabar, M., Wang, Y., Rodriguez, M. G., Li, S., Zha, H., Song, L. (2015). Coevolve: A joint point process model for information diffusion and network co-evolution. Advances in Neural Information Processing Systems, 1954–1962.

  • Fox, E. W., Short, M. B., Schoenberg, F. P., Coronges, K. D., Bertozzi, A. L. (2016). Modeling e-mail networks and inferring leadership using self-exciting point processes. Journal of the American Statistical Association, 111(514), 564–584.

    Article  MathSciNet  Google Scholar 

  • Granger, C. W. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica: Journal of the Econometric Society, 37, 424–438.

    Article  Google Scholar 

  • Hall, E. C., Willett, R. M. (2016). Tracking dynamic point processes on networks. IEEE Transactions on Information Theory, 62(7), 4327–4346.

    Article  MathSciNet  Google Scholar 

  • Hawkes, A. G. (1971). Spectra of some self-exciting and mutually exciting point processes. Biometrika, 58(1), 83–90.

    Article  MathSciNet  Google Scholar 

  • Kaipio, J., Somersalo, E. (2006). Statistical and computational inverse problems, Vol. 160. New York: Springer.

    MATH  Google Scholar 

  • Kingma, D. P., Ba, J. (2015). Adam: A method for stochastic optimization. In International conference on learning representations.

  • Lai, E. L., Moyer, D., Yuan, B., Fox, E., Hunter, B., Bertozzi, A. L., Brantingham, P. J. (2016). Topic time series analysis of microblogs. IMA Journal of Applied Mathematics, 81(3), 409–431.

    Article  MathSciNet  Google Scholar 

  • Lee, D. D., Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), p. 788.

    Article  Google Scholar 

  • Lewis, E., Mohler, G. (2011). A nonparametric EM algorithm for multiscale Hawkes processes. Journal of Nonparametric Statistics, 1(1), 1–20.

    Google Scholar 

  • Linderman, S., Adams, R. (2014). Discovering latent network structure in point process data. In International conference on machine learning (pp. 1413–1421). Beijing, China: JMLR: W&C.

  • Malinverno, A. (2002). Parsimonious Bayesian Markov chain Monte Carlo inversion in a nonlinear geophysical problem. Geophysical Journal International, 151(3), 675–688.

    Article  Google Scholar 

  • Mark, B., Raskutti, G., Willett, R. (2018). Network estimation from point process data. IEEE Transactions on Information Theory, 65, 2953–2975.

    Article  MathSciNet  Google Scholar 

  • Marsan, D., Lengline, O. (2008). Extending earthquakes’ reach through cascading. Science, 319(5866), 1076–1079.

    Article  Google Scholar 

  • Mohler, G. O. (2014). Marked point process hotspot maps for homicide and gun crime prediction in Chicago. International Journal of Forecasting, 30(3), 491–497.

    Article  Google Scholar 

  • Mohler, G. O., Short, M. B., Brantingham, P. J., Schoenberg, F. P., Tita, G. E. (2011). Self-exciting point process modeling of crime. Journal of the American Statistical Association, 106(493), 100–108.

    Article  MathSciNet  Google Scholar 

  • Neumaier, A. (1998). Solving ill-conditioned and singular linear systems: A tutorial on regularization. SIAM Review, 40(3), 636–666.

    Article  MathSciNet  Google Scholar 

  • Ogata, Y. (1978). The asymptotic behaviour of maximum likelihood estimators for stationary point processes. Annals of the Institute of Statistical Mathematics, 30(1), 243–261.

    Article  MathSciNet  Google Scholar 

  • Ogata, Y. (1998). Space-time point-process models for earthquake occurrences. Annals of the Institute of Statistical Mathematics, 50(2), 379–402.

    Article  Google Scholar 

  • Porter, M. D., White, G., et al. (2012). Self-exciting hurdle models for terrorist activity. The Annals of Applied Statistics, 6(1), 106–124.

    Article  MathSciNet  Google Scholar 

  • Reinhart, A. (2018). A review of self-exciting spatio-temporal point processes and their applications. Statistical Science, 33(3), 299–318.

    MathSciNet  MATH  Google Scholar 

  • Schoenberg, F. P. (2006). On non-simple marked point processes. Annals of the Institute of Statistical Mathematics, 58(2), 223–233.

    Article  MathSciNet  Google Scholar 

  • Schoenberg, F. P. (2013). Facilitated estimation of ETAS. Bulletin of the seismological Society of America, 103(1), 601–605.

    Article  Google Scholar 

  • Schoenberg, F. P., Brillinger, D. R., Guttorp, P. (2013). Point processes, spatial-temporal. Encyclopedia of Environmetrics, 4, 1573–1578.

    Google Scholar 

  • Schoenberg, F. P., et al. (2018a). Comment on “A review of self-exciting spatio-temporal point processes and their applications” by Alex Reinhart. Statistical Science, 33(3), 325–326.

    Article  MathSciNet  Google Scholar 

  • Schoenberg, F. P., Gordon, J. S., Harrigan, R. J. (2018b). Analytic computation of nonparametric Marsan–Lengliné estimates for Hawkes point processes. Journal of Nonparametric Statistics, 30(3), 742–775.

    Article  MathSciNet  Google Scholar 

  • Veen, A., Schoenberg, F. P. (2008). Estimation of space-time branching process models in seismology using an EM-type algorithm. Journal of the American Statistical Association, 103(482), 614–624.

    Article  MathSciNet  Google Scholar 

  • Wang, B., Luo, X., Zhang, F., Yuan, B., Bertozzi, A. L., Brantingham, P. J. (2018). Graph-based deep modeling and real time forecasting of sparse spatio-temporal data. arXiv preprint arXiv:1804.00684.

  • Yuan, B., Li, H., Bertozzi, A. L., Brantingham, P. J., Porter, M. A. (2019). Multivariate spatiotemporal Hawkes processes and network reconstruction. SIAM Journal on Mathematics of Data Science, 1(2), 356–382.

    Article  MathSciNet  Google Scholar 

  • Yuan, B., Wang, X., Ma, J., Zhou, C., Bertozzi, A. L., Yang, H. (2020). Variational autoencoders for highly multivariate spatial point processes intensities. In International conference on learning (representations).

  • Zhu, S., Xie, Y. (2019). Spatial–temporal–textual point processes with applications in crime linkage detection. arXiv preprint arXiv:1902.00440.

  • Zhuang, J., Ogata, Y., Vere-Jones, D. (2002). Stochastic declustering of space-time earthquake occurrences. Journal of the American Statistical Association, 97(458), 369–380.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported by the City of Los Angeles Gang Reduction Youth Development Project, by NSF grant DMS-2027277 and by NSF grant DMS-1737770. Baichuan Yuan gratefully acknowledges the fellowship support of the National Institute of Justice (NIJ) under Award Number 2018-R2-CX-0013.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frederic P. Schoenberg.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Fast estimation of Hawkes processes.

Appendices

Appendix 1: Simulation data

1.1 \(U=1\) data

We simulate a univariate ST-Hawkes process with \(K=1/6\), \(\mu =0.01\), \(T=2.1\times 10^5\), \(X,Y \in (0,10)\), \(f(r)=\frac{1}{2\pi \sigma ^2}\exp (-r^2/2\sigma ^2)\) (\(\sigma ^2=0.2\)) and \(h(t)=\omega \exp (-\omega t)\) (\(\omega =10\)). The regularization parameter \(\alpha =0.5\).

1.2 \(U=100\) data

Using the same triggering densities, this data set has the following parameters: \(U=100\), the background rate \(\varvec{\mu }=(0.01,\ldots ,0.01)\). \(T=10^5\), \(X,Y \in (0,10)\), \(\sigma ^2=0.2\) and \(\omega =10\) with 172,943 events. For the triggering matrix in Fig. 2, each yellow pixel is 1/20, cyan pixel is 1/40 and dark pixel is 0.

1.3 \(U=10\) data

With the same densities, the parameters are \(U=10\), \(\varvec{\mu }=(0.01,\ldots ,0.01)\), \(T=1e6\), \(X,Y \in (0,10)\), \(\sigma ^2=0.2\), \(\omega =10\) and \(\varvec{K}\) is shown in Fig. 3. Here, each yellow pixel is 1/6 and dark pixel is 0. The regularization parameter \(\alpha =0.55\).

1.4 \(U=10\) data with a Pareto triggering density in time

We keep the same parameters as the \(U=10\) above. The changes on the densities are on the temporal density \(h(t)=(p-1)c^{p-1}/(t+c)^p\) with \(c=2\) and \(p=2.5\) and the same spatial triggering density with \(\sigma ^2=0.1\). The regularization parameter \(\alpha =0.38\).

1.5 \(U=10\) data with a uniform triggering density in time

Similar to the section above, here we change the temporal densities to be uniform \(h(t)=0.1\) and the spatial triggering density with \(\sigma ^2=0.1\). The regularization parameter \(\alpha =0.4\). We threshold the estimated \(\varvec{{\tilde{K}}}\) with \(\epsilon = 0.01\) to remove noise.

1.6 \(U=10\) data with a power-law triggering density in space

Similarly, we use the power-law density \(f(r)=\frac{1}{(r^2+1)^2}\) in space and the exponential triggering density in time with \(\omega =10\). The regularization parameter \(\alpha =0.28\). We threshold the estimated \(\varvec{{\tilde{K}}}\) with \(\epsilon = 0.02\) to remove noise.

1.7 \(U=10\) data with a uniform triggering density in space

Given the same parameters as above, we change the spatial density to \(f(r)=0.25\) and keep the exponential triggering density in time with \(\omega =10\). The regularization parameter \(\alpha =0.36\). We threshold the estimated \(\varvec{{\tilde{K}}}\) with \(\epsilon = 0.01\) to remove noise (Fig. 9).

Fig. 9
figure 9

The estimation results of STHC on \(U=10\) data with a Pareto triggering density in time, a uniform triggering density in time, a power-law triggering density in space and a uniform triggering density in space (from left to right). (Top) Ground-truth spatial triggering density f(r) as red triangles and estimated triggering density as blue circles. (Bottom) Temporal triggering density h(t) as red triangles and estimated triggering density as blue circles

Appendix 2: Gowalla and Brightkite data sets

In this section, we describe the preprocessing procedure for Gowalla and Brightkite data sets. We focus on various local friendship subnetworks within different US cities, including San Diego (SD), Chicago (CHI), Los Angeles (LA) and San Francisco (SF). They have diverse network sizes and ST patterns within the same time period.

1.1 Brightkite-SD

We study check-ins in SD for Brightkite data set. We use a bounding box (with a north latitude of 33.1142, a south latitude of 32.5348, an east longitude of \(-\,116.9058\), and a west longitude of \(-\,117.2824\))Footnote 1 to locate check-ins in SD. We consider “active” users, who have more than 300 check-ins during the period. This gives us a small subnetwork with 25 “active” users and a total of 13,760 check-ins in SD.

1.2 Gowalla-CHI

We apply the same procedure as in "Appendix 2" on the Gowalla check-in data for CHI. The bounding box for CHI has a north latitude of 42.0229, a south latitude of 41.6446, an east longitude of \(-\,87.5245\) and a west longitude of \(-\,87.9395\). After selecting only active users (with more than 100 check-ins) users, we have a medium-sized subnetwork with 96 users and 27,326 check-ins.

1.3 Brightkite-LA

We apply the same procedure as in "Appendix 2" on the Brightkite check-in data in LA. The bounding box for LA has a north latitude of 34.34, a south latitude of 33.70, an east longitude of \(-\,118.16\) and a west longitude of \(-\,118.67\). After selecting only active users (with more than 150 check-ins) users, we have a medium-sized subnetwork with 168 users and 89,127 check-ins.

1.4 Gowalla-SF

We apply the same procedure as in "Appendix 2" on the Gowalla check-in data in SF. The bounding box for SF has a north latitude of 37.93, a south latitude of 37.64, an east longitude of \(-\,122.28\) and a west longitude of \(-\,123.17\). After selecting only active users (with more than 65 check-ins) users, we have a large subnetwork with 515 users and 102,673 check-ins.

Appendix 3: Assumptions for Theorem 1

There are two separate sets of general assumptions for the consistency of GMM and MLE in Hawkes processes. We only list assumptions that are relevant to our proof.

The first set of assumptions is from Ogata (1978) about the point process and intensity functions.

Assumption 1

(Consistency of MLE estimation)

  • Multivariate Hawkes process \((\varvec{N}_{t,x,y})\) is stationary, ergodic and absolutely continuous with respect to the standard Poisson process.

  • The conditional intensity function \(\lambda _{\Theta }\) with parameters \(\Theta \) is predictable for all compact metric spaces and continuous in \(\Theta \).

  • When \(t=0\), \(\lambda _{\Theta }\)is positive almost surely and \(\lambda _{\Theta _1}= \lambda _{\Theta _2}\)almost surely if and only if \(\Theta _1=\Theta _2\); for any \(\Theta \) from a compact metric space, there exists a neighborhood \(U(\Theta )\) of \(\Theta \) such that for all \(\Theta ' \in U(\Theta )\), \(|\lambda _{\Theta '}|\) and \(|\log \lambda _{\Theta '}|\) are bounded by random variables with finite second moments.

  • For any \(\Theta \) from a compact metric space, there is a neighborhood \(U(\Theta )\) of \(\Theta \) such that \(\sup _{\Theta ' \in U(\Theta )}|\lambda (\Theta ')-{\mathbb {E}}(\lambda (\Theta '))| \rightarrow 0\) in probability as \(t \rightarrow \infty \) and (for some \(\alpha >0\)) \(\sup _{\Theta ' \in U(\Theta )}|\log {\mathbb {E}}(\lambda (\Theta '))|\) has finite \((2+\alpha ){\text{th}}\) moment uniform bounded with respect to t.

On top of Assumption 1, we also need GMM-related assumptions from Achab et al. (2017).

Assumption 2

(Consistency of GMM estimation)

  • For (25), the GMM approximation error \(L(\varvec{R})=0\) if and only if \(\varvec{R} = (\varvec{I-K^{\rm T}})^{-1}\).

  • For (2224), the supports of the triggering density XYH satisfy \({\tilde{X}}^2/X\), \({\tilde{Y}}^2/Y\), \({\tilde{H}}^2/T \rightarrow 0\) separately as \(X,Y,H \rightarrow \infty \).

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, B., Schoenberg, F.P. & Bertozzi, A.L. Fast estimation of multivariate spatiotemporal Hawkes processes and network reconstruction. Ann Inst Stat Math 73, 1127–1152 (2021). https://doi.org/10.1007/s10463-020-00780-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-020-00780-1

Keywords

Navigation