Estimation of regional transition probabilities for spatial dynamic microsimulations from survey data lacking in regional detail
Introduction
Microsimulations are powerful tools for the multivariate analysis of complex systems, such as economic markets or medical care infrastructures. They differ from the more established macrosimulations in terms of the objects that are considered in the simulation process. While in macrosimulations the behavior of aggregated system-intrinsic entities is modeled, microsimulations target the smallest entities of the system (units) directly. This allows for the investigation of multidimensional interactions and nonlinear dependencies within the system that cannot be studied by macrosimulations. Examples for microsimulation models can be found in Klevmarken, 2010, Lawson, 2011 and O’Donoghue et al. (2011), as well as Markham et al. (2017). Microsimulations are often conducted according to a basic procedure. First, a base population as a synthetic replica of the system of interest is constructed. In practice, this may be either artificially generated data or real-world observations from administrative records and surveys (Li and O’Donoghue, 2014). Next, multiple parameters that characterize the system in its initial state are altered in scenarios. The alterations are designed to target important properties of the system in light of the research objectives. The effects of the alterations are projected into future periods, such that every scenario initializes an individual branch in the system’s evolution. After a given number of periods (simulation horizon), the branches are compared giving insights on important dynamics and dependencies within the system (Burgard et al., 2019c).
There are different types of microsimulations. They mainly differ with respect to the manner in which the mentioned alterations are projected. An important distinction is between static and dynamic microsimulations (Li and O’Donoghue, 2013). Static microsimulations are characterized by the constancy of unit characteristics over time. When constructing the synthetic replica, every unit is provided with a set of characteristics that determines its behavior and interaction with other units. In static microsimulations, these characteristics do not change over the simulation horizon. Only specific simulation inputs are altered, depending on the research objectives. Examples of static microsimulation models can be found in Peichl et al. (2010) as well as Sutherland and Figari (2013). Dynamic microsimulations, on the other hand, are characterized by stochastic changes of unit characteristics (state transitions) over time. Since a unit’s behavior within the synthetic replica is driven by its respective characteristics, the interactions between units are also subject to temporal variation. Examples of dynamic microsimulation models can be found in O’Donoghue et al. (2009) and Fialka et al. (2011). If the dynamic microsimulation is time-discrete, state transitions can only appear periodically at distinct points in time (Schmid et al., 2018). If the simulation is time-continuous, they can appear at any given time and thus are modeled via survival functions (Willekens, 2009).
Hereafter, we focus on dynamic microsimulations with discrete time. In particular, we look at microsimulations in socio-economic research, where polytomous variables are of interest. This conceptual delimitation differentiates the topic from other fields where corresponding simulations are also used, such as particle physics (Elvira, 2017) or cancer research (Jayasekera et al., 2018). The simulation setup requires the definition of transition probability sets for every unit of the synthetic replica. These sets provide the conditional likelihood of a state transition for some unit characteristic given its current state and other characteristics. The probabilities constitute stochastic processes within the synthetic replica over the simulation horizon, which need to represent real-world dynamics as accurately as possible to obtain valid simulation results. In practice, transition probabilities are usually unknown and must be estimated from survey data. This is often done via statistical models, such as the multinomial logit model (McCullagh and Nelder, 1989, Greene, 2002, Forcina, 2017).
Microsimulations that account for regional data structures are often referred to as small area or spatial microsimulations (Rahman et al., 2010, Rahman and Harding, 2016, Tanton et al., 2018). Transition probability estimation can be challenging in these settings, as there may be heterogeneous transition dynamics across regions. The estimation method must explicitly account for corresponding differences to accurately resemble the system’s evolution. However, in practice, we often encounter the problem that the survey data used for transition probability estimation lacks in regional detail. Due to confidentiality restrictions, regional identifiers that would allow for a localization of the sample elements may be censored. Thus, regional heterogeneity cannot be observed as spatial aggregates are indistinguishable. Further, even if regional identifiers are available, the majority of survey samples often contain only a few observations per region due to limited resources. In that case, observed regional transition frequencies may be inaccurate or even biased as a result of coverage problems. Ignoring these issues may cause only small deviations in the initial phase of the simulation. But due to the complex interactions between units, the inaccuracies accumulate and self-reinforce over the simulation horizon. Hence, local transition dynamics are misrepresented over time and the simulation fails to provide an authentic evolution regarding the real system. The simulation outcomes are not reliable anymore (Chin and Harding, 2006, Tanton, 2014). Thus, if the survey data lacks in regional detail, methodological adjustments are required.
In this paper, we discuss so-called alignment methods (Bækgaard, 2002, Kelly and Percival, 2009, Li and O’Donoghue, 2014, Stephensen, 2016) to address these issues. Alignment is commonly used in microsimulations to calibrate transition dynamics to external benchmarks to prevent unrealistic projections over time. However, we show that these methods can also be used to recover local heterogeneity in transition dynamics when the primary database lacks in regional detail. For this, we consider a situation in which external benchmarks on regional transition dynamics are available (e.g. from census data). The general idea is to incorporate this regional information in the multinomial logit model such that resulting micro-level probability estimates reproduce the benchmarks when aggregated. This allows us to recover the unobserved regional heterogeneity in transition dynamics on the micro-level and calibrate the data generating process such that the simulated evolution is genuine.
Two alignment methods are studied for this purpose. The first method is called logit scaling and was originally proposed by Stephensen (2016). It is an ex-post approach based on iterative proportional fitting (Bishop et al., 1970). After the initial model parameter estimation has been performed, the transition probability estimates under the model are adjusted sequentially until they reproduce the external benchmarks. Stephensen (2016) showed that the method minimizes the Kullback–Leibler divergence between original and adjusted probability estimates. The second method is new and draws from constrained maximum likelihood theory (e.g. Dong and Wets, 2000, Chatterjee et al., 2016). The regional benchmarks are used to modify the fitting process in the multinomial logit model directly. This is done by imposing a set of box constraints on transition probabilities resulting from the parameter estimates. The underlying optimization is solved via sequential quadratic programming (Kraft, 1994). We present a parametric bootstrap (Reynolds and Templin, 2004, Zoubir and Iskander, 2004) to estimate the variance of model parameter estimates and the mean squared error (MSE) of the transition probability estimates. Further, we proof that this alignment method is consistent in model parameter estimation.
Both methods are described and discussed in theory. Next, they are tested in simulation experiments to evaluate their performances in a controlled environment. And finally, both methods are applied to labor forceprobability estimation based on the German Microcensus 2012 (Statistisches Bundesamt, 2017). We find that the inclusion of aggregated regional benchmarks allows for the recovery of local micro-level transition dynamics despite a lack in regional detail. The remainder of the paper is organized as follows. In Section 2, the basic methodology is described. This includes the presentation of a suitable statistical framework, the multinomial logit model, as well as logit scaling as a standard method for alignment. Section 3 introduces constrained maximum likelihood as an alternative alignment approach. Section 4 contains the simulation experiments, while Section 5 encloses the application. Section 6 closes with some conclusive remarks. Note that a preprint of this paper has been published as working paper by Burgard et al. (2019a).
Section snippets
Statistical framework
This section introduces a statistical framework to describe transition dynamics in microsimulations. Based on these descriptions, we can present transition probability estimation methods in later sections. For illustrative purposes, we assume that the system of interest is a population of individuals and the objective is to study its labor market in regional detail. In what follows, three representations of the population are considered to conduct the simulation study.
First, let denote the
Method
Hereafter, we introduce a new method for aligning transition probability estimates to regional benchmarks in the sense of Section 2.3. It can be viewed as a special case of constrained maximum likelihood estimation (Dong and Wets, 2000, Chatterjee et al., 2016). Recall the maximum likelihood problem in (9). We modify it by adding a set of regional inequality constraints with respect to the benchmarks. The negative log-likelihood is minimized while the set of feasible solutions is limited to a
Simulation experiments
Hereafter, we present several simulation experiments in order to demonstrate the effectiveness of the proposed methods within a controlled environment. We focus on three performance aspects: (i) model parameter estimation, (ii) predictive inference, as well as (iii) uncertainty estimation.
Setup
In what follows, we apply the three methods (Mod, LS, CML) in both the binary and the polytomous case to regional transition probability estimation on real data. For this, we consider observations obtained from all individuals aged 15–85 from the German Microcensus 2012. The overall setup is similar to the (model-based) simulation study from the last section. However, note that the data is used directly rather than as a basis for an artificial population as in Section 4. The advantage of this
Conclusion
The estimation of regional transition probabilities from survey data lacking in regional detail is a major challenge when conducting dynamic spatial microsimulations. Missing regional observations can either lead to coverage problems in local samples or prevent a spatial localization of the sample observations. It could be shown that common estimation methods obtain inefficient or even biased results in these cases. We discussed two methods that are able to account for regional heterogeneity by
Acknowledgments
We would like to thank two anonymous reviewers for their very helpful and constructive comments. Under this guidance, the overall quality and legibility of the paper could be improved significantly.
Funding
This work was supported by the research project REMIKIS — Regionale Mikrosimulationen und Indikatorsysteme, funded by the Nikolaus Koch Foundation; the research unit MikroSim — Sektorenübergreifendes kleinräumiges Mikrosimulationsmodell (FOR 2559), funded by the German Research Foundation; and the
References (49)
A Fisher-scoring algorithm for fitting latent class models with individual covariates
Econom. Stat.
(2017)Dynamic microsimulation for policy analysis: Problems and solutions
- et al.
Evaluating binary alignment methods in microsimulation models
J. Artif. Soc. Soc. Simul.
(2014) - et al.
Improving spatial microsimulation estimates of health outcomes by including geographic indicators of health behaviour: The example of problem gambling
Health Place
(2017) - et al.
Discrimination measures for discrete time-to-event predictions
Econom. Stat.
(2018) - et al.
An application of bootstrapping in logistic regression model
Open Access Libr. J.
(2016) Micro-macro linkage and the alignment of transition processesTechnical Report 25
(2002)- et al.
Incorporating survey weights into binary and multinomial logistic regression models
Sci. J. Appl. Math. Stat.
(2015) Pattern Recognition and Machine Learning
(2006)- et al.
Discrete Multivariate Analysis: Theory and Practice
(1970)
Multinomial logistic regression algorithm
Ann. Inst. Statist. Math.
Regularized area-level modelling for robust small area estimation in the presence of unknown covariate measurement errors
Res. Papers Econ.
Conducting a dynamic microsimulation for care research: Data generation, transition probabilities and sensitivity analysis
Estimation of regional transition probabilities for spatial dynamic microsimulations from survey data lacking in regional detail
Res. Papers Econ.
Constrained maximum likelihood estimation for model calibration using summary-level information from external big data sources
J. Amer. Statist. Assoc.
Regional Dimensions: Creating Synthetic Small-Area Micro Data and Spatial Microsimulation ModelsTechnical Report
Generalized iterative scaling for log-linear models
Ann. Math. Stat.
Estimating density functions: A constrained maximum likelihood approach
J. Nonparametr. Stat.
Impact of detector simulation in particle physics collider experiments
Phys. Rep.
Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models
Ann. Statist.
Variable selection via nonconcave penalized likelihood and its oracle properties
J. Amer. Statist. Assoc.
Summary Based on the Final Project Report of the Dynamic Microsimulation Model of the Czech Republic
Non-concave penalization in linear mixed-effect models and regularized selection of fixed effects
AStA Adv. Stat. Anal.
Econometric Analysis
Cited by (3)
The Influence of Migration Patterns on Regional Demographic Development in Germany
2023, Social Sciences