Mortality forecasting using factor models: Time-varying or time-invariant factor loadings?

https://doi.org/10.1016/j.insmatheco.2021.01.006Get rights and content

Abstract

Many existing mortality models follow the framework of classical factor models, such as the Lee–Carter model and its variants. Latent common factors in factor models are defined as time-related mortality indices (such as κt in the Lee–Carter model). Factor loadings, which capture the linear relationship between age variables and latent common factors (such as βx in the Lee–Carter model), are assumed to be time-invariant in the classical framework. This assumption is usually too restrictive in reality as mortality datasets typically span a long period of time. Driving forces such as medical improvement of certain diseases, environmental changes and technological progress may significantly influence the relationship of different variables. In this paper, we first develop a factor model with time-varying factor loadings (time-varying factor model) as an extension of the classical factor model for mortality modelling. Two forecasting methods to extrapolate the factor loadings, the local regression method and the naive method, are proposed for the time-varying factor model. From the empirical data analysis, we find that the new model can capture the empirical feature of time-varying factor loadings and improve mortality forecasting over different horizons and countries. Further, we propose a novel approach based on change point analysis to estimate the optimal ‘boundary’ between short-term and long-term forecasting, which is favoured by the local linear regression and naive method, respectively. Additionally, simulation studies are provided to show the performance of the time-varying factor model under various scenarios.

Introduction

Mortality forecasting is an important topic in various areas, such as demography, actuarial science and government policymaking. Most age-specific mortality data are high-dimensional time series. The factor model approach is one of the most popular methods to model high-dimensional time series, representing the data matrix by a few latent common factors. Common factors describe common information shared by cross-sections, while factor loadings reflect the linear relationship between the original variables and the common factors. There is a large literature discussing factor models, including but not limited to Anderson, 1963, Pena and Box, 1987, Stock and Watson, 2002, Bai and Ng, 2002, Bai, 2009, Lam and Yao, 2012 and Chang et al. (2018).

Many existing stochastic mortality models use the factor model approach. As an application of the classical factor model (with time-invariant factor loadings), Lee and Carter (1992) (Lee–Carter Model) is one of the most prominent methods for mortality forecasting, which is employed by the US Bureau of the Census as the benchmark model to predict long-run life expectancy (Hollmann et al., 1999). The common factor extracted by the Lee–Carter model is defined as Mortality Index, and the factor loadings capture the relationship between the age variables and the mortality index. Since there is only one factor in the Lee–Carter model, Booth et al., 2002, Renshaw and Haberman, 2003 and Yang et al. (2010) extended the Lee–Carter framework to incorporate more common latent factors for mortality modelling in different countries. Li and Chan (2005) proposed an outlier-adjusted model to deal with possible outliers in the mortality index by combining the Lee–Carter model with time series outlier analysis. Additionally, Booth et al. (2006) compared the Lee–Carter model with four other variants by applying them to mortality data of multiple populations. Tuljapurkar et al. (2000) examined mortality rates over five decades for the G7 countries using the Lee–Carter model. Lundström and Qvist (2004) and Booth et al. (2004) applied the Lee–Carter model to mortality data of Sweden and Australia, respectively. A summary of the variants of the Lee–Carter model is discussed in Booth and Tickle (2008).

In the existing literature of mortality factor models, factor loadings, which capture the relationship between age variables and latent common factors, are usually assumed to be invariant over time (we call factor models with time-invariant factor loadings ‘classical factor model’). For example, in Lee–Carter model, there is only one factor and the time-invariant factor loading represents the age-related sensitivity to the mortality improvement (we call classical factor model with only one factor ‘Lee–Carter model’ throughout this paper). However, since mortality datasets typically span a long period of time, it is restrictive to assume that the factor loadings are time-invariant. Driving forces such as medical improvement of certain diseases, environmental changes, and technological progress may influence the relationship of different variables significantly. Booth et al. (2002) studied the violation of the invariance assumption in the mortality data of Australia and suggested to find an optimal fitting period during which the factor loadings were invariant to improve the fit of the Lee–Carter model. Their approach, however, needs to manually select the fitting period and hence loses the information of early years. In recent years, there is a rich literature on time-varying factor models to capture the dynamics and structural changes in factor loadings for macroeconomic variables modelling, for example, see Breitung and Eickmeier (2011) and Chen et al. (2014). However, there has been no literature on mortality modelling which allows factor loadings to change smoothly over time, to the best of our knowledge. Li and O’Hare (2017) and Li et al. (2015) used semi-parametric approaches to extend the CBD models (Cairns et al., 2009) by allowing for time-varying coefficients, which can free model assumptions and show superior short-term forecasting performance. However, CBD models are only suitable for old-age mortality modelling, and the factors (regressors) are observable. Unfortunately, for Lee–Carter model and many of its variants the factors are unobserved, which makes it difficult to model and estimate. To fill those gaps, we introduce a factor model with time-varying factor loadings as an extension of the classical factor model based on Su and Wang (2017). This new model can be used for mortality modelling and forecasting by developing corresponding estimation and forecasting methods.

As the time-varying factor model allows for time-varying factor loadings, it provides more flexibility in model fitting, which, however, also poses challenges in model forecasting. Besides forecasting the common factors, factor loadings also need to be extrapolated into the future. In this paper, we provide two forecasting methods of the factor loadings, one uses the local linear regression to roll over the time-varying factor loadings into the future; while the other one inherits the value of the factor loading from the last time period and remains invariant in the future. These two forecasting methods are called the local linear regression and the naive method, respectively. Their details are described in Section 2. Empirical results using the mortality data from different populations show that the time-varying factor model provides more accurate out-of-sample forecasting results than the Lee–Carter model.

The existing literature suggests that different forecasting horizons may favour different models. For example, Bell (1997) found that a simple random walk with drift model for age-specific mortality rates yields the most accurate 1-step-ahead forecast compared with the other six methods on the US data. Hyndman and Ullah (2007) introduced a method which outperformed the method proposed by Lee and Miller (2001) in the long-term forecasting. Specifically, we have found in the literature that semi-parametric or non-parametric methods can be more suitable for short-term forecasting. For example, the semi-parametric model developed in Li et al. (2015) can produce superior 5-year-ahead forecasting results. CMI (2009) employed the P-splines model (Currie et al., 2004) for short-term forecasting to generate the initial rates of mortality improvement. Our empirical applications in Section 5.3 also suggest that the time-varying model based on local regression (non-parametric forecasting) is better for short-term forecasting, while the time-varying model based on naive method (parametric forecasting) is better for long-term forecasting. Then where is the optimal ‘boundary’ between short-term (based on the local regression method) and long-term (based on the naive method) forecasting? We propose a novel approach based on change point analysis (Bai, 2010) to estimate the optimal ‘boundary’ and apply it to mortality data of multiple countries. Additionally, we conduct simulation studies to show the performance of the time-varying factor model under different scenarios and investigate under which conditions it preforms better than the classical factor model.

The rest of the paper is organized as follows. Section 2 introduces the time-varying factor model and its estimation approach. The forecasting methods based on the time-varying factor model are also discussed in detail. Section 3 discusses the relative advantages of the local regression method and the naive method in the short-term and long-term forecasting, respectively. We then propose an approach based on change point analysis to estimate the ‘boundary’ between short-term and long-term forecasting, which is favoured by the local regression method and the naive method, respectively. Section 4 introduces the datasets and empirical evidence of time-varying factor loadings. Section 5 applies the proposed methods to age-specific mortality data of multiple countries and shows the advantages of the proposed methods. Section 6 conducts simulation studies to investigate the performance of the time-varying factor model under different scenarios. Section 7 concludes the paper. Appendix A provides the gender-specific empirical results using the time-varying factor model. Appendix B presents the estimations of the optimal boundaries for multiple countries with a variety of forecasting horizons. Appendix C displays estimation results of the time-varying model with multiple factors.

Section snippets

Time-varying factor model

Let mx,t denote the central death rate for age x in year t, where x=1,2,,N and t=1,2,,T. Thus, {mx,t}x=1,2,,N,t=1,2,,T is an N-dimensional time series with T observations. Since mortality rates are always positive numbers, we use the log transformation to map the central death rates from R+ space to R space for modelling purposes. Assume ax is the age-specific constant, which is the averages over time of the ln(mx,t). Then ln(mx,t)ax can be modelled using the classical factor model as

Optimal ‘boundary’ estimation

Under the framework of time-varying factor model, we assume the factor loading bx,t is a function of time t. In Section 2.3, we introduced two different methods to extrapolate the factor loading. One is a naive method, which is more suitable for long-term forecasting; and the other is based on local linear regression, which is more suitable for short-term forecasting. Then can we estimate the ‘boundary’ between short-term and long-term forecasting that divides the forecasting horizon according

Data

The mortality data used in this paper are extracted from theHuman Mortality Database (HMD) (University of California, 2018). Six countries are selected for the empirical analysis in Sections 4 Data, 5 Empirical results and analysis. For each country, age-sex-specific death rates are available annually for the entire population. The selected countries are shown in Table 1 along with the corresponding available time horizons, which will be used for empirical analysis.

The mortality data are

Empirical results and analysis

In the first two subsections, we present the application results of the time-varying factor model using age-specific mortality data of the US. We compare the time-varying factor models based on both the naive and local regression forecasting methods with Lee–Carter model (the classical factor model with one factor) via out-of-sample forecasting performance. Empirical results by gender are provided in Appendix A. Section 5.3 further compares the forecasting performance across multiple countries

Monte Carlo simulations

In this section, we further investigate the prediction performance of the time-varying factor model and the classical factor model through Monte Carlo simulations. We use examples with different structures of the factor loadings to illustrate that the time-varying factor model can improve the forecasting accuracy when the ‘true’ factor loadings change over time. In addition, we explain under which conditions the naive method performs better than the local regression method even in the

Conclusion

This paper develops a time-varying factor model for mortality modelling. Two forecasting methods, the local linear regression and naive method, are used to extrapolate the factor loadings into the future. To understand the optimal forecasting horizon of the two forecasting methods, we propose an approach to estimate the optimal ‘boundary’ between short-term and long-term forecasting, which is favoured by the local linear regression and the naive method, respectively. Empirical analysis on

References (37)

  • AndersonT.W.

    ‘The use of factor analysis in the statistical analysis of multiple time series’

    Psychometrika

    (1963)
  • BaiJ.

    Panel data models with interactive fixed effects

    Econometrica

    (2009)
  • BaiJ. et al.

    Determining the number of factors in approximate factor models

    Econometrica

    (2002)
  • BellW.R.

    Comparing and assessing time series methods for forecasting age-specific fertility and mortality rates

    J. Off. Statist.

    (1997)
  • BoothH. et al.

    Lee-carter mortality forecasting: a multi-country comparison of variants and extensions

    Demogr. Res.

    (2006)
  • BoothH. et al.

    Applying lee-carter under conditions of variable mortality decline

    Popul. Stud.

    (2002)
  • BoothH. et al.

    Mortality modelling and forecasting: A review of methods

    Ann. Actuar. Sci.

    (2008)
  • BoothH. et al.

    Beyond three score years and ten: prospects for longevity in Australia

    People Place

    (2004)
  • Cited by (8)

    • Prediction of China’s Population Mortality under Limited Data

      2022, International Journal of Environmental Research and Public Health
    View all citing articles on Scopus
    View full text