Abstract
Beta regressions are widely used for modeling random variables that assume values in the standard unit interval, (0, 1), such as rates, proportions, and income concentration indices. Parameter estimation is typically performed via maximum likelihood, and hypothesis testing inferences on the model parameters are commonly performed using the likelihood ratio test. Such a test, however, may deliver inaccurate inferences when the sample size is small. It is thus important to develop alternative testing procedures that are more accurate when the sample contains only few observations. In this paper, we consider the beta regression model with parametric mean link function and derive two modified likelihood ratio test statistics for that class of models. We provide simulation evidence that shows that the new tests usually outperform the standard likelihood ratio test in samples of small to moderate sizes. We also present and discuss two empirical applications.
Similar content being viewed by others
References
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974). https://doi.org/10.1109/TAC.1974.1100705
Aranda-Ordaz, F.J.: On two families of transformations to additivity for binary response data. Biometrika 68(2), 357–363 (1981). https://doi.org/10.1093/biomet/68.2.357
Barndorff-Nielsen, O.E.: Inference on full or partial parameters based on the standardized signed log likelihood ratio. Biometrika 73(2), 307–322 (1986)
Barndorff-Nielsen, O.E.: Modified signed log likelihood ratio. Biometrika 78(3), 557–563 (1991)
Bayer, F.M., Cribari-Neto, F.: Bartlett corrections in beta regression models. Journal of Statistical Planning and Inference 143(3), 531–547 (2013). https://doi.org/10.1016/j.jspi.2012.08.018
Bayer, F.M., Cribari-Neto, F.: Bootstrap-based model selection criteria for beta regressions. Test 24(4), 776–795 (2015). https://doi.org/10.1007/s11749-015-0434-6
Bayer, F.M., Cribari-Neto, F.: Model selection criteria in beta regression with varying dispersion. Commun. Stat. Simul. Comput. 46(4), 729–746 (2017). https://doi.org/10.1080/03610918.2014.977918
Bayer, F.M., Cintra, R.J., Cribari-Neto, F.: Beta seasonal autoregressive moving average models. J. Stat. Comput. Simul. 88(15), 2961–2981 (2018). https://doi.org/10.1080/00949655.2018.1491974
Burnham, K.P., Anderson, D.R.: Multimodel inference: understanding AIC and BIC in model selection. Soc. Methods Res. 33(2), 261–304 (2004)
Canterle, D.R., Bayer, F.M.: Variable dispersion beta regressions with parametric link functions. Stat. Papers 60(5), 1541–1567 (2019). https://doi.org/10.1007/s00362-017-0885-9
Colosimo, E.A., Chalita, L.V.A.S., Demétrio, C.G.B.: Tests of proportional hazards and proportional odds models for grouped survival data. Biometrics 56(4), 1233–1240 (2000)
Cribari-Neto, F., Zeileis, A.: Beta regression in R. J. Stat. Softw. 34, 1–24 (2010)
Cribari-Neto, F., Souza, T.C.: Religious belief and intelligence: worldwide evidence. Intelligence 41(5), 482–489 (2013). https://doi.org/10.1016/j.intell.2013.06.011
Cribari-Neto, F., Lucena, S.E.F.: Non-nested hypothesis testing in the class of varying dispersion beta regressions. J. Appl. Stat. 42(5), 967–985 (2015). https://doi.org/10.1080/02664763.2014.993368
Czado, C.: On link selection in generalized linear models. Advances in GLIM and Statistical Modelling, pp. 60–65. Springer, New York (1992)
Czado, C.: On selecting parametric link transformation families in generalized linear models. J. Stat. Plan. Inference 61(1), 125–140 (1997). https://doi.org/10.1016/S0378-3758(96)00150-4
Dehbi, H.M., Cortina-Borja, M., Geraci, M.: Aranda-Ordaz quantile regression for student performance assessment. J. Appl. Stat. 43(1), 58–71 (2016)
Doornik, J.A.: An Object-Oriented Matrix Language Ox 6. Timberlake Consultants Press, London (2009)
Espinheira, P.L., Ferrari, S.L.P., Cribari-Neto, F.: On beta regression residuals. J. Appl. Stat. 35(4), 407–419 (2007)
Espinheira, P.L., Ferrari, S.L.P., Cribari-Neto, F.: Bootstrap prediction intervals in beta regressions. Comput. Stat. 29(5), 1263–1277 (2014) (Erratum: 32, 2017, 1777)
Espinheira, P.L., Santos, E.G., Cribari-Neto, F.: On nonlinear beta regression residuals. Biom. J. 59(3), 445–461 (2017)
Ferrari, S.L.P., Cribari-Neto, F.: Beta regression for modelling rates and proportions. J. Appl. Stat. 31(7), 799–815 (2004). https://doi.org/10.1080/0266476042000214501
Ferrari, S.L.P., Pinheiro, E.C.: Improved likelihood inference in beta regression. J. Stat. Comput. Simul. 81(4), 431–443 (2011)
Grün, B., Kosmidis, I., Zeileis, A.: Extended beta regression in R: Shaken, stirred, mixed, and partitioned. J. Stat. Softw. 48(11), 1–25 (2012). https://doi.org/10.18637/jss.v048.i11
Lawley, D.N.: A general method for approximating to the distribution of likelihood ratio criteria. Biometrika 43(3–4), 295–303 (1956)
Lima, F.P., Cribari-Neto, F.: Bootstrap-based testing inference in beta regression. Braz. J. Probab. Stat. 34(1), 18–34 (2020). https://doi.org/10.1214/18-BJPS412
Lynn, R., Harvey, J., Nyborg, H.: Average intelligence predicts atheism rates across 137 nations. Intelligence 37(1), 11–15 (2009)
McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. Chapman and Hall, London (1989)
Nagelkerke, N.J.D.: A note on a general definition of the coefficient of determination. Biometrika 78(3), 691–692 (1991)
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)
Ospina, R., Ferrari, S.L.P.: A general zero-or-one inflated beta regression models. Comput. Stat. Data Anal. 56(6), 1609–1623 (2012)
Ospina, R., Cribari-Neto, F., Vasconcellos, K.L.P.: Improved point and interval estimation for a beta regression model. Comput. Stat. Data Anal. 51(2), 960–981 (2006). https://doi.org/10.1016/j.csda.2005.10.002 (Erratum: 55, 2011, 2445)
Paolino, P.: Maximum likelihood estimation of models with beta-distributed dependent variables. Polit. Anal. 9(4), 325–346 (2001)
Pereira, T.L., Cribari-Neto, F.: Detecting model misspecification in inflated beta regressions. Commun. Stat. Simul. Comput. 43(3), 631–656 (2014a). https://doi.org/10.1080/03610918.2012.712183
Pereira, T.L., Cribari-Neto, F.: Modified likelihood ratio statistics for inflated beta regressions. J. Stat. Comput. Simul. 84(5), 982–998 (2014b). https://doi.org/10.1080/00949655.2012.736514
Prater, N.H.: Estimate gasoline yields from crudes. Pet. Refin. 35(5), 236–238 (1956)
Pregibon, D.: Goodness of link tests for generalized linear models. J. R. Stat. Soc. Ser. C (Appl. Stat.) 29(1), 15–24 (1980)
Rao, C.R.: Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Math. Proc. Cambridge Philos. Soc. 44(1), 50–57 (1948)
Rao, C.R.: Linear Statistical Inference and its Applications, 2nd edn. Wiley, New York (1973)
Rocha, A.V., Cribari-Neto, F.: Beta autoregressive moving average models. Test 18, 529–545 (2009). https://doi.org/10.1007/s11749-008-0112-z
Scher, V.T., Cribari-Neto, F., Pumi, G., Bayer, F.M.: Goodness-of-fit tests for \(\beta \)ARMA hydrological time series modeling. Environmetrics 31(3), e2607 (2020). https://doi.org/10.1002/env.2607
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Severini, T.A.: Likelihood Methods in Statistics, 1st edn. Oxford University Press, Oxford (2000)
Simas, A.B., Barreto-Souza, W., Rocha, A.V.: Improved estimators for a general class of beta regression models. Comput. Stat. Data Anal. 54(2), 348–366 (2010)
Skovgaard, I.M.: Likelihood asymptotics. Scand. J. Stat. 28(1), 3–32 (2001)
Smithson, M., Verkuilen, J.: A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychol. Methods 11(1), 54–71 (2006)
Zuckerman, M., Silberman, J., Hall, J.A.: The relation between intelligence and religiosity: a meta-analysis and some proposed explanations. Personal. Soc. Psychol. Rev. 17(4), 325–354 (2013)
Acknowledgements
The authors thank two anonymous referees for comments, suggestions, and constructive criticism. FCN and FMB gratefully acknowledge partial financial support from Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq). We also acknowledge partial financial support from Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
We provide below the observed and expected information matrices along with Fisher’s information matrix inverse. The observed information matrix is given by
where \(J_{(\varvec{\beta },\varvec{\beta })} = X^{\top } [\varPhi T V^{*} + S T^2 (Y^{*} - M^{*})] T \varPhi X\), \(J_{(\varvec{\beta },\varvec{\gamma })} = J_{(\varvec{\gamma },\varvec{\beta })}^{\top } = -X^{\top } [(Y^{*} - M^{*}) - \varPhi (M V^{*} + C)] T H Z\), \(J_{(\varvec{\beta },\lambda )} = J_{(\lambda ,\varvec{\beta })}^{\top } = X^{\top } [\varPhi ^2 V^{*} T \varvec{\rho } - \varPhi (Y^{*} - M^{*}) \varvec{w}]\), \(J_{(\varvec{\gamma },\varvec{\gamma })} = Z^{\top } \{H (M^2 V^{*} + 2M C + V^{\dagger }) + [ M (Y^{*} - M^{*}) + (Y^{\dagger } - M^{\dagger }) ] H^2 Q\} H Z\), \(J_{(\varvec{\gamma },\lambda )} = J_{(\lambda ,\varvec{\gamma })}^{\top } = -Z^{\top } [(Y^{*} -M^{*}) - \varPhi (M V^{*} + C)] H \varvec{\rho }\), and \(J_{(\lambda ,\lambda )} = [\varPhi ^2 V^{*} \varvec{\rho }^2 - \varPhi (Y^{*} - M^{*}) \varvec{\zeta } ]^{\top } \varvec{\imath }\). Here, \(Y^{*} = {{\,{\mathrm{diag}}\,}}(y^{*}_1, \ldots , y_n^{*})\), \(Y^{\dagger } = {{\,{\mathrm{diag}}\,}}(y_1^{\dagger }, \ldots , y_n^{\dagger })\), \(M^{*} = {{\,{\mathrm{diag}}\,}}(\mu ^{*}_1, \ldots , \mu _n^{*})\), \(M^{\dagger } = {{\,{\mathrm{diag}}\,}}(\mu _1^{\dagger }, \ldots , \mu _n^{\dagger })\), \(V^{*} = {{\,\mathrm{diag}\,}}(\upsilon _1^{*}, \ldots , \upsilon _n^{*})\), \(V^{\dagger } = {{\,\mathrm{diag}\,}}(\upsilon _1^{\dagger }, \ldots , \upsilon _n^{\dagger })\), \(C = {{\,\mathrm{diag}\,}}(c^{*\dagger }_1,\ldots , c^{*\dagger }_n)\), \(S = {{\,\mathrm{diag}\,}}(g''(\mu _1,\lambda ), \ldots , g''(\mu _n,\lambda ))\), \(Q = {{\,\mathrm{diag}\,}}(h''(\phi _1), \ldots , h''(\phi _n))\), \(\varvec{w} = (w_1, \ldots ,w_n)^{\top }\), \(\varvec{\zeta } = (\zeta _1, \ldots ,\zeta _n)^{\top }\), where \(w_t = \partial (\partial \mu _t/\partial \eta _{1t})/\partial \lambda \), \(\zeta _t = \partial ^2\mu _t/\partial \lambda ^2\) and double primes denote second derivatives; see the end of this appendix for details.
Since \({\mathbb {E}}\left( \partial \ell _t(\mu _t, \phi _t)/\partial \mu _t\right) = {\mathbb {E}}\left( \partial \ell _t(\mu _t, \phi _t)/\partial \phi _t \right) = 0\), Fisher’s information matrix is given by
where \(K_{(\varvec{\beta },\varvec{\beta })} = X^{\top } \varPhi ^2 V^{*} T^2 X\), \(K_{(\varvec{\beta },\varvec{\gamma })} = K_{(\varvec{\gamma },\varvec{\beta })}^{\top } = X^{\top } \varPhi (M V^{*} + C) T H Z\), \(K_{(\varvec{\beta }, \lambda )} = K_{(\lambda ,\varvec{\beta })}^{\top } = X^{\top } \varPhi ^2 V^{*} T \varvec{\rho }\), \(K_{(\varvec{\gamma },\varvec{\gamma })} = Z^{\top } H (M^2 V^{*} + 2M C + V^{\dagger }) H Z\), \(K_{(\varvec{\gamma }, \lambda )} = K_{(\lambda , \varvec{\gamma })}^{\top } = Z^{\top } \varPhi (M V^{*} + C) H \varvec{\rho }\), and \(K_{(\lambda , \lambda )} = \varvec{\rho }^{\top } \varPhi ^2 V^{*} \varvec{\rho }\).
In large samples and under the usual regularity conditions for maximum likelihood estimation, we have
approximately, where \(\hat{\varvec{\beta }}\), \(\hat{\varvec{\gamma }}\), and \({\hat{\lambda }}\) are the MLEs of \(\varvec{\beta }\), \(\varvec{\gamma }\), and \(\lambda \), respectively. In what follows, we shall use a result on inverses of partitioned matrices given by (Rao 1973, p. 33) to obtain a closed-form expression for \(K^{-1}\).
Consider the symmetric matrix given by
where \(I_{(\varvec{\beta },\varvec{\beta })}= K_{(\varvec{\beta },\varvec{\beta })}\), \(I_{(\varvec{\beta },\varvec{\gamma })}=K_{(\varvec{\beta },\varvec{\gamma })}\), \(I_{(\varvec{\gamma },\varvec{\beta })}=K_{(\varvec{\gamma },\varvec{\beta })}\), and \(I_{(\varvec{\gamma },\varvec{\gamma })}=K_{(\varvec{\gamma },\varvec{\gamma })}\). We denote its inverse as
Let \(\varDelta = X^{\top } \varPhi (M V^{*} + C) T H Z Z^{\top } H^{\top } T^{\top } (M V^{*} + C)^{\top } \varPhi ^{\top } X (X^{\top } \varPhi ^2 V^{*} T^2 X)^{-1}\). It can be shown that
with
\(I_p\) denoting the \(p \times p\) identity matrix. Additionally,
and \(I^{(\varvec{\gamma },\varvec{\gamma })} = \omega ^{-1}\). We obtain
where
with
Finally, we note the following results:
Appendix B
In this appendix, we provide details on the derivation of the corrected likelihood test statistics for the varying precision beta regression model with parametric mean link function. Note that \(\bar{\varvec{q}}\) is obtained from
For all \(t \not = u\), \(y_t\) and \(y_u\) are independent, and \({\mathbb {E}}_{\theta _1}(y_t^{*} - \mu _t^{{*}(1)}) = 0\). Therefore, \({\mathbb {E}}_{\theta _1}[(y_t^{*} - \mu _t^{*(1)}) (y_u^{*} - \mu _u^{*})] = 0\) and \({\mathbb {E}}_{\theta _1}[(y_t^{*} - \mu _t^{*(1)}) (y_t^{*} - \mu _t^{*})] = {\mathbb {E}}_{\theta _1}[(y_t^{*} - \mu _t^{*(1)}) (y_t^{*} - \mu _t^{*(1)})] + {\mathbb {E}}_{\theta _1}[(y_t^{*} - \mu _t^{*(1)}) (\mu _t^{*(1)} - \mu _t^{*})] = {\mathbb {E}}_{\theta _1}[(y_t^{*} - \mu _t^{*(1)})^2] = \upsilon _t^{*(1)}\). Evaluation at \(\theta _1\) is indicated by the superscript ‘(1)’.
After some algebra, we arrive at
Thus,
Using Equations (4) and (6), we obtain
Hence,
Similarly, from (4) and (7) it follows that
Thus,
We shall now move to the derivation of \({\bar{\varUpsilon }}\), which is obtained from
From Eqs. (5), (6), and (7), we obtain
Appendix C
In this appendix, we provide details on the derivation of the score test statistic for testing \({\mathcal {H}}_0: \lambda = 1\) (logit link).
The general form of the score test statistic is \( S_R=U(\tilde{\varvec{\theta }})' K^{-1}(\tilde{\varvec{\theta }})U(\tilde{\varvec{\theta }}), \) where \(U(\tilde{\varvec{\theta }})\) is the score vector and \(K^{-1}(\tilde{\varvec{\theta }})\) is Fisher’s information inverse matrix, both evaluated at \(\tilde{\varvec{\theta }}\). When the interest lies in testing \({\mathcal {H}}_0: \lambda =1\) in the varying precision beta regression model with parametric link function, the score test statistic can be expressed as
where
and
Under the null hypothesis and when n is large, \(S_R\) is approximately \(\chi ^2_1\) distributed.
Rights and permissions
About this article
Cite this article
Rauber, C., Cribari-Neto, F. & Bayer, F.M. Improved testing inferences for beta regressions with parametric mean link function. AStA Adv Stat Anal 104, 687–717 (2020). https://doi.org/10.1007/s10182-020-00376-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-020-00376-3