Skip to main content
Log in

Characterizations and generalizations of the negative binomial distribution

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

In this paper, we give detailed descriptions of the Zero-Modified Negative Binomial distribution for analyzing count data. In particular, we study the characterizations and properties of this distribution, whose main advantage is its flexibility which makes it suitable for modeling a wide range of overdispersed and underdispersed count data (which may or may not be caused by zero-modification, i.e., the inflation or deflation of zeroes), without requiring previous knowledge about any of these inherent data characteristics. We derive maximum likelihood estimation of the model parameters based on positive observations, and evaluate the loss of efficiency by considering this procedure. We illustrate the suitability of this distribution on real data sets with different types of zero-modification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Aragón J, Eberly D, Eberly S (1992) Existence and uniqueness of the maximum likelihood estimator for the two-parameter negative binomial distribution. Stat Probab Lett 15(5):375–379

    Article  MathSciNet  Google Scholar 

  • Bayarri MJ, Berger JO, Datta GS (2008) Objective bayes testing of poisson versus inflated poisson models. Inst Math Stat 3:105–121

    MathSciNet  Google Scholar 

  • Binns M (1975) Sequential estimation of the mean of a negative binomial distribution. Biometrika 62(2):433–440

    Article  MathSciNet  Google Scholar 

  • Bliss CI, Fisher RA (1953) Fitting the negative binomial distribution to biological data. Biometrics 9(2):176–200

    Article  MathSciNet  Google Scholar 

  • Cohen AC (1960) An extension Ao a truncated Poisson distribution. Biometrics 16:447–450

    Google Scholar 

  • Conceição KS, Louzada F, Andrade MG, Helou E (2017) Zero-modified power series distribution and its hurdle distribution version. J Stat Comput Simul 87:1842–1862

    Article  MathSciNet  Google Scholar 

  • Conigliani C, Castro JI, O’Hagan A (2000) Bayesian assessment of goodness of fit against nonparametric alternatives. Canadian J Stat 28(2):327–342

    Article  MathSciNet  Google Scholar 

  • Consul PC (1990) New class of location-parameter discrete probability distributions and their characterizations. Commun Stat Theory Methods 19:4653–4666

    Article  MathSciNet  Google Scholar 

  • Consul PC, Famoye F (2006) Lagrangian probability distributions. Birkhäuser, Boston

    MATH  Google Scholar 

  • Cordeiro GM, Andrade MG, de Castro M (2009) Power series generalized nonlinear models. Comput Stat Data Anal 53:1155–1166

    Article  MathSciNet  Google Scholar 

  • David FN, Johnson NI (1952) The truncated Poisson. Biometrics 8:275–285

    Article  MathSciNet  Google Scholar 

  • Dietz E, Böhning D (2000) On estimation of the poisson parameter in zero-modified poisson models. Comput Stat Data Anal 34:441–459

    Article  Google Scholar 

  • Frome EL (1983) The analysis of rates using Poisson regression model. Biometrics 39:665–674

    Article  Google Scholar 

  • Frome EL, Checkoway H (1985) Use of Poisson regression models in estimating incidence rates and ratios. Am J Epidemiol 121(2):309–323

    Article  Google Scholar 

  • Gourieroux C, Monfort A (1995) Testing encompassing and simulating dynamic econometric model. Econom Theory 11:195–228

    Article  MathSciNet  Google Scholar 

  • Gupta RC (1974) Modified power series distribution and some of its applications. Indian J Stat 36(3):288–298

    MathSciNet  MATH  Google Scholar 

  • Heilbron DC (1994) Zero-altered and other rRegression models for count data with aAdded zeros. Biometrical J 36(5):531–547

    Article  Google Scholar 

  • Hinde J, Demetrio CGB (1998) Overdispersion: models and estimation. Comput Stat Data Anal 27:151–170

    Article  Google Scholar 

  • Jain GC, Consul PC (1971) A generalized negative binomial distribution. SIAM J Appl Math 21(4):501–513

    Article  MathSciNet  Google Scholar 

  • Jennrich RI, Sampson PF (1976) Newton-raphson and related algorithms for maximum likelihood variance component estimation. Technometrics 18(1):11–17

    Article  MathSciNet  Google Scholar 

  • Johnson N. L, Kotz S (1969) Discrete Distributions, first. Wiley, New York

    MATH  Google Scholar 

  • Johnson N. L, Kemp A. W, Kotz S (2005) Univariate Discrete Distributions, third. Wiley, New York

    Book  Google Scholar 

  • Kashyap RL (1982) Optimal Choice of AR and MA Parts in Autoregressive Moving Average Models. IEEE Trans Pattern Anal Mach Intell PAMI–4(2):99–104

    Article  Google Scholar 

  • Lu M, Mizon GE (1996) The encompassing principle and hypothesis testing. Econometric Theory 12(5):845–858

    Article  MathSciNet  Google Scholar 

  • Mizon GE, Richard JF (1986) The encompassing principle and its application to testing non-nested hypotheses. Econometrica 54(3):657–78

    Article  MathSciNet  Google Scholar 

  • M’Kendrick AG (1926) Applications of mathematics to medical problems. Proc Edinburgh Math Soc 44:98–103

    Article  Google Scholar 

  • Mullahy J (1986) Specification and testing of some modified count data models. J Econom 33:341–365

    Article  MathSciNet  Google Scholar 

  • Ng T (1989) A new class of modified binomial distributions with applications to certain toxicological experiments. Commun Stat Theory Methods 18(9):3477–3492

    Article  Google Scholar 

  • Pesaran MH, Weeks M (1999) Non-nested Hypothesis Testing: An overview. Technical report, Faculty of Economics and Politics. University of Cambridge, Cambridge

  • Piegorsch WW (1990) Maximum likelihood estimation for the negative binomial dispersion parameter. Biometrics 46:863–867

    Article  MathSciNet  Google Scholar 

  • Podlich HM, Faddy MJ, Smyth GK (2002) A general approach to modeling and analysis of species abundance data with extra zeros. J Agric Biol Environ Stat 7(3):324–334

    Article  Google Scholar 

  • R Core Team (2020) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria

  • Ridout, M., Demétrio, C. G. B. & Hinde, J. (1998). Models for count data with many zeros. Proceedings of the XIXth International Biometrics Conference, pages 179–192. Cape Town, Invited Papers

  • Shahmandi M, Wilson P, Thelwall M (2020) A new algorithm for zero-modified models applied to citation counts. Scientometrics 125:993–1010

    Article  Google Scholar 

  • Umbach D (1981) On inference for a mixture of a Poisson and a degenerate distribution. Commun Stat Theory Methods 10:299–306

    Article  MathSciNet  Google Scholar 

  • Welsh AH, Cunningham RB, Donnelly CF, Lindenmayer DB (1996) Modelling the abundance of rare species: statistical models for counts with extra zeros. Ecol Modell 88:297–308

    Article  Google Scholar 

  • Ye M, Meyer PD, Neuman SP (2008) On model selection criteria in multimodal analysis. Water Resour Res 44:1–12

    Google Scholar 

Download references

Acknowledgements

We are indebted to the Editorial Boarding and Referees for their valuable comments, criticisms, and suggestions, which have substantially improved the text of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Katiane S. Conceição.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Katiane S. Conceição is supported by the Brazilian organization Fundação de Amparo à Pesquisa do Estado de São Paulo - FAPESP (2019/22412-5); Marinho G. Andrade is supported by the Brazilian organization Fundação de Amparo à Pesquisa do Estado de São Paulo - FAPESP (2019/21766-8); Francisco Louzada is supported by the Brazilian organizations CNPq (301976/2017-1) and FAPESP (2013/07375-0).

Fisher score method

Fisher score method

We considered the Fisher score method to calculate the maximum likelihood estimates for \(\mu \) and \(\phi \) parameters of ZMNB (or ZTNB) distribution. For this, we use the iterative equations:

$$\begin{aligned} \left( \begin{array}{c} \mu ^{(j+1)}\\ \phi ^{(j+1)}\\ \end{array} \right) =\left( \begin{array}{c} \mu ^{(j)}\\ \phi ^{(j)}\\ \end{array} \right) +\left[ \begin{array}{ccccccccc} \mathcal {J}^{^+(j)}_{\mu \mu } &{} \mathcal {J}^{^+(j)}_{\mu \phi }\\ \mathcal {J}^{^+(j)}_{\phi \mu } &{} \mathcal {J}^{^+(j)}_{\phi \phi }\\ \end{array}\right] ^{-1}\times \left( \begin{array}{c} \mathcal {U}_{\mu }^{^+(j)}\\ \mathcal {U}_{\phi }^{^+(j)}\\ \end{array} \right) . \end{aligned}$$

The maximum likelihood estimates of the parameters are obtained when \((\mathcal {U}_{\mu }^{^+(j)})^2+(\mathcal {U}_{\phi }^{^+(j)})^2< \varepsilon \) occur, where \(\varepsilon \) is the error in the estimation (ie, when the difference between iterations is less than a pre-established error \(\varepsilon \)). A detailed description of the Fisher score algorithm is presented as follows:

figure a

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Conceição, K.S., Andrade, M.G., Louzada, F. et al. Characterizations and generalizations of the negative binomial distribution. Comput Stat 37, 1255–1286 (2022). https://doi.org/10.1007/s00180-021-01150-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-021-01150-y

Keywords

Navigation