Abstract
In this paper, we propose new semiparametric procedures for inference on linear functionals in the context of two semicontinuous populations. The distribution of each semicontinuous population is characterized by a mixture of a discrete point mass at zero and a continuous skewed positive component. To utilize the information from both populations, we model the positive components of the two mixture distributions via a semiparametric density ratio model. Under this model setup, we construct the maximum empirical likelihood estimators of the linear functionals. The asymptotic normality of the proposed estimators is established and is used to construct confidence regions and perform hypothesis tests for these functionals. We show that the proposed estimators are more efficient than the fully nonparametric ones. Simulation studies demonstrate the advantages of our method over existing methods. Two real-data examples are provided for illustration.
Similar content being viewed by others
References
Anderson, J. A. (1979). Multivariate logistic compounds. Biometrika, 66, 17–26.
Böhning, D., Alfò, M. (2016). Editorial: Special issue on models for continuous data with a spike at zero. Biometrical Journal, 58, 255–258.
Brunner, E., Dette, H., Munk, A. (1997). Box-type approximations in nonparametric factorial designs. Journal of the American Statistical Association, 92, 1494–1502.
Cai, S., Chen, J. (2018). Empirical likelihood inference for multiple censored samples. The Canadian Journal of Statistics, 46(2), 212–232.
Cai, S., Chen, J., Zidek, J. V. (2017). Hypothesis test in the presence of multiple samples under density ratio models. Statistica Sinica, 27, 761–783.
Chen, J., Liu, Y. (2013). Quantile and quantile-function estimations under density ratio model. The Annals of Statistics, 41, 1669–1692.
Chen, Y.-H., Zhou, X.-H. (2006). Generalized confidence intervals for the ratio or difference of two means for lognormal populations with zeros. Working Paper 296, UW Biostatistics Working Paper Series. https://biostats.bepress.com/uwbiostat/paper296.
Dufour, J.-M., Flachaire, E., Khalaf, L. (2019). Permutation tests for comparing inequality measures. Journal of Business and Economic Statistics, 37, 457–470.
Efron, B., Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman and Hall.
Fernholz, L. T. (1983). von Mises calculus for statistical functionals. New York: Springer.
Jiang, S., Tu, D. (2012). Inference on the probability \(P(T_1<T_2)\) as a measurement of treatment effect under a density ratio model and random censoring. Computational Statistics and Data Analysis, 56, 1069–1078.
Kang, L., Vexler, A., Tian, L., Cooney, M., Louis, G. M. B. (2010). Empirical and parametric likelihood interval estimation for populations with many zero values: Application for assessing environmental chemical concentrations and reproductive health. Epidemiology, 21, S58–S63.
Kay, R., Little, S. (1987). Transformations of the explanatory variables in the logistic regression model for binary data. Biometrika, 74, 495–501.
Koopmans, L. H. (1981). Introduction to contemporary statistical methods. Boston: Duxbury Press.
Li, H., Liu, Y., Liu, Y., Zhang, R. (2018). Comparison of empirical likelihood and its dual likelihood under density ratio model. Journal of Nonparametric Statistics, 30, 581–597.
Lu, Y.-H., Liu, A.-Y., Jiang, M.-J., Jiang, T. (2020). A new two-part test based on density ratio model for zero-inflated continuous distributions. Applied Mathematics-A Journal of Chinese Universities, 35, 203–219.
Neuhauser, M. (2011). Nonparametric statistical tests: A computational approach. Boca Raton: CRC Press.
Nixon, R. M., Thompson, S. G. (2004). Parametric modelling of cost data in medical studies. Statistics in Medicine, 23, 1311–1331.
Owen, A. (2001). Empirical likelihood. New York: CRC Press.
Pauly, M., Brunner, E., Konietschke, F. (2015). Asymptotic permutation tests in general factorial designs. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77, 461–473.
Qin, J. (2017). Biased sampling, over-identified parameter problems and beyond. Singapore: Springer.
Qin, J., Zhang, B. (1997). A goodness-of-fit test for logistic regression models based on case-control data. Biometrika, 84, 609–618.
Satter, F., Zhao, Y. (2021). Jackknife empirical likelihood for the mean difference of two zero-inflated skewed populations. Journal of Statistical Planning and Inference, 211, 414–422.
Serfling, R. J. (1980). Approximation theorems of mathematical statistics. New York: Wiley.
Shao, J., Tu, D. (1995). The jackknife and bootstrap. New York: Springer.
Tu, W., Zhou, X.-H. (1999). A Wald test comparing medical costs based on log-normal distributions with zero valued costs. Statistics in Medicine, 18, 2749–2761.
Wang, C., Marriott, P., Li, P. (2017). Testing homogeneity for multiple nonnegative distributions with excess zero observations. Computational Statistics and Data Analysis, 114, 146–157.
Wang, C., Marriott, P., Li, P. (2018). Semiparametric inference on the means of multiple nonnegative distributions with excess zero observations. Journal of Multivariate Analysis, 166, 182–197.
Wu, C., Yan, Y. (2012). Empirical likelihood inference for two-sample problems. Statistics and Its Interface, 5, 345–354.
Yuan, M., Li, P., Wu, C. (2021). Semiparametric inference of the Youden index and the optimal cut-off point under density ratio models. The Canadian Journal of Statistics. https://doi.org/10.1002/cjs.11600.
Zhou, X.-H., Tu, W. (1999). Comparison of several independent population means when their samples contain log-normal and possibly zero observations. Biometrics, 55, 645–651.
Zhou, X.-H., Tu, W. (2000). Interval estimation for the ratio in means of log-normally distributed medical costs with zero values. Computational Statistics and Data Analysis, 35, 201–210.
Acknowledgements
The authors thank the Chief Editor, the Associate Editor, and two reviewers for their very careful reading and a number of helpful comments. The authors are grateful to Dr. Changbao Wu for his constructive and helpful comments. Dr. Wang’s work is supported in part by National Natural Science Foundation of China Grants 12001454, 11971404, Humanities and Social Sciences Foundation of the Ministry of Education of China Grant 19YJC910005, and Natural Science Foundation of Fujian Province Grant 2020J01031. Dr. Li’s work is supported in part by the Natural Sciences and Engineering Research Council of Canada Grant RGPIN-2020-04964.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
About this article
Cite this article
Yuan, M., Wang, C., Lin, B. et al. Semiparametric inference on general functionals of two semicontinuous populations. Ann Inst Stat Math 74, 451–472 (2022). https://doi.org/10.1007/s10463-021-00804-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-021-00804-4