Skip to main content
Log in

A split questionnaire survey design in the context of statistical matching

  • Original Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

In this paper, we tackle the problem of splitting a long (potentially time consuming) questionnaire into two parts, where each participant only responds to a fraction of the questions, and all respondents obtain a common portion of questions. We propose a method that combines regression models to the two independent samples (questionnaires) in the survey. Each sample includes the common response variable Y and common covariate x, while two vectors of specific covariates z and w are recorded such that no single sampling unit has answered both z and w. This corresponds to the problem of statistical matching that we tackle under the assumption of conditional independence. In the statistical matching context, we use a macro approach to estimate parameters of a regression model. This means that we can estimate the joint distribution of all variables of interest with available data utilizing the assumption of conditional independence. We make use of this here by fitting three regression models with the same response variable for each model. Combining the three models allows us to obtain a prediction model with all covariates in common. We compare the performance of our proposed method in simulation studies as well as a real data example. Our method gives better results as compared to commonly used alternative methods. The proposed routine is easy to apply in practice and it neither requires the formulation of a model for the covariates itself nor an imputation model for the missing covariates vectors z and w.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Burgette LF, Reiter JP (2010) Multiple imputation for missing data via sequential regression trees. Am J Epidemiol 172(9):1070–1076

    Article  Google Scholar 

  • Chipperfield JO, Steel DG (2009) Design and estimation for split questionnaire surveys. J Offic Stat 25(2):227–244

    Google Scholar 

  • Cutillo A, Scanu M (2020) A mixed approach for data fusion of HBS and SILC. J Soc Indic Res. https://doi.org/10.1007/s11205-020-02316-9

    Article  Google Scholar 

  • Donatiello G, D’Orazio M, Frattarola D, Rizzi A, Scanu M, Spaziani M (2016) The role of the conditional independence assumption in statistically matching income and consumption. Stat J IAOS 32:667–675

    Article  Google Scholar 

  • D’Orazio M (2015) Integration and imputation of survey data in R: the StatMatch package. J Rom Stat Rev 2:57–68

    Google Scholar 

  • D’Orazio M, Di Zio M, Scanu M (2006a) Statistical matching: theory and practice. Wiley, New York

    Book  Google Scholar 

  • D’Orazio M, Di Zio M, Scanu M (2006b) Statistical matching for categorical data: displaying uncertainty and using logical constraints. J Offic Stat 22:137–157

    Google Scholar 

  • Doretti M, Geneletti S, Stanghellini E (2018) Missing data: a unified taxonomy guided by conditional independence. Int Stat Rev 86(2):189–204

    Article  MathSciNet  Google Scholar 

  • Endres E (2019) Statistical matching meets probabilistic graphical models: contributions to categorical data fusion. Ph.D. Dissertation. Ludwig-Maximilians-University Munich

  • Endres E, Augustin T (2016) Statistical matching of discrete data by Bayesian networks. Proc Eight Int Conf Probabil Graph Mod Proc Mach Learn Res 52:159–170

    Google Scholar 

  • Endres E, Augustin T (2019) Utilizing log-linear Markov networks to integrate categorical data files, Technical Report 222. Department of Statistics, LMU Munich

  • Fahrmeir L, Kenib T, Lang S, Marx B (2013) Regression-models, methods and applications. Springer, Berlin

    Google Scholar 

  • Fitzenberger B, Fuchs B (2017) The residency discount for rents in Germany and the tenancy law reform act 2001: evidence from quantile regressions. German Econ Rev 18(2):212–236

    Article  Google Scholar 

  • Graham JW, Taylor BJ, Olchowski AE, Cumsille PE (2006) Planned missing data designs in psychological research. Psychol Methods 11(4):323–343

    Article  Google Scholar 

  • Kamgar S, Navvabpour H (2017) An efficient method for estimating population parameters using split questionnaire design. J Stat Res Iran 14(1):77–99

    Article  Google Scholar 

  • Kamgar S, Meinfelder F, Münnich R (2018) Estimation within the new integrated system of household surveys in Germany. J Stat Pap 1–27

  • Kaplan D, McCarty AT (2013) Data fusion with international large scale assessments: a case study using the OECD PISA and TALIS surveys. Large-Scale Assess Educ. https://doi.org/10.1186/2196-0739-1-6

    Article  Google Scholar 

  • Kauermann G, Ali M (2020) Semi-parametric regression when some (expensive) covariates are missing by design. J Stat Pap 1–22. https://doi.org/10.1007/s00362-019-01152-5

  • Kim K, Park M (2019) Statistical micro matching using a multinomial logistic regression model for categorical data. Commun Stat Appl Methods 26(5):507–517

    Google Scholar 

  • Kim JK, Berg E, Park T (2016) Statistical matching using fractional imputation. Surv Methodol 42(1):19–40

    Google Scholar 

  • Little RJA (1992) Regression with missing X’s: a review. J Am Stat Assoc 87(420):1227–1237

    Google Scholar 

  • Little RJ, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, London. https://doi.org/10.1002/9781119013563

    Book  MATH  Google Scholar 

  • Moriarity C, Scheuren F (2001) Statistical matching: a paradigm for assessing the uncertainty in the procedure. J Offic Stat 17(3):407–422

    Google Scholar 

  • Peytchev A, Peytcheva E (2017) Reduction of measurement error due to survey length: evaluation of the split questionnaire design approach. Surv Res Methods 11(4):361–368

    Google Scholar 

  • Pigott TD (2001) A review of methods for missing data. Educ Res Eval 7(4):3535–3830

    Article  Google Scholar 

  • Raghunathan TE, Grizzle JE (1995) A split questionnaire survey design. J Am Stat Assoc 90(429):54–63

    Article  Google Scholar 

  • Rässler S (2002) Statistical matching: a frequentist theory, practical applications, and alternative bayesian approaches. Springer, New York. https://doi.org/10.1007/978-1-4613-0053-3

    Book  MATH  Google Scholar 

  • Rässler S (2004) Data fusion: identification problems, validity, and multiple imputation. Austrian J Stat 33:153–171

    Google Scholar 

  • Rendall MS, Dastidar BG, Weden MM, Baker EH, Nazarov Z (2013) Multiple imputation for combined-survey estimation with incomplete regressors in one but not both surveys. Sociol Methods Res 42(4):483–530

    Article  MathSciNet  Google Scholar 

  • Roszka W (2015) Some practical issues related to the integration of data from sample surveys. Statistika: Stat Econ J 95(1):60–75

  • Rubin DB (1986) Statistical matching using file concatenation with adjusted weights and multiple imputations. J Bus Econ Stat 4(1):87–94

    MathSciNet  Google Scholar 

  • Singh AC, Mantel H, Kinack M, Rowe G (1993) Statistical matching: use of auxiliary information as an alternative to the conditional independence assumption. Surv Methodol 19:59–79

    Google Scholar 

  • Stuart M, Yu C (2019) A computationally efficient method for selecting a split questionnaire design. Creat Compon. https://lib.dr.iastate.edu/creativecomponents/252

  • Van Buuren S, Groothuis-Oudshoorn K (2011) mice: Multivariate imputation by chained equations in R. J Stat Softw 45(3):1–67

    Article  Google Scholar 

  • Vantaggi B (2008) Statistical matching of multiple sources: a look through coherence. Int J Approx Reason 49:701–711

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mehboob Ali.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Variables list for rent data example

Appendix: Variables list for rent data example

Common variables Y, x

Component 1 z

Component 2 w

Samples

\(\hbox {Y} =\) rent per square meter (in Euros), x = the floor space

\(\hbox {z1} = 1\) if the apartment does not have an upmarket kitchen,

Missing

\(\hbox {S}_{\mathrm{a}}\)

\(\hbox {z2} = 1\) if the apartment has an open kitchen,

\(\hbox {z3} = 1\) if the apartment lies in an apartment type building,

\(\hbox {z4} = 1\) if the apartment lies in an old building,

\(\hbox {z5} = 1\) if the apartment is located in a back premises,

\(\hbox {z6} = 1\) if apartment has standard central heating,

\(\hbox {z7} = 1\) if the apartment has under floor heating

Missing

\(\hbox {w1} = 1\) if the apartment has good bathroom equipment,

\(\hbox {S}_{\mathrm{b}}\)

\(\hbox {w2} = 1\) if the apartment lies in an average residential location,

\(\hbox {w3} = 1\) if the apartment has a second rest room,

\(\hbox {w4}= 1\) if the apartment has a new floor,

\(\hbox {w5} = 1\) if the apartment has a bad floor,

\(\hbox {w6} = 1\) if the apartment has a good floor,

\(\hbox {w7} = 1\) if the apartment lies in a ground floor

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ali, M., Kauermann, G. A split questionnaire survey design in the context of statistical matching. Stat Methods Appl 30, 1219–1236 (2021). https://doi.org/10.1007/s10260-020-00554-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-020-00554-2

Keywords

Navigation