Abstract
Despite the high complexity of the real world, linear regression still plays an important role in estimating parameters to model a physical relationship between at least two variables. The precision of the estimated parameters, which can usually be considered as an indicator of the solution quality, is conventionally obtained from the inverse of the normal equations matrix for which intensive computation is required when the number of observations is large. In addition, the impacts of the distribution of the observations on parameter precision are rarely reported in the literature. In this paper, we propose a new methodology to model the distribution of observations for linear regression in order to predict the parameter precision prior to actual data collection and performing the regression. The precision analysis can be readily performed given a hypothesized data distribution. The methodology has been verified with several simulated and real datasets. The results show that the empirical and model-predicted precisions match very well, with discrepancies of up to 6% and 3.4% for simulated and real datasets, respectively. Simulations demonstrate that these differences are simply due to finite sample size. In addition, simulation also demonstrates the relative insensitivity of the method to noise in the independent regression variables that causes deviations from the data distribution function. The proposed methodology allows straightforward prediction of the parameter precision based on the distribution of the observations related to their numerical limits and geometry, which greatly simplify design procedures for various experimental setups commonly involved in geodetic surveying such as LiDAR data collection.
Similar content being viewed by others
References
Abraham B, Ledoliter J (2006) Introduction to Regression Modelling. Thompson Brooks/Cole, Belmont, CA
Berné J, Baselga (2004) First-order design of geodetic networks using the simulated annealing method. J Geodesy 78:47–54
Chan TO, Lichti D, Roesler G, Cosandier D, Al-Durgham K (2019) Range scale-factor calibration of the Velodyne VLP-16 LiDAR system for position tracking applications. In Proceedings of the 11th International Conference on Mobile Mapping. Shenzhen, China, 6–8 May. 70–77. 350–355.
Evans M, Hastings N, Peacock B (2000) Statistical Distributions, 3rd edn. John Wiley & Sons, New York
Featherstone WE, Lichti DD (2009) Fitting gravimetric geoid models to vertical deflections. J Geodesy 83:583–589
Grafarend E (1974) Optimization of geodetic networks. Bolletino di Geodesia a Science Affini 33(4):351–406
Grafarend WE, Sansò F (eds) (1985) Optimization and design of geodetic networks. Springer, Berlin
Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: Review of methods and applications. Expert Syst Appl 73:220–239
Hekimoglu S, Berber M (2003) Effectiveness of robust methods in heterogeneous linear models. J Geodesy 76:706–713
Holst C, Eling C, Kuhlmann H (2013) Automatic optimization of height network configurations for detection of surface deformations. Journal of Applied Geodesy 7(2):103–113
Holst C, Artz T, Kuhlmann H (2014) Biased and unbiased estimates based on laser scans of surfaces with unknown deformations. Journal of Applied Geodesy 8(3):169–184
Jazireeyan I, Ardalan AA (2017) Absolute calibration of satellite altimetry using linear regression and harmonic analysis. Geodesy and Cartography 43(3):83–91
Jia F, Lichti DD (2019) A model-based design system for terrestrial laser scanning networks in complex sites. Remote Sensing 11:1749
Leevy JL, Khoshgoftaar TM, Bauder RA, Seliya N (2018) A survey on addressing high-class imbalance in big data. Journal of Big Data 5:42
Lichti DD, O’Keefe K, Jamtsho S (2010) Propagation of an unmodeled additive constant in range sensor observations. ASCE Journal of Surveying Engineering 136(3):111–119
Mahboub V (2012) On weighted total least-squares for geodetic transformations. J Geodesy 86:359–367
Mikhail EM (1976) Observations and Least Squares. IEP, New York
Mozaffar M, Varshosaz M (2016) Optimal placement of a terrestrial laser scanner with an emphasis on reducing occlusions. Photogrammetric Record 31(156):374–393
Pelto BM, Menounos B, Marshall SJ (2019) Multi-year evaluation of airborne geodetic surveys to estimate seasonal mass balance, Columbia and Rocky Mountains, Canada. The Cryosphere 13:1709–1727
Rawlings JO, Pantula SG, Dickey DA (1989) Applied Regression Analysis: a Research Tool, 2nd edn. Springer-Verlag, New York
Rüeger JM (1990) Electronic distance measurement: An introduction, 3rd edn. Springer, Heidelberg, Germany
Ruffhead A (2018) Introduction to multiple regression equations in datum transformations and their reversibility. Survey Review 50(358):82–90
Schaffrin B, Wieser A (2008) On weighted total least-squares adjustment for linear regression. J Geodesy 82:415–421
Schaffrin B, Felus YA (2008) On the multivariate total least-squares approach to empirical coordinate transformations. Three algorithms. J Geodesy 82:373–383
Searle SR (1986) Linear Models for Unbalanced Data. Wiley, New York
Soudarissanane, S. (2016) The Geometry of Terrestrial Laser Scanning: Identification of Errors, Modeling and Mitigation of Scanning Geometry. Dissertation, Delft University of Technology.
Staneski, PG (1990). The Truncated Cauchy Distribution: Estimation of Parameters and Application to Stock Returns. Dissertation, Old Dominion University.
Yang Y (1999) Robust estimation of geodetic datum transformation. J Geodesy 73:268–274
Acknowledgements
This work is supported by the Natural Sciences and Engineering Research Council of Canada (RGPIN-2018-03775). Data for example 2 have been made available by the City of Calgary. The authors wish to thank the anonymous reviewers who provided very constructive comments to help improve our paper.
Funding
This work is supported by the Natural Sciences and Engineering Research Council of Canada (RGPIN-2018–03775).
Author information
Authors and Affiliations
Contributions
DDL designed the research; DDL, TOC and DL performed the research; DDL, TOC and DB analysed data; DDL and TOC wrote the paper.
Corresponding author
Ethics declarations
Conflicts of interest
None.
Appendix A
Appendix A
In this appendix, the unbiasedness of the estimates of the slope parameter, m, and the y-axis intercept parameter, b, estimated by the linear least-squares model incorporating the x-observation distribution function, p(x), is proven. The limits of integration [a1, a2] are unequal (a1 < a2) to keep the proof general. The distribution function is assumed to be centred at x = 0.
The expectation of the estimated slope stems from the solution to the least-squares normal equations (Eq. 9) and substituting the specific forms given by Eqs. 30 and 31
results in
Since x has been assumed to be error-free, this expression reduces to
The numerator is analysed by substituting the model of Eq. 1
Under the stated assumptions, the numerator reduces to
Division of Equation A5 by the denominator of Equation A3 results in the following
Therefore, the estimated slope parameter is unbiased.
The expected value of the intercept parameter is given by
Since the denominator is unity by definition (Eq. 20), it is sufficient to analyse only the numerator
Therefore, the estimated y-axis intercept parameter is also unbiased.
Rights and permissions
About this article
Cite this article
Lichti, D.D., Chan, T.O. & Belton, D. Linear regression with an observation distribution model. J Geod 95, 23 (2021). https://doi.org/10.1007/s00190-021-01484-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00190-021-01484-x