Skip to main content
Log in

Linear regression with an observation distribution model

  • Original Article
  • Published:
Journal of Geodesy Aims and scope Submit manuscript

Abstract

Despite the high complexity of the real world, linear regression still plays an important role in estimating parameters to model a physical relationship between at least two variables. The precision of the estimated parameters, which can usually be considered as an indicator of the solution quality, is conventionally obtained from the inverse of the normal equations matrix for which intensive computation is required when the number of observations is large. In addition, the impacts of the distribution of the observations on parameter precision are rarely reported in the literature. In this paper, we propose a new methodology to model the distribution of observations for linear regression in order to predict the parameter precision prior to actual data collection and performing the regression. The precision analysis can be readily performed given a hypothesized data distribution. The methodology has been verified with several simulated and real datasets. The results show that the empirical and model-predicted precisions match very well, with discrepancies of up to 6% and 3.4% for simulated and real datasets, respectively. Simulations demonstrate that these differences are simply due to finite sample size. In addition, simulation also demonstrates the relative insensitivity of the method to noise in the independent regression variables that causes deviations from the data distribution function. The proposed methodology allows straightforward prediction of the parameter precision based on the distribution of the observations related to their numerical limits and geometry, which greatly simplify design procedures for various experimental setups commonly involved in geodetic surveying such as LiDAR data collection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4.
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Abraham B, Ledoliter J (2006) Introduction to Regression Modelling. Thompson Brooks/Cole, Belmont, CA

    Google Scholar 

  • Berné J, Baselga (2004) First-order design of geodetic networks using the simulated annealing method. J Geodesy 78:47–54

    Article  Google Scholar 

  • Chan TO, Lichti D, Roesler G, Cosandier D, Al-Durgham K (2019) Range scale-factor calibration of the Velodyne VLP-16 LiDAR system for position tracking applications. In Proceedings of the 11th International Conference on Mobile Mapping. Shenzhen, China, 6–8 May. 70–77. 350–355.

  • Evans M, Hastings N, Peacock B (2000) Statistical Distributions, 3rd edn. John Wiley & Sons, New York

    Google Scholar 

  • Featherstone WE, Lichti DD (2009) Fitting gravimetric geoid models to vertical deflections. J Geodesy 83:583–589

    Article  Google Scholar 

  • Grafarend E (1974) Optimization of geodetic networks. Bolletino di Geodesia a Science Affini 33(4):351–406

    Google Scholar 

  • Grafarend WE, Sansò F (eds) (1985) Optimization and design of geodetic networks. Springer, Berlin

    Google Scholar 

  • Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: Review of methods and applications. Expert Syst Appl 73:220–239

    Article  Google Scholar 

  • Hekimoglu S, Berber M (2003) Effectiveness of robust methods in heterogeneous linear models. J Geodesy 76:706–713

    Article  Google Scholar 

  • Holst C, Eling C, Kuhlmann H (2013) Automatic optimization of height network configurations for detection of surface deformations. Journal of Applied Geodesy 7(2):103–113

    Article  Google Scholar 

  • Holst C, Artz T, Kuhlmann H (2014) Biased and unbiased estimates based on laser scans of surfaces with unknown deformations. Journal of Applied Geodesy 8(3):169–184

    Article  Google Scholar 

  • Jazireeyan I, Ardalan AA (2017) Absolute calibration of satellite altimetry using linear regression and harmonic analysis. Geodesy and Cartography 43(3):83–91

    Article  Google Scholar 

  • Jia F, Lichti DD (2019) A model-based design system for terrestrial laser scanning networks in complex sites. Remote Sensing 11:1749

    Article  Google Scholar 

  • Leevy JL, Khoshgoftaar TM, Bauder RA, Seliya N (2018) A survey on addressing high-class imbalance in big data. Journal of Big Data 5:42

    Article  Google Scholar 

  • Lichti DD, O’Keefe K, Jamtsho S (2010) Propagation of an unmodeled additive constant in range sensor observations. ASCE Journal of Surveying Engineering 136(3):111–119

    Article  Google Scholar 

  • Mahboub V (2012) On weighted total least-squares for geodetic transformations. J Geodesy 86:359–367

    Article  Google Scholar 

  • Mikhail EM (1976) Observations and Least Squares. IEP, New York

    Google Scholar 

  • Mozaffar M, Varshosaz M (2016) Optimal placement of a terrestrial laser scanner with an emphasis on reducing occlusions. Photogrammetric Record 31(156):374–393

    Article  Google Scholar 

  • Pelto BM, Menounos B, Marshall SJ (2019) Multi-year evaluation of airborne geodetic surveys to estimate seasonal mass balance, Columbia and Rocky Mountains, Canada. The Cryosphere 13:1709–1727

    Article  Google Scholar 

  • Rawlings JO, Pantula SG, Dickey DA (1989) Applied Regression Analysis: a Research Tool, 2nd edn. Springer-Verlag, New York

    Google Scholar 

  • Rüeger JM (1990) Electronic distance measurement: An introduction, 3rd edn. Springer, Heidelberg, Germany

    Book  Google Scholar 

  • Ruffhead A (2018) Introduction to multiple regression equations in datum transformations and their reversibility. Survey Review 50(358):82–90

    Article  Google Scholar 

  • Schaffrin B, Wieser A (2008) On weighted total least-squares adjustment for linear regression. J Geodesy 82:415–421

    Article  Google Scholar 

  • Schaffrin B, Felus YA (2008) On the multivariate total least-squares approach to empirical coordinate transformations. Three algorithms. J Geodesy 82:373–383

    Article  Google Scholar 

  • Searle SR (1986) Linear Models for Unbalanced Data. Wiley, New York

    Google Scholar 

  • Soudarissanane, S. (2016) The Geometry of Terrestrial Laser Scanning: Identification of Errors, Modeling and Mitigation of Scanning Geometry. Dissertation, Delft University of Technology.

  • Staneski, PG (1990). The Truncated Cauchy Distribution: Estimation of Parameters and Application to Stock Returns. Dissertation, Old Dominion University.

  • Yang Y (1999) Robust estimation of geodetic datum transformation. J Geodesy 73:268–274

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the Natural Sciences and Engineering Research Council of Canada (RGPIN-2018-03775). Data for example 2 have been made available by the City of Calgary. The authors wish to thank the anonymous reviewers who provided very constructive comments to help improve our paper.

Funding

This work is supported by the Natural Sciences and Engineering Research Council of Canada (RGPIN-2018–03775).

Author information

Authors and Affiliations

Authors

Contributions

DDL designed the research; DDL, TOC and DL performed the research; DDL, TOC and DB analysed data; DDL and TOC wrote the paper.

Corresponding author

Correspondence to D. D. Lichti.

Ethics declarations

Conflicts of interest

None.

Appendix A

Appendix A

In this appendix, the unbiasedness of the estimates of the slope parameter, m, and the y-axis intercept parameter, b, estimated by the linear least-squares model incorporating the x-observation distribution function, p(x), is proven. The limits of integration [a1, a2] are unequal (a1 < a2) to keep the proof general. The distribution function is assumed to be centred at x = 0.

The expectation of the estimated slope stems from the solution to the least-squares normal equations (Eq. 9) and substituting the specific forms given by Eqs. 30 and 31

$$ \begin{gathered} {\mathbf{x}} = {\mathbf{N}}^{ - 1} {\mathbf{U}} \\ = \left( {\begin{array}{*{20}c} {\int\limits_{{a_{1} }}^{{a_{2} }} {x^{2} p\left( x \right)dx} } & 0 \\ 0 & {\int\limits_{{a_{1} }}^{{a_{2} }} {p\left( x \right)dx} } \\ \end{array} } \right)^{ - 1} \left( {\begin{array}{*{20}c} {\int\limits_{{a_{1} }}^{{a_{2} }} {x \cdot y \cdot p\left( x \right)dx} } \\ {\int\limits_{{a_{1} }}^{{a_{2} }} {y \cdot p\left( x \right)dx} } \\ \end{array} } \right) \\ \end{gathered} $$
(A1)

results in

$$ E\left\{ {\hat{m}} \right\} = E\left\{ {\frac{{\int\limits_{{a_{1} }}^{{a_{2} }} {x \cdot y \cdot p\left( x \right)dx} }}{{\int\limits_{{a_{1} }}^{{a_{2} }} {x^{2} p\left( x \right)dx} }}} \right\} $$
(A2)

Since x has been assumed to be error-free, this expression reduces to

$$ E\left\{ {\hat{m}} \right\} = \frac{{E\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {x \cdot y \cdot p\left( x \right)dx} } \right\}}}{{\int\limits_{{a_{1} }}^{{a_{2} }} {x^{2} p\left( x \right)dx} }} $$
(A3)

The numerator is analysed by substituting the model of Eq. 1

$$ \begin{aligned} & E\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {x \cdot y \cdot p\left( x \right)dx} } \right\} = E\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {x\left( {m \cdot x + b - e} \right)p\left( x \right)dx} } \right\} \\ &\quad = E\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {m \cdot x^{2} p\left( x \right)dx} } \right\} + E\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {b \cdot x \cdot p\left( x \right)dx} } \right\} - E\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {x \cdot e \cdot p\left( x \right)dx} } \right\} \\ & \quad = mE\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {x^{2} p\left( x \right)dx} } \right\} + bE\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {x \cdot p\left( x \right)dx} } \right\} - E\left\{ {e\int\limits_{{a_{1} }}^{{a_{2} }} {x \cdot p\left( x \right)dx} } \right\} \\ \end{aligned} $$
(A4)

Under the stated assumptions, the numerator reduces to

$$ \begin{gathered} E\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {x \cdot y \cdot p\left( x \right)dx} } \right\} = m\int\limits_{{a_{1} }}^{{a_{2} }} {x^{2} p\left( x \right)dx} + b\left( 0 \right) - E\left\{ {e\left( 0 \right)} \right\} \\ = m\int\limits_{{a_{1} }}^{{a_{2} }} {x^{2} p\left( x \right)dx} \\ \end{gathered} $$
(A5)

Division of Equation A5 by the denominator of Equation A3 results in the following

$$ E\left\{ {\hat{m}} \right\} = \frac{{m\int\limits_{{a_{1} }}^{{a_{2} }} {x^{2} p\left( x \right)dx} }}{{\int\limits_{{a_{1} }}^{{a_{2} }} {x^{2} p\left( x \right)dx} }} = m $$
(A6)

Therefore, the estimated slope parameter is unbiased.

The expected value of the intercept parameter is given by

$$ E\left\{ {\hat{b}} \right\} = E\left\{ {\frac{{\int\limits_{{a_{1} }}^{{a_{2} }} {y \cdot p\left( x \right)dx} }}{{\int\limits_{{a_{1} }}^{{a_{2} }} {p\left( x \right)dx} }}} \right\} = \frac{{E\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {y \cdot p\left( x \right)dx} } \right\}}}{{\int\limits_{{a_{1} }}^{{a_{2} }} {p\left( x \right)dx} }} $$
(A7)

Since the denominator is unity by definition (Eq. 20), it is sufficient to analyse only the numerator

$$ \begin{aligned} & E\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {y \cdot p\left( x \right)dx} } \right\} = E\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {\left( {m \cdot x + b - e} \right)p\left( x \right)dx} } \right\} \\ & \quad = m\int\limits_{{a_{1} }}^{{a_{2} }} {x \cdot p\left( x \right)dx} + b\int\limits_{{a_{1} }}^{{a_{2} }} {p\left( x \right)dx} - E\left\{ {e\int\limits_{{a_{1} }}^{{a_{2} }} {p\left( x \right)dx} } \right\} \\ & \quad = m\left( 0 \right) + b\left( 1 \right) - E\left\{ {e\left( 1 \right)} \right\} = b \\ \end{aligned} $$
(A8)

Therefore, the estimated y-axis intercept parameter is also unbiased.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lichti, D.D., Chan, T.O. & Belton, D. Linear regression with an observation distribution model. J Geod 95, 23 (2021). https://doi.org/10.1007/s00190-021-01484-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00190-021-01484-x

Keywords

Navigation