Linear regression with an observation distribution model

Lichti, D. D.; Chan, T. O.; Belton, D.

doi:10.1007/s00190-021-01484-x

Linear regression with an observation distribution model

Original Article
Published: 05 February 2021

Volume 95, article number 23, (2021)
Cite this article

Journal of Geodesy Aims and scope Submit manuscript

D. D. Lichti¹,
T. O. Chan² &
D. Belton³

807 Accesses
6 Citations
Explore all metrics

Abstract

Despite the high complexity of the real world, linear regression still plays an important role in estimating parameters to model a physical relationship between at least two variables. The precision of the estimated parameters, which can usually be considered as an indicator of the solution quality, is conventionally obtained from the inverse of the normal equations matrix for which intensive computation is required when the number of observations is large. In addition, the impacts of the distribution of the observations on parameter precision are rarely reported in the literature. In this paper, we propose a new methodology to model the distribution of observations for linear regression in order to predict the parameter precision prior to actual data collection and performing the regression. The precision analysis can be readily performed given a hypothesized data distribution. The methodology has been verified with several simulated and real datasets. The results show that the empirical and model-predicted precisions match very well, with discrepancies of up to 6% and 3.4% for simulated and real datasets, respectively. Simulations demonstrate that these differences are simply due to finite sample size. In addition, simulation also demonstrates the relative insensitivity of the method to noise in the independent regression variables that causes deviations from the data distribution function. The proposed methodology allows straightforward prediction of the parameter precision based on the distribution of the observations related to their numerical limits and geometry, which greatly simplify design procedures for various experimental setups commonly involved in geodetic surveying such as LiDAR data collection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review of Hidden Markov Models and Their Applications

Article 12 May 2020

Bhavya Mor, Sunita Garhwal & Ajay Kumar

Modeling trends and periodic components in geodetic time series: a unified approach

Article Open access 04 March 2024

Gaël Kermarrec, Federico Maddanu, … Janusz Bogusz

Tutorial on PCA and approximate PCA and approximate kernel PCA

Article Open access 31 October 2022

Sanparith Marukatat

References

Abraham B, Ledoliter J (2006) Introduction to Regression Modelling. Thompson Brooks/Cole, Belmont, CA
Google Scholar
Berné J, Baselga (2004) First-order design of geodetic networks using the simulated annealing method. J Geodesy 78:47–54
Article Google Scholar
Chan TO, Lichti D, Roesler G, Cosandier D, Al-Durgham K (2019) Range scale-factor calibration of the Velodyne VLP-16 LiDAR system for position tracking applications. In Proceedings of the 11th International Conference on Mobile Mapping. Shenzhen, China, 6–8 May. 70–77. 350–355.
Evans M, Hastings N, Peacock B (2000) Statistical Distributions, 3rd edn. John Wiley & Sons, New York
Google Scholar
Featherstone WE, Lichti DD (2009) Fitting gravimetric geoid models to vertical deflections. J Geodesy 83:583–589
Article Google Scholar
Grafarend E (1974) Optimization of geodetic networks. Bolletino di Geodesia a Science Affini 33(4):351–406
Google Scholar
Grafarend WE, Sansò F (eds) (1985) Optimization and design of geodetic networks. Springer, Berlin
Google Scholar
Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: Review of methods and applications. Expert Syst Appl 73:220–239
Article Google Scholar
Hekimoglu S, Berber M (2003) Effectiveness of robust methods in heterogeneous linear models. J Geodesy 76:706–713
Article Google Scholar
Holst C, Eling C, Kuhlmann H (2013) Automatic optimization of height network configurations for detection of surface deformations. Journal of Applied Geodesy 7(2):103–113
Article Google Scholar
Holst C, Artz T, Kuhlmann H (2014) Biased and unbiased estimates based on laser scans of surfaces with unknown deformations. Journal of Applied Geodesy 8(3):169–184
Article Google Scholar
Jazireeyan I, Ardalan AA (2017) Absolute calibration of satellite altimetry using linear regression and harmonic analysis. Geodesy and Cartography 43(3):83–91
Article Google Scholar
Jia F, Lichti DD (2019) A model-based design system for terrestrial laser scanning networks in complex sites. Remote Sensing 11:1749
Article Google Scholar
Leevy JL, Khoshgoftaar TM, Bauder RA, Seliya N (2018) A survey on addressing high-class imbalance in big data. Journal of Big Data 5:42
Article Google Scholar
Lichti DD, O’Keefe K, Jamtsho S (2010) Propagation of an unmodeled additive constant in range sensor observations. ASCE Journal of Surveying Engineering 136(3):111–119
Article Google Scholar
Mahboub V (2012) On weighted total least-squares for geodetic transformations. J Geodesy 86:359–367
Article Google Scholar
Mikhail EM (1976) Observations and Least Squares. IEP, New York
Google Scholar
Mozaffar M, Varshosaz M (2016) Optimal placement of a terrestrial laser scanner with an emphasis on reducing occlusions. Photogrammetric Record 31(156):374–393
Article Google Scholar
Pelto BM, Menounos B, Marshall SJ (2019) Multi-year evaluation of airborne geodetic surveys to estimate seasonal mass balance, Columbia and Rocky Mountains, Canada. The Cryosphere 13:1709–1727
Article Google Scholar
Rawlings JO, Pantula SG, Dickey DA (1989) Applied Regression Analysis: a Research Tool, 2nd edn. Springer-Verlag, New York
Google Scholar
Rüeger JM (1990) Electronic distance measurement: An introduction, 3rd edn. Springer, Heidelberg, Germany
Book Google Scholar
Ruffhead A (2018) Introduction to multiple regression equations in datum transformations and their reversibility. Survey Review 50(358):82–90
Article Google Scholar
Schaffrin B, Wieser A (2008) On weighted total least-squares adjustment for linear regression. J Geodesy 82:415–421
Article Google Scholar
Schaffrin B, Felus YA (2008) On the multivariate total least-squares approach to empirical coordinate transformations. Three algorithms. J Geodesy 82:373–383
Article Google Scholar
Searle SR (1986) Linear Models for Unbalanced Data. Wiley, New York
Google Scholar
Soudarissanane, S. (2016) The Geometry of Terrestrial Laser Scanning: Identification of Errors, Modeling and Mitigation of Scanning Geometry. Dissertation, Delft University of Technology.
Staneski, PG (1990). The Truncated Cauchy Distribution: Estimation of Parameters and Application to Stock Returns. Dissertation, Old Dominion University.
Yang Y (1999) Robust estimation of geodetic datum transformation. J Geodesy 73:268–274
Article Google Scholar

Download references

Acknowledgements

This work is supported by the Natural Sciences and Engineering Research Council of Canada (RGPIN-2018-03775). Data for example 2 have been made available by the City of Calgary. The authors wish to thank the anonymous reviewers who provided very constructive comments to help improve our paper.

Funding

This work is supported by the Natural Sciences and Engineering Research Council of Canada (RGPIN-2018–03775).

Author information

Authors and Affiliations

Department of Geomatics Engineering, The University of Calgary, Calgary, AB, Canada
D. D. Lichti
Guangdong Provincial Key Laboratory of Urbanization and Geo-Simulation, School of Geography and Planning, Sun Yat-Sen University, Guangzhou, China
T. O. Chan
School of Earth and Planetary Sciences, Curtin University, Perth, WA, Australia
D. Belton

Authors

D. D. Lichti
View author publications
You can also search for this author in PubMed Google Scholar
T. O. Chan
View author publications
You can also search for this author in PubMed Google Scholar
D. Belton
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

DDL designed the research; DDL, TOC and DL performed the research; DDL, TOC and DB analysed data; DDL and TOC wrote the paper.

Corresponding author

Correspondence to D. D. Lichti.

Ethics declarations

Conflicts of interest

None.

Appendix A

In this appendix, the unbiasedness of the estimates of the slope parameter, m, and the y-axis intercept parameter, b, estimated by the linear least-squares model incorporating the x-observation distribution function, p(x), is proven. The limits of integration [a₁, a₂] are unequal (a₁ < a₂) to keep the proof general. The distribution function is assumed to be centred at x = 0.

The expectation of the estimated slope stems from the solution to the least-squares normal equations (Eq. 9) and substituting the specific forms given by Eqs. 30 and 31

$$ \begin{gathered} {\mathbf{x}} = {\mathbf{N}}^{ - 1} {\mathbf{U}} \\ = \left( {\begin{array}{*{20}c} {\int\limits_{{a_{1} }}^{{a_{2} }} {x^{2} p\left( x \right)dx} } & 0 \\ 0 & {\int\limits_{{a_{1} }}^{{a_{2} }} {p\left( x \right)dx} } \\ \end{array} } \right)^{ - 1} \left( {\begin{array}{*{20}c} {\int\limits_{{a_{1} }}^{{a_{2} }} {x \cdot y \cdot p\left( x \right)dx} } \\ {\int\limits_{{a_{1} }}^{{a_{2} }} {y \cdot p\left( x \right)dx} } \\ \end{array} } \right) \\ \end{gathered} $$

(A1)

results in

$$ E\left\{ {\hat{m}} \right\} = E\left\{ {\frac{{\int\limits_{{a_{1} }}^{{a_{2} }} {x \cdot y \cdot p\left( x \right)dx} }}{{\int\limits_{{a_{1} }}^{{a_{2} }} {x^{2} p\left( x \right)dx} }}} \right\} $$

(A2)

Since x has been assumed to be error-free, this expression reduces to

$$ E\left\{ {\hat{m}} \right\} = \frac{{E\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {x \cdot y \cdot p\left( x \right)dx} } \right\}}}{{\int\limits_{{a_{1} }}^{{a_{2} }} {x^{2} p\left( x \right)dx} }} $$

(A3)

The numerator is analysed by substituting the model of Eq. 1

$$ \begin{aligned} & E\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {x \cdot y \cdot p\left( x \right)dx} } \right\} = E\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {x\left( {m \cdot x + b - e} \right)p\left( x \right)dx} } \right\} \\ &\quad = E\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {m \cdot x^{2} p\left( x \right)dx} } \right\} + E\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {b \cdot x \cdot p\left( x \right)dx} } \right\} - E\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {x \cdot e \cdot p\left( x \right)dx} } \right\} \\ & \quad = mE\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {x^{2} p\left( x \right)dx} } \right\} + bE\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {x \cdot p\left( x \right)dx} } \right\} - E\left\{ {e\int\limits_{{a_{1} }}^{{a_{2} }} {x \cdot p\left( x \right)dx} } \right\} \\ \end{aligned} $$

(A4)

Under the stated assumptions, the numerator reduces to

$$ \begin{gathered} E\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {x \cdot y \cdot p\left( x \right)dx} } \right\} = m\int\limits_{{a_{1} }}^{{a_{2} }} {x^{2} p\left( x \right)dx} + b\left( 0 \right) - E\left\{ {e\left( 0 \right)} \right\} \\ = m\int\limits_{{a_{1} }}^{{a_{2} }} {x^{2} p\left( x \right)dx} \\ \end{gathered} $$

(A5)

Division of Equation A5 by the denominator of Equation A3 results in the following

$$ E\left\{ {\hat{m}} \right\} = \frac{{m\int\limits_{{a_{1} }}^{{a_{2} }} {x^{2} p\left( x \right)dx} }}{{\int\limits_{{a_{1} }}^{{a_{2} }} {x^{2} p\left( x \right)dx} }} = m $$

(A6)

Therefore, the estimated slope parameter is unbiased.

The expected value of the intercept parameter is given by

$$ E\left\{ {\hat{b}} \right\} = E\left\{ {\frac{{\int\limits_{{a_{1} }}^{{a_{2} }} {y \cdot p\left( x \right)dx} }}{{\int\limits_{{a_{1} }}^{{a_{2} }} {p\left( x \right)dx} }}} \right\} = \frac{{E\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {y \cdot p\left( x \right)dx} } \right\}}}{{\int\limits_{{a_{1} }}^{{a_{2} }} {p\left( x \right)dx} }} $$

(A7)

Since the denominator is unity by definition (Eq. 20), it is sufficient to analyse only the numerator

$$ \begin{aligned} & E\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {y \cdot p\left( x \right)dx} } \right\} = E\left\{ {\int\limits_{{a_{1} }}^{{a_{2} }} {\left( {m \cdot x + b - e} \right)p\left( x \right)dx} } \right\} \\ & \quad = m\int\limits_{{a_{1} }}^{{a_{2} }} {x \cdot p\left( x \right)dx} + b\int\limits_{{a_{1} }}^{{a_{2} }} {p\left( x \right)dx} - E\left\{ {e\int\limits_{{a_{1} }}^{{a_{2} }} {p\left( x \right)dx} } \right\} \\ & \quad = m\left( 0 \right) + b\left( 1 \right) - E\left\{ {e\left( 1 \right)} \right\} = b \\ \end{aligned} $$

(A8)

Therefore, the estimated y-axis intercept parameter is also unbiased.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lichti, D.D., Chan, T.O. & Belton, D. Linear regression with an observation distribution model. J Geod 95, 23 (2021). https://doi.org/10.1007/s00190-021-01484-x

Download citation

Received: 21 May 2020
Accepted: 19 January 2021
Published: 05 February 2021
DOI: https://doi.org/10.1007/s00190-021-01484-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Linear regression with an observation distribution model

Abstract

Access this article

Similar content being viewed by others

A Systematic Review of Hidden Markov Models and Their Applications

Modeling trends and periodic components in geodetic time series: a unified approach

Tutorial on PCA and approximate PCA and approximate kernel PCA

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Appendix A

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Linear regression with an observation distribution model

Abstract

Access this article

Similar content being viewed by others

A Systematic Review of Hidden Markov Models and Their Applications

Modeling trends and periodic components in geodetic time series: a unified approach

Tutorial on PCA and approximate PCA and approximate kernel PCA

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Appendix A

Appendix A

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation