当前位置: X-MOL 学术Acta Geod. Geophys. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Usability of the Benford’s law for the results of least square estimation
Acta Geodaetica et Geophysica ( IF 1.4 ) Pub Date : 2019-05-28 , DOI: 10.1007/s40328-019-00259-3
Nursu Tunalioglu , Bahattin Erdogan

Benford’s law (BL), also known as the first-digit or significant-digit law, is an intriguing pattern in data sets, considers the frequency of occurrence of the first digits, which are not uniformly distributed as might be expected, conversely follow a specified theoretical distribution. According to BL, the occurrence of first non-zero digit in a numerical data, which is generated or found in nature, depends on a logarithmic distribution. Least square estimation (LSE) method is mostly preferred for the estimation of the unknown parameters from different types of geodetic data. The residuals and the normalized residuals of the LSE method, which follow normal distribution and expected values of them are zero are used in outlier detection problem. In this study, BL is investigated for residuals and the normalized residuals estimated from LSE method. Three types of geodetic data are used: (1) simulated regression models, (2) global positioning system (GPS) data, (3) leveling network. The first group data sets are simulated based on linear regression and univariate models and each simulated group is generated for a number of 100, 1000, and 10,000 samples. To generate second group, an international global navigation satellite system (GNSS) service (IGS) station data (ISTA) is processed by kinematic PPP approach using GIPSY OASIS II v6.4 software. Here, the observation duration of GPS data is 4 days. For the last data, a leveling network with 55 points involving 110 observations of height differences is simulated. BL has been applied to the residuals (v) and normalized residuals (w) estimated from LSE method. Goodness-of-fit test has been implemented to determine whether a population has a specified BL distribution or not. This test is based on how good a fit we have between the frequency of occurrence of residuals and normalized residuals in an observed sample and the expected frequencies obtained from the hypothesized distribution. The results depending on the statistical test show that each data set (residuals and normalized residuals) used in this study follows BL.

中文翻译:

本福德定律对最小二乘估计结果的可用性

本福德定律(BL),也称为第一位数字或有效数字定律,是数据集中的一种有趣模式,它考虑了第一位数字的出现频率,这些频率不是按照预期的那样均匀分布的,反之则遵循指定的理论分布。根据BL,自然产生或发现的数值数据中第一个非零数字的出现取决于对数分布。最小二乘估计(LSE)方法最适合用于从不同类型的大地测量数据估计未知参数。LSE方法的残差和归一化残差遵循正态分布,其期望值为零,用于异常检测问题。在这项研究中,研究了BL的残差以及根据LSE方法估计的归一化残差。使用了三种类型的大地测量数据:(1)模拟回归模型;(2)全球定位系统(GPS)数据;(3)水准测量网络。基于线性回归和单变量模型对第一组数据集进行了模拟,并且为100个,1000个和10,000个样本生成了每个模拟组。为了生成第二组,使用GIPSY OASIS II v6.4软件通过运动学PPP方法处理了国际全球导航卫星系统(GNSS)服务(IGS)站数据(ISTA)。在此,GPS数据的观察时间为4天。对于最后的数据,模拟了一个具有55个点的水准网络,其中包括110个高度差的观测值。BL已应用于残差(基于线性回归和单变量模型对第一组数据集进行了模拟,并且为100个,1000个和10,000个样本生成了每个模拟组。为了生成第二组,使用GIPSY OASIS II v6.4软件通过运动学PPP方法处理了国际全球导航卫星系统(GNSS)服务(IGS)站数据(ISTA)。在此,GPS数据的观察时间为4天。对于最后的数据,模拟了一个具有55个点的水准网络,其中包括110个高度差的观测值。BL已应用于残差(基于线性回归和单变量模型对第一组数据集进行了模拟,并且为100个,1000个和10,000个样本生成了每个模拟组。为了生成第二组,使用GIPSY OASIS II v6.4软件通过运动学PPP方法处理了国际全球导航卫星系统(GNSS)服务(IGS)站数据(ISTA)。在此,GPS数据的观察时间为4天。对于最后的数据,模拟了一个具有55个点的水准网络,其中包括110个高度差的观测值。BL已应用于残差(使用GIPSY OASIS II v6.4软件通过运动学PPP方法处理国际全球导航卫星系统(GNSS)服务(IGS)站数据(ISTA)。在此,GPS数据的观察时间为4天。对于最后的数据,模拟了一个具有55个点的水准网络,其中包括110个高度差的观测值。BL已应用于残差(使用GIPSY OASIS II v6.4软件通过运动学PPP方法处理国际全球导航卫星系统(GNSS)服务(IGS)站数据(ISTA)。在此,GPS数据的观察时间为4天。对于最后的数据,模拟了一个具有55个点的水准网络,其中包括110个高度差的观测值。BL已应用于残差(v)和根据LSE方法估算的归一化残差(w)。拟合优度检验已执行,以确定总体是否具有指定的BL分布。该检验基于我们在观察到的样本中出现残差和归一化残差的频率与从假设分布获得的预期频率之间的拟合程度。根据统计测试的结果表明,本研究中使用的每个数据集(残差和归一化残差)均遵循BL。
更新日期:2019-05-28
down
wechat
bug