当前位置: X-MOL 学术EPJ Data Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An end-to-end statistical process with mobile network data for official statistics
EPJ Data Science ( IF 3.0 ) Pub Date : 2021-04-29 , DOI: 10.1140/epjds/s13688-021-00275-w
David Salgado , Luis Sanguiao , Bogdan Oancea , Sandra Barragán , Marian Necula

Mobile network data has been proven to provide a rich source of information in multiple statistical domains such as demography, tourism, urban planning, etc. However, the incorporation of this data source to the routinely production of official statistics is taking many efforts since a diversity of highly entangled issues (access, methodology, IT tools, quality, skills) must be solved beforehand. To do this, one-off studies with concrete data sets are not enough and a standard statistical production process must be put in place. We propose a concrete modular process structured into evolvable modules detaching the strongly technological layer underlying this data source from the necessary statistical analysis producing outputs of interest. This architecture follows the principles of the so-called ESS Reference Methodological Framework for Mobile Network Data. Each of these modules deals with a different aspect of this data source. We apply hidden Markov models for the geolocation of mobile devices, use a Bayesian approach on this model to disambiguate devices belonging to the same individual, compute aggregate numbers of individuals detected by a telecommunication network using probability theory, and model hierarchically the integration of auxiliary information from the telco market and official data to produce final estimates of the number of individuals across different territorial regions in the target population. A first simple illustrative proposal has been applied to synthetic data providing preliminary software tools and accuracy indicators monitoring the performance of the process. Currently, this exercise has been applied to the estimation of present population and origin-destination matrices. We present an illustrative example of the execution of these production modules comparing results with the simulated ground truth, thus assessing the performance of each production module.



中文翻译:

使用移动网络数据进行官方统计的端到端统计过程

事实证明,移动网络数据可在人口统计,旅游,城市规划等多个统计领域提供丰富的信息源。但是,由于多样性纠缠不清的问题(访问,方法论,IT工具,质量,技能)必须事先解决。为此,仅凭具体数据集进行一次性研究是不够的,必须建立标准的统计生产过程。我们提出了一个具体的模块化过程,该过程被构造为可演化的模块,从而将该基础数据源的强大技术层与必要的统计分析分离开来,从而产生有意义的输出。该体系结构遵循所谓的“移动网络数据的ESS参考方法框架”的原理。这些模块中的每个模块都处理此数据源的不同方面。我们将隐马尔可夫模型应用于移动设备的地理位置,在该模型上使用贝叶斯方法来消除属于同一个人的设备的歧义,使用概率论计算电信网络检测到的个人的总数,并分层建模辅助信息的集成从电信市场和官方数据中得出目标人群中不同地区的个人人数的最终估计值。第一个简单的说明性建议已应用于合成数据,提供了初步的软件工具和监视过程性能的准确性指标。当前,此练习已应用于估计当前人口和原住民目的地矩阵。我们提供了这些生产模块执行的说明性示例,将结果与模拟的实际情况进行了比较,从而评估了每个生产模块的性能。

更新日期:2021-04-30
down
wechat
bug