当前位置: X-MOL 学术Journal of Official Statistics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Correlates of Representation Errors in Internet Data Sources for Real Estate Market
Journal of Official Statistics ( IF 0.5 ) Pub Date : 2019-09-01 , DOI: 10.2478/jos-2019-0022
Maciej Beręsewicz 1
Affiliation  

Abstract New data sources, namely big data and the Internet, have become an important issue in statistics and for official statistics in particular. However, before these sources can be used for statistics, it is necessary to conduct a thorough analysis of sources of nonrepresentativeness. In the article, we focus on detecting correlates of the selection mechanism that underlies Internet data sources for the secondary real estate market in Poland and results in representation errors (frame and selection errors). In order to identify characteristics of properties offered online we link data collected from the two largest advertisements services in Poland and the Register of Real Estate Prices and Values, which covers all transactions made in Poland. Quarterly data for 2016 were linked at a domain level defined by local administrative units (LAU1), the urban/rural distinction and usable floor area (UFA), categorized into four groups. To identify correlates of representation error we used a generalized additive mixed model based on almost 5,500 domains including quarters. Results indicate that properties not advertised online differ significantly from those shown in the Internet in terms of UFA and location. A non-linear relationship with the average price per m2 can be observed, which diminishes after accounting for LAU1 units.

中文翻译:

房地产市场Internet数据源中表示误差的相关性

摘要新数据源,即大数据和Internet,已成为统计尤其是官方统计中的重要问题。但是,在将这些来源用于统计之前,有必要对非代表性来源进行彻底的分析。在本文中,我们专注于检测选择机制的相关性,这些选择机制是波兰二手房地产市场互联网数据源的基础,并导致表示错误(框架和选择错误)。为了确定在线提供的房地产的特征,我们将从波兰两个最大的广告服务和房地产价格和价值登记处收集的数据链接起来,该数据涵盖了波兰的所有交易。2016年的季度数据在地方行政单位(LAU1)定义的域级别上进行了关联,城乡差异和可用建筑面积(​​UFA),分为四类。为了确定表示误差的相关性,我们使用了一个基于大约5500个域(包括四分之一)的广义加法混合模型。结果表明,就UFA和位置而言,未在线发布的属性与Internet上显示的属性存在显着差异。可以观察到与每平方米平均价格的非线性关系,在考虑到LAU1单位后,这种关系逐渐减少。
更新日期:2019-09-01
down
wechat
bug