当前位置: X-MOL 学术Journal of Official Statistics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Controlling for Selection Bias in Social Media Indicators through Official Statistics: a Proposal
Journal of Official Statistics ( IF 1.1 ) Pub Date : 2020-06-01 , DOI: 10.2478/jos-2020-0017
Stefano M. Iacus 1 , Giuseppe Porro 2 , Silvia Salini 1 , Elena Siletti 1
Affiliation  

Abstract With the increase of social media usage, a huge new source of data has become available. Despite the enthusiasm linked to this revolution, one of the main outstanding criticisms in using these data is selection bias. Indeed, the reference population is unknown. Nevertheless, many studies show evidence that these data constitute a valuable source because they are more timely and possess higher space granularity. We propose to adjust statistics based on Twitter data by anchoring them to reliable official statistics through a weighted, space-time, small area estimation model. As a by-product, the proposed method also stabilizes the social media indicators, which is a welcome property required for official statistics. The method can be adapted anytime official statistics exists at the proper level of granularity and for which social media usage within the population is known. As an example, we adjust a subjective well-being indicator of “working conditions” in Italy, and combine it with relevant official statistics. The weights depend on broadband coverage and the Twitter rate at province level, while the analysis is performed at regional level. The resulting statistics are then compared with survey statistics on the “quality of job” at macro-economic regional level, showing evidence of similar paths.

中文翻译:

通过官方统计数据控制社交媒体指标中的选择偏见:一项提案

摘要随着社交媒体使用的增加,已经有了巨大的新数据源。尽管人们对这场革命充满热情,但使用这些数据的主要主要批评之一是选择偏见。实际上,参考人群是未知的。但是,许多研究表明,这些数据构成了宝贵的资源,因为它们更及时且具有更高的空间粒度。我们建议通过加权,时空,小面积估算模型将Twitter数据锚定到可靠的官方统计数据,从而根据Twitter数据调整统计数据。作为副产品,建议的方法还可以稳定社交媒体指标,这是官方统计所需的受欢迎属性。只要官方统计数据以适当的粒度级别存在,并且已知该人群中的社交媒体使用情况,便可以采用该方法。例如,我们调整了意大利“工作条件”的主观幸福感指标,并将其与相关的官方统计数据相结合。权重取决于省级的宽带覆盖范围和Twitter速率,而分析则是在区域级进行的。然后将得到的统计数据与宏观经济区域一级关于“工作质量”的调查统计数据进行比较,显示出类似路径的证据。而分析是在区域级别进行的。然后将得到的统计数据与宏观经济区域一级关于“工作质量”的调查统计数据进行比较,显示出类似路径的证据。而分析是在区域级别进行的。然后将得到的统计数据与宏观经济区域一级关于“工作质量”的调查统计数据进行比较,显示出类似路径的证据。
更新日期:2020-06-01
down
wechat
bug