当前位置: X-MOL 学术Big Data Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Models and Practices in Urban Data Science at Scale
Big Data Research ( IF 3.3 ) Pub Date : 2018-08-14 , DOI: 10.1016/j.bdr.2018.04.003
Marco Balduini , Marco Brambilla , Emanuele Della Valle , Christian Marazzi , Tahereh Arabghalizi , Behnam Rahdari , Michele Vescovi

Cities can be observed through a broad set of sensing technologies, spanning from physical sensors in the streets, to socio-economic reports, to other kinds of sources that are able to represent the behaviour of the citizens and visitors, such as mobile phone records, social media posts, and other digital traces.

In this paper, we propose a conceptual framework for putting at use this variety of Big Data sources, with a unified approach that applies spatial and temporal analysis over heterogeneous streams of data. We define spatial analysis based on conceptual grids (made of cells) over the city space, and then we study: the time series of signals both at grid and cell level; the correlation across signals and across cells; the prediction of city dynamics based on multiple signals; and the identifications of anomalies based on the difference between the observed dynamics and their prediction.

To implement this model we propose a general architectural framework that uses Big Data technologies (such as HDFS, YARN, HIVE, PIG, Cascalog, Spark, Spark SQL, Spark Streaming and SparkR) and can be deployed in different configurations based on different needs. By taking an inherent data science approach to the problem we are able to address at scale: technical problems such as heterogeneous time and space granularity of the data, as well as appropriate interpretation of the results through tools that enable intuitive and immediate visual perception of emerging patterns and dynamics.

We demonstrate feasibility, generality and effectiveness of our Urban Data Science at scale approach through multiple use cases and examples taken from real-world requirements collected in various cities and accounting for diverse business and city needs.



中文翻译:

大规模城市数据科学的模型和实践

可以通过广泛的传感技术来观察城市,从街道上的物理传感器到社会经济报告,再到能够代表市民和游客行为的其他各种来源(例如手机记录,社交媒体帖子和其他数字痕迹。

在本文中,我们提出了一个使用各种大数据源的概念框架,并采用了对异构数据流进行时空分析的统一方法。我们基于城市空间上的概念网格(由单元组成)定义空间分析,然后研究:网格和单元级别的信号时间序列;信号之间和细胞之间的相关性;基于多个信号的城市动态预测;并根据观测到的动力学与预测之间的差异识别异常。

为了实现此模型,我们提出了一个使用大数据技术(例如HDFS,YARN,HIVE,PIG,Cascalog,Spark,Spark SQL,Spark Streaming和SparkR)的通用体系结构框架,并可以根据不同需求以不同配置进行部署。通过采用固有的数据科学方法来解决该问题,我们能够大规模解决:技术问题,例如数据的时间和空间异质性,以及通过能够对新兴事物进行直观直观的感知的工具对结果进行适当的解释模式和动态。

我们通过多个用例和实例(从在不同城市中收集的现实需求中得出的示例)中说明了我们的城市数据科学的可行性,普遍性和有效性,这些案例来自于各个城市中收集的并满足各种业务和城市需求。

更新日期:2018-08-14
down
wechat
bug