当前位置: X-MOL 学术Transp. Res. Part C Emerg. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prediction of rare feature combinations in population synthesis: Application of deep generative modelling
Transportation Research Part C: Emerging Technologies ( IF 8.3 ) Pub Date : 2020-10-01 , DOI: 10.1016/j.trc.2020.102787
Sergio Garrido , Stanislav S. Borysov , Francisco C. Pereira , Jeppe Rich

Population synthesis is concerned with the generation of agents for agent-based modelling in many fields, such as economics, transportation, ecology and epidemiology. When the number of attributes describing the agents and/or their level of detail becomes large, survey data cannot densely support the joint distribution of the attributes in the population due to the curse of dimensionality. It leads to a situation where many attribute combinations are missing from the sample data while such combinations exist in the real population. In this case, it becomes essential to consider methods that are able to impute such missing information effectively. In this paper, we propose to use deep generative latent models. These models are able to learn a compressed representation of the data space, which when projected back to the original space, leads to an effective way of imputing information in the observed data space. Specifically, we employ the Wasserstein Generative Adversarial Network (WGAN) and the Variational Autoencoder (VAE) for a large-scale population synthesis application. The models are applied to a Danish travel survey with a feature-space of more than 60 variables and trained and tested using cross-validation. A new metric that applies to the evaluation of generative models in an unsupervised setting is proposed. It is based on the ability to generate diverse yet valid synthetic attribute combinations by comparing if the models can recover missing combinations (sampling zeros) while keeping truly impossible combinations (structural zeros) models at a minimum. For a low-dimensional experiment, the VAE, the marginal sampler and the fully random sampler generate 5%, 21% and 26% more structural zeros per sampling zero when compared to the WGAN. For a high dimensional case, these figures increase to 44%, 2217% and 170440% respectively. This research directly supports the development of agent-based systems and in particular cases where detailed socio-economic or geographical representations are required.



中文翻译:

种群合成中稀有特征组合的预测:深度生成建模的应用

人口综合与许多领域(例如经济学,交通运输,生态学和流行病学)中用于基于主体的建模的主体的生成有关。当描述代理的属性数量和/或其详细程度变大时,由于维度的诅咒,调查数据无法密集地支持属性在总体中的联合分布。这就导致了这样一种情况,即样本数据中缺少许多属性组合,而实际组合中却存在这样的组合。在这种情况下,必须考虑能够有效地估算此类缺失信息的方法。在本文中,我们建议使用深度生成潜在模型。这些模型能够学习数据空间的压缩表示形式,将其投影回原始空间时,导致在观察到的数据空间中插补信息的有效方法。具体来说,我们将Wasserstein生成对抗网络(WGAN)和变异自动编码器(VAE)用于大规模的人口综合应用。这些模型被应用于具有60多个变量的特征空间的丹麦旅行调查,并使用交叉验证进行了训练和测试。提出了一种适用于在无人监督的情况下评估生成模型的新指标。它基于通过比较模型是否可以恢复缺失的组合(采样零),同时将真正不可能的组合(结构零)模型保持在最小的能力来生成各种有效的综合属性组合的能力。对于低维实验,VAE,边际采样器和完全随机采样器会产生5%,与WGAN相比,每个采样零的结构性零分别多21%和26%。对于高维情况,这些数字分别增加到44%,2217%和170440%。这项研究直接支持基于代理的系统的开发,特别是在需要详细的社会经济或地理代表的情况下。

更新日期:2020-10-02
down
wechat
bug