International Journal of Geographical Information Science ( IF 3.733 ) Pub Date : 2019-08-29 , DOI: 10.1080/13658816.2019.1658876 Henry Crosby; Theodoros Damoulas; Stephen A. Jarvis
The physical and social processes in urban systems are inherently spatial and hence data describing them contain spatial autocorrelation (a proximity-based interdependency on a variable) that need to be accounted for. Standard k-fold cross-validation (KCV) techniques that attempt to measure the generalisation performance of machine learning and statistical algorithms are inappropriate in this setting due to their inherent i.i.d assumption, which is violated by spatial dependency. As such, more appropriate validation methods have been considered, notably blocking and spatial k-fold cross-validation (SKCV). However, the physical barriers and complex network structures which make up a city’s landscape mean that these methods are also inappropriate, largely because the travel patterns (and hence Spatial Autocorrelation (SAC)) in most urban spaces are rarely Euclidean in nature. To overcome this problem, we propose a new road distance and travel time k-fold cross-validation method, RT-KCV. We show how this outperforms the prior art in providing better estimates of the true generalisation performance to unseen data.