Skip to main content

Advertisement

Log in

Zonation and scaling of tropical cyclone hazards based on spatial clustering for coastal China

  • Original Paper
  • Published:
Natural Hazards Aims and scope Submit manuscript

Abstract

Zonation refers to the spatially constrained clustering of objects of interest with location information based on the similarity of their attributes. The results of zonation by clustering are usually relatively homogeneous spatial units in raster or vector formats. The spatial distribution of tropical cyclone (TC) hazards, such as TC wind and rainfall, may result in significant spatial heterogeneity from coastal to inland areas, and proper spatial zonation can greatly improve the understanding and management of TC risks. Although zonation methods have been developed based on expert knowledge, simple statistics or GIS tools in past studies, various challenges still exist in the areas of selecting representative attribute indicators, clustering algorithms, and fusion of multiple indicators into an integrated scaling indicator. In this study, TC hazards are chosen to explore methods for zonation and scaling. First, wind data of 1,256 TCs from 1949 to 2017 and rainfall data of 895 TCs from 1951 to 2014 were collected at a 1-km resolution. The mean, standard deviation, and intensity of the 200-year return period for wind and rainfall were estimated and used as representative hazard intensity indicators (HIIs) for spatial clustering. Second, the K-means, interactive self-organizing data analysis techniques algorithm, mean shift and Gaussian mixture model were used to test the suitability of natural hazard zonation based on raster data. All four algorithms were found to perform well, with K-means ranking the best. Third, a hierarchical clustering algorithm was utilized to cluster the HIIs into polygons at the provincial, city and county levels in China. Finally, the six HIIs were weighted into a single indicator for integrated hazard intensity scaling. The zonation and scaling maps developed in the present study can reflect the spatial pattern of TC hazard intensity satisfactorily. In general, the TC hazard scale is decreasing from the southeast coast to the northwest inland of China. The methods and steps proposed in this study can also be applied in the zonation and scaling of other types of disasters as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

Download references

Acknowledgements

This work is mainly supported by the National Key Research and Development Program of China (No. 2018YFC1508803), the National Key Research and Development Program of China (No. 2017YFA0604903), and the Key Special Project for Introduced Talents Team of Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou) (No. GML2019ZD0601).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weihua Fang.

Ethics declarations

Conflict of interest

The authors declare that they have no competing financial interests or personal relationships in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 A. PCA Whitening

The specific steps are as follows:

  1. (1)

    The three-dimensional matrix \({Y}_{d\times m\times n}\) composed of the six HIIs was reshaped to the two-dimensional matrix \({X}_{s\times d}\). Each row of the reshaped data is the observation of a grid, and the column is the indicator of all grids. Where \(d\) is the number of indicators, \(m\) represents the rows of geospatial data, \(n\) represents the columns of geospatial data, and \(s=m\times n\) represents the total number of grids of geospatial data.

  2. (2)

    The covariance matrix \({\Sigma }_{d\times d}\) of the two-dimensional matrix \({X}_{s\times d}\) was calculated, and the eigenvectors and eigenvalues of the covariance matrix \({\Sigma }_{d\times d}\) were calculated by the eigenvalue decomposition (EVD).

  3. (3)

    The product \({X}_{s\times d}^{\prime}\) of the data matrix \({X}_{s\times d}\) and the eigenvector matrix \({U}_{d\,\times\,d}\) was calculated to remove the correlation between features, \({X}_{s\,\times\,d}^{\prime}={X}_{s\,\times\,d}*{U}_{d\,\times\,d}\).

  4. (4)

    Each dimension of \({X}_{s\,\times\,d}^{\prime}\) was scaled by Eq. 3 to obtain a matrix \(X_{{whiten, i}}^{\prime\prime } \) in which the variance of each dimension is unit variance.

    $$ X_{{whiten,\,i}}^{\prime\prime} = \frac{{X_{i} }}{{std(X_{i} )}}\quad \,\left( {i = 1,2, \ldots ,6} \right)$$
    (3)

    where \(X_{i}^{\prime}\) is all samples in column \(i\) of matrix \(X_{{s \times d}}^{\prime}\), \(std\left( {X_{i}^{\prime} } \right)\) is the standard deviation of all samples in column \(i\) of matrix \(X_{{s \times d}}^{\prime}\), \(X_{{whiten,\,i}}^{{''}}\) is all samples in column \(i\) after PCA whitening.

1.2 B. K-means

The specific algorithm steps are as follows:

  1. (1)

    According to prior experience, the appropriate cluster number \(K\) and the maximum iteration number \(T\) are set.

  2. (2)

    Randomly select \(K\) points from the data set \( F = \{ F\left( {x_{1} } \right),F\left( {x_{2} } \right),~ \ldots ,~F\left( {x_{n} } \right)\}\) as the initial center. The center vector of each cluster is \(\left\{ {\mu ^{1} ,~\mu ^{2} ,~ \ldots ,\mu ^{K} } \right\} \), \(n\) represents the number of samples, \( F = \left\{ {F_{1} \left( x \right)} \right.,F_{2} \left( x \right),~ \ldots ,~F_{m} \left( x \right)\}\) represents the \(m\) feature statistics at point \(x\), and \(m\) represents the vector feature dimension.

  3. (3)

    For \( t = 1,~2,~ \ldots ,~T\), clusters are divided according to the following steps.

    1. (a)

      Initialize the clusters to \({ C_{k} = \emptyset ,\left( {k = 1,2, \ldots ,{\mkern 1mu} K} \right)} \);

    2. (b)

      According to the Euclidean distance formula, calculate the distance \(d_{{ik}} \) between each sample \(\left\{F\left({x}_{i}\right),\left(i=1,\,2,\,\ldots ,n\right)\right\}\) and each centroid vector \({\{\mu }^{k},(k=1,\,2,\,\ldots ,\,K)\}\), mark \({x}_{i}\) as the cluster \({\varphi }_{i}\) corresponding to \({d}_{ik}\) with the smallest distance, and update \({C}_{{\varphi }_{i}}={C}_{{\varphi }_{i}}\cup \left\{{x}_{i}\right\}\);

    3. (c)

      For \(k=1,\,2,\,\ldots ,\,K\), calculate the centroids \({\mu }^{k}=\frac{1}{\left|{C}_{k}\right|}\sum _{x\in {C}_{k}}F(x)\) for all samples in \({C}_{k}\);

    4. (d)

      If the distance between the new center and the original center is less than the threshold, go to step 4; otherwise, repeat Step 3.

  4. (4)

    Output the clusters \(C=\left\{{C}_{1},{C}_{2},\ldots {,C}_{K}\right\}\).

1.3 C. ISODATA

The specific algorithm steps are as follows:

  1. (1)

    Randomly select \({K}_{0}\) points from the data set \(F=\{F\left({x}_{1}\right),F\left({x}_{2}\right),\,\ldots ,\,F\left({x}_{n}\right)\}\) as the initial center. The center vector of each cluster is \(\left\{{\mu }^{1},\,{\mu }^{2},\,{\dots ,\mu }^{{K}_{0}}\right\}\), \(n\) represents the number of samples, \(F=\{{F}_{1}\left(x\right),{F}_{2}\left(x\right),\,\ldots ,\,{F}_{m}\left(x\right)\}\) represents the \(m\) feature statistics at point \(x\), and \(m\) represents the vector feature dimension.

  2. (2)

    Calculate the distance \({d}_{ik}\) between each sample \(\left\{F\left({x}_{i}\right),\left(i=1,\,2,\,\ldots ,n\right)\right\}\) and each centroid vector \({\{\mu }^{k},(k=1,\,2,\,\ldots ,\,{K}_{0})\}\), and mark \({x}_{i}\) as the cluster corresponding to \({d}_{ik}\) with the smallest distance.

  3. (3)

    Determine whether the number of samples in each cluster is less than \({N}_{min}\). If less than \({N}_{min}\), the cluster is discarded. Set \(K={K}_{0}-1\), and the samples in this cluster are redistributed to the cluster with the smallest distance among the remaining clusters.

  4. (4)

    For \(k=1,\,2,\,\ldots ,\,K\), calculate the centroids \({\mu }^{k}=\frac{1}{\left|{C}_{k}\right|}\sum _{x\in {C}_{k}}F(x)\) for all samples in \({C}_{k}\).

  5. (5)

    If \(K\ge 2{K}_{0}\), there are too many clusters; go to the merge operation Step 8.

  6. (6)

    If \(K\le \frac{{K}_{0}}{2}\), there are too few clusters; go to the splitting operation Step 9.

  7. (7)

    If the maximum number of iterations is reached, the algorithm is terminated. Otherwise, return to Step 2.

  8. (8)

    Merge operation:

    1. (a)

      The distances between cluster centers of all clusters are calculated and expressed by matrix \(D\), where \(D\left(k,k\right)=0\);

    2. (b)

      Two clusters of \(D\left(k,{k}^{{\prime}}\right)<{d}_{min}\left(k\ne {k}^{{\prime}}\right)\) should be merged into a new cluster, and the center of this class is \({\mu }^{new}=\frac{1}{{n}_{k}+{n}_{{k}^{{\prime}}}}\left({n}_{k}{\mu }^{k}+{n}_{{k}^{{\prime}}}{\mu }^{{k}^{{\prime}}}\right)\), where \({n}_{i}\) and \({n}_{j}\) are the numbers of samples in the two clusters, respectively.

  9. (9)

    Splitting operation:

    1. (a)

      Calculate the variance of each dimension of all samples in each cluster and select the maximum variance \({{\Sigma}}_{\mathrm{m}\mathrm{a}\mathrm{x}}\) of each cluster;

    2. (b)

      If \({\sigma }_{max}>Sigma\) of a cluster and the sample size of this cluster is \({n}_{k}\ge 2{n}_{min}\), then go to Step c; otherwise, exit the splitting operation;

    3. (c)

      This cluster is divided into two clusters, and set \(K={K}_{0}+1\), \({{(\mu }^{k})}^{(+)}={\mu }^{k}+{\sigma }_{max}\), \({{(\mu }^{k})}^{(-)}={\mu }^{k}-{\sigma }_{max}\).

1.4 D. Mean shift

The specific algorithm steps are as follows:

  1. (1)

    Randomly select a sample as the core sample among the unmarked data samples.

  2. (2)

    Find all the samples with the core sample as the center and the bandwidth as the radius, record the sample as set \(M\), and consider this sample to belong to cluster \(C\). Add 1 to the probability that the samples in the circle belong to this class; this parameter will be used for step 7.

  3. (3)

    Take the core sample as the center, calculate the vector from the core sample to each sample in the set \(M\), and add these vectors to obtain the vector offset value.

  4. (4)

    The core sample moves along the direction of the vector offset value, and the moving distance is ||shift||.

  5. (5)

    Repeat steps 2 through 4 until the vector offset value is very small, that is, iterate to convergence, the samples encountered in the iteration process are classified into cluster \(C\), and record the core points at this time.

  6. (6)

    If the distance between the core sample of the current cluster \(C\) and the core sample of other existing clusters \({C}_{i}\) is less than the threshold when converging, cluster \({C}_{i}\) and cluster \(C\) are merged. Otherwise, take \(C\) as a new cluster and add 1 cluster.

  7. (7)

    Repeat step 1 through step 6 until all points are traversed.

  8. (8)

    According to the access frequency of each cluster to each point, the cluster with the highest access frequency is taken as the cluster of the point.

1.5 E. Gaussian mixture model

  1. (1)

    Input data set \(F=\{F\left({x}_{1}\right),F\left({x}_{2}\right),\,\ldots ,\,F\left({x}_{n}\right)\}\) and the number of Gaussian mixture models \(K\), where \(n\) is the number of samples, \(F=\{{F}_{1}\left(x\right),{F}_{2}\left(x\right),\,\ldots ,\,{F}_{m}\left(x\right)\}\) is the \(m\) feature statistics at point \(x\), and \(m\) is the vector feature dimension.

  2. (2)

    Assuming that each sample \(\left\{F\left({x}_{i}\right),\left(i=1,\,2,\,\ldots ,n\right)\right\}\) is independent and identically distributed and that the probability density function of the Gaussian mixture model is\({p}_{GMM}\left(F\left({x}_{i}\right);\theta \right)=\sum _{k=1}^{K}{\alpha }^{k}p\left(F({{x}_{i});\mu }^{k},{\Sigma }^{k}\right)\), \(\theta =\{{\alpha }^{1}\dots {\alpha }^{K},{\mu }^{1}\dots {\mu }^{K},{{\Sigma}}^{1},\dots {{\Sigma}}^{K}\}\) is the set of all parameters, where \({\alpha }^{k}\) is the weight of the \(k\) Gaussian distribution, which satisfies the condition\({\sum }_{k=1}^{K}{\alpha }^{k}=1\). \(p\) is the probability density function of the normal distribution, \(\mu ^{k} = \left( {\mu _{1}^{k} , \ldots ,\mu _{d}^{k} } \right)^{\prime} \) is the mean of each dimension of the \(k\) Gaussian distribution, and \({{\Sigma}}^{k}\) is the \(d \times d\) covariance matrix.

  3. (3)

    When \(i = 1,{\text{~}}2,{\text{~}} \ldots ,{\text{~}}m\), calculate the posterior probability \(\gamma _{{ik}} = p_{M} \left( {\left. {z_{i} = k} \right|x_{i} } \right)\left( {1 \le k \le K} \right)\) of \(x_{i}\) generated by each mixed component according to the following formula.

    $$ p_{M} \left( {\left. {z_{i} = k} \right|x_{i} } \right) = \frac{{P\left( {z_{i} = k} \right) \cdot p_{M} \left( {\left. {x_{i} } \right|z_{i} = k} \right)}}{{p_{M} \left( {x_{i} } \right)}} = \frac{{\alpha ^{k} \cdot p\left( {\left. {x_{i} } \right|\mu ^{k} ,\Sigma ^{k} } \right)}}{{\mathop \sum \nolimits_{{k = 1}}^{K} \alpha ^{k} \cdot p\left( {\left. {x_{i} } \right|\mu ^{k} ,\Sigma ^{k} } \right)}} $$
    (4)
  4. (4)

    For \(1 \le k \le K\)

    1. (a)

      Calculate the new mean vector \( \mu ^{{k\prime }} = \frac{{\sum\nolimits_{{i = 1}}^{m} {\gamma _{{ik}} } x_{i} }}{{\sum\nolimits_{{i = 1}}^{m} {\gamma _{{ik}} } }}\);

    2. (b)

      Calculate the new covariance matrix \( \Sigma ^{{k\prime }} = {\text{ }}\frac{{\sum\nolimits_{{i = 1}}^{m} {\gamma _{{ik}} } {\text{ }}\left( {{\text{ }}x_{i} - \mu ^{{k\prime }} } \right)\left( {{\text{ }}x_{i} - \mu ^{{k\prime }} } \right)^{T} }}{{\sum\nolimits_{{i = 1}}^{m} {\gamma _{{ik}} } }}\);

    3. (c)

      Calculate the probability of a new mixed component \({{\alpha }^{k}}^{{\prime}}=\frac{{\sum }_{i=1}^{m}{\gamma }_{ik}}{m}\).

  5. (5)

    Repeat step 4 to update the model parameter \(\left\{\left.({\alpha }^{k},{\mu }^{k},{{\Sigma}}^{k})\right|1\le k\le K\right\}\) to \( (\alpha ^{{k\prime }} ,\mu ^{{k\prime }} ,\Sigma ^{{k\prime }} )|1 \le k \le K\).

  6. (6)

    Repeat steps 3 and 4 until the stop condition is met.

  7. (7)

    \({C}_{k}=\varnothing ,(1\le k\le K)\), when \(i=1,\,2,\,\ldots ,\,m\), determine the cluster of \({x}_{i}\) as \({\varphi }_{i}\) according to \( \varphi _{i} = \mathop {\arg ~\max }\limits_{{k \in \left\{ {1,~2, \ldots ,K} \right\}}} \gamma _{{ik}}\), and divide \({x}_{i}\) into the corresponding cluster \({C}_{{\varphi }_{i}}={C}_{{\varphi }_{i}}\cup \left\{{x}_{i}\right\}\).

  8. (8)

    Output the clusters \(C=\left\{{C}_{1},{C}_{2},\ldots {,C}_{K}\right\}\).

1.6 F. Hierarchical clustering

The specific algorithm steps are as follows:

  1. (1)

    Input data set \( F = \{ F\left( {x_{1} } \right),F\left( {x_{2} } \right),~ \ldots ,~F\left( {x_{n} } \right)\}\), the clustering cluster link algorithm, the distance measurement function, and the number of clusters \(K\), where \(n\) represents the number of samples, \( F = \{ F_{1} \left( x \right),F_{2} \left( x \right),~ \ldots ,~F_{m} \left( x \right)\}\) represents the \(m\) feature statistics at point \(x\), and \(m\) represents the vector feature dimension.

  2. (2)

    Initialize each sample as a cluster \({C}_{i}=\left\{F\left({x}_{i}\right),\left(i=1,\,2,\,\ldots ,n\right)\right\}\).

  3. (3)

    Calculate the distance between each pair of clusters according to the average link and Euclidean distance formula. The average link refers to the average distance between samples in one cluster and samples in another cluster. The formulas are as follows:

    $$ d\left( {u,v} \right) = \mathop \sum \limits_{{ij}} \frac{{dist\left( {u_{i} ,v_{j} } \right)}}{{\left( {\left| u \right|{\text{*}}\left| v \right|} \right)}} $$
    (5)

    where \(u\) and \(v\) are two clusters, \(\left| u \right|\) and \(\left| v \right|\) are the number of samples in the two clusters, \(i\) is any sample in cluster \(u\), and \(j\) is any sample in cluster \(v\).

  4. (4)

    Find the two closest clusters and merge them.

  5. (5)

    Repeat steps 3 and 4 until all samples are agglomerated into one cluster.

  6. (6)

    Output the clusters \(C=\left\{{C}_{1},{C}_{2},\ldots {,C}_{K}\right\}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, W., Zhang, H. Zonation and scaling of tropical cyclone hazards based on spatial clustering for coastal China. Nat Hazards 109, 1271–1295 (2021). https://doi.org/10.1007/s11069-021-04878-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11069-021-04878-4

Keywords

Navigation