On the selection of time-varying scenarios of wind and ocean waves: Methodologies and applications in the North Tyrrhenian Sea
Introduction
There are several environmental processes taking place in the nearshore zone that are of foremost interest from both scientific and technical points of view such as storm surges coastal circulation, beach evolution and pollutant dispersion among many others (a few recent examples can be found e.g. in Wu et al., 2018, Briganti et al., 2018, Enrile et al., 2019, de Ruggiero et al., 2020). A detailed characterization of the local climatology of the sea is therefore crucial to plan the human activities in coastal areas (Zheng et al., 2020, Losada et al., 2019, Venancio et al., 2020, Khelil et al., 2019) and to handle hazards and emergency situations (Cheung et al., 2003, Samaras et al., 2016, Kerguillec et al., 2019, Mahendra et al., 2011, Silva et al., 2017).
In this respect, the use of numerical models for the simulation and forecasting of marine and coastal hydrodynamics has dramatically increased over the last decades, especially thanks to the exponential growth in the power of modern supercomputers. A wide range of numerical models has been developed for the analysis, simulation and resolution of geophysical fluid dynamics problems, allowing to efficiently describe the processes driven by the upper-sea physics, for example beach morphodynamics (Larson et al., 1987, Reeve, 2006, Castelle et al., 2015, Vousdoukas et al., 2012, Coco et al., 2014, Roelvink et al., 2009), littoral currents (Watanabe, 1982, Kaliraj et al., 2014, Lee et al., 2014), and ocean waves growth and propagation (Tolman et al., 2016, Booij et al., 1997, López et al., 2015, Browne et al., 2007, Wornom et al., 2001).
The characterization of these phenomena is generally based on huge amount of information, the processing of which requires high computational powers and times, especially if the data at hand come from a climate reanalysis service, characterized by fine resolutions in time and space. Nevertheless, top computational performances are not always available, thus it may be advisable to reduce the number of environmental conditions to be retained for further numerical simulations and analysis. The detection of the most significant modes of variability of a time series can help to pursue this goal. Indeed, the use of techniques for the reduction of data dimensionality, and in particular clustering algorithms (Hastie et al., 2009, Wilks, 2011, Anderberg, 2014), allows to focus on a limited number of subsets to capture most of the variability of a time series, reducing in turn the computational load required to run a modelling chain. Cluster analysis partitions input data into modes or clusters based on the selected distance measure (e.g. Euclidean, Minkowski, etc.), each cluster representing a group of elements that are similar to each other and are dissimilar from the elements of another cluster. More specifically, this methodology aims to simultaneously minimize the distance between members of a given cluster and maximize the distance between the centres or centroids of the clusters (Michelangeli et al., 1995, Jain et al., 1999, Coggins et al., 2014, Fučkar et al., 2016). The result of a clustering analysis is a subset of elements that summarizes the initial dataset while maintaining its main properties. Note that in the previous literature the clusters centroids have been also referred to as “model scenarios”; hereinafter, they will be referred to simply as “scenarios” for the sake of brevity. Scenarios can highlight the main characteristics of the dataset under investigation, which may be difficult to appreciate if only looking at its overall distribution, especially in case of huge multi-variate datasets.
There exist several clustering methods, and the previous literature present different applications for the identification of environmental conditions, with different specific objectives that concern not only the analysis of geophysical processes but also their numerical simulation. A thorough review of the available techniques can be found in Barbakh et al. (2009), which outlines how in general the clustering method should be selected according to the analysis that is meant to be carried out.
As far as met-ocean variables are concerned, Hadzimejlic et al. (2012) introduced a general workflow to clustering atmospheric states through the use of hierarchical methods; such methods were also used by Euán and Sun (2019) to develop a method for the clustering of time series of directional spectra. Camus et al. (2011a) used clustering techniques for the propagation of sea waves in the coastal zone and Enríquez et al. (2020) applied clustering techniques to characterize the spatial patterns of storm surges off the global coastline. McLaughlin et al. (2003) and Bárcena et al. (2015) performed clustering analysis for the simulation of three-dimensional hydrodynamics of estuaries, and Núñez et al. (2019) employed clustering analysis to assess the probability of marine litter accumulation in estuaries.
The latter studies are based on the K-means algorithm, which is one of the most popular unsupervised clustering techniques and allows to group the initial database in K classes and as many centroids (scenarios), each representing the respective class (i.e. cluster; the term “unsupervised” means that the data do not need to be labelled prior to performing the clustering, MacQueen et al., 1967, Friedman et al., 2001). The use of this particular method allows to select scenarios that, on average, catch the properties of the data belonging to the respective clusters, as the K-means aims precisely at minimizing the intra-cluster variance. However, different methods could be employed if different data selection were meant to be carried out. For instance, the Maximum Dissimilarity Algorithm may be more appropriate than K-means to describe extreme conditions of the input variables (Camus et al., 2011b).
K-means has the advantage that it is relatively easy to implement and guarantees convergence; on the other hand, one of the drawbacks is that the number of clusters must be manually chosen. The aforementioned works defined the number of clusters to be used according to indexes that allow to assess the goodness of the clustering with respect to the total data. Herein, the number of clusters best suited for the application at hand will be referred to as “optimal”. However, the use of particular indexes to establish the optimal number of scenarios may sometimes be misleading, and the indexes employed be not always consistent among each other. In this respect, the present work resumes a methodology for the selection of time-varying scenarios of met-ocean variables (in this work meant as wind and wave data). K-means analysis is here applied to select groups of climatological scenarios from hindcast data in front of Genoa (Ligurian coastline, NW of Italy), comparing the optimal number of scenarios according to two different indexes: the model efficiency (CE) and the total variance (W), proposed by Nash and Sutcliffe (1970) and Wilks (2011), respectively. In particular, the analysis is conducted for varying initial conditions of the problem, such as time resolutions of the data and the number of variables employed. Then, some of the clusters selected according to a specific initial set of the hindcast time series are analysed and evaluated in the context of the local climatology of the area.
The paper is structured as follows: Section 2 presents the hindcast data and the methodology used in the study. Section 3 presents and discusses the results, while the conclusions are drawn in Section 4.
Section snippets
Hindcast data
This work took advantage of wave and wind data hindcasted off the Genoa coastline, in the North-Tyrrhenian Sea (north west of Italy). An overview of the area is shown in panel A) of Fig. 1, while in panel B) a close-up on the hindcast node used is shown. In this study, the node 230 (i.e. Point_000230) was considered among all the grid nodes in the Mediterranean Sea provided by the hindcast service of the Department of Civil, Chemical and Environmental Engineering of the University of Genoa
Optimal number of clusters due to the selection of the initial dataset
The examples presented in this section allow to evaluate how the selection of the starting dataset may affect the number of required scenarios, defined according to CE. As explained in Section 2.4, this index quantifies to what extent the scenarios selected from a training dataset are able to reproduce a dataset used to validate the clustering model. An analysis on a simplified case was first carried out to test the reliability of the algorithms developed. To this end, one year of data were
Final remarks
The selection of significant scenarios of geo-physical variables can be helpful for a plethora of applications. As far as met-ocean forcing is concerned, the hydrodynamic modelling of sea states might not be feasible if long-term series of data are meant to be investigated, mainly due to the computational cost requirements. To overcome such difficulties, it might be convenient to identify a reduced number of representative scenarios through clustering techniques. In this respect, this paper
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This research was developed in the framework of the Interreg Italia-Francia Marittimo projects SPlasH!1 (Stop alle Plastiche in H20!, grant number D31I18000620007), GEREMIA2 (GEstione dei REflui per il MIglioramento delle Acque portuali, grant number D41I18000600005) and SINAPSI3 (asSIstenza alla Navigazione per l’Accesso ai Porti in Sicurezza, grant number D64I18000160007) .
References (59)
- et al.
Selecting model scenarios of real hydrodynamic forcings on mesotidal and macrotidal estuaries influenced by river discharges using K-means clustering
Environ. Model. Softw.
(2015) - et al.
Near-shore swell estimation from a global wind-wave model: Spectral process, linear, and artificial neural network models
Coast. Eng.
(2007) - et al.
A hybrid efficient method to downscale wave climate to coastal areas
Coast. Eng.
(2011) - et al.
Analysis of clustering and selection algorithms for the study of multivariate wave climate
Coast. Eng.
(2011) - et al.
Impact of the winter 2013–2014 series of severe western europe storms on a double-barred sandy coast: Beach and dune erosion and megacusp embayments
Geomorphology
(2015) - et al.
Modeling of storm-induced coastal flooding for emergency management
Ocean Eng.
(2003) - et al.
Beach response to a sequence of extreme storms
Geomorphology
(2014) - et al.
A multivariate model of sea storms using copulas
Coast. Eng.
(2007) - et al.
Modelling the marine circulation of the campania coastal system (tyrrhenian sea) for the year 2016: Analysis of the dynamics
J. Mar. Syst.
(2020) - et al.
Monitoring and management of coastal hazards: Creation of a regional observatory of coastal erosion and storm surges in the pays de la loire region (Atlantic coast, France)
Ocean Coast. Manage.
(2019)
Challenges and opportunities in promoting integrated coastal zone management in Algeria: Demonstration from the algiers coast
Ocean Coast. Manage.
Towards operational guidelines for over-threshold modeling
J. Hydrol.
Artificial neural networks applied to port operability assessment
Ocean Eng.
A planning strategy for the adaptation of coastal areas to climate change: The spanish case
Ocean Coast. Manage.
Assessment and management of coastal multi-hazard vulnerability along the Cuddalore–Villupuram, east coast of India using geospatial techniques
Ocean Coast. Manage.
Rivers, runoff, and reefs
Glob. Planet. Change
Performance evaluation of wavewatch III in the Mediterranean Sea
Ocean Model.
River flow forecasting through conceptual models part I—A discussion of principles
J. Hydrol.
A methodology to assess the probability of marine litter accumulation in estuaries
Mar. Pollut. Bull.
Modelling storm impacts on beaches, dunes and barrier islands
Coast. Eng.
An index-based method for coastal-flood risk assessment in low-lying areas (costa de caparica, Portugal)
Ocean Coast. Manage.
Hydrodynamic modeling with scenario approach in the evaluation of dredging impacts on coastal erosion in santos (Brazil)
Ocean Coast. Manage.
Modeling wave effects on storm surge and coastal inundation
Coast. Eng.
Beach management strategy for small islands: Case studies of China
Ocean Coast. Manage.
Cluster Analysis for Applications: Probability and Mathematical Statistics: A Series of Monographs and Textbooks, Vol. 19
Review of clustering algorithms
The “swan” wave model for shallow water
Large scale tests on foreshore evolution during storm sequences and the performance of a nearly vertical structure
Coastal Eng. Proc.
Sediment transport and deposition during extreme sea storm events at the Salerno Bay (Tyrrhenian Sea): comparison of field data with numerical model results
Nat. Hazards Earth Syst. Sci.
Cited by (3)
Detection and quantification of wave trends in the Mediterranean basin
2024, Dynamics of Atmospheres and OceansClimate analysis of wave systems for multimodal sea states in the Mediterranean Sea
2024, Applied Ocean Research