Abstract

The conventional tourism demand prediction models are currently facing several challenges due to the excess number of search intensity indices that are used as indicators of tourism demand. In this work, the framework for deep learning-based monthly prediction of the volumes of Macau tourist arrivals was presented. The main objective in this study is to predict the tourism growth via one of the deep learning algorithms of extracting new features. The outcome of this study showed that the performance of the adopted deep learning framework was better than that of artificial neural network and support vector regression models. Practitioners can rely on the identified relevant features from the developed framework to understand the nature of the relationships between the predictive factors of tourist demand and the actual volume of tourist arrival.

1. Introduction

Most countries depend on the tourism sector for economic growth as it creates jobs and contribute about 10.4% to the global gross domestic product (GDP) as at 2019 [1]. Prediction of tourism demand (TD) is one of the critical activities in the tourism sector due to the need for precise forecasts in making certain operational decisions, such as pricing, resources, staff, revenue, and capacity management [2, 3]. Precise TD prediction is also required by the governments to facilitate planning of destination infrastructure, operational flexibility, and environmental quality control [4]. Both quantitative and qualitative techniques are used in TD forecasting. The qualitative techniques depend on intuition, experience, and understanding of a specific destination market and exhibit poor adaptability [5]. They are used mainly in the prediction of tourist arrivals based on historical records and other determinants of tourism volume [6].

Most of the existing studies in TD prediction are based on the quantitative approach; they normally present a model using training data from historical tourist arrival volumes (TAVs) and other TD predictors [7, 8]. Web technology advancements have made search engines an essential tool for tourists when planning their trips, especially in getting relevant information on their areas of interest. Search Intensity Indices (SIIs) have been recognized as a potential TD indictor in the destination market [9, 10] and have been examined by many researchers for TD prediction [11]. SII data are important for accurate TD prediction even though some practitioners have reported some challenges in using them with the conventional prediction models. Two of the common practical barriers that exist are as follows: (i) issues related to feature selection. Rosselló-Nadal and He [12] stated that many factors are considered potential TD predictands, such as the exchange rate, travel cost, tourism prices, and several SII data. The increasing number of the potential influential predictands reduces the availability of training datasets in the feature space, meaning nonavailability of enough data for the building of accurate models.

According to Wang et al. [13], training of most prediction models on training data with numerous explanatory factors is a tedious task; hence, feature engineering is an essential step in building a prediction model as it provides the best set of relevant features that will improve the performance of the developed model [14]. Although some factors have established meanings, it is possible that numerous potential keywords are focused with end user in terms of tourism marketing. Selected feature to TD prediction is currently based mainly on the existing knowledge of the tourism of destination market with the selection of the effective features demand significant human efforts [15]. (ii) The second issue is related to lag order selection. Even though numerous TD prediction methods have adopted the SII data, few studies focus on the determination of the huge relation for series data of the time. Some of the current studies investigated unpredictability hypothesis using either the Granger causality test or Pearson correlation coefficients [16]; in these studies, the investigation of the less hypothesis is performed by examining the extent of the relationship between the lagged values of a factor and the volume of tourist arrival.

However, the reliability of both the Granger causality test and Pearson correlation coefficients is still in doubt in the face of nonlinear underlying relationships [17]. Therefore, the effective selection of the potential data relationships will enhance the development of highly accurate prediction models. The performance of time-series and AI models in prediction tasks is excellent as they rely less on feature selection depends on the existing information of market destination.

The performance of the existing prediction methods is not satisfactory in every destination market due to the impact of numerous real-world situations, especially when adopting numerous SII data as TD indicators that may demand significant field-related expertise to restrain ambiguity. In addition, each SII parameter has different numbers of lag effective; the complexity increases in the presence of the same problem, such as bias in platform and language [18].

The emergence of deep learning techniques has been the solution to most of the above barriers as they have provided the means for achieving accurate TD prediction [19]. Deep learning is an extension of the ANN techniques of two processing layers of nonlinear relationships; they have an influence on many methods due to its built-in feature selection capability. The deep learning network is also relevant in time-series analysis owing to its advantages of being flexible and ability to discriminate nonlinear relationships. Long-term dependencies can be specifically handled and learnt by the recurrent neural network (RNN), attention mechanism, and long short-term-memory (LSTM) models. Hence, deep learning is considered an alternative solution to TD prediction. This work proposes a deep learning technique for TD prediction that will simultaneously address the earlier mentioned practical issues.

Structure of this manuscript is as follows: introduction is mentioned before, and then related work shows the existing methods. The deep learning section comes with its properties, following by difference of deep learning and normal neural network, then continuing by the proposed method and detail regarding the main method. Result and discussion section comes with analysis and the given results followed by conclusion section to summarize the research.

The need for accurate TD prediction is enormous as it provides the necessary tourism-related information to researchers and practitioners; such information aids in making decisions on certain activities, such as resource allocation, risks, and opportunity identification. A review of the existing literature on TD prediction was presented in this section; deep learning technique was also reviewed as it is the selected basis for this work.

2.1. Tourism Demand Prediction

The existing TD prediction-related studies can be classified into two approaches: qualitative and quantitative. The qualitative approach mainly depends on the experience and knowledge on a specific domain; hence, they are sometimes considered “artistic,” and they exhibit low generalization potential [20]. The quantitative approach has been the method of choice for the estimation of the mount relation between many notes data in tourism. They develop models depending on historical data for prediction to potential tourist arrival volumes. The performance of the quantitative approach can be improved using two main approaches: the first plan is to introduce more related parameters that can motivate tourism-related travels and the second strategy is adoption of more complex models that can accurately generalize future trends.

The construction of TD prediction models mainly depends on input factors that are highly related to TD with no missing values. The available TD prediction models can be grouped in different ways using different criteria. They can be grouped into determinants and indicators based on the nature of their influence on TD. Determinants are the major prediction parameters. The traditional theory of economic, such as utility theory, and the theory of the behaviour of consumption estimate that TD is influenced by both qualitative and quantitative economic factors, but most of the TD prediction models do not consider the qualitative economic factors due to the difficulty of their quantification. Such models rely mainly on the quantitative economic factors because they can be measured and used feature to prediction methods. Considering TD nature, it could be stated that the consideration of only economic factors is not enough. Some works have previously focused on the impact of noneconomic determinants on travel motivations, as well as the impact of travel motivation on the destination choices. Kumar and Kumar [21], for instance, introduced qualitative noneconomic determinants such as climate index, special events, and leisure time index. These limits will have classified to pull, resistance, and push, considering their relationship with the specific domain; however, the pull factors are the only considered destination tourism market attributes [22] while the push factors have more influence on the source market [23]. The resistance factors on the other hand are comprised of factors restricting travel from source to the destination market, such as the relative prices and perceived corruption [24].

The determinants of economic theory mainly influence TD. The TD prediction accuracy can be improved by introducing some leading parameters that are seen as secondary factors into the prediction model [25]. The advancement of Web technology has made it possible for most tourists to search for important info before engaging on their trips; such information may be related to selection of the destination, hotel reservations, booking flights, and planning activities. The attention of tourist is reflected on the SII data; hence, they are effective TD indicators that need to be incorporated in TD prediction models. The study by [26] focused on the analysis of the relationship between the search words of tourists on US cities and the level of attractiveness of the cities. From the analysis, SII data were found as important tourism scale indicators in the destination market. Furthermore, Hilal et al. [27] predicted TD of Hong Kong from 9 source countries using the Google Trends Index; the study attested to the importance of SII data for such tasks.

Yang et al. [28] stated that data of SII portray choices to tourist and offer rapid data, which represents right on time variations to the choices of people. The relevance data of SII to prediction of motel residence were also presented by Li et al. [29]. These attributes made them better than the variable time-series of individual models as they can solve the problems associated with abrupt variations in econometric patterns [30].

Search engine selection is mainly based on their popularity in the targeted tourism market. Google and Baidu provide the SII data commonly used in the existing literature [31, 32]. However, the Baidu Index provides daily search volumes while Google company trends provide and normalize the index every week/month. Several prediction models are available in the field of TD prediction. In [8], the authors suggested that these methods are classified into time-series, AI, and econometric models, but TD prediction is usually done using time-series and econometric models [33].

Most of the common prediction methods are extensions of the AMA model [12] while the complex ones, such as Bayesian model, Markov-switching model, and generalized dynamic factor model, are developed for better performance [22, 24]. The existing techniques in this group rely on historical time-series patterns to determine the relationships between TD predictands and the tourist arrival volumes. When building a predictive model, the main task is to introduce the best set of features that will reduce the prediction errors based on the measured performance metrics, such as mean absolute errors (MAEs), root mean squared errors (RMSEs), or mean absolute percentage error (MAPE).

The AI models are the soft computing and machine learning methods used in TD prediction. The study by Law and Au [34] relied on multivariate regression analyses to build models for the identification of nonlinear relationships using neural networks. Furthermore, Zhang et al. [16] presented the improvement of the comprehensibility of TD prediction models using the rough set approach. According to Chakrabarty [35], evolutionary computing methods can be easily used to predict the monthly arrival of tourists at the Balearic Islands of Spain. Effectiveness and efficiency proved by machine learning were used to evaluate the distribution of people during tourism [34]. Hybrid models that integrate different models have recently been shown to provide better results [20, 36]. However, the “No Free Lunch” concept provides that no single technique can perform well in all scenarios as all methods are associated with certain limitations in certain scenarios [37]. Econometrics and time-series models normally depend on stabilizer economic structure and patterns of historical while AI models depend on the size with quality from the training data available.

2.2. Deep Learning (DL)

Successful prediction of TD using AI models, such as SVR and neural networks, has been reported. The study by Chen et al. [38] succeeded in training deep network models through greedy layer-wise pretraining for wide range practical applications. Since the development of deep learning, it has found numerous practical applications, ranging from pattern recognition to natural language processing and image recognition [39]. It has also been used in predicting sequential data problems [40]. This section discussed two common deep network techniques that have shown great efficiency in time-series prediction.

The reviewed deep learning techniques are LSTM and RNN with mechanism of attention. Regarding the RNN, its popular deep networking [41] processes data elements through selective passage of information across the time steps. This is an important attribute for its suitability in TD prediction as its structure is included in the time-series data series provides important information of context. Presented in Figure 1, either of the x for input or y for output of RNN, it can be a singular point of data, but both are time-series data. The memory of the RNN is preserved in the neuron of the hidden layer that captures all information that has been previously processed. The neuron output is generated based on the state of the neuron of the previous hidden layer and the current input via a feedback loop mechanism. The RNN can establish relation between a range of loop elements; it is also being found efficient in data series of nontime [42].

LSTM was developed as an extension of the RNN; it was built with not just a recurrent learning unit, but with several gates for the capturing of the longer and shorter states from the starting and last units, respectively. This feature has enabled the use of LSTM in solving time-series prediction problems. Regarding the mechanism of attention, its feature selection technique works together beside other deep learning (DL) models. The model can assign different weights to different inputs to learn the relevance of the input without the need of doing it prior to model fitting.

The incorporated attention mechanism of LSTM made it ideal for TD prediction because it offers solution to both prediction and feature selection problems. Figure 2 shows the structure example of LSTM.

2.3. Rationale of This Work

Much progress has been made in the prediction of TD, but these developments are yet to reflect in the performance of feature selection processes for TD prediction. Although the prediction performance of TD prediction models is dependent on the selected features used as the training data, TD prediction is still facing two practical limitations; the first issue is associated with feature selection. For secondary TD prediction factors, feature selection is mainly aimed at query selection with the aim of collecting search engine keywords that are related to tourism. However, the issue of long tail in SII data means there are numerous search density small search queries, thereby reflecting travel experiences unique [22]. Some irrelevant and redundant features can be removed using common senses to arrive at a set of ideal subqueries related, but this demands much human effort. The second limitation is related to lag order selection. This lag order determines the nature of the relationship between time-series data. Hence, lag order selection is an important step of preliminary in TD prediction; it is done using methods such as Granger causality test [27].

Most times, some tests do not capture nonlinear relationships or explain the underlying effects of confounding [29]. A wrong lag order selection process will invalidate the subsequent steps of building a prediction model.

Deep learning is an aspect of artificial intelligence that is considered a potential alternative to the existing TD prediction models due to its two unique properties which are (i) ability to naturally learn from highly nonlinear correlations and (ii) ability to automatically select appropriate features at different network layers due to its built-in feature selection mechanism. Moreover, lag selection can be reduced by exploiting the temporally local correlation between TD and its predictands; this can carefully select the best feature of the raw data of input.

The properties made deep learning (DL) ability solution of the over-reliance in the field of expertise. Hence, the aim of the proposed work is to develop a model of deep learning for TD prediction via using the deep network framework for autoextraction of the relevant features with suitable lag orders from various potential features.

With the proposed method, deep learning used features extracted and structured as the neural network with variable nodes and layers. Taken features control the number of layers and nodes and control the system accordingly.

2.4. Deep Learning vs. Neural Network

The neural network works on the basis of entering the information that must be processed into the input layer and then into the hidden layers that are limited in number in the neural network. Also, the number of nodes is fixed in a normal neural network. As for deep learning, the number of hidden layers that are responsible for the process is variable. What distinguishes deep learning is the change in the number of hidden layers as well as the number of nodes in the concerned layer. As for the proposed method, deep learning also contains feedback between the hidden layers and the output layer as well.

3. Methodology

A deep learning-based conceptual framework for TD prediction is proposed in this study; the following subsections provide the detail of the adopted deep learning framework for addressing the abovementioned problems. The proposed conceptual model was used for TD prediction.

The proposed deep learning-based model for TD prediction was developed according to the steps shown as follows:(1)The first step is the identification of the search engine platforms. While planning for travels, tourists normally use several search engines to get information about their potential destination. Hence, it is expected that different source marketing platforms have various search engines. So, that is why it is important to identify the search engine platforms first. Google, for instance, is the commonest search engine in most of the English-speaking regions of the world, while Baidu and Yandex are common in places such as Russia and China [29].(2)The second step is the data collection phase. There is a need to get the volume of monthly tourist arrival volume from reliable sources. Most TD prediction variables can be sourced from various resource depends on data that are available. The collection of other factors, such as data of SII, involves the following substeps:(i)Identification of the initial search terms of the potential destination of the market and maximization of potential set of keywords searched in advance that will reflect the points of interest to tourist on the end user market using Google related to trends queries.(ii)Translation of the tourism-related keywords into languages that can be carried by the other search engine platforms using Google Translate.(iii)Extraction of the data per month of SII that corresponds to entered keywords comes from the final stage of a certain engine search. Monthly data can be obtained from Google Trends while the Baidu Index and other search engines may only provide daily data that require conversion into monthly data. Some search platforms may also not have the SII data of some relevant keywords.Although the feature selection step is not required in the deep learning framework that was applied in the subsequent step, it is important to remove most of the irrelevant factors that relate loosely with the size of tourist arrivals; this will be done automatic by the deep learning framework. Regarding the limitation of the linear Pearson correlation coefficients (PCCs), factors with minor associations can be prefiltered using the maximal information coefficient (MIC). This MIC is based on the concept that if there are two related features, a scatter plot of the grid that divides the data can be made to understand the relation of related features. Provide the MIC generally:where represents the PCC and can be used as a natural nonlinearity measure. For a high value of MIC, large equation (1) implies a nonlinear relationship while a linear relationship is denoted by small equation (1) [21].(3)The training of the deep learning (DL) model is the third step. Being that deep learning techniques are developed with a built-in feature selection mechanism, this study proposes a deep network architecture for autoselection of the relevant factors and determination of the order delay time sequences of series.(4)The model interpretation considers the last step. The trained model captures the relationship between the TD predictands and the tourist arrival volume. The most influencing factors on the lag orders can be determined from the weights of neural connections and degrees of attention. The proposed framework in this study requires no manual feature selection process as the deep learning model automatically performs the feature selection process.

3.1. Deep Network Structure

The proposed deep network architecture and its articulation with historical time-series TD data are presented in this section. The integration of the attention mechanism into the LSTM network is also detailed; the mechanism of attention gives the degrees of interest in the various factors. Tourism demand prediction is mainly aimed at predicting the tourist arrival volume based on historical multivariate factors.

The input is formally portrayed as the fully observed set of feature vector as follows:

The corresponding tourist arrival volume is represented aswhere T represents the total length of time in steps, and say the week numbers or months with the gathered database. represents the tourist arrival volume (TAV) with time of step t, while of vector of multivariate factors.

The prediction of TD requires the use of the time-series of multivariate parameters and the real TAV as model inputs for the construction of a model F for the prediction of y at future time steps:

This expression differs from those of autoregressive models where the availability of is normally assumed when predicting as both are designed for the modelling of the relationship between conditions and their consequences. Long-term dependencies can be handled by the RNN, but the training is sensitive with the RNN of changing in gradient. So, the proposed LSTM and RNN can address this issue through the provision of block memory in their current connections. Cell memory is contained in each block for the storage of the temporal states of the network; there are three gates that control the network, which are called on the basis of data flow, and they are as follows: forget, remember, and inference; they ensure that weak signals are blocked from flowing through the network. Figure 3 depicts the LSTM framework.

Assuming the time-series as an input, this input is encoded by the LSTM into a set of hidden states . LSTM is based on the concept that a few gates are implemented at each time step from the regulation of the flow of information through the sequences; this enables accurate capturing of any long-range dependency. For any time of step l in LSTM, the capturing of long-range dependencies requires updating of the hidden state by the fixed data with the same current of time step , at the preceding time step , at the input gate , at the forget gate , at the output gate , and at the memory cell using the following equations [17]:where σ and tanh are frequent functions, while X indicates sage multiplication. The b and W are used with the LSTM as the parameters though framework training. The result of equation (9) is utilized to estimate the results of the linear retreating layer:As mentioned in equation (7), is indicated as the linear weight retreating layer.

The proposed method deep network (DN) structure of TD is seen in Figure 4. The framework used the LSTM parameters along with the concern technique. Identify the leg or lead relations among data series at time is important in TD expectation due to advantage effect varies depending on the periods of delay. The LSTM is used for the framework of long-term dependency in data series of the time, with attention technique which is indicated in which parts of the unit sequentially units of the framework are effective. This structure gives ability to authors to take two critical parts of data in TD expectation, called as follows:(i)The relationship of temporal among different demand and factors(ii)Significance of the factors depending on weights on TD

Therefore, the temporal of long-term dependency among different arrivals of tourist volumes and factors can be detected automatically via utilizing the LSTM of the alertness technology.Size of the input is (), while () considers size of choosing data in training. The (n) is large tardiness arrangement specified by the users. All layers that are connected (dense) consider bullied according to attention technology with each attention. Then, the common relevant information is chosen in the lead series as seen in the following equation:where are the weighing to be used by the framework. While the vector () represents as weighing that calculates the significance of feature of lead series on time l (), and it was normalized . Thus, lead series is multiplied by attention weighing : .

The LSTM utilizes as input and updates the concealing case at time . The vector is presented by calculating up the multiplication:

The final result is generated by the linear layer:

The LSTM and the identifier can be trained one by one.

4. Results and Discussion

4.1. Empirical Study

Implementation of the proposed imaginary framework was empirically evaluated through the prediction of the monthly TAV in Macau, an autonomous region of China that is located across the Pearl River Delta from Hong Kong. The economy of this region mainly depends on its gaming and tourism sectors. So, it is important to have an accurate and timely prediction of the TAV to sustain the economy of the region. The TAV of Macau in this study was predicted using secondary indicators such as SII data due to the lack of reliable data sources and expertise.

4.2. Performance Evaluation

To proposed model was evaluated for TD prediction performance by comparison with some conventional prediction models that included the SVR, ANN, ARIMA, and ARIMAX models [18, 20]. The predictions with the conventional models relied on the use of the TAVit dates back 12 months as an estimate of , with the name of  =  . For the ANN and SVR models, the input data used were the data from the past 12 months for the prediction of . The ANN model was built with a sigmoid activation function and one hidden layer; it was trained using the backpropagation algorithm. Stationary series are achieved in the ARIMAX and ARIMA using the AR order of (p, d) times of difference; they are trained on the MA order of q using the tourist data from the past 12 months for the prediction of the next 12-month TAV series; this is gradually increased during the process of model validation. ARIMA differed from ARIMAX by using only the TAV for the prediction task while ARIMAX requires other external factors, . Features that contribute less to the TAV were eliminated using the MIC because such a large number of features cannot be handled by the SVR and ANN. The walk-forward model validation was employed to mimic a real-world scenario where new TAVs are made available monthly for the prediction of the TAV for the following month. The prediction model is trained in each step and used to predict the TAV for the next month.

Three measures are used with this work of forecasting accuracy to the acquired predicted values (RMSE, MAE, and MAPE). They are defined as follows.

The best prediction model is determined as the model which achieved the lowest measurement values. The robustness of the developed model in this study was ensured by repeating the walk-forward validation process for the benchmarking models five times in the prediction of 12-month TAVs from 2016 to 2019. Note that the comparative models (SVR & ANN) have no expert-crafted features as their input features were automated from the MIC filtering process. Both models were further implemented using the features selected by the deep learning model; here, the models were recognized as SVR + F and ANN + F. Tables 13 summarize the global MAPE, MAE, and RMSE of the comparative models over a five-year period. The table shows that the performance of the deep network is better in terms of achieving low measurement errors for the 4 consecutive years.

Table 4 shows the result of the one-tailed l-test for the MAPE, MAE, and RMSE of the proposed and comparative models in this study at α = 0.05. The results showed that the null hypothesis is rejected as it suggests the equality of the mean values of the proposed DLM and the comparative model. This is supported by the lower MAPE, MAE, and RMSE values of the DLM compared to the comparative models. The SVR and ANN with no feature selection performed comparatively with the conventional method. However, the proposed DLM which required no feature selection process performed better in terms of prediction than the other models. Note that the performance of SVR and ANN was improved by incorporating the DLM-selected features into their framework; the MAPE of SVR reduced to 5.086% from 6.482% while that of ANN reduced to 5.922% from 14.319%. Hence, the feature selection capability of the proposed approach is validated.

Although all DLM, ARIMA, and ARIMAX do not require preselected features, the ability of DLM to autoselect relevant features from the raw SII data improved its performance more than that of the ARIMA model. The comparison was made using only the Baidu keywords and Macau tourist arrival from Mainland China. Tables 57 show the MAPE, MAE, and RMSE of the comparative models within the studied five-year period. These tables also show similar improvements in performance; for instance, there were decreases in the MAPE. The results of the l-test for the proposed DLM and the comparative models are shown in Table 8; the results further supported better performance of the proposed DLM compared to the benchmark models.

5. Conclusion

The tourism sector requires accurate and timely demand predictions for making informed sector-related decisions. Studies have previously focused on the time-series, econometrics, and AI models for this task in the previous years; however, the performance of these conventional models is dependent on the goodness of the selected features. The feature selection process and their lag order determination are domain specific and demand significant human effort. This study proposed a deep learning-based approach to the selection of the relevant features for better performance of the predictive models. The evaluations showed better performance of the proposed deep network framework compared to the conventional methods possibly due to two reasons; firstly, the ability of the deep network model is to mimic the natural biological system. The successive network layers extract low-level features from the initial input layer for the subsequent abstraction of the high-level features that will capture the semantic relationships between features in the succeeding layers. Secondly, the LSTM has an attention mechanism that automatically selects the relevant features at each time step.

This study made two important contributions to TD prediction; the first one is the development of a systemic conceptual framework of TD prediction and the validation of its TD prediction capability. The proposed model in this work considered all the available TD prediction factors and required no human intervention in terms of feature selection. The second contribution is the use of the attention score for the interpretation is the trained deep network architecture. This provides the practitioners a new way of updating their TD prediction based on a set of relevant indicators at different time steps. The outcome of this work also suggests the ability of the proposed DLM to select a set of relevant features and determine their suitable lag orders. With these contributions, this work can be extended in two directions: firstly, incorporation of different types of indicators, such as Blogs and Tweets into the TD prediction task;, which can address the issue of nonavailability of training data as the DLM allows direct usage of these media data; secondly, feature sets with suitable lag orders can serve as inputs to other TD prediction models. The combination of DLM and the conventional prediction models can enable further theory development.

Future work can increase the features extracted and make the deep neural network more flexible in terms of the number nodes and layers. Prediction for a large variable number is required, and training should be made for the huge dataset.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors extend their appreciation to the National Natural Science Foundation of China, grant number 71701141, and Soft Science Research Project of Shanxi of China, grant number 2017041007-3, for funding and supporting this work.