An ensemble-based hotel recommender system using sentiment analysis and aspect categorization of hotel reviews
Introduction
Initiation of the second generation of World Wide Web that is, Web 2.0 and the exponential growth of social networks, enterprises, and individuals have led to an excessive increase in the usage of the content available in these web resources which, in turn, help us to make highly informative judgments. Information processing from web resources also opens up many new research domains. For example, tourists consider checking past experiences and opinions of other travelers, available on the different web platforms, when planning their own vacations. This rich and diverse publicly available data can be used by giant tourist organizations as part of their on-field market research. This may range from carrying out polls or focusing groups of probable customers as future endeavors. But the diversity in opinions present in the textual data, provided by the users, gives rise to unwanted complexity as the processing of such huge data is a next to impossible task for humans. To this end, computer scientists provide some data-mining tools/algorithms which can help the user to extract relevant information from the vast amount of data. Taking into consideration a somewhat similar problem of creating a recommendation framework from opined texts available on the internet, this work focuses on developing a Hotel Recommendation System which can help both the tourism industry and individuals looking for hotels. In doing so, we use some advanced and comparatively new algorithms in the domain of textual data mining. Hotel Recommendation System based on Sentiment Analysis of the reviews is a very new research topic that has attracted the researchers due to its tremendous application in the hotel industry and tourism. It has multiple aspects, for instance, a review may talk about different categories such as location, room, and staff of a hotel. There are several factors that add complexity to study as well as retrieve useful information from such data. For example, the requirements of one particular user vastly differ from another. Also the writing pattern of each reviewer is different from the other. For example, for the same hotel, different customers may give feedback in a completely diverse manner. Also the priority we give to some aspects varies a lot at a personal level. Some of us prefer food over hotel location, while some like to pay extra bucks for the window view. It varies both on gender as well as age basis. A millennial may prefer the availability of entertainment and spacious rooms. A typically old person may prefer better room service and cleanliness of rooms. Accordingly, we give the reviews. Another important factor is the size of the review set. Customers are keener to category wise personalized information found in the reviews and often use it as a basis for decision-making. We form an ensemble of different models of transfer learning using BERT and Random Forest classifier on different textual features of the reviews to classify the sentiments of the hotel reviews
In this work, we have focused on the Sentiment Analysis of the reviews crawled from the online Tripadvisor website made by online consumers. Then we have grouped the data into predefined categories. These categories like ‘Location’, ‘Cleanliness’, ‘Service’ etc. are the aspects that frequently recur in the review data, because topics often overlap with each other in real-world reviews. The remaining of the paper has been organized as follows. Section 2 provides a literature survey about the works already done on this topic along with a brief description of their performances. This is followed by Section 3 where we have discussed our motivation for the work and provided a brief decription of our contributions in the present work. Section 4 describes the datasets on which the proposed framework has been evaluated. The methodology that has been followed in designing our architecture is described in Section 5. This is followed by Section 6 where a detailed analysis of performance is shown. Finally, the concluding remarks are reported in Section 7.
Section snippets
Literature survey
Sentiment analysis indicates an area of natural language processing (NLP), computational linguistics, and text mining which aims to determine the emotions, personality, etc. of a writer analogous to specific topics. In recent years, many researchers have proposed various models on sentiment analysis of various topics of tourism, finance, social media, etc. Many types of research have been done in analyzing sentiments in the financial domain [1], [2], [3], [4]. In 2020 Zhao et al. [5] proposed a
Motivation and contributions
One of the major problems faced by the various researchers for designing a hotel recommender system is the lack of a properly labeled dataset. There are very few datasets containing hotel reviews. And even those datasets cannot be used for sentiment analysis or categorization task. There are no sentiment labels present in the dataset which are mandatory to train the dataset for the sentiment analysis task. This brings in the necessity for preparing a suitably labeled dataset containing the
Data crawling
High-quality datasets related to hotel recommender systems are not publicly available as such. Especially the purpose of Sentiment Analysis and availability of properly balanced data are major challenges. That is why, in this work, we have been motivated to develop our own dataset based on our requirements. Also, a new dataset is always a valuable resource for the research community. The crawling of data was carried out using the Tripadvisor API. The website Tripadvisor has a huge database of
Pre-processing of review text data
Data pre-processing tasks are executed after the collection of data. The review texts that the reviewer writes consists of various types of words and their different forms. All these words and their various forms do not have any such significance for classification purposes. So, to generalize the data and preventing unnecessary use of computational resources, these texts need to be pre-processed. The text pre-processing tasks carried out for the Sentiment Analysis classification include:
- •
Results and analysis
In this work, we have proposed a hotel recommendation system based on Sentiment Analysis and categorization of hotel reviews. We have prepared our own dataset by crawling data, carried out using the Trip advisor API. The crawled dataset consists of reviews. The results for both the processes of Sentiment Analysis and review categorization are discussed in detail. The libraries used for data crawling are urllib, socket, and contextlib. The gensim [34] library’s efficient Word2Vec
Conclusion
Helping a user to choose a proper hotel based on his/her requirement and affordability from the online hotel reviews made by the customer gives us an interesting research field called the hotel recommendation system. This ensures that the customers can make optimal travel decisions based on the input query. In this work, we have presented a novel approach for a user query based recommendation system which gives hotels and reviews corresponding to them if required as output as per the user
CRediT authorship contribution statement
Biswarup Ray: Conceptualization, Data curation, Formal analysis, Resources, Software, Methodology, Writing - original draft. Avishek Garain: Conceptualization, Data curation, Formal analysis, Resources, Software, Methodology, Writing - original draft. Ram Sarkar: Methodology, Supervision, Writing - original draft.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (44)
- et al.
Aspect based sentiment oriented summarization of hotel reviews
Procedia Comput. Sci.
(2017) - et al.
Improving text summarization of online hotel reviews with review helpfulness and sentiment
Tour. Manag.
(2020) - et al.
A novel deterministic approach for aspect-based opinion mining in tourism products reviews
Expert Syst. Appl.
(2014) - et al.
Opinion mining from online hotel reviews – a text summarization approach
Inf. Process. Manage.
(2017) - et al.
A classification-based review recommender
Knowl. Based Syst.
(2010) - et al.
Predicting hotel review helpfulness: The impact of review visibility, and interaction between hotel stars and review ratings
Int. J. Inf. Manage.
(2016) - et al.
Manipulation of online reviews: An analysis of ratings, readability, and sentiments
Decis. Support Syst.
(2012) - et al.
Gather customer concerns from online product reviews - a text summarization approach
Expert Syst. Appl.
(2009) - et al.
K-RMS algorithm
Procedia Comput. Sci.
(2020) - et al.
Deep learning for financial sentiment analysis on finance news providers
Sentiment polarity identification in financial news: A cohesion-based approach
News impact on stock price return via sentiment analysis
Knowl.-Based Syst.
Bert-based financial sentiment index and LSTM-based stock return predictability
A bert based sentiment analysis and key entity detection approach for online financial texts
Sentiment analysis of social media response on the covid19 outbreak
Brain Behav. Immun.
Twitter sentiment analysis on worldwide COVID-19 outbreaks
Kurdistan J. Appl. Res.
Sentiment analysis of Twitter data during critical events through Bayesian networks classifiers
Future Gener. Comput. Syst.
An adaptable fine-grained sentiment analysis for summarization of multiple short online reviews
Data Knowl. Eng.
Machine learning-based multi-documents sentiment-oriented summarization using linguistic treatment
Expert Syst. Appl.
A sentiment-based hotel review summarization
Machine learning-based sentiment analysis for analyzing the travelers reviews on Egyptian hotels
Sentiment analysis for hotel reviews
Cited by (109)
Bandit algorithms: A comprehensive review and their dynamic selection from a portfolio for multicriteria top-k recommendation
2024, Expert Systems with ApplicationsA BERT-Based Sequential POI Recommender system in Social Media
2024, Computer Standards and InterfacesA multi-label text message classification method designed for applications in call/contact centre systems
2023, Applied Soft ComputingAn Aspect-Based Review Analysis Using ChatGPT for the Exploration of Hotel Service Failures
2024, Sustainability (Switzerland)Wasserstein GAN-based architecture to generate collaborative filtering synthetic datasets
2024, Applied Intelligence