1. Introduction
As technology today covers every industry, the impact it may have on businesses is immense. Online consumer feedback is one aspect of this technological impact that has been critical to the company’s success. The hospitality and tourism industry is one well-known area where the usage of technology is critical. Furthermore, customer feedback is information offered by customers who share their ideas regarding a certain item or service of an organization. They are largely recorded these days through user-generated information seen in online customer reviews posted on review websites. Customer feedback may have a substantial influence on a company’s performance since the opinion of customers is so crucial in a company’s competitive markets. [
1]
1.1. Accommodations And Review Statements
After staying at an accommodation, a guest can rate it and leave a comment about their experience. Customers can provide both good and negative feedback. Some of the reviews have conflicting opinions. Users write numerous evaluations, many of which mention flaws that have to be addressed by management. According to a customer review, customers would be incredibly delighted the next time they stay at the accommodation/guest house if management is able to listen to the guest and ensure that any concerns are rectified and addressed. This reflects the accommodation’s administration’s concern for its visitors.In the hospitality and tourist industries, online portals are rapidly being used for reservations and bookings.
Many of these customers routinely utilize these websites to book reservations. More than 140 million reservations are made online each year, with the majority of customers researching their travel alternatives on existing websites [
2]
In order for any concerns brought up in the reviews to be addressed, the relevant management must be made aware of them. Customer reviews might be evaluated here to identify such issues. Maintenance problems are unavoidable. This is why it’s vital to be informed and watch for reviews and complaints at all times. It may be challenging to keep track of guest comments and improvement recommendations, especially for major accommodation chains with a high volume of daily guests. [
3]
1.2. NLP And Sentiment Analysis Of Review Statements
Methods, strategies, and tools for analyzing, detecting, and extracting subjective information from text, such as views and attitudes, can be referred to as sentiment analysis [
4] According to, sentiment analysis is the most popular study subject in Natural Language Processing (NLP) [
5]. Sentiment analysis or opinion mining, according to the preceding description, is the study of extracting or mining the sentiment value from a customer’s comment, notion, or information towards a certain area of interest. Nonetheless, the vast majority of existing sentiment analysis algorithms are primarily concerned with classifying sentiments as positive, neutral, or negative. It’s possible to further broaden this system by classifying maintenance issues into relevant categories such as food, electricity, washroom, room, etc. This spectrum may be further refined and fine-grained to include specific maintenance concerns such as pool, kitchen, entertainment, service, bar, parking, and miscellaneous, among others [
6].
2. Domain Background
2.1. Customer Feedback
A customer review, also known as customer feedback, is information, an issue, or input provided by a consumer regarding their experience with a certain product or service. These customer ratings are significant in everyday life since human decisions are constantly influenced by the thoughts and perspectives of others [
7]. As a result, these client reviews have a substantial influence on a company’s or service’s performance. As a result, in order to run a successful business, customer reviews might be of aid. In the hospitality sector, consumer feedback is a significant component of the decision-making process for accommodation management, as well as a decisive factor in the company’s future.
2.2. What is Sentiment Analysis and Its Benefits
Sentiment analysis [
8] is the computer evaluation of a person’s feelings, emotions, or attitudes toward a product, service, or other item. It might be either favorable, neutral, or negative. Although there was substantial interest in study in this field in the early twentieth century with studies on public opinion analysis, the beginning of computer-based sentiment analysis did not emerge until the availability of subjective writings on the Web [
9]. Because sentiment analysis is a popular area of study, recognizing and categorizing classification outcomes as positive, negative, or neutral is already in use.
As many organizations struggle with information overload these days, they routinely keep massive amounts of client input that are difficult to manually sift and evaluate. Similarly, if a certain lodging receives more than 5000 consumer evaluations each month, that is more than 150 reviews left per day. It would be a nightmare for employees to read and categorize the responses for analysis, and by the time it is completed, the information may be out of date. Furthermore, by rapidly and properly assessing feedback, customer experience may be improved by taking the appropriate measures to fix current concerns while focusing on driving further sales to the company.
3. Machine Learning-based classification and extraction techniques
Text polarity is selected via classifiers. Support Vector Machine (SVM), Naive Bayes, and maximum entropy classifiers are often used models [
10]. Several models, including logistic regression and random forest, can be used in conjunction with the machine learning approach for classification problems. A labeled dataset is used in the supervised learning technique to deliver an appropriate output [
11]. If there is unlabeled data, a clustering approach must be performed before utilizing supervised learning. Unsupervised learning requires no labeled data. Because the user reviews are labeled, supervised learning classification is a better fit for this, leading in higher accuracy and performance.
4. Deep Learning-based classification and extraction techniques
Deep learning is an early subset of Machine Learning that acts similarly to Machine Learning but can do better on more complex data than Machine Learning and can extract features without human involvement [
12]. Deep learning allows computer models composed of several processing layers to learn data representations with varying degrees of abstraction [
13]. Deep Learning is a complicated science, yet the possibilities are limitless.
Deep learning models demand a lot of resources, thus a system with more cores and a good GPU is essential for the model to function well. Deep Learning is utilized in this research, together with NLP and text preparation techniques, to develop a model for the problem that this project is seeking to solve.
5. Existing Work
This section takes a look at the most relevant existing solutions for reviewing, assessing, and classifying in the hospitality sector. And the author evaluates their benefits and drawbacks in order to determine how far existing solutions go in resolving challenges linked to this area and similar domains.
5.1. Existing Work on Rule-based approaches
This technique extracts polarity from sentiment lexicons, aggregates those scores, and then estimates the overall sentiment of a phrase. This approach is frequently used when sentiment tagged data is not available or the algorithm performs well across domains. The effectiveness of this method might be increased by POS tagging, phrase co-occurrence analysis, and NLP [
14].
[
15] The authors of the research developed a contextual technique to increase the effectiveness of
Lexicon-based Sentiment Analysis. Furthermore, the authors considered the possibility of Sentiment Analysis performance degradation caused by both the local and global environment. This technique enhanced precision, recall, and accuracy when compared to the baseline method. The key drawback is that the focus should be on establishing the sentiment value of reviews at the aspect level.
The authors implemented an enhanced sentiment classifier [
16] based on the lexicon-based approach, using four-way classification methods such as modifiers and negation classifiers, emoticons classifier, domain specific classifier, and
SWN-based classifier to achieve sentiment analysis effectively and accurately. As a consequence, the authors were able to improve the accuracy and efficiency of sentiment categorization in a variety of domains while also reducing the amount of neutral evaluations discovered. The time required to manually assign ratings to domain-specific terms was their system’s largest flaw.
5.2. Existing Work on Machine Learning approaches
[
17] The authors describe how they created a model for sentiment analysis of reviews using
Naive Bayes,
Support Vector Machine, and
Decision Tree. They utilized Amazon.com reviews to recommend that supervised learning be utilized for implementation. Cross-validation of the classifier findings was also done to justify the best classifier for the purpose. Using cross-validations and performance comparisons of the three classifiers used,
Support Vector Machine was the best classifier with 81.75% accuracy. As previously mentioned, the project may be developed to categorize reviews other than positive and negative, as well as produce product ratings depending on the reviews.
In this research work, the
Random Forest approach [
18] is used to give a sentiment analysis approach in Indonesian language. Furthermore, as stated by the authors, they employed a number of weighting methods, including
Binary TF,
Raq TF,
TF-IDF, and
Logarithmic TF, as well as bag of words (BOW) features, and they were able to reach an accuracy of 82%, which is not sufficient.
Figure 1.
A prototype feature diagram of the proposed system (self-composed).
Figure 1.
A prototype feature diagram of the proposed system (self-composed).
5.3. Existing Work on Hybrid approaches
This research [
19] uses a range of machine learning and deep learning approaches to analyze and predict positive and negative views in a Twitter tweet dataset, including CNN, RNN, Decision Trees, Naive Bayes, Random Forest, and LSTM. The techniques are then compared to determine which algorithm is superior. Then, to develop a hybrid model,
CNN classifier,
CNN (
with Word2Vec), and
RNNclassifier (with
LSTM and
Word2Vec) are utilized.
This research [
20] is being conducted to create a model for sentiment analysis of tweets using a hybrid technique. The authors used
SentiWordNet-based feature vectors as input and Support vector machine (
SVM) as the classification model. Tokenization and part of speech tagging were also utilized in data preparation. The authors picked Support Vector Machine because it outperforms other classifiers in terms of accuracy and precision. The author’s major purpose in implementing a hybrid method was to improve classification performance by using SWN to manage lexicon modifier negation during score calculation.
6. Proposed Approach and Architecture
Figure 1 depicts a similar prototype feature diagram of the suggested prototype created as well as its functionality linked to the proposed research.
As seen in the
Figure 2, the author choose tiered architecture as the high-level architecture for the system. This is more scalable than the layered design, suggesting that each level of the architecture may be grown separately without affecting the other layers.
Generally, if a web application is used to show the outcome of the Classification and Extraction System after deploying the model built. In a scenario like that, it would make more sense to showcase how the components of the system depicts their relationships and their data flows.
Presentation Tier - In this case, the web interface acts as the presentation tier. The online interface is used by the user to write and search for reviews, while management may search, assess, and categorize comments based on a variety of maintenance categories.
Logic Tier - This logic layer would be in charge of searching and interpreting guest reviews, recognizing maintenance concerns, and monitoring their status. This layer communicates with the data storage layer to retrieve and store information, as well as with the presentation layer to transmit data to the guest / management.
Data Tier -The data layer may be used to record both visitor comments and information about maintenance issues. The data storage layer communicates with the application layer, supplying data for processing and storing any information generated by processing.
During a requirement survey conducted prior to developing the prototype, the author realized that there was an obvious need for the public and accommodation managements to classify and extract reviews based on several categories.
The author’s first objective was to use Machine Learning techniques in conjunction with Natural Language Processing (
NLP) and integrate them into the system in order to construct the model. However, due to the reduced accuracy and poor results produced by using Machine Learning Algorithms, the author investigated numerous Deep Learning approaches and finally
MLP (Multilayer Perceptron) is used to construct a superior model with great accuracy above 95%. Furthermore, using deep learning and NLP, which are not available in any existing platform, it is possible to detect the maintenance-related problems arising from these challenges with high accuracy [
21].
7. Testing and Evaluation
7.1. Testing
This author discusses the model’s key testing objectives. The prototype will be thoroughly tested using these processes in order to eliminate any potential defects and improve the overall system.
7.1.1. Confusion Matrix
The confusion matrix is a table that summarizes classification model performance. To comprehend a confusion matrix, the following terminology must be understood.
True Positives - Values that have been specified as true and are expected to be true by the system.
True Negatives - Values that have been designated as negative and predicted as negative by the system.
False Positives - Values declared as true but predicted by the system as negative.
False Negatives - Values set as negative but predicted by the system as positive.
7.1.2. Accuracy
The average accuracy of the model obtained is 95%.
Accuracy of the Model = (number of statements with correct positive outputs + number of statements with correct negative outputs) / (number of statements tested) * 100%.
7.1.3. F1 score
F1 Score is a measurement used to compare two models, such as two models with low precision and good recall, or vice versa. Thus, the F1 Score is introduced to capture both accuracy and recall simultaneously.
7.1.4. Precision
This is the proportion of accurate predictions that were successful, and the prototype has a precision of 0.94.
7.1.5. Recall
The prototype was successful in making 0.94 real positive predictions, which is the measurement used to determine ’Recall’.
Testing done for some of the review statements based on the model is presented in the
Table 1
7.2. Evaluation of the System
Evaluation is the most important stage in the software development cycle since feedback and professional evaluations are crucial in deciding the system’s success or failure. The system’s limitations can also be clearly identified . Additionally, the performance and success of the project are more meaningful and productive.
The project’s outcome will improve the manual process that the majority of lodgings now utilize for examining text-based feedback from customers. The suggested system can assess hundreds of reviews in a matter of seconds and identify maintenance-related issues from them, saving a lot of time and eliminating the need for a person to examine the reviews, which is a labor-intensive and error-prone operation.The author was able to achieve the research goal by employing recognized text mining techniques in Natural Language Processing, sentiment analysis approaches from the literature review, as well as using Deep Learning to create the model.
The development of models and data pre-processing have taken a lot of effort in order to get the best results with the data currently available. Enterprise-level tools, technologies, and processes were used to construct the recommended solution. The system was developed taking into account both the requirements that had already been identified using efficient qualitative and quantitative fact collecting techniques, as well as any potential future requirements. In order to choose the mean accuracy for classification measures, the F1 score was predominantly employed. They were investigated using the confusion matrix. Recall and precision were also taken into account. They were chosen after reviewing over the existing literature.
By completely eliminating the human process of evaluating text-based customer feedback, the suggested approach attempts to provide more accurate and important data. While the management of the accommodation can notify guests of maintenance-related issues that have already been rectified and take note of them, this would also increase guest satisfaction [
22].
8. Future Enhancements
Although the method was developed for the hospitality and tourist industry, it may be used to any industry. The system has been designed and constructed such that it may be utilized with any other domain, such as the restaurant business, to analyze any kind of issue. The hospitality and tourist sectors were chosen as the focus of this initiative. The new domain will need the utilization of new datasets.
The system will need multi-language support in order to become a widely utilized system on a worldwide scale, which was not taken into consideration. This might be accomplished by training them on distinct datasets for each language, but it will be necessary to employ libraries that are designed for these languages. It will be difficult to implement some languages since certain libraries do not support them.
The categorization accuracy of the model can be further enhanced, which will provide more accurate information.
Slang, irony, and sarcasm are three examples of semantics that have the power to drastically alter a person’s perspective, however this system has been criticized for not taking them into account which is another limitation.
9. Conclusion
The research was to categorize a variety of maintenance issues found in lodgings, and it was determined that they were electrical, bathroom, and room-related. The recurrent problems in reviews were also found to be related to food or other categories.
Secondly, examined the viability of employing Deep Learning and Natural Language Processing approaches to detect maintenance concerns in the hospitality and tourist industry.
Hence, this research offers a method for accommodations to automatically categorize and analyze customer reviews using Deep Learning and Natural Language Processing to find issues with maintenance. This research describes how a workable strategy was found by evaluating the solutions, technologies, approaches, and tools already in use. To ensure that the model’s performance is acceptable, extensive testing was conducted.
Acknowledgments
The author of this paper acknowledges the guidance and evaluation insights received for the project from Mr. Pumudu Fernando and Mrs. Indula Kulawardana.
References
- Chu, R.K.S. Stated-importance versus derived-importance customer satisfaction measurement. Journal of Services Marketing 2002, 16, 285–301. [Google Scholar] [CrossRef]
- Responding to TripAdvisor hotel reviews – the good, the bad, and the ugly. - Google Search. Available online: https://www.google.com/search?q=Responding+to+TripAdvisor+hotel+reviews+%E2%80%93+the+good%2C+the+bad%2C+and+the+ugly.&oq=Responding+to+TripAdvisor+hotel+reviews+%E2%80%93+the+good%2C+the+bad%2C+and+the+ugly.&aqs=chrome..69i57.350j0j4&sourceid=chrome&ie=UTF-8 (accessed on 10 April 2023).
- Agrawal, P.K.; Alvi, A.S. Textual Feedback Analysis: Review. 2015 International Conference on Computing Communication Control and Automation 2015, pp. 457–460.
- Dahiya, S.; Mohta, A.; Jain, A. Text Classification based Behavioural Analysis of WhatsApp Chats. 2020 5th International Conference on Communication and Electronics Systems (ICCES) 2020, pp. 717–724.
- Baldania, R. Sentiment analysis approaches for movie reviews forecasting: A survey. 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS) 2017, pp. 1–6.
- Park, S.; Bae, B.C.; Cheong, Y.G. Emotion Recognition from Text Stories Using an Emotion Embedding Model. 2020 IEEE International Conference on Big Data and Smart Computing (BigComp) 2020, pp. 579–583.
- Bandana, R. Sentiment Analysis of Movie Reviews Using Heterogeneous Features. 2018 2nd International Conference on Electronics, Materials Engineering & Nano-Technology (IEMENTech) 2018, pp. 1–4.
- Zhang, L.; Wang, S.; Liu, B. Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2018, 8. [Google Scholar] [CrossRef]
- Mäntylä, M.; Graziotin, D.; Kuutila, M. The evolution of sentiment analysis - A review of research topics, venues, and top cited papers. Comput. Sci. Rev. 2016, 27, 16–32. [Google Scholar] [CrossRef]
- Sun, S.; Luo, C.; Chen, J. A review of natural language processing techniques for opinion mining systems. Inf. Fusion 2017, 36, 10–25. [Google Scholar] [CrossRef]
- Gautam, G.; Yadav, D. Sentiment analysis of twitter data using machine learning approaches and semantic analysis. 2014 Seventh International Conference on Contemporary Computing (IC3) 2014, pp. 437–442.
- Shorten, C.; Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. Journal of Big Data 2019, 6, 1–48. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Katz, G.; Ofek, N.; Shapira, B. ConSent: Context-based sentiment analysis. Knowl. Based Syst. 2015, 84, 162–178. [Google Scholar] [CrossRef]
- Rintyarna, B.S.; Sarno, R.; Fatichah, C. Enhancing the performance of sentiment analysis task on product reviews by handling both local and global context. Int. J. Inf. Decis. Sci. 2020, 12, 75–101. [Google Scholar] [CrossRef]
- Asghar, M.Z.; Khan, A.; Ahmad, S.; Qasim, M.; Khan, I.A. Lexicon-enhanced sentiment analysis framework using rule-based classification scheme. PLoS ONE 2017, 12. [Google Scholar] [CrossRef] [PubMed]
- Singla, Z.; Randhawa, S.; Jain, S. Sentiment analysis of customer product reviews using machine learning. 2017 International Conference on Intelligent Computing and Control (I2C2) 2017, pp. 1–5.
- Fauzi, M.A. Random Forest Approach for Sentiment Analysis in Indonesian Language. Indonesian Journal of Electrical Engineering and Computer Science 2018. [Google Scholar] [CrossRef]
- El-Jawad, M.H.A.; Hodhod, R.A.; Omar, Y.M.K. Sentiment Analysis of Social Media Networks Using Machine Learning. 2018 14th International Computer Engineering Conference (ICENCO) 2018, pp. 174–176.
- Gupta, I.; Joshi, N. Enhanced Twitter Sentiment Analysis Using Hybrid Approach and by Accounting Local Contextual Semantic. Journal of Intelligent Systems 2019, 29, 1611–1625. [Google Scholar] [CrossRef]
- Goularas, D.; Kamis, S. Evaluation of Deep Learning Techniques in Sentiment Analysis from Twitter Data. 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications (Deep-ML) 2019, pp. 12–17.
- How to Respond Properly to Online Hotel Reviews - HermesThemes.com. - Google Search. Available online: https://www.google.com/search?q=How+to+Respond+Properly+to+Online+Hotel+Reviews+-+HermesThemes.com.&oq=How+to+Respond+Properly+to+Online+Hotel+Reviews+-+HermesThemes.com.&aqs=chrome..69i57.322j0j7&sourceid=chrome&ie=UTF-8 (accessed on 12 April 2023).
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).