1. Introduction
Sentiment analysis has gained considerable attention in recent years due to its potential for extracting valuable insights from textual data. One area where sentiment analysis has significant relevance is in the domain of online food delivery. With the proliferation of online food platforms and the increasing use of social media for sharing experiences, understanding customer sentiments and feedback is crucial for the success and growth of these services [
1].
In summary, sentiment analysis in online food delivery has proven to be a useful tool for gaining insight about customers through their opinions, pinpointing areas that require development, and keeping an eye on trends within this fast-paced sector. Through the analysis of customer reviews and feedback, sentiment analysis techniques contribute to enhancing customer satisfaction, optimizing service quality, and informing decision-making processes for online food delivery platforms [
2].
Machine learning-based sentiment analysis leverages algorithms to learn patterns and classify sentiment, while lexicon-based sentiment analysis uses pre-defined sentiment lexicons. Both approaches have been widely applied in sentiment analysis tasks, including those in the food sector [
3].
Most of the available textual datasets for sentiment analysis are in English, while the analysis of low-resource languages poses many difficulties characterized by limited linguistic resources and complexities in grammar and vocabulary. The scarcity of available datasets in these languages hampers the automatic procedure automatic extraction of aspects and sentiment classification. Due to this data deficiency researchers working with low-resource languages either have to utilize the limited existing datasets or create their own [
4,
5].
The objective of this study is to determine the overall performance of sentiment analysis of comments posted on online delivery platforms, performed by a specific tool, and to draw conclusions about consumer concerns about online food delivery. Through this analysis, conclusions about consumer trends were drawn and the accuracy of a tool that first translates and then analyzes comments was measured. These results will help determine the effectiveness of low-resource language sentiment analysis tools.
4. Results
The principal purpose is to gauge the tool's accuracy in analyzing Greek language, which is not performed directly but rather by translating the text first. By comparing the tool's results to those of the expert’s, we can calculate four metrics ‘Accuracy’, ‘Precision’, ‘Recall’ and ‘F-score’ based on the confusion matrix (
Table 1) [
29]. For the calculation of the metrics the terms ‘True Positive’ and ‘True Negative’ were used. A comment is labeled as ‘True Positive; when it’s actual positive and the tool predicts its label as positive as well, the same procedure is applied for the labeling of the negative comments. If the prediction made by the tool matches the actual labeling of the comment, it is correctly labeled and termed as ‘True’. However, if the prediction and the actual labeling differ, the comment is inaccurately labeled and termed as ‘False’. To facilitate comparison between the entities identified by the expert and those identified by the tool, both sets were recorded in an Excel file. Based on the entities that are used more frequently, certain consumer behavior inferences can be drawn.
Using the Confusion Matrix (
Table 1), the aforementioned metrics are calculated as shown in
Table 2.
Three hundred (300) comments were collected. These comments were collected by three (3) different types of food related businesses, a hundred (100) comments from each one. A fast-food restaurant, an Italian restaurant, and a coffee roaster shop comprised the three categories. These three categories were chosen because they are totally different to observe how the tool reacts to different vocabularies concerning each category. The total count of comments analyzed was 293 since some of them were eliminated from the analysis to avoid duplications and comments written in any language other than Greek. The analyzed comments from the fast-food restaurant, the Italian restaurant and the coffee roaster shop were 98, 98 and 97 respectively.
Overall, the analysis shows high performance in the classifications of the dataset with an average accuracy of 90.67% (
Table 4). It should be underlined that, 34% (100 comments) of the comments were not classified from the model, 25% (75) were not evaluated due to sarcasm and lacking syntax and 8% (25) were classified as neutral. Basically, the model classified only 65% of the comments, namely 193 out of 293. Moreover, there is a positive trend towards online food ordering as the true positive comments were one-hundred thirty-five (135) almost three (3) times the negative ones as shown in the Confusion Matrix (
Table 3). In
Figure 6, the confusion matrix is visualized through a grouped bar chart, showing clearly a very good performance in detecting correctly positive comments.
Table 5 statistically examines whether there is agreement between the expert and the tool used. Specifically, it presents the interrater agreement results between the trained expert and the Meaning cloud text analytics platform as well as the Intraclass Correlation (ICC) for the combined data as well as each individual company. As regards Cohen’s unweighted kappa values range between .25 and .28 for all Greek companies and the combined data, which shows a fair agreement between the expert and the Meaning Cloud tool [
30]. Similarly, Fleiss’ kappa shows a fair agreement for all data analyzed. Finally, Krippendorff’s alpha, a more conservative test, shows a tentative agreement [
30] only for the Fast-food company (.67) but not for the other companies and the total dataset. As regards the ICCs all are above the acceptable benchmark values [
31], [
32]. Overall, our data show a marginal agreement between rating of the expert and the tool.
Furthermore, in order to draw conclusions about the online food ordering from the consumer’s perspective, an entity analysis was also carried out. As shown in
Table 6 that presents the results from the expert’s entity analysis, 8.2% of the comments regarded price, 44.7% addressed delivery speed, and 51.8% pertained to the quality of orders. In addition, 11.6% concerned the delivery personnel’s behavior, 7.5% regarded hygiene, and 15.3% discussed the restaurant's overall impression. Finally, 6.8% concerned portion size, while 23.5% were focused on customer service.
Table 7 presents the entity analysis results of the tool. The tool missed to detect comments regarding the entities of price, hygiene, and portion size. Concerning the rest of the entities, the tool identified comments regarding: 0.7% about speed, 31.7% about quality, 4% about the delivery personnel’s behavior, 13.6% about the restaurant’ s overall impression, and 10.9% about the service. Lastly, in 39.9% of all comments, the tool failed to identify any entities, while 8.8% contained isolated labels covering various food categories, amalgamated into the ‘other’ entity.
The analysis highlights gaps in the detection capabilities of the tool since it reveals an inconsistency between the expert's observations and the findings of the tool. The expert identified quality, speed, and customer service as the most pivotal entities, and all the entities were included in the observations. The tool primarily detected comments on overall impression, quality, and customer service, while price and hygiene were missing from the observations.
Conclusions
Sentiment analysis plays a crucial role in understanding the consumers’ pulse towards products, services, and purchasing experiences. By analyzing sentiment, businesses can understand customers' emotional requirements and make decisions that nurture deeper connections with their customer base.
This study contributes to the growing body of literature on sentiment analysis in online food delivery services, providing evidence from Greece that underscores the importance of understanding customer sentiments for the success and growth of these platforms. It has tried to investigate the effectiveness of off-the-shelf sentiment analysis APIs in providing meaningful and accurate results for identifying sentiments in Greek. This was achieved by analyzing 300 online ordering customer reviews using the Meaning Cloud tool. This tool has the ability to classify the text in five levels and identify entities in comments.
According to the results, the analysis achieved a high accuracy level of 90.67%. Specifically, within the dataset, the researcher identified 61% (179) of the comments as positive and 32% (94) as negative. The tool detected correctly 76% (135) of the positive comments and 42% (40) of the negative comments. Positive comments are three times more than negative ones. It must be noted that although, the classification had a high accuracy, only 66% of the total comments were classified. Therefore, if the unclassified comments were included in the evaluation metrics, then the percentages would probably decrease.
Also, the findings of this research underscore that, when ordering food online, customers make comments mainly on the quality of the delivered meal, the speed of delivery, and the restaurant’s customer service. By leveraging sentiment analysis techniques, we have identified key points of customer interest. This insight can assist companies operating Greek food delivery platforms and the collaborating food catering businesses in improving customer satisfaction and optimizing service delivery.
It must be noted that, apart from the term ‘store’, which was incorporated into the generalized-default model utilized in the analysis, the percentages of discovered entities by the tool were quite low. The low percentages primarily stemmed from the research constraint requiring comments to be translated before analysis. This process often distorts the original meaning of the comments.
Moreover, this study has revealed challenges associated with the need for developing specialized tools tailored to the linguistic nuances of specific languages. As shown, the model used lacks specialization in a specific domain to incorporate relevant vocabulary; instead, it relies on a limited set of terms in a generalized manner. Herein comes the necessity of developing a lexicon dedicated not only to a specific language, but also to a particular field, in this case, a particular type of restaurant or shop. By developing the lexicon, the percentages will undoubtedly increase, yet achieving 100% accuracy is improbable due to customers employing incorrect or unstructured syntax. Such variations alter the meaning of comments, affecting the findings of the analysis. Moving forward, continued research and innovation in sentiment analysis tools and techniques will be essential for unlocking its full potential in diverse linguistic contexts and industry domains.
Author Contributions
Conceptualization, C.C., A.L., M.N., F.N., and N.F.; methodology, C.C., M.N., F.N., and A.L.; validation, N.F., A.L., and F.N.; formal analysis, N.F. and F.N.; investigation, N.F. and F.N.; resources, N.F.; data curation, N.F. and F.N.; writing—original draft preparation, N.F, M.N., N.F., C.C. and A.L.; writing—review and editing, N.F, M.N., N.F., C.C. and A.L.; visualization, N.F. and F.N.; supervision, M.N., N.F., C.C. and A.L. All authors have read and agreed to the published version of the manuscript.