1. Introduction
Electronic commerce is a commercial practice that exchanges products and services online. The emergence of e-commerce has enabled consumers to buy products and services at their discretion, regardless of time and place, without leaving the comfort of their own homes or places of business. With the advent of e-commerce, companies can now serve their clientele around the clock, which often results in more business and happier clients. Review sites, online shopping sites, and weblogs have allowed almost anybody to voice their opinion on products and services. Horrigan [
1] surveyed online shoppers in 2008 and found that 81% of respondents had researched the goods using internet resources before purchasing, with 79% expressing confidence that they had made the best possible choice. People can communicate their feelings and experiences by posting their opinions and images online. Emotions include a vast spectrum of human affairs, from liking something to disliking something, being happy to feel sad, and being low to feeling happy [
2]. It also helps companies raise their product quality because of these reviews.
Opinions expressed in online reviews have a significant part in the buying decisions of online shoppers. Reviews give consumers an idea of what it's like to purchase from a company by providing feedback on the quality of the items and services they've experienced. Online reviews are crucial since they may significantly affect a company's image and, by extension, its revenue. Customer confidence in a company and the number of consumers that patronize it may increase with positive evaluations, whereas credibility and customer aversion can decrease with unfavorable reviews [
3]. Online reviews are more crucial than ever in today's market. With the excess of options available, customers put a premium on hearing about the perspectives and experiences of others before making a purchase. Hence, in today's cutthroat e-commerce environment, survival requires keeping an eye on customer feedback and responding appropriately.
Figure 1 states that if someone is concerned about purchasing a headphone, they can look online and read reviews from other customers about the product. After going through these reviews, the person's decision will be influenced by the opinions of other customers [
1].
Online reviews are assessments or viewpoints expressed by customers or users of a good, service, or company online. They assist other prospective consumers in making selections by sharing information and feedback. Online reviews can be accessed on many different platforms, including social media sites, specialized review websites, and mobile apps. They often include a rating or score along with a written summary of the reviewer's experience, level of satisfaction, and any advantages or disadvantages they encountered. Online reviews are something that businesses frequently pay attention to because they have a big impact on their reputation and how customers perceive them. Reviews may be broken down into two broad categories: quantitative (like star ratings) and qualitative (like lists of pros and cons) [
4]. It can also include Annotation made in the form of words or pictures that classifies how one feels. Such quantitative expressions of emotion are the star ratings seen on many online retailers' sites, as shown in
Figure 2. This rating system is used to assess people's opinions. Classification refers to separating one kind of review or criticism into distinct categories. Reviews may be emotionally categorized in several ways, including positive, negative, neutral, and some additional categories, including anger, sadness, happiness, annoyance, etc.
As a result, customers are now more dependent on product reviews to obtain information and make informed purchasing decisions. Due to the sheer volume of information available, about 32% of consumers have reported confusion, while 30% have expressed frustration. Reading all the reviews can prove to be time-consuming, yet many people prefer to go through a few reviews before opting for a product or making a choice [
5,
6]. Sentiment analysis (SA) offers useful insights for people and firms in the modern digital age, where enormous amounts of data are produced everyday through online reviews, social media posts, and client feedback. The sentiment or opinion represented in text data, such online reviews, is analyzed and determined using sentiment analysis. SA has become a critical or beneficial tool to extract, evaluate and report the sentiments of people using any social media platform [
7].
SA is a sub-field of NLP (Natural Language Processing), which automatically identifies opinions from text data. Its primary objective is to categorize reviews created by users as negative or positive depending upon the author's viewpoint or sentiment on a particular subject. As per Liu, opinion mining or sentiment analysis, as both terms, can be used alternatively in the research or study of behaviors, attitudes, feelings, emotions, evaluation, categorization, opinions, and sentiments towards various aspects of a service or product or a person [
8].
Many different ways for opinion mining may be used to provide product recommendations. These approaches rely mainly on the text content that consumers give. Consumers can communicate their feedback using Multus-medium, including text and graphics [
9].Online reviews that include both text and photos are referred to as multus-medium reviews. Reviews that include text and images are very important in many fields. They provide people with a more thorough and interesting platform to share their thoughts and experiences. Images offer context and visual proof that writing alone often struggles to convey. Reviewers may provide a more thorough and accurate account of their experiences by incorporating photographs with their comments, empowering future consumers to make more educated selections. Reviews with both images and text are frequently seen as being more sincere and reliable. Since real-world photographs are more difficult to manipulate than text-only reviews, seeing them with written criticism strengthens the review's trustworthiness [
10]. Images have the power to stir feelings and forge closer ties with readers. Reviews are more important when consumers use images to describe their experiences because they connect with prospective customers on a deeper level. While typical online evaluations are mostly composed of written descriptions, multus-medium reviews improve the reviewing process by including both textual and visual aspects that complement or supplement the material. As can be seen in
Figure 3, users express their thoughts, sentiments, and emotions by posting text and photographs on social media platforms regarding their delayed flights.
The product recommendation to the new customer, many analysts have evaluated customer reviews using traditional machine learning and opinion grouping approaches like deep multimodal attentive fusion, VGG-Net-16, Attention-Based Modality-Gated Networks (AMGN), and others [
11]. The Multus-Medium reviews include ample information that must be evaluated before processing. As a result, a sophisticated technique is employed to analyze client feedback from Multus-Medium image data. This research focuses on the Multus-Medium, which extracts the feature from images and text and helps customers recommend the product. Multus-medium reviews have the potential to place a big impact on e-commerce by improving user engagement, trust, product understanding, and customer service. Multus-medium evaluations give companies a competitive edge, higher conversion rates, and insightful market data. A more knowledgeable and interesting shopping experience benefits the customer, increasing their contentment and self-assurance in their purchase choices. It will significantly benefit individuals to make better purchasing decisions. Also, it will provide a revolutionary help to the organizations to make better decisions regarding their products and policies.
Textual ratings and reviews are the foundation of all the current work on recommender systems. When doing so, however, they ignore the Multus-Medium photos that also contain a rich amount of information consumers provide. Reviews that include both text and images are extremely important in influencing customers' purchasing choices but existing approaches neglect the visual reviews. Images give products and services a real depiction, enabling clients to perceive the true appearance, quality, and characteristics. Customers may better grasp what to anticipate thanks to this visual context, which also helps them make more educated choices. As social evidence, image and text reviews demonstrate that others have had favorable experiences with a good or service. Customers' confidence is increased and their perception of risk is decreased when they can see visual proof of several happy consumers, which increases the possibility that they will make a purchase. Current approaches neglect these important characteristics because of ignorance of visual reviews. The evaluations presented in the style of Multus-Medium include a substantial amount of data, which also has to be considered for the analysis. MMOM (Multus medium opinion mining) is a technique suggested to be used in the study to review product reviews. It offers an improved model for deep learning by using BiLSTM and an embedded CNN that is trained on GoogleNet in addition to VGGNet.
1.1. Research Contribution
Here is the list of main contributions of this research,
Investigation of Multus-Medium by utilizing the discriminative and unique features in the form of texts and images to get the recommendation of a better product.
BiLSTM embedded CNN and feature fusion with sentiment analytic model. The proposed MMOM produces a better recommendation.
Experimental results demonstrate that the accuracy, F1 score and ROC over fliker8k are 90.38%, 88.75% and 93.08%, whereas for twitter dataset are 88.54%, 86.34%, and 92.26% respectively, the accuracy of purposed model is 7.34% and 9.54% higher than the other two mentioned techniques.
In the coming sections of this research, literature review about the techniques and practices involved in analyzing the sentiments available in the shape of text or images on the internet. The proposed methodology is discussed in
Section 3. It includes a method to collect and analyze precise data. The proposed approach will help gather accurate information and provide insights into the research subject. Analysis and Results after the data analysis is presented in the fourth section of the study. After investigation,
Section 5 of the study discusses the results. In the ending part, conclusions, recommendations, and future research guidelines are presented in Section 6 of the study.
4. Experimental Results
Examining the results of experiments while assessing the success, efficiency, and effectiveness of the proposed method is the focus of this subsection. The suggested System's precision and efficacy have been examined through experiments. It has been decided to put the proposed System through its paces using five benchmark datasets selected from previously published publications. An extensive experimental evaluation confirmed that the propositioned strategy outperformed contemporary and up-to-date alternatives. Each experiment's specifics are described in the subsection.
4.1. Performance Evaluation Measure
Precision, recall, F1 score, and ultimate accuracy are utilized as standard benchmarks to assess the effectiveness of the Multus-medium sentiment analysis. The suggested measure's numerical formulas are mentioned in Equations (8), (9), (10), and (11), where (TP, TN, FN, and FP) stand for (True Positive), (False Negative), and (False Positive), respectively.
Whereas it has been common practice to use a Receiver Operating Characteristic (ROC) Curve to assess a classifier's efficiency. (11) Shows the formula to for performance indicator.
Conditional probabilities, where l is a class label, are calculated using the formula CP (i/l). The results of a classification are shown on a ROC curve [
44] from most positive to most negative
4.2. Outcome of the Experiment
The effectiveness of the suggested method is demonstrated by contrasting it with preexisting models. However, the discussion primarily focuses on presenting the results without providing a deep technical discussion or sufficient insights from the results. To address this, we will delve into the underlying reasons and implications of the findings in greater detail.
4.2.1. Baselines
To determine how well the proposed model functions, we utilized reference models and descriptive data presented in the table below as benchmarks. However, it is important to provide a more comprehensive analysis and insights regarding the performance of the proposed model compared to the baselines.
AHRM [
24]: They use a multi-modal (positive, negative) analysis of social images with the help of an attention-based heterogeneous relational model (AHRM).
DMAF [
23]: They proposed a new method of multimodal attentive fusion (DMAF) analysis for the sentiment. This technique can collect supplementary and no redundant information for more accurate sentiment categorization by automatically drawing attention to locations and words that are associated with affection.
AMGN [
25]: They proposed a new approach called "Attention-Based Modality-Gated Networks" (AMGN) to leverage the correlation between the modalities of images and texts, intending to extract discriminative features required for multimodal sentiment analysis.
In
Table 3, the MMOM model consistently achieved higher F1 scores and accuracies compared to the other models on both the Flicker8k and Twitter datasets. This indicates that the proposed MMOM model demonstrated superior performance in sentiment analysis.
For the Flicker8k dataset:
AHRM model achieved an F1 score of 87.1% and an accuracy of 87.5%.
DMAF model achieved an F1 score of 85% and an accuracy of 85.9%.
MMOM model achieved the highest F1 score of 88.75% and the highest accuracy of 90.38%.
For the Twitter (t4sa) dataset:
DMAF model achieved an F1 score of 76.9% and an accuracy of 76.3%.
AMGN model achieved an F1 score of 79.1% and an accuracy of 79%.
MMOM model achieved an F1 score of 86.34% and an accuracy of 88.54%.
Here are some possible reasons for the superior performance of the MMOM model:
Leveraging embedded GoogLeNet and VGGNet with BiLSTM: The MMOM model utilized embedded GoogLeNet and VGGNet with BiLSTM (Bidirectional Long Short-Term Memory) as part of its architecture. This combination of models and techniques might have helped capture more comprehensive and relevant features from the input data, leading to improved sentiment analysis performance.
Attention-based mechanism: The MMOM model might have employed an attention-based mechanism that focuses on relevant aspects or regions of the input data, allowing it to better extract discriminative features for sentiment analysis. This attention mechanism could have contributed to the model's superior performance compared to the other attention-based models like AHRM and DMAF.
Aspect term position prediction and semantic similarity: The MMOM model may have incorporated the prediction of aspect terms' position and considered semantic similarity between aspects in sentiment analysis. By taking into account these additional factors, the MMOM model could have gained a better understanding of the sentiment expressed towards specific aspects, resulting in more accurate sentiment analysis.
Integration of external knowledge: The MMOM model might have integrated external knowledge from pre-trained language representation models like GoogLeNet and VGGNet. This incorporation of external knowledge could have provided supplementary information that merged with semantic features, leading to enhanced performance in sentiment analysis tasks.
Overall, the combination of embedded GoogLeNet and VGGNet, the attention-based mechanism, aspect term position prediction, semantic similarity consideration, and the integration of external knowledge could have contributed to the superior performance of the proposed MMOM model compared to the other models mentioned in the table. Further detailed analysis and experiments can help provide a more comprehensive understanding of the specific advantages and strengths of the MMOM model.
The Multus-medium approach was utilized to perform opinion mining of product reviews, whereas results are depicted in
Table 3. The proposed model exhibited strong performance compared to other state-of-the-art methods regarding the accuracy and F1 score, while other models also demonstrated competitive performance. The F1 score and Accuracy of our proposed model is much better than the above-mentioned baseline models. The reason behind this better result is GoogleNet with VGGNet embedding through which a merged feature-embedded CNN and integrated supplementary information.
The main results indicate that the performance of GoogleNet embedding can be improved by incorporating external knowledge, which leads to positive outcomes. However, more than using syntactic knowledge is required to enhance aspect-based Multus-medium sentiment analysis. Instead, a combination of external knowledge from pre-trained language representation models like GoogleNet and VGGNet and additional knowledge signals can create auxiliary information that merges with semantic features and enhances performance.
We conducted another experiment to compare our proposed MMOM model with two baseline methods using the flicker8k and Twitter datasets. Our model, which utilizes embedded GoogLeNet and VGGNet with BiLSTM, outperformed other attention-based models in sentiment analysis. Furthermore, our proposed approach improved the F1 score and accuracy over non-transformer-based baselines. These results indicate that predicting aspect terms' position and semantic similarity between aspects significantly impact sentiment analysis.
Figure 11 and
Figure 12 visualize the comparison of accuracy and F1 score of the Twitter and flicker8k datasets, respectively.
4.2.2. Performance of Purposed Research
Figure 13 displays the True Positive Rate (TPR) and False Positive Rate (FPR) for the study. Each dataset's TPR and FPR values are shown on a ROC curve. The area covered by the roc curve is 0.92 for dataset 13a and 0.93 for dataset 13b.
To further assess the effectiveness of the proposed technique, a ROC curve was used to analyze True Positive Rate (TPR) and False Positive Rate (FPR) for each dataset (
Figure 13). The proposed classification approach demonstrated an overall correct prediction rate of 88.54% and 90.38% on the Twitter and flicker8k datasets, respectively. However, it is crucial to provide a deeper technical discussion and offer insights into these performance measures and their implications.
A performance-based matrix, a confusion matrix, has been created to demonstrate the effectiveness of the proposed technique using TP, TN, FP, and FN
Figure 14 and
Figure 15.
It display two different combinations of predicted and actual values. The correct prediction rate for the proposed system is 88.54% in
Figure 14 and 90.38% in
Figure 15. These results indicate that the proposed classification approach performs well and produces better outcomes. Additionally, precision, recall, F1 measure, and accuracy have been employed to determine the proposed model's effectiveness. The flicker8k dataset's macro results demonstrate 88.88% precision, 89.04% recall, 88.75% F value, and 90.38% accuracy, supporting the proposed work's superior efficiency. The proposed work's results on the Twitter dataset show 86.80% precision, 85.97% recall, 86.34% F value, and 88.54% accuracy, as shown in
Table 4.
A confusion matrix and performance-based matrix were employed to demonstrate the effectiveness of the proposed technique using TP, TN, FP, and FN.
Figure 14 and
Figure 15 display different combinations of predicted and actual values. The proposed system achieved 88.54% precision, 85.97% recall, 86.34% F1 value, and 88.54% accuracy on the Twitter dataset (
Table 4). Similarly, on the flicker8k dataset, the proposed system demonstrated 88.88% precision, 89.04% recall, 88.75% F1 value, and 90.38% accuracy. However, it is important to provide a deeper technical discussion and elaborate on the implications of these performance measures.
Table 4 summarizes the performance of the proposed model in terms of recall, precision, and F value for each sentiment class within the Flicker8k and Twitter datasets. These results provide insights into how well the model can accurately classify text into different sentiment categories, indicating its effectiveness in sentiment analysis tasks on these specific datasets.
For the Flicker8k dataset:
Positive class: The proposed model achieved a recall of 95.08%, indicating that it correctly identified a high percentage of positive instances. The precision for the positive class was 93.79%, indicating a high accuracy in predicting positive instances. The F value for the positive class was 94.43%, reflecting a good balance between precision and recall.
Neutral class: The model achieved a recall of 88.15% and a precision of 77.93% for the neutral class. The F value for the neutral class was 82.72%.
Negative class: The proposed model achieved a recall of 83.89% and a precision of 94.92% for the negative class. The F value for the negative class was 89.11%.
Average: The average recall across all classes was 89.04%, the average precision was 88.88%, and the average F value was 88.75%.
For the Twitter dataset:
Positive class: The proposed model achieved a recall of 89.79% and a precision of 94.02% for the positive class. The F value for the positive class was 91.86%.
Neutral class: The model achieved a recall of 91.82% and a precision of 87.5% for the neutral class. The F value for the neutral class was 89.61%.
Negative class: The proposed model achieved a recall of 76.3% and a precision of 78.9% for the negative class. The F value for the negative class was 77.57%.
Average: The average recall across all classes was 85.97%, the average precision was 86.80%, and the average F value was 86.34%.
These results demonstrate the performance of the proposed model for different sentiment classes within each dataset. The model achieved high recall and precision values for positive instances in both datasets. It also showed varying levels of performance for the neutral and negative classes. The average performance across all classes indicates the overall effectiveness of the proposed model in sentiment analysis for both the Flicker8k and Twitter datasets.
Based on the results, it can be inferred that the proposed system achieves greater accuracy and lower loss during both the training and validation phases. This indicates that the proposed approach is capable of accurate classification with minimal error. However, it is necessary to provide a more in-depth technical discussion to explain the reasons behind these findings and discuss their implications.
Figure 15 provides a detailed analysis of performance measures, including Precision, Recall, F1 score, and Accuracy, across various datasets. However, it is recommended to provide more insights and discuss the implications of these performance measures in greater detail.
4.3. Discussion
The experimental analysis aims to improve the comprehension of semantic vectors and refine BiLSTM for sentiment classification. The study utilizes GoogleNet and VGGNet for deep feature extraction and integrates external knowledge into the MMOM model to aid in auxiliary information, ultimately improving sentiment analysis performance. The research indicates that additional knowledge can yield better results. However, integrating knowledge into the model is complicated due to the diverse embedding space, which includes image features, text words, and distance vector entities. A consistent vector space is essential for the model to learn the embedding of GoogleNet and VGGNet knowledge with BiLSTM.
Moreover, sentiment analysis performance depends on contextual knowledge, and inappropriate knowledge incorporation may negatively affect performance. The proposed Multus Medium-based opinion mining injects BiLSTM semantic vectors that may introduce noise and alter the original context of the input vector. Therefore, an efficient solution is required to manage excessive noise that does not change the actual context of the input sentence.
5. Conclusions and Future Work
Product reviews and comments posted online have a significant impact on purchasing decisions and product sales. They also play a role in quality improvement and recommendation systems for new users. In this study, a deep learning approach is applied to conduct sentiment analysis on product reviews. The proposed method utilizes BiLSTM and embedded CNN, incorporating GoogleNet and VGGNet, to develop an enhanced deep learning model. The model includes several critical steps such as data preprocessing, feature set extraction, semantic attention vector, BiLSTM for text analysis, CNN with VGGNet and GoogleNet for image analysis, deep feature extraction for images, fusion, and reinforcement learning.
A Multus Medium Opinion Mining (MMOM) method is proposed, which utilizes both textual and image features to provide better product recommendations based on unique and discriminative features. The MMOM model combines BiLSTM embedded CNN and feature fusion for sentiment analysis, outperforming other models in terms of recommendation accuracy. Experimental results demonstrate the high accuracy, F1 score, and ROC values achieved by the MMOM model. On the Flicker8k dataset, the model achieves an accuracy of 90.38%, an F1 score of 88.75%, and an ROC of 93.08%. On the Twitter dataset, the model achieves an accuracy of 88.54%, an F1 score of 86.34%, and an ROC of 92.26%. These results indicate a significant improvement over the other mentioned techniques, with an accuracy advantage of 7.34% and 9.54%.
The study highlights the significant impact of online product reviews and comments on purchasing decisions and product sales. By conducting sentiment analysis on these reviews, businesses can gain valuable insights into customer sentiments and preferences, enabling them to make informed decisions to improve product quality, marketing strategies, and customer satisfaction. the proposed Multus-Medium approach, particularly the MMOM model, offers a comprehensive solution for sentiment analysis and recommendation systems by leveraging both textual and image features. The experimental results validate its superiority over existing techniques, providing businesses with valuable insights for informed decision-making and assisting customers in making better purchasing decisions. Future work could focus on adopting more effective decision-making techniques to further improve accuracy. Furthermore, the proposed scheme can be expanded to other sentiment-related tasks such as hospital recommendation systems, crop farming recommendation, and medical diagnostic systems. This research contributes to assisting customers in making informed purchasing decisions.