3. Methodology
This section describes the step-by-step methodology that was followed for this research work. First of all, a relevant data set had to be selected. So, the dataset proposed in [
111] that contains about 120,000 Tweet IDs of Tweets about exoskeletons, posted from 21 May 2017 to 21 May 2022, was used. The dataset in [
111] was developed by using the Search Twitter
operator within RapidMiner [
112] and utilizing the Twitter API's Advanced Search functionality. RapidMiner is a data science platform that allows the design, development, and implementation of different algorithms in Big Data, Data Mining, Data Science, Artificial Intelligence, Machine Learning, and related disciplines. The Search Twitter
operator in RapidMiner operates by establishing a connection with the Twitter API while adhering to the rate limits for accessing Twitter data according to Twitter's Standard Search regulations. The Advanced Search characteristic of the Twitter API can be accessed by a user when they are logged into twitter.com. It allows users to search for Tweets based on time stamps, keywords, and a set of data filters such as Tweets containing an exact phrase(s), Tweets containing any of the specified keywords, Tweets that exclude specific keywords, Tweets featuring a distinct hashtag, and Tweets in a particular language. After the collection of the Tweets using both these methodologies driven by a keyword-based approach, the duplicate Tweets were removed in [
111]. The dataset complies with the FAIR (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management [
114]. The standard procedure for working with such Twitter datasets is to perform hydration of the Tweet IDs. However, this dataset was developed by the first author of this paper, so all the Tweets were already available for analysis. In addition to that, to include more recent Tweets for the analysis of this work, the same methodology for data collection as discussed in [
111] was utilized to collect Tweets about exoskeletons from May 22, 2022, to May 13, 2023. May 13, 2023, was the most recent date at the time of data collection. Thereafter, the newly collected data was merged with the existing tweets to develop a merged dataset for analysis. This dataset comprised 153045 Tweets about exoskeletons, which were posted on Twitter between May 21, 2017, and May 13, 2023. These Tweets were posted by a total of 84,716 distinct users. In addition to the text of the Tweets, the data also contained characteristics associated with these Tweets present as different attributes.
Table 1 summarizes the same.
Thereafter, the tweeting behavior per hour was analyzed. To perform this, the data present in the “created_at” attribute was analyzed. The data in this attribute contained both date and time information. By using the concept of binning, 24 bins representing the 24 hours in a day were created and each Tweet was assigned a bin by extracting only the time information from this attribute. The pseudocode of the program that was written in Python 3.11.5 is shown in Algorithm 1. Using a similar approach as shown in Algorithm 1, the number of Tweets posted per month per year between May 2017 and May 2023 were also extracted to analyze the Tweeting patterns about exoskeletons in the last six years.
|
Thereafter, different characteristics associated with these tweets were computed. These characteristics included the mean value of the total number of characters used per hour, the median value of the total number of characters used per hour, the number of hashtags used per hour, and the number of user mentions included in the tweets per hour. After calculating these characteristics, these features were assigned to the bins. As a result of this assignment, for each hour for all the tweets posted, the mean value of the total number of characters used, the median value of the total number of characters used, the number of hashtags used, and the number of user mentions present were compiled. Algorithm 2 shows the pseudocode of the program that was written in Python 3.11.5 to compute the mean and median value of the Tweets per hour. The pseudocode of the program to calculate hashtags and user mentions per hour is shown in Algorithm 3.
|
|
The flowchart shown in
Figure 1 summarizes the working of the above-mentioned algorithms. After obtaining this master dataset for analysis, as shown in
Figure 1, the correlations between these characteristics and the number of Tweets per hour were evaluated to deduce if those characteristics were statistically significant (p<0.05). After analyzing these correlations (using Pearson’s correlation coefficient) it was observed that all these characteristics i.e. mean value of characters used in the Tweets per hour, median value of characters used in the Tweets per hour, number of hashtags used per hour, and number of user mentions used per hour, had statistically significant relationships with the number of Tweets per hour (results are discussed in detail in
Section 4). Thereafter, a multiple linear regression model was developed where mean value characters used in the Tweets per hour, median value of characters used in the Tweets per hour, the number of hashtags used per hour, and the number of user mentions used per hour were considered as the independent variables and the number of Tweets per hour was considered as the dependent variable. Algorithm 4 shows the pseudocode of the program that was written in Python 3.11.5 to determine these correlations as well as to develop the multiple linear regression model.
Thereafter, the focus of the investigation shifted towards hashtag-specific sentiment analysis. This was considered relevant for investigation primarily because prior works that focused on the investigation of Tweets for sentiment analysis (
Section 2) did not determine a list of popular hashtags and their associated sentiments. However, determining a list of popular hashtags and their associated sentiments has been popular in the area of Natural Language Processing as can be seen from multiple recent works in this field that focused on hashtag-specific sentiment analysis of Tweets about COVID-19 [
115], politics [
116], and movies [
117], just to name a few. To perform the same, at first the list of top 10 hashtags (in terms of the number of tweets posted) was determined. Then, the number of Tweets per hashtag (out of these top 10 hashtags) per month between May 2017 to May 2023 was computed to understand the trends of the same. Algorithm 5 shows the pseudocode of the program that was written in Python 3.11.5 to determine the top 10 hashtags.
|
|
After obtaining the results from Algorithm 5, the VADER sentiment analysis approach was applied to the tweets. The subject of Sentiment Analysis can be explored using various methods, including manual categorization, Linguistic Inquiry and Word Count (LIWC), Affective Norms for English Words (ANEW), the General Inquirer (GI), SentiWordNet, and machine learning-based approaches like Naive Bayes, Maximum Entropy, and Support Vector Machine (SVM). However, the specific approach chosen for this study was VADER, which stands for Valence Aware Dictionary for Sentiment Reasoning [
118]. The decision to opt for VADER as the sentiment analysis method is based on multiple factors. First and foremost, VADER has demonstrated exceptional performance, surpassing human labeling in terms of accuracy and effectiveness. Furthermore, VADER has been proven to overcome the limitations faced by other similar sentiment analysis techniques. The following provides a comprehensive overview of the distinct characteristics and features of VADER.
VADER sets itself apart from LIWC by displaying heightened sensitivity to sentiment expressions that commonly appear in the analysis of social media posts.
The General Inquirer lacks the inclusion of sentiment-relevant lexical elements frequently encountered in social communication. However, VADER effectively addresses this issue.
The ANEW lexicon exhibits reduced responsiveness to lexical elements typically associated with sentiment in social media content. This is not a limitation of VADER.
The SentiWordNet lexicon contains a significant amount of noise since a notable proportion of its synsets lack either positive or negative polarity. However, this does not represent a constraint or drawback of VADER.
The Naïve Bayes classifier relies on the assumption of feature independence, which is a simplistic assumption. Nonetheless, VADER's more nuanced approach overcomes this weakness.
The Maximum Entropy technique incorporates information entropy by assigning feature weightings without assuming conditional independence between features.
Both machine learning classifiers and verified sentiment lexicons face the challenge of requiring a substantial amount of training data.
Additionally, machine learning models depend on the training set to accurately represent a wide range of characteristics. The VADER approach distinguishes itself through its concise rule-based framework, enabling the creation of a specialized sentiment analysis engine tailored specifically for language found on social media platforms. The system demonstrates remarkable adaptability, capable of adjusting to different domains without the need for specific training data. Instead, it relies on a flexible, valence-based sentiment dictionary that has been validated by humans to serve as a reliable standard. The VADER system is renowned for its high efficiency since it can immediately analyze streaming data. The VADER approach was applied to every Tweet to classify it as positive, negative, or neutral. Thereafter, the hashtag-specific sentiment analysis was performed for each of the top 10 hashtags, and the trends in the tweets (positive, negative, and neutral) were analyzed. Algorithm 6 shows the pseudocode of the program that was written in Python 3.11.5 to compute the number of tweets per sentiment per hashtag (top 10 hashtags).
|
As can be seen in Algorithm 5, it calls the algorithm for Data Preprocessing. The data processing algorithm represented a program that was written to perform the necessary preprocessing of the Tweets prior to assigning a sentiment label (positive, negative, and neutral) to each Tweet.
As Algorithm 7 has been called in multiple Algorithms that are presented in this paper, a step-by-step working of this Algorithm is shown in
Figure 2.
Thereafter, text processing and text analysis-based approaches were used to detect potentially sarcastic Tweets as well as Tweets that could contain news in the context of conversations about exoskeletons. For developing the methodology for sarcasm detection, prior works based on text-processing were reviewed. It was observed that several works (for example: [
119], [
120], [
121]) detected potentially sarcastic Tweets by either searching for “sarcasm” or “sarcastic” present in the form of words or hashtags in Tweets as sarcasm appears to be a commonly recognized concept by many Twitter users, who explicitly mark their sarcastic messages by using hashtags [
122]. In addition to these approaches, another study in this field [
123] was reviewed to understand lexical-based approaches for detecting sarcasm in Tweets. So, in this work, a combination of keyword-based, hashtag-based, and lexical analysis-based methodologies was utilized for detecting potentially sarcastic Tweets. These included hashtags or keywords such as “sarcasm”, “sarcastic”, “irony”, and “cynicism”; interjections such as “gee” and “gosh”; lexical expressions such as “not sure if you know this”; formulaic expressions such as “thanks a lot”, and “good job”; foreign terms such as “au contraire”; rhetorical statements such as “tell us what you really think” and specific combination of keywords such as “perfect just perfect”. The approach in this context also accounted for different character case (upper case or lower case) combinations of these criteria to track potentially sarcastic Tweets. The pseudocode of the program that was written in Python 3.11.5 to detect potentially sarcastic Tweets from the dataset is shown in Algorithm 8.
|
As can be seen from Algorithm 8, the output of this program produced a set of Tweets in a .CSV which were potentially sarcastic Tweets. Thereafter, a similar approach was used to detect Tweets that contained news. A review of prior works [
124,
125] related to the detection of Tweets showed that researchers in this field have tracked the presence of “news” in hashtag form or in keyword form in Tweets to detect news communicated in different Tweets. So, the methodology in this work involved searching for “news” in hashtag form or in keyword form in the Tweets present in this dataset. The methodology also accounted for different character case (upper case or lower case) combinations in the keyword as well as in the hashtag. Algorithm 9 represents the pseudocode of the program that was written in Python 3.11.5 to detect Tweets that contained news.
|
As can be seen from Algorithm 9, the output of this program produced a set of Tweets in a .CSV which contained news about exoskeletons. Thereafter, Algorithms 2 and 3 were run on the master dataset (shown in
Figure 1), .CSV file containing sentiment labels for each Tweet (one of the outputs of Algorithm 6), .CSV file representing potentially sarcastic Tweets (output of Algorithm 8) and the .CSV file representing Tweets that contained news (output of Algorithm 9). The objective of running these algorithms on these tweets was to compare the Positive tweets, Negative tweets, Neutral tweets, Possibly Sarcastic tweets, and Tweets that contained News in terms of mean length of the tweets per month, median length of the tweets per month, the average number of hashtags used per month, and the average number of user mentions used per month, to interpret the underlying trends of the same.
|
After performing this analysis, a fine-grain analysis of sentiments associated with these tweets about exoskeletons was performed. This analysis was performed using the DistilRoBERTa-base library [
127] of Python. This library can categorize a given text into one of seven distinct classes of sentiments - anger, disgust, fear, joy, neutral, sadness, and surprise. As shown in Algorithm 10, a program was written in Python 3.11.5 that provided a score for each Tweet for each of these sentiment classes. Thereafter, the sentiment class that received the highest score was used to obtain the label for that tweet in terms of anger, disgust, fear, joy, neutral, sadness, or surprise. The results of running all these Algorithms on the dataset are presented in
Section 4.
4. Results and Discussions
It is worth noting that the histograms presented in
Figure 3 and
Figure 9 do not represent all the Tweets that were posted in the month of May during those respective years. This is because the dataset contains Tweets starting from May 21, 2017, and for May 2023, the dataset contains Tweets up to May 13, 2023. As can be seen from Figures 3 to 9, the tweeting patterns about exoskeletons between May 2017 and May 2023 were diverse, and the general public posted a considerable number of Tweets about exoskeletons in almost all the months between May 2017 and May 2023. However, certain months stand out in some of these Figures to represent a significantly higher volume of Tweets posted during that time. For instance, in 2019, the most number of Tweets about exoskeletons were posted in October, and the same pattern was again observed in 2022. In 2022, the least number of Tweets were posted in September. However, in none of the prior years (2017 to 2021), September was the month when the least number of Tweets were posted. Similarly, other insights about the tweeting behavior about exoskeletons can be observed in Figures 3 to 9. As the number of Tweets posted about a topic represents the degree of public interest towards that topic [
126], these results serve as a framework for indicating the specific and varied levels of public interest towards exoskeletons from May 2017 to May 2023.
In
Figure 10, the analysis of the output from Algorithm 1 is presented. Specifically, a histogram-based approach was used to determine the specific timeslots of 1-hour duration during different times of the day (24-hour format) when the most and least number of Tweets about exoskeletons were posted. In addition to this, the varying trends of posting Tweets in other timeslots are also presented in
Figure 10.
As can be seen from
Figure 10, the timeslots of 17 (representing the time window 16:01 to 17:00 in a 24-hour format) and 16 (representing the time window 15:01 to 16:00 in a 24-hour format) represent time windows when the highest number of Tweets about exoskeletons have been posted. Furthermore, this figure also shows that the timeslot of 6 (representing the time window from 5:00 to 6:00 in a 24-hour format) represents the time window when the least number of Tweets about exoskeletons have been posted. It is worth noting that these timeslots were prepared based on the dataset that contains the timestamps in Eastern Standard Time (EST). The outputs from Algorithm 2 are presented in
Figure 11 and
Figure 12, respectively. These two figures show the mean character count of the Tweets posted per hour (in a 24-hour format) and the median character count of the Tweets posted per hour (in a 24-hour format), respectively.
The results shown in
Figure 11 and
Figure 12 also help to reveal patterns of public discourse about exoskeletons during different time instants of the day. For instance, from
Figure 11 it can be concluded that the time slot of 1 (representing the time window 00:01 to 01:00 in a 24-hour format) is the time range when the general public has posted the shortest Tweets about exoskeletons.
The results shown in Figures 13 and 14 represent the output obtained from Algorithm 3. Specifically, these figures show the varying patterns of the usage of hashtags and user mentions in tweets about exoskeletons posted per hour (in a 24-hour format). These figures also help to reveal patterns of public discourse about exoskeletons during different time instants of the day. For instance, from
Figure 13, it can be concluded that the time slot of 17 (representing the time window 16:01 to 17:00 in a 24-hour format) is the time range when the general public has used the highest number of hashtags in their Tweets about exoskeletons. Similarly, from Figure 14, it can be concluded that the time slot of 16 (representing the time window 15:01 to 16:00 in a 24-hour format) is the time range when the generic public has mentioned the highest number of users in their Tweets about exoskeletons.
The findings from Algorithm 4 are discussed next. This algorithm computed the correlation (using Pearson’s correlation) between the following:
- a)
number of Tweets per hour and number of characters (mean value) in the Tweets per hour
- b)
number of Tweets per hour and number of characters (median value) in the Tweets per hour
- c)
number of Tweets per hour and number of hashtags in the Tweets per hour
- d)
number of Tweets per hour and number of user mentions in the Tweets per hour
The coefficient of correlation between these parameters (Pearson’s r value) is shown in
Figure 15, and
Table 2 presents the p-values of these correlations.
Figure 14.
A tabular representation of the correlation between the number of Tweets posted per hour and specific characteristics of these Tweets.
Figure 14.
A tabular representation of the correlation between the number of Tweets posted per hour and specific characteristics of these Tweets.
As can be seen from Figure 14 and
Table 2, all these correlations were statistically significant. So, the multiple linear regression model (as shown in Algorithm 4) was developed by using the number of Tweets per month as the response variable and the other characteristics of these Tweets as predictor variables. The prediction equation is shown in Equation (1) and the characteristic features of this multiple linear regression model are represented in Table 3.
where,
TM = total number of Tweets per month
Cmean = mean value of the number of characters used in the Tweets per month
Cmed = median value of the number of characters used in the Tweets per month
Hc = number of hashtags used in the Tweets per month
UMc = number of user mentions used in the Tweets per month
Table 3.
Characteristic Features of the Multiple Linear Regression Model.
Table 3.
Characteristic Features of the Multiple Linear Regression Model.
Description |
Value |
Multiple Linear Regression Intercept |
2784.170988721279 |
Multiple Linear Regression Coefficients |
[11.78367763 -31.13336391 0.30537686 0.96967955] |
R2 score |
0.9540953548345376 |
Mean Squared Error (before cross-validation) |
54577.94142377716 |
Root Mean Squared Error (before cross Validation) |
233.61922314693447 |
Value of k for k-folds cross-validation |
10 |
Mean Squared Error (after cross-validation) |
65260.27219328486 |
Root Mean Squared Error (after cross-validation) |
255.46090149626588 |
The top 10 hashtags that were used in Tweets about exoskeletons from May 2017 to May 2023 were computed by Algorithm 5. This algorithm also computed the number of Tweets posted using each of these hashtags per month in this time range. The results of the same are shown in
Figure 15. As can be seen from
Figure 15, the top 10 hashtags were #exoskeleton, #robotics, #iot, #technology, #tech #innovation, #ai, #sci, #construction, and #news. Out of all these hashtags, #exoskeleton was by far the most used hashtag per month in this time range. Thereafter, Algorithm 6 was used to perform sentiment analysis of the Tweets. The output of Algorithm 6 showed that the number of positive, negative, and neutral Tweets were 71,596, 30,773, and 50,676, respectively. This distribution of positive, negative, and neutral Tweets is shown in the form of a pie chart in
Figure 15. As can be seen from
Figure 15, most of the tweets were positive. Furthermore, Algorithm 6 also computed the number of positive, negative, and neutral Tweets for each of the top 10 hashtags for every month in this time range. These results are presented in Figures 16 to 25, respectively.
The varying patterns of public sentiment towards exoskeletons can be inferred from these results. For instance,
Figure 16 shows that most of the general public has expressed a positive sentiment in their tweets about exoskeletons. The patterns of sentiment associated with the top 10 hashtags also reveal novel insights associated with the paradigms of conversations regarding exoskeletons on Twitter. For instance, from
Figure 22, it can be inferred that for almost all the months in 2022, the usage of #ai in tweets about exoskeletons was mainly associated with a positive sentiment. A similar pattern can be seen regarding the usage of #exoskeleton in the Tweets from
Figure 16. As can be seen from this Figure, during 2022, the majority of the Tweets that were posted using #exoskeleton had a positive sentiment. In a similar manner, sentiment associated with the top 10 hashtags and the trends of the same on a monthly as well as on a yearly basis can be deduced from Figures 16 to 25.
Figure 14.
A graphical representation of the number of Tweets per month per hashtag for the top 10 hashtags.
Figure 14.
A graphical representation of the number of Tweets per month per hashtag for the top 10 hashtags.
Figure 15.
A pie chart-based representation of the percentage of positive, negative, and neutral Tweets about exoskeletons.
Figure 15.
A pie chart-based representation of the percentage of positive, negative, and neutral Tweets about exoskeletons.
Figure 16.
A graphical representation of the number of Tweets per sentiment per month for #exoskeleton.
Figure 16.
A graphical representation of the number of Tweets per sentiment per month for #exoskeleton.
Figure 17.
A graphical representation of the number of Tweets per sentiment per month for #robotics.
Figure 17.
A graphical representation of the number of Tweets per sentiment per month for #robotics.
Figure 18.
A graphical representation of the number of Tweets per sentiment per month for #iot.
Figure 18.
A graphical representation of the number of Tweets per sentiment per month for #iot.
Figure 19.
A graphical representation of the number of Tweets per sentiment per month for #technology.
Figure 19.
A graphical representation of the number of Tweets per sentiment per month for #technology.
Figure 20.
A graphical representation of the number of Tweets per sentiment per month for #tech.
Figure 20.
A graphical representation of the number of Tweets per sentiment per month for #tech.
Figure 21.
A graphical representation of the number of Tweets per sentiment per month for #innovation.
Figure 21.
A graphical representation of the number of Tweets per sentiment per month for #innovation.
Figure 22.
A graphical representation of the number of Tweets per sentiment per month for #ai.
Figure 22.
A graphical representation of the number of Tweets per sentiment per month for #ai.
Figure 23.
A graphical representation of the number of Tweets per sentiment per month for #sci.
Figure 23.
A graphical representation of the number of Tweets per sentiment per month for #sci.
Figure 24.
A graphical representation of the number of Tweets per sentiment per month for #construction.
Figure 24.
A graphical representation of the number of Tweets per sentiment per month for #construction.
Figure 25.
A graphical representation of the number of Tweets per sentiment per month for #news.
Figure 25.
A graphical representation of the number of Tweets per sentiment per month for #news.
Next, Algorithms 2 and 3 were run on the master dataset (shown in
Figure 1), .CSV file containing sentiment labels for each Tweet (one of the outputs of Algorithm 6), .CSV file representing potentially sarcastic Tweets (output of Algorithm 8) and the .CSV file representing Tweets that contained news (output of Algorithm 9). The objective of running these algorithms on these tweets was to compare the positive tweets, negative tweets, neutral tweets, possibly Sarcastic, and tweets that contained news in terms of the mean length of the tweets per month, the median length of the tweets per month, the average number of hashtags used per month, and the average number of user mentions used per month, to interpret the underlying trends of the same. The results of this analysis are shown in Figures 26 to 29, respectively. These results also reveal several novel insights related to the tweeting patterns of the general public in the context of Tweets about exoskeletons. For instance, from
Figure 26 and
Figure 27, it can be concluded that the average number of characters used in neutral tweets has been considerably lower as compared to positive tweets, negative tweets, possibly sarcastic tweets, as well as tweets that contained news.
Figure 28 shows that the average number of hashtags used in tweets that contained news has considerably increased since the beginning of January 2022.
Figure 29 shows that as far as possibly sarcastic Tweets are concerned, the number of user mentions has been significantly less (even zero on multiple occasions) as compared to positive tweets, negative tweets, neutral tweets, and tweets that contained news.
As discussed in
Section 3, a fine-grain analysis of the sentiments was also performed to detect different sentiment classes such as anger, disgust, fear, joy, neutral, sadness, and surprise (pseudocode presented in Algorithm 10). As
Figure 15 reports that 33.1% of the Tweets were neutral tweets, so the neutral tweets were removed prior to the data analysis to understand the distribution of sentiment classes such as anger, disgust, fear, joy, sadness, and surprise in the remainder of the tweets. The results of this analysis are shown in
Figure 30. As can be seen from this Figure, the sentiment of surprise was the most common emotion. It was followed by joy, disgust, sadness, fear, and anger.
Next, a comparison of the work of this paper with prior works in this field in terms of the focus areas is presented in Table 3. As can be seen from Table 3, the work presented in this paper is the first paper in this area of research that focuses on multimodal forms of content analysis, text analysis, sentiment analysis, fine-grain sentiment analysis, hashtag-specific sentiment analysis in the context of tweets about exoskeletons. Furthermore, this work also presents a multiple linear regression model to predict tweets posted about exoskeletons on a monthly basis in terms of specific characteristics of the tweets.
Table 3.
Comparison of the focus areas of this research paper with the focus areas of prior works in this field.
Table 3.
Comparison of the focus areas of this research paper with the focus areas of prior works in this field.
Work |
CA of Tweets about Robots or Robotic Solutions |
CA of Tweets about Wearables (including Wearable Robotics) |
SA of Tweets about Robots or Robotic Solutions |
SA of Tweets about Robots (including Wearable Robotics) |
Fine Grain SA of Tweets about Wearable Robotics |
MLR Model to Predict Tweets about Wearable Robotics |
Cramer et al. [18] |
√ |
|
|
|
|
|
Salzmann-Erikson et al. [19] |
√ |
|
|
|
|
|
Fraser et al. [20] |
√ |
|
|
|
|
|
Mubin et al. [21] |
√ |
|
|
|
|
|
Barakeh et al. [22] |
√ |
|
|
|
|
|
Mahmud et al. [23] |
|
√ |
|
|
|
|
Yamanoue et al. [24] |
|
√ |
|
|
|
|
Tussyadiah et al. [25] |
|
√ |
|
|
|
|
Saxena et al. [26] |
|
√ |
|
|
|
|
Adidharma et al. [27] |
|
√ |
|
|
|
|
Pillarisetti et al. [28] |
|
√ |
|
|
|
|
Keane et al. [29] |
|
√ |
|
|
|
|
Sinha et al. [30] |
|
|
√ |
|
|
|
El-Gayar et al. [31] |
|
|
|
√ |
|
|
Jeong et al. [32] |
|
|
|
√ |
|
|
Niininen et al. [33] |
|
|
|
√ |
|
|
Thakur et al. [this work] |
√ |
√ |
√ |
√ |
√ |
√ |
5. Conclusions
The popularity of social media platforms has been on an exponential rise in the last decade and a half as social media platforms provide a seamless means for users to connect, communicate, and collaborate with each other. Out of different social media platforms, analysis of conversations on Twitter has been of significant interest to researchers from different disciplines. This can be inferred from the fact that in the last few years, there have been several works that focused on the analysis of tweets about emerging technologies, matters of global interest, and topics of global concern such as ChatGPT, the Russia–Ukraine war, cryptocurrency markets, virtual assistants, abortions, loneliness, housing needs, fake news, religion, early detection of health-related problems, elections, education, pregnancy, food insufficiency, and virus outbreaks such as MPox, flu, H1N1, and COVID-19, just to name a few. Even though a wide range of topics and several emerging technologies have been investigated in recent works, there hasn’t been any prior work in this field thus far that has focused on the analysis of tweets about exoskeletons. The rapid advancement of exoskeleton technology is being propelled by its extensive range of applications. Some of these uses involve assisting elderly individuals and those with disabilities in their daily tasks, increasing productivity and alleviating fatigue in military personnel, enhancing the quality of life for amputees and individuals with paralysis in different body parts, aiding firefighters in climbing and lifting heavy equipment, bolstering labor efficiency, and facilitating the transportation of bulky machinery in different industrial settings. As a result of these expanding use cases of exoskeletons, the general public has shared their views, opinions, and perspectives about exoskeletons on Twitter in the last few years on social media platforms, such as Twitter. The work presented in this paper aims to address this research gap as well as it aims to contribute towards advancing research in the area of exoskeleton technology by presenting several novel findings from a comprehensive analysis of about 150,000 Tweets about exoskeletons posted between May 2017 and May 2023. First, findings from a comprehensive content analysis and temporal analysis of these tweets reveal the specific months when a significantly higher volume of Tweets was posted and the time windows when the highest number of Tweets, the lowest number of tweets, tweets with the highest number of hashtags, and tweets with the highest number of user mentions have been posted. Second, the paper shows that there are statistically significant correlations between the number of Tweets posted per hour and different characteristics of tweeting behavior, such as number of characters (mean value) in the Tweets per hour, number of characters (median value) in the Tweets per hour, number of hashtags used in the Tweets per hour, and number of user mentions used in the Tweets per hour. Third, the paper presents a multiple linear regression model to predict the number of Tweets posted per hour in terms of these characteristics of tweeting behavior. The R2 score of this model was observed to be 0.9540. Fourth, the paper reports that the 10 most popular hashtags were #exoskeleton, #robotics, #iot, #technology, #tech #innovation, #ai, #sci, #construction and #news. Fifth, an exploratory sentiment analysis of these tweets was performed using VADER and the DistilRoBERTa-base library in Python. The findings show that 46.8% of the Tweets were positive, 33.1% of the Tweets were neutral, and 20.1% of the tweets were neutral. The findings also show that in the tweets that did not express a neutral sentiment, the sentiment of surprise was the most common emotion. It was followed by joy, disgust, sadness, fear, and anger. Furthermore, analysis of hashtag-specific sentiments revealed several novel insights associated with the tweeting behavior of the general public in this regard. For instance, for almost all the months in 2022, the usage of #ai in tweets about exoskeletons was mainly associated with a positive sentiment. Sixth, text processing-based approaches were used to detect possibly sarcastic tweets and tweets that contained news. Thereafter, a comparison of positive tweets, negative tweets, neutral tweets, possibly sarcastic tweets, and tweets that contained news, in terms of different characteristic properties of these tweets are presented. The findings of this analysis reveal multiple insights related to the tweeting behavior of the general public about exoskeletons. For instance, the average number of characters used in neutral tweets has been considerably lower in neutral tweets as compared to positive tweets, negative tweets, possibly sarcastic tweets as well as tweets that contained news, and the average number of hashtags used in tweets that contained news has considerably increased since the beginning of January 2022. As per the best knowledge of the authors, no similar work has been done in this field thus far. Future work in this area would involve performing topic modeling of these tweets to interpret the specific topics represented in the tweets about exoskeletons.
Author Contributions
Conceptualization, N.T.; methodology, N.T., K.A.P., A.P., R.S., N.A., and C. H.; software, N.T., K.A.P., A.P., R.S.; validation, N.T., K.A.P., A.P., R.S., N.A., and C. H.; formal analysis, N.T., K.A.P, A.P. R.S., N.A. and C. H.; investigation, N.T., K.A.P., A.P., R.S., N.A. and C. H.; resources, N.T., K.A.P, A.P. R.S.; data curation, N.T.; writing—original draft preparation, N.T., K.A.P, A.P. R.S., N.A. and C. H.; writing—review and editing, N.T.; visualization, N.T., K.A.P, A.P. R.S.; supervision, N.T.; project administration, N.T.; funding acquisition, Not Applicable. All authors have read and agreed to the published version of the manuscript.
Figure 1.
A flowchart that represents the working of different algorithms to obtain the master dataset for analysis.
Figure 1.
A flowchart that represents the working of different algorithms to obtain the master dataset for analysis.
Figure 2.
A workflow diagram representing the working of Algorithm 7.
Figure 2.
A workflow diagram representing the working of Algorithm 7.
Figure 3.
A histogram-based representation of the number of Tweets about exoskeletons per month in 2017.
Figure 3.
A histogram-based representation of the number of Tweets about exoskeletons per month in 2017.
Figure 4.
A histogram-based representation of the number of Tweets about exoskeletons per month in 2018.
Figure 4.
A histogram-based representation of the number of Tweets about exoskeletons per month in 2018.
Figure 5.
A histogram-based representation of the number of Tweets about exoskeletons per month in 2019.
Figure 5.
A histogram-based representation of the number of Tweets about exoskeletons per month in 2019.
Figure 6.
A histogram-based representation of the number of Tweets about exoskeletons per month in 2020.
Figure 6.
A histogram-based representation of the number of Tweets about exoskeletons per month in 2020.
Figure 7.
A histogram-based representation of the number of Tweets about exoskeletons per month in 2021.
Figure 7.
A histogram-based representation of the number of Tweets about exoskeletons per month in 2021.
Figure 8.
A histogram-based representation of the number of Tweets about exoskeletons per month in 2022.
Figure 8.
A histogram-based representation of the number of Tweets about exoskeletons per month in 2022.
Figure 9.
A histogram-based representation of the number of Tweets about exoskeletons per month in 2023.
Figure 9.
A histogram-based representation of the number of Tweets about exoskeletons per month in 2023.
Figure 10.
A histogram-based representation of the number of Tweets in different time slots (of 1-hour duration) of a day (24-hour format).
Figure 10.
A histogram-based representation of the number of Tweets in different time slots (of 1-hour duration) of a day (24-hour format).
Figure 11.
A histogram-based representation of the number of characters (mean value) used in Tweets in different time slots (of 1-hour duration) of a day (24-hour format).
Figure 11.
A histogram-based representation of the number of characters (mean value) used in Tweets in different time slots (of 1-hour duration) of a day (24-hour format).
Figure 12.
A histogram-based representation of the number of characters (median value) used in Tweets in different time slots (of 1-hour duration) of a day (24-hour format).
Figure 12.
A histogram-based representation of the number of characters (median value) used in Tweets in different time slots (of 1-hour duration) of a day (24-hour format).
Figure 13.
A histogram-based representation of the number of hashtags used in Tweets in different time slots (of 1-hour duration) of a day (24-hour format).
Figure 13.
A histogram-based representation of the number of hashtags used in Tweets in different time slots (of 1-hour duration) of a day (24-hour format).
Figure 14.
A histogram-based representation of the number of user mentions used in Tweets in different time slots (of 1-hour duration) of a day (24-hour format).
Figure 14.
A histogram-based representation of the number of user mentions used in Tweets in different time slots (of 1-hour duration) of a day (24-hour format).
Figure 26.
A graphical representation to compare the positive tweets, negative tweets, possibly sarcastic tweets, and tweets that contained news, in terms of the mean value of the characters used, on a monthly basis.
Figure 26.
A graphical representation to compare the positive tweets, negative tweets, possibly sarcastic tweets, and tweets that contained news, in terms of the mean value of the characters used, on a monthly basis.
Figure 27.
A graphical representation to compare the positive tweets, negative tweets, possibly sarcastic tweets, and tweets that contained news, in terms of the median value of the characters used, on a monthly basis.
Figure 27.
A graphical representation to compare the positive tweets, negative tweets, possibly sarcastic tweets, and tweets that contained news, in terms of the median value of the characters used, on a monthly basis.
Figure 28.
A graphical representation to compare the positive tweets, negative tweets, possibly sarcastic tweets, and tweets that contained news, in terms of the average number of hashtags present in the tweets, on a monthly basis.
Figure 28.
A graphical representation to compare the positive tweets, negative tweets, possibly sarcastic tweets, and tweets that contained news, in terms of the average number of hashtags present in the tweets, on a monthly basis.
Figure 29.
A graphical representation to compare the positive tweets, negative tweets, possibly sarcastic tweets, and tweets that contained news, in terms of the average number of user mentions present in the tweets, on a monthly basis.
Figure 29.
A graphical representation to compare the positive tweets, negative tweets, possibly sarcastic tweets, and tweets that contained news, in terms of the average number of user mentions present in the tweets, on a monthly basis.
Figure 30.
A representation of the number of Tweets for each of the fine-grain sentiment classes - anger, disgust, fear, joy, sadness, and surprise.
Figure 30.
A representation of the number of Tweets for each of the fine-grain sentiment classes - anger, disgust, fear, joy, sadness, and surprise.
Table 1.
Description of the attributes in the dataset.
Table 1.
Description of the attributes in the dataset.
Attribute Name |
Description |
Row no. |
Row number of the data |
Id |
ID of the tweet |
Created-At |
Date and time when the tweet was posted |
From-User |
Twitter username of the user who posted the tweet |
From-User-Id |
Twitter User ID of the user who posted the tweet |
To-User |
Twitter username of the user whose tweet was replied to (if the tweet was a reply) in the current tweet |
To-User-Id |
Twitter user ID of the user whose tweet was replied to (if the tweet was a reply) in the current tweet |
Language |
Language of the tweet |
Source |
Source of the tweet to determine if the tweet was posted from an Android source, Twitter website, etc. |
Text |
Complete text of the tweet, including embedded URLs |
Geo-Location-Latitude |
Geo-Location (Latitude) of the user posting the tweet |
Geo-Location-Longitude |
Geo-Location (Longitude) of the user posting the tweet |
Retweet Count |
Retweet count of the tweet |
Table 2.
Representation of the p-values of the correlations that were investigated.
Table 2.
Representation of the p-values of the correlations that were investigated.
Description |
p-value |
Number of Tweets per hour and the number of characters (mean) used in Tweets per hour |
0.0138 |
Number of Tweets per hour and the number of characters (median) used in Tweets per hour |
0.0098 |
Number of Tweets per hour and the number of hashtags used in Tweets per hour |
0.0006 |
Number of Tweets per hour and the number of user mentions used in Tweets per hour |
2.44e-13 |