For both Google Translate and Marian Mt we compared the true values with the rated values and looked for patterns in the outliers. What were there any after we removed the bad translations. We will review the outliers by Google Translate Positive, Neutral and Negative. Then do the same for Marian MT. In this subsection, we review outliers for Google Translate output. There were no Google Translate positive outliers - implying no true positive sentiment statements falsely classified as neutral or negative after being translated into English.
The Google Translate sentence had no positive outliers.
Using Marian MT we had no positive outliers. Listing 3 and 4 show the neutral and negative outliers.
All the sentences in the original Italian have no connotation of positive or negative sentiment; hence, they are considered neutral. Indeed, they do not express an opinion but simply state facts. So, why are they perceived differently in translation? In the case of the second and third sentences, we can assume that the sentiment was perceived as negative due to the presence of some negative signifiers, such as "no" (in "I have no preference...") and the verb "leave" (in "Tomorrow we leave..."). "Leave" implies a separation, and that is why it was probably perceived as negative. As for the first sentence, which was neutral but perceived as positive, we can infer that the presence of verbs indicating ’company,’ such as "came" and "visit," triggered the positive interpretation. As mentioned above, these misinterpretations are also due to the absence of context and ’voice inflection.’
4.1. Why was polarized Italian made neutral?
The neutral score assigned to these sentences was a bit of a surprise. Let’s analyze the original sentiments. Here are possible explanations for each of them: "Sei un essere abominevole" (translated correctly as "You are an abominable being") clearly has a negative sentiment. Even in the absence of overtly negative signifiers (such as "no," "not," "never," etc.), the word "abominable" sets the tone for a negative interpretation. However, since Natural Language Processing (NLP) models are primarily trained to identify the presence of positive and negative words to determine the sentiment of a sentence, in really short sentences where the rest of the signifiers are neither positive nor negative, the NLP model might make a ’decision’ to assign a neutral sentiment. In this case, that might be why it was perceived as neutral.
As for the second sentence, "Disapprovo la tua scelta" (translated as "I disapprove of your choice"), "disapprove" is the only overtly negative word, making the sentiment clearly negative.
Regarding the last sentence, "Buono a nulla" (translated as "Good for nothing"), it is possible that the NLP model was confused by the equal presence of positive and negative signifiers: "buono" (good) is positive, while "nulla" (nothing) is negative. Consequently, it might have perceived the conflicting sentiments as canceling each other out and opted for a neutral score.
In summary, the different interpretations of the original sentiments in translation could be attributed to the NLP model’s reliance on identifying positive and negative words to determine sentiment and the specific words present in each sentence that contribute to the overall sentiment.
Sentiment Analysis General observations:
The sentiment analysis presented some interesting ‘challenges’, more so in dealing with neutral sentences.
Let us look at some examples.
1.)“L’ esperienza studio in Italia è stata unica. Mi ha cambiato letteralmente la vita e mi ha aperto gli occhi su una nuova realtà.” (Original positive) [The study experience in Italy was unique. It literally changed my life and opened my eyes on a new reality.
All three engines, Google, Opus, and NLTK assigned a score of 0, neutral, to this sentence. The original sentence bears a clear positive message: the study abroad experience was mindblowing, and it changed the student’s life forever (it is implicit that it changed it in a positive way.) Yet this positivity didn’t translate into the English version even though the sentence’s translation was correct for both Google and Opus.
I believe the problem here was the word ‘unique,’ which can have positive, negative, and neutral connotations according to the context. If unique is intended as being ‘peculiar,’ it has a negative meaning; if it is used to point out that something or someone is just ‘different,’ the word carries a neutral connotation. If it is used to indicate that something or someone has “no equal”, then it is positive. It is possible that the neutral scores are justified by the fact that the engines perceived the study abroad experience as being simply ‘different’. Furthermore, reflecting on the second part of the original sentence, “mi ha cambiato letteralmente la vita” (it changes my life completely), one could argue that a change in life is not always a positive event. It depends on the context and the person’s perception of the events. From a linguistic point of view, the expression “L’ esperienza studio in Italia è stata unica” in Italian is undoubtedly positive. In Italian, something that is unique to you is ‘positive’. You would never use this expression to talk about something that was indifferent to you or negative. If an event is perceived as neutral, one would say, for example, “è stata un’esperienza normale,” or “ è stata un’esperienza come un’altra” (“it was a normal/ uneventful experience,” or “it was an experience like any other.” If it is negative, then one would say: “ E’ stata una brutta esperienza” or “Un’esperienza negativa.” (“It was a bad experience” or “It was a negative experience.”)
2.) “Sei raggiante!” ( original positive) [ You are glowing! (Opus); You are radiant! (Google) ] Google sentiment score: 0.5255 (positive) Opus Sentiment score: 0 (neutral) NLTK Sentiment score:1 (neutral)
This is another interesting case. The original sentence is clearly positive. To tell someone they are glowing in Italian is to compliment them. Yet Opus and also NLTK assigned it a neutral score. Opus’s score is even more interesting because the translation it provided is more accurate than the one provided by Google. The most sensible explanation for this mistake in sentiment analysis would be that the word “glowing” in English is used in a variety of expressions that also carry negative feelings.
I believe that this different use of the word ‘glowing’ contributed to the final calculation of the sentiment score and justified the neutral rating assigned by Opus. The NLTK score averages out the sentiment scores provided by Google and Opus and leans towards the neutral sentiment. However, it also provides a 0.629 score for positive sentiment, thus recognizing the intent of the original.
3.)È stata una vacanza da sogno! (Original positive) [ It was a dream vacation! (Opus); It was a dream holiday! (Google)]
Google Sentiment score: 0.6114 (positive) Opus Sentiment score: 0.3164 NLTK: positive score 0.433; neutral score: 0.567
This one is worth discussing for the disparity among the different scores. Although all of the “datasets’ assigned a positive score, there was a significant difference between Google and the score provided by Opus and NLTK. Google’s assignment of the score appeared to be much more confident; Opus and NLTK provided a positive score with a lower level of confidence (in fact, NLTK also assigned a higher neutral score to this sentence) How do we explain this? The translations are both good (even though the one provided by Google seems more proper in British English.) A plausible explanation could be the fact that “a dream vacation” is something different for everyone, thus subjectivity plays a role in determining the positivity or neutrality of the sentiment.
4.) Mi hai delusa! ( Original negative) [You let me down! (Opus); You disappointed me! (Google)] Google sentiment score: -0.4767 Opus sentiment score: 0 (neutral) NLTK Neutral score :1 NLTK Negative score:0 TextblobSentimentpolarity: -0.1555556
The translations provided are both good, with a slight preference for Opus, which is more exact from an idiomatic point of view. Google’s score is perfect as it identifies the original sentiment. This is interesting because, as mentioned above, Google doesn’t provide a better translation but still is on point with the sentiment score. However, despite providing a better translation, Opus read the sentence as neutral, and NLTK also assigned the sentence a neutral score. The sentiment polarity was a low negative.
There is no doubt that the original sentence has a negative connotation, “mi hai delusa” is a sentence that expresses sadness, anger, and disillusionment. I would imagine that “You let me down!” works the same way. That is why the neutral score was a surprise and needs further investigation. In fact, as of now, there is no plausible explanation for the mistake.
5.) Sei un inetto! (Original negative) [You’re inept! (Opus); You’re an inept! (Google)] Opus & Google score: 0 (neutral) NLTK Neutral score: 1 NLTK Negative score: 0
Again, as in the case above, the original leaves no room for misunderstanding. In Italian culture, calling someone ‘inetto’ is certainly an offense; hence the expression carries a negative sentiment. It is possible, however, that the sentence was read as a ‘personal opinion’, which is obviously not universal and open to personal interpretation. This would justify the neutral score assigned.
6.) Buono a nulla! (Original negative) [Good for nothing! (Opus & Google)] Google score: 0.4926 Opus score: 0.4926 NLTK Positive score: 0.615 NLTK Negative score: 0 NLTK Neutral score: 0.385
In Italian, “Buono a nulla!” is another way to say “inept” and just like the sentence above expresses a negative sentiment. The error in sentiment analysis can be justified by the presence of the word “good,” which is usually positive. The three datasets picked up the sentiment score carried by the word ‘good’ and consequently read the sentence as positive.
7.) “In questo momento mi sento piuttosto calma non provo emozioni forti. (Original Neutral) [Right now, I feel rather calm; I don’t feel strong emotions. (Opus); “At this moment, I feel quite calm I do not feel strong emotions. (Google)] Google score: -0.0281 Opus score: -0.1032 NLTK Positive score: 0.192 NLTK Negative score:0.225 NLTK Neutral score: 0.583
“Alla fine dei conti puoi fare quello che desideri, a me non interessa molto. (Neutral) [At the end of the accounts, you can do what you want. I don’t care much. (Opus & Google)] Google score: -0.3244 Opus score: 0.3705 NLTK Positive score: 0.076 NLTK Negative score:0.166 NLTK Neutral score: 0.758
“Non ho preferenze su cosa fare stasera.” [I have no preference on what to do tonight.(Opus & Google) Google score: -0.296 Opus score: -0.296 NLTK Positive score: 0 NLTK Negative score:0.239 NLTK Neutral score: 0.787
“Non ho mai favorito nessuno studente, per me sono tutti uguali.”(neutral) [I have never favored any student, for me, they are all the same. (Opus & Google)] Google score: -0.3252 Opus score: -0.3252 NLTK Positive score: 0 NLTK Negative score:0.189 NLTK Neutral score: 0.811
These sentences were presented as neutral in the original because they do not express positive or negative feelings or attitudes. Indeed, the ‘subjects’ of the sentences’ are neither upset nor happy; neither in favor nor against a particular situation, they are simply “emotions/opinions free’ thus, the sentences are neutral. However, they were rated as negative by both Google and Opus, while the NLTK scores were more on point.
Here are some possible explanations:
The presence of the negative words “Non ho/I do not”; “Senza/Without” might have led the ‘analysis’ in the wrong direction.
Also, the absence of context might have had a role in leading to the wrong score.
Indeed, some of these sentences could sound negative if pronounced with an upset tone. This is true, especially for these two sentences:
“Alla fine dei conti puoi fare quello che desideri, a me non interessa molto. “A me non interessa” ( I don’t care much) can be negative if pronounced with an altered/upset tone. It can communicate a lack of ‘interest’ and ‘feelings. However, if the same sentence (at least in Italian) is pronounced with a flat tone, then it just communicates ‘neutrality.’ “Non ho preferenze su cosa fare stasera.” In this case, if the sentence is pronounced within the context of an argument, hence with an altered tone of voice, then it can have a negative feeling. But if a person says it just to express that ‘s/he would go with the flow”, ’ it is entirely neutral.
Last but not least, two more sentences are worthy of attention.
8. “La musica americana attrae sempre molti giovani italiani.” (Original neutral) [“American music always attracts many young Italians.” (Opus & Google)] Google score: 0.4019 Opus score: 0.4019 NLTK Positive score: 0.31 NLTK Negative score:0 NLTK Neutral score: 0.69
“Domani partiamo per andare in Italia.” (Original neutral) [“Tomorrow we leave to go Italy.” ( Opus & Google)] Google score:-0.0516 Opus score: -0.0516 NLTK Positive score:0 NLTK Negative score:0.167 NLTK Neutral score: 0.833
These two sentences in Italian are plain statements. They express simple facts: American music is popular, and ‘tomorrow’ we are going to Italy. Yet the first was rated with a positive score (only NLTK proposed a neutral score). The presence of the word ‘attracts,’ which intrinsically has a positive meaning, possibly led the datasets to identify this sentence as positive.
As for the second one, it is plausible that the word ‘leave’ which indicates “separation,” might have led to the negative score.