Preprint Article Version 1 This version is not peer-reviewed

Feature Engineering and Semantic Enrichment for Enhanced Text Classification: A Case Study on Figurative Language in Tweets

Version 1 : Received: 15 July 2024 / Approved: 15 July 2024 / Online: 16 July 2024 (03:13:03 CEST)

How to cite: Sammartino, V.; Baccheschi, C.; Gneri, J.; Picchianti, V.; Frasson, A. Feature Engineering and Semantic Enrichment for Enhanced Text Classification: A Case Study on Figurative Language in Tweets. Preprints 2024, 2024071179. https://doi.org/10.20944/preprints202407.1179.v1 Sammartino, V.; Baccheschi, C.; Gneri, J.; Picchianti, V.; Frasson, A. Feature Engineering and Semantic Enrichment for Enhanced Text Classification: A Case Study on Figurative Language in Tweets. Preprints 2024, 2024071179. https://doi.org/10.20944/preprints202407.1179.v1

Abstract

This study explores advanced feature engineering and semantic enrichment methods to enhance text classification, focusing on detecting figurative language in tweets. The novel features introduced, Syno\_Lower\_Mean and Syn\_Mean, measure the use of uncommon synonyms and the mean frequency of synonyms, capturing semantic richness crucial for detecting figurative expressions. Using resources like SenticNet and Framester, we enrich our feature set with sentiment and frame semantic information. Our approach includes extensive data preprocessing, sophisticated feature selection, and implementing various classification models, such as SVM, KNN, Logistic Regression, Decision Trees, Random Forest, BERT, and LSTM networks. We rigorously evaluate each model's performance to assess the effectiveness of our features and enrichment methods. Putting emphasis on model explainability, we use decision tree analysis, feature importance analysis, and the TREPAN algorithm to approximate SVM decisions. Although we focus on figurative language detection, our methods have broader implications for various NLP text classification tasks. Our findings demonstrate significant improvements in classification accuracy and interpretability through innovative feature design and dataset enrichment.

Keywords

Feature Engineering; Text Classification; Figurative Language; Semantic Enrichment,; Machine Learning; Natural Language Processing

Subject

Computer Science and Mathematics, Computer Science

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.