Version 1
: Received: 15 July 2024 / Approved: 15 July 2024 / Online: 16 July 2024 (03:13:03 CEST)
How to cite:
Sammartino, V.; Baccheschi, C.; Gneri, J.; Picchianti, V.; Frasson, A. Feature Engineering and Semantic Enrichment for Enhanced Text Classification: A Case Study on Figurative Language in Tweets. Preprints2024, 2024071179. https://doi.org/10.20944/preprints202407.1179.v1
Sammartino, V.; Baccheschi, C.; Gneri, J.; Picchianti, V.; Frasson, A. Feature Engineering and Semantic Enrichment for Enhanced Text Classification: A Case Study on Figurative Language in Tweets. Preprints 2024, 2024071179. https://doi.org/10.20944/preprints202407.1179.v1
Sammartino, V.; Baccheschi, C.; Gneri, J.; Picchianti, V.; Frasson, A. Feature Engineering and Semantic Enrichment for Enhanced Text Classification: A Case Study on Figurative Language in Tweets. Preprints2024, 2024071179. https://doi.org/10.20944/preprints202407.1179.v1
APA Style
Sammartino, V., Baccheschi, C., Gneri, J., Picchianti, V., & Frasson, A. (2024). Feature Engineering and Semantic Enrichment for Enhanced Text Classification: A Case Study on Figurative Language in Tweets. Preprints. https://doi.org/10.20944/preprints202407.1179.v1
Chicago/Turabian Style
Sammartino, V., Valeria Picchianti and Andrea Frasson. 2024 "Feature Engineering and Semantic Enrichment for Enhanced Text Classification: A Case Study on Figurative Language in Tweets" Preprints. https://doi.org/10.20944/preprints202407.1179.v1
Abstract
This study explores advanced feature engineering and semantic enrichment methods to enhance text classification, focusing on detecting figurative language in tweets. The novel features introduced, Syno\_Lower\_Mean and Syn\_Mean, measure the use of uncommon synonyms and the mean frequency of synonyms, capturing semantic richness crucial for detecting figurative expressions. Using resources like SenticNet and Framester, we enrich our feature set with sentiment and frame semantic information. Our approach includes extensive data preprocessing, sophisticated feature selection, and implementing various classification models, such as SVM, KNN, Logistic Regression, Decision Trees, Random Forest, BERT, and LSTM networks. We rigorously evaluate each model's performance to assess the effectiveness of our features and enrichment methods. Putting emphasis on model explainability, we use decision tree analysis, feature importance analysis, and the TREPAN algorithm to approximate SVM decisions. Although we focus on figurative language detection, our methods have broader implications for various NLP text classification tasks. Our findings demonstrate significant improvements in classification accuracy and interpretability through innovative feature design and dataset enrichment.
Keywords
Feature Engineering; Text Classification; Figurative Language; Semantic Enrichment,; Machine Learning; Natural Language Processing
Subject
Computer Science and Mathematics, Computer Science
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.