Preprint Article Version 1 This version is not peer-reviewed

Fusion of Visual and Textual Data for Enhanced Semantic Representations

Version 1 : Received: 24 September 2024 / Approved: 25 September 2024 / Online: 26 September 2024 (16:55:15 CEST)

How to cite: Sterling, L.; Vale, K.; Martinez, A. Fusion of Visual and Textual Data for Enhanced Semantic Representations. Preprints 2024, 2024092066. https://doi.org/10.20944/preprints202409.2066.v1 Sterling, L.; Vale, K.; Martinez, A. Fusion of Visual and Textual Data for Enhanced Semantic Representations. Preprints 2024, 2024092066. https://doi.org/10.20944/preprints202409.2066.v1

Abstract

Generic text embeddings have demonstrated considerable success across a multitude of applications. However, these embeddings are typically derived by modeling the co-occurrence patterns within text-only corpora, which can limit their ability to generalize effectively across diverse contexts. In this study, we investigate methodologies that incorporate visual information into textual representations to overcome these limitations. Through extensive ablation studies, we introduce a novel and straightforward architecture named VisualText Fusion Network (VTFN). This architecture not only surpasses existing multimodal approaches on a range of well-established benchmark datasets but also achieves state-of-the-art performance on image-related textual datasets while utilizing significantly less training data. Our findings underscore the potential of integrating visual modalities to substantially enhance the robustness and applicability of text embeddings, paving the way for more nuanced and contextually rich semantic representations.

Keywords

Multimodal Integration; Semantic Embeddings; Representation Learning; Transfer Learning

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.