Preprint Essay Version 1 This version is not peer-reviewed

MM-Transformer: a Transformer-based Knowledge Graph Link Prediction Model by Fusing Multimodal Features

Version 1 : Received: 4 July 2024 / Approved: 5 July 2024 / Online: 5 July 2024 (07:28:25 CEST)

How to cite: Wang, D.; Tang, K.; Zeng, J.; Pan, Y.; Dai, Y.; Li, H.; Han, B. MM-Transformer: a Transformer-based Knowledge Graph Link Prediction Model by Fusing Multimodal Features. Preprints 2024, 2024070495. https://doi.org/10.20944/preprints202407.0495.v1 Wang, D.; Tang, K.; Zeng, J.; Pan, Y.; Dai, Y.; Li, H.; Han, B. MM-Transformer: a Transformer-based Knowledge Graph Link Prediction Model by Fusing Multimodal Features. Preprints 2024, 2024070495. https://doi.org/10.20944/preprints202407.0495.v1

Abstract

Multimodal knowledge graph completion necessitates the integration of information from multiple modalities (such as images and text) into the structural representation of entities to improve link prediction. However, most existing studies have overlooked the interaction between different modalities. To address this issue, this paper proposed a Transformer-based knowledge graph link prediction model (MM-Transformer) that fuses multimodal features. Different modal encoders are employed to extract structural, visual and textual features, and hybrid key-value calculations are performed on features from different modalities based on the Transformer architecture. The similarities of textual tags to structural tags and visual tags are calculated and aggregated respectively, and multimodal entity representations are modeled and optimized to reduce the heterogeneity of the representations. Experimental results demonstrate that, compared to the current multimodal state-of-the-art methods, the proposed method achieves significant performance improvements in knowledge graph link prediction tasks. This proves that the proposed method effectively addresses the problem of multimodal feature fusion in knowledge graph link prediction tasks.

Keywords

knowledge graph; multimodal features; link prediction

Subject

Computer Science and Mathematics, Computer Science

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.