Wang, D.; Tang, K.; Zeng, J.; Pan, Y.; Dai, Y.; Li, H.; Han, B. MM-Transformer: a Transformer-based Knowledge Graph Link Prediction Model by Fusing Multimodal Features. Preprints2024, 2024070495. https://doi.org/10.20944/preprints202407.0495.v1
APA Style
Wang, D., Tang, K., Zeng, J., Pan, Y., Dai, Y., Li, H., & Han, B. (2024). MM-Transformer: a Transformer-based Knowledge Graph Link Prediction Model by Fusing Multimodal Features. Preprints. https://doi.org/10.20944/preprints202407.0495.v1
Chicago/Turabian Style
Wang, D., Huige Li and Bin Han. 2024 "MM-Transformer: a Transformer-based Knowledge Graph Link Prediction Model by Fusing Multimodal Features" Preprints. https://doi.org/10.20944/preprints202407.0495.v1
Abstract
Multimodal knowledge graph completion necessitates the integration of information from multiple modalities (such as images and text) into the structural representation of entities to improve link prediction. However, most existing studies have overlooked the interaction between different modalities. To address this issue, this paper proposed a Transformer-based knowledge graph link prediction model (MM-Transformer) that fuses multimodal features. Different modal encoders are employed to extract structural, visual and textual features, and hybrid key-value calculations are performed on features from different modalities based on the Transformer architecture. The similarities of textual tags to structural tags and visual tags are calculated and aggregated respectively, and multimodal entity representations are modeled and optimized to reduce the heterogeneity of the representations. Experimental results demonstrate that, compared to the current multimodal state-of-the-art methods, the proposed method achieves significant performance improvements in knowledge graph link prediction tasks. This proves that the proposed method effectively addresses the problem of multimodal feature fusion in knowledge graph link prediction tasks.
Keywords
knowledge graph; multimodal features; link prediction
Subject
Computer Science and Mathematics, Computer Science
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.