Preprint
Article

MC-Net: A Multi-Path Contextual Reasoning Framework for Multimodal Conversations

Altmetrics

Downloads

6

Views

4

Comments

0

This version is not peer-reviewed

Submitted:

20 November 2024

Posted:

21 November 2024

You are already at the latest version

Alerts
Abstract
Multimodal Conversation is a sophisticated vision-language task where an AI agent must engage in meaningful dialogues grounded in visual content. This requires a deep understanding of not only the presented question but also the dialog history and the associated image context. However, existing methods primarily focus on single-hop or single-path reasoning, which often fall short in capturing the nuanced multimodal relationships essential for generating accurate and contextually relevant responses. In this paper, we propose a novel and powerful model, the Multi-path Contextual Reasoning Model (MC-Net), which employs multi-path reasoning and multi-hop mechanisms to process complex multimodal information comprehensively. MC-Net integrates dialog history and image context in parallel, iteratively enriching the semantic representation of the input question through both paths. Specifically, MC-Net adopts a multi-path framework to simultaneously derive question-aware image features and question-enhanced dialog history features, effectively leveraging iterative reasoning processes within each path. Furthermore, we design an enhanced multimodal attention mechanism to optimize the decoder, enabling it to generate highly precise responses. Experimental results on the VisDial v0.9 and v1.0 datasets demonstrate that MC-Net significantly outperforms existing methods, showcasing its efficacy in advancing multimodal conversational AI.
Keywords: 
Subject: Computer Science and Mathematics  -   Artificial Intelligence and Machine Learning
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated