This study delves into Personal Narratives (PN), which encompass both oral and written recounting of individual experiences, encompassing facts, events, people, and thoughts. Traditional emotion recognition and sentiment analysis primarily focus on broader categories like utterances or documents. Our research, however, centers on identifying Emotion Carriers (EC), which are specific segments within speech or text that elucidate the narrator's emotional state (e.g.,"losing a parent", "decision-making moments"). Extracting these ECs enriches the representation of a user's emotional state, thereby enhancing natural language understanding and the sophistication of dialogue models. While previous studies have utilized lexical attributes to identify ECs, we argue that incorporating spoken narratives offers a more nuanced view of the context and emotional state. This paper explores the integration of speech and textual embeddings at the word level, alongside both early and late fusion techniques, for improved EC detection in spoken narratives. We employ Residual Neural Networks (ResNet), initially pre-trained on diverse speech emotion datasets and subsequently fine-tuned for EC detection. Our experimental findings demonstrate that late fusion, in particular, significantly enhances EC detection capabilities.