Submitted:
05 January 2026
Posted:
13 January 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Scope and Nature of the Contribution
3. Background and Related Work
4. AI Approaches for Story Generation
5. Literature Review
6. Dimensions of AI-Assisted Storytelling
6.1. Text-Based Storytelling
- S represents semantic consistency across consecutive sentences,
- P measures plot progression and event ordering,
- C captures character continuity and behavior patterns,
- are tunable weights determining the influence of each factor.
6.2. Multimodal Story
- denotes text alignment based on semantic features,
- represents visual alignment quality (e.g., CLIP similarity),
- are modality weights.
6.3. Interactive Storytelling
- is the number of branches available at decision point i,
- is the probability of selecting a particular branch,
- n represents the total decision points in the narrative.
6.4. Digital Media Storytelling
- scene arrangement and timeline sequencing,
- visual transitions and motion effects,
- adaptive soundscapes and background audio,
- captioning and script alignment,
- generation of stylized artwork or animations based on narrative cues.
6.5. Nature of the Proposed Formulations
7. Proposed Framework: The AI Assisted AASM
7.1. Comparison with Existing Approaches Table-1
| Feature | Existing Approaches | Proposed AASM Framework |
|---|---|---|
| Narrative Coherence Modeling | Primarily text-level coherence with limited long-range consistency | Explicit narrative engine with long-range dependency and contextual memory tracking |
| Multimodal Alignment | Partial integration of textual and visual modalities | Integrated text, visual, and audio alignment through a unified synthesis module |
| Interactive Logic | Limited or implicit branching mechanisms | Explicit interactive logic layer enabling structured decision modeling |
| Evaluation Metrics | Isolated automatic metrics, such as BLEU, ROUGE | Unified automatic, semantic and human-centered evaluation metrics |
| Ethical and Bias Considerations | Often discussed qualitatively or omitted | Explicit bias estimation, fairness awareness, and interpretability modeling |
| Scalability and Modularity | Task-specific and tightly coupled architectures | Modular, extensible, and application-agnostic framework design |
7.2. Narrative Engine
- represents the generated narrative at time step t,
- denotes the narrative context at time t,
- K is the external knowledge base supporting factual and semantic grounding,
- S defines stylistic and genre-related constraints.
7.3. Visual Synthesis Module
- T is the textual prompt derived from the narrative engine,
- z is a latent visual vector sampled from a learned distribution,
- V represents the generated visual output.
7.4. Interactive Logic Layer
- denotes the current decision state,
- represents user input at time t,
- is the updated narrative decision state.
7.5. Rendering and Media Output Module
- N represents the narrative text,
- V denotes visual content,
- A corresponds to the audio stream,
- R
7.6. Distinction from Existing Frameworks
8. Case Study and Exploratory Validation
8.1. Illustrative Comparison
| Criterion | Baseline Story | AASM-Guided Story |
|---|---|---|
| Narrative Coherence | Low | High |
| Character Consistency | Inconsistent | Consistent |
| Structural Clarity | Fragmented | Well-structured |
| Reader Engagement | Medium | High |
9. Evaluation Metrics
9.1. Automatic Metrics
9.1.1. BLEU Score
9.1.2. ROUGE-L
9.1.3. METEOR
9.2. Semantic and Structural Metrics
9.2.1. Narrative Coherence Score
9.2.2. Event Continuity
9.2.3. Stylistic Consistency
9.3. Human Evaluation Metrics
9.3.1. Engagement Score
9.3.2. Emotional Depth
9.3.3. Cultural Relevance
10. Applications
- N denotes the narrative output generated by the system,
- V represents visual or multimodal content,
- I corresponds to interaction mechanisms,
- U defines the user profile or contextual preferences.
10.1. Digital Media and Entertainment
10.2. Education and Learning Platforms
10.3. Virtual Reality and Simulation
11. Challenges & Limitatrions
11.1. Methodological Limitations
11.2. Consistency Loss
11.3. Creativity–Factuality Balance
11.4. Bias and Fairness
11.5. Model Interpretability
11.6. Computational and Resource Constraints
11.7. Ethical & Legal
12. Future Directions
- Real-time human-AI collaborative storytelling environment AI.
- Cultural sensitivity and context-sensitive narrative models.
- Generation and modeling of emotion-aware narratives.
- Cross-lingual and multilingual storytelling systems.
- Transparent and explainable AI tools for storywriting decisions.
13. Conclusion
References
- Tang, C.; Lin, C.; Huang, H.; Guerin, F.; Zhang, Z. EtriCA: Event-Triggered Context-Aware Story Generation Augmented by Cross Attention. arXiv 2022, arXiv:2210.12463. [Google Scholar]
- Sohn, S. S.; Li, D.; Zhang, S.; Chang, C.-J.; Kapadia, M. From Words to Worlds: Transforming One-Line Prompt into Immersive Multi-Modal Digital Stories with Communicative LLM Agent. arXiv 2024, arXiv:2406.10478. [Google Scholar]
- Li, B.; et al. Story Generation by Planning with Event Graph. arXiv 2021, arXiv:2102.02977. [Google Scholar] [CrossRef]
- Mathemyths Team. Leveraging Large Language Models to Teach Mathematical Language through Child-AI Co-Creative Storytelling. arXiv 2024, arXiv:2304.19727. [Google Scholar]
- Roy, A. Revolutionizing Digital Narratives: The Role of Semantic Web and Artificial Intelligence in Storytelling. Preprints.org 202503.1948. [Google Scholar] [CrossRef]
- Gado, M.; Taliee, T.; Memon, M.; Ignatov, D.; Timofte, R. VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs. arXiv 2025. [Google Scholar] [CrossRef]
- Ghorbani, S. Aether Weaver: Multimodal Affective Narrative Co-Generation with Dynamic Scene Graphs. arXiv 2025. [Google Scholar] [CrossRef]
- Lu, Z.; Zhou, Q.; Wang, Y. WhatELSE: Shaping Narrative Spaces at Configurable Level of Abstraction for AI-Bridged Interactive Storytelling. arXiv 2025, arXiv:2502.18641. [Google Scholar]
- Author(s). Enhancing Pre-Service Teachers’ Reflective Thinking Skills through Generative AI-Assisted Digital Storytelling Creation: A Three-Dimensional Framework Analysis; 2024. [Google Scholar]
About the Authors
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.