Preprint Article Version 1 This version is not peer-reviewed

Exploring Controllable Ability through Text-to-Image Diffusion Model for Painting-Style Design

Version 1 : Received: 13 August 2024 / Approved: 14 August 2024 / Online: 15 August 2024 (02:57:14 CEST)

How to cite: Zhao, Y.; Liang, Z.; Qiu, Y.; Wang, X.; Yang, W. Exploring Controllable Ability through Text-to-Image Diffusion Model for Painting-Style Design. Preprints 2024, 2024081035. https://doi.org/10.20944/preprints202408.1035.v1 Zhao, Y.; Liang, Z.; Qiu, Y.; Wang, X.; Yang, W. Exploring Controllable Ability through Text-to-Image Diffusion Model for Painting-Style Design. Preprints 2024, 2024081035. https://doi.org/10.20944/preprints202408.1035.v1

Abstract

Painting style and creativity are fundamental to art, defining a work’s uniqueness and expression. They embody an artist’s personality, convey emotions, establish identity, and shape audience perception, adding depth and value to the work. Recently, diffusion models have gained popularity in art design, animation, and gaming, especially in original painting creation, poster design, and visual identity development. Traditional creative processes, while demanding unique imagination and creativity, often face challenges such as slow innovation, high reliance on manual efforts, high costs, and limitations in scalability. Consequently, exploring painting-style creative design through deep learning has emerged as a promising trend. In this paper, we introduce a novel network architecture, the Painting-Style Design Assistant Network (PDANet), for style transformation. To support this, we curated the Painting-42 dataset, comprising 4,055 works by 42 renowned Chinese painters. PDANet leverages this dataset to capture the aesthetic intricacies of Chinese painting, offering rich design references. Furthermore, we propose a lightweight Identity-Net for large-scale text-to-image (T2I) models, which aligns internal knowledge with external control signals, thereby enhancing the capabilities of existing T2I models. The trainable Identity-Net feeds image prompts into the U-Net encoder to generate diverse and stable images. Both quantitative and qualitative analyses demonstrate that our approach outperforms current methods, delivering high-quality generated content with wide-ranging applicability.

Keywords

diffusion model; painting-style design; text-to-image (T2I) models; computer-aided design

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.