The field of 3D model generation has become essential across various industries, including gaming, virtual and augmented reality (VR/AR), architecture, and medical imaging. Traditionally reliant on manual efforts, 3D content creation is now being transformed by deep generative models, enabling more efficient, scalable, and dynamic generation of complex shapes and environments. This survey provides a comprehensive review of key backbone architectures used for 3D generation, including autoencoders, variational autoencoders (VAEs), generative adversarial networks (GANs), autoregressive models, diffusion models, normalizing flows, attention-based models, CLIP-guided models, and procedural generation techniques. We explore each model’s role in 3D generation, highlighting their strengths—such as the precision of VAEs, the realism of GANs, the stability of diffusion models, and the scalability of procedural methods—alongside their limitations, such as training instability, high computational costs, and the difficulty in handling multi-modal data. Additionally, we discuss the increasing relevance of attention-enhanced models and the integration of text-based CLIP supervision for improved semantic alignment in 3D outputs. The survey concludes with an analysis of open challenges, including balancing efficiency with expressiveness, managing training complexity, and addressing dataset limitations. It also identifies future research directions, such as few-shot learning, hybrid architectures, and neural-symbolic approaches, which promise to advance the field by improving the generalization and versatility of 3D generation models. This paper aims to guide researchers and practitioners in navigating the evolving landscape of 3D generative methods and inspire new innovations in the creation of realistic, high-quality 3D content.