Training artificial intelligence (AI) for reasoning presents significant challenges due to the lack of clear objectives and reward functions. In structured environments such as chess and Go, objectives are well-defined, and the optimal moves can be determined through search algorithms that maximize the reward function. This clarity enables AI to learn and improve autonomously through random trials and recursive calculation of rewards, mirroring human learning processes that depend on prior experiences and state-dependent actions. However, in real-world applications, AI lacks access to direct interaction and feedback, making it difficult to define proper reward functions. This limitation is particularly evident in the training of large language models (LLMs), where the quality of the output is constrained by the training data and the absence of a dynamic reward system. Simulated environments offer some utility but are inherently limited by their design and scope. Achieving artificial general intelligence (AGI) requires AI to function and receive feedback in real-world settings, similar to human cognitive and strategic development through interactive experiences. This paper explores the critical role of clear objectives and reward functions in AI training, the limitations posed by the current lack of real-world interaction, and the implications for future advancements in AI reasoning capabilities.