As the performance of large language models is proven in general domains, we can consider applying them to specific tasks to improve the system’s specialization and responsiveness. In this paper, we will develop a lightweight software system based on RAG (Retrieval-Augmented Generation) to test whether this approach can provide effective advice and protection for patients after discharge. All patient data in this study comes from the neurology department. We will select three diseases — Acute Cerebral Infarction, Atherosclerosis, and Hypertension to test our system, and the generated results will be evaluated by professional doctors. Additionally, we will introduce a memory mechanism to enhance the system’s performance and responsiveness. Finally, we will include suggestions for system improvements and extensions. The experimental results indicate that both the foundational capabilities of large language models (such as Llama 3 and GPT-4) and the system architecture (e.g., the integration of RAG and memory mechanisms) significantly influence overall performance. By comparing different models and architectures, it is observed that the GPT-4 system combined with RAG and memory mechanisms outperforms the baseline in terms of total score and improvements, demonstrating the effectiveness of system architecture optimization in enhancing the models’ ability to handle complex tasks. Since the data still contains sensitive information of patients and hospitals after being desensitized, we cannot show all the test data. However, after obtaining the hospital’s permission, through data desensitization, we showed some of the test data. You can click here to get the prototype we developed and some test data.