Version 1
: Received: 15 June 2024 / Approved: 17 June 2024 / Online: 17 June 2024 (07:57:08 CEST)
How to cite:
Nirmal Joshua, K.; Sreejith, M. Enhancing Natural Language to Code Generation in the SantaCoder Model through In-Context Learning. Preprints2024, 2024061105. https://doi.org/10.20944/preprints202406.1105.v1
Nirmal Joshua, K.; Sreejith, M. Enhancing Natural Language to Code Generation in the SantaCoder Model through In-Context Learning. Preprints 2024, 2024061105. https://doi.org/10.20944/preprints202406.1105.v1
Nirmal Joshua, K.; Sreejith, M. Enhancing Natural Language to Code Generation in the SantaCoder Model through In-Context Learning. Preprints2024, 2024061105. https://doi.org/10.20944/preprints202406.1105.v1
APA Style
Nirmal Joshua, K., & Sreejith, M. (2024). Enhancing Natural Language to Code Generation in the SantaCoder Model through In-Context Learning. Preprints. https://doi.org/10.20944/preprints202406.1105.v1
Chicago/Turabian Style
Nirmal Joshua, K. and Mihit Sreejith. 2024 "Enhancing Natural Language to Code Generation in the SantaCoder Model through In-Context Learning" Preprints. https://doi.org/10.20944/preprints202406.1105.v1
Abstract
Generating executable code from natural language instructions using Large Language Models (LLMs) presents challenges such as semantic understanding and handling ambiguous input. This study focuses on the SantaCoder model and explores the impact of in-context learning on code generation using the MBPP and HumanEval datasets for evaluation. Our results demonstrate significant improvements in three key metrics (defined in the paper): correctness@k, similarity@k and pass@k. To address the problem of selecting optimal demonstrations to maximize correctness and pass rates, we investigate two methods: latent concept selection and random selection in this paper. These findings highlight the effectiveness of in-context learning and the critical role of demonstration selection in enhancing the accuracy, efficiency, and versatility of the SantaCoder model in code generation.
Keywords
In Context Learning; Large Language Models; NL2Code; Machine Learning; Latent Concept Learning
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.