Preprint Article Version 2 This version is not peer-reviewed

Integrated Optimization of Large Language Models: Synergizing Data Utilization and Compression Techniques

Version 1 : Received: 6 September 2024 / Approved: 9 September 2024 / Online: 9 September 2024 (14:01:46 CEST)
Version 2 : Received: 10 September 2024 / Approved: 10 September 2024 / Online: 10 September 2024 (14:16:55 CEST)

How to cite: Li, X.; Ma, Y.; Huang, Y.; Wang, X.; Lin, Y.; Zhang, C. Integrated Optimization of Large Language Models: Synergizing Data Utilization and Compression Techniques. Preprints 2024, 2024090662. https://doi.org/10.20944/preprints202409.0662.v2 Li, X.; Ma, Y.; Huang, Y.; Wang, X.; Lin, Y.; Zhang, C. Integrated Optimization of Large Language Models: Synergizing Data Utilization and Compression Techniques. Preprints 2024, 2024090662. https://doi.org/10.20944/preprints202409.0662.v2

Abstract

In this paper, we propose "Synergized Efficiency Optimization for Large Language Models" (SEO-LLM), a groundbreaking approach that integrates advanced data utilization and model compression techniques to significantly enhance the performance, efficiency, and scalability of large language models (LLMs). Our method synergistically combines Adaptive Data Augmentation (ADA), Transfer- Active Learning (TAL), Adaptive Iterative Pruning (AIP), and Synergistic Quantization and Distillation (SQD). These components work together to reduce the training data requirement by 30%, compress model size by 67.6%, and improve inference speed by up to 50%, while preserving or even enhancing model accuracy across various NLP tasks. ADA dynamically adjusts augmentation strategies to optimize model generalization, while TAL leverages pre-trained models to focus learning on the most informative data samples. AIP intelligently prunes less significant weights, and SQD harmonizes quantization with knowledge distillation to achieve high compression rates without significant performance loss. The synergy between these techniques makes SEO-LLM a robust solution for deploying LLMs in resource-constrained environments, maintaining state-of-the-art performance with a fraction of the computational and data resources.

Keywords

Natural Language Processing (NLP); Large Language Models (LLMs); Data Utilization; Model Compression; Knowledge Distillation

Subject

Computer Science and Mathematics, Computer Science

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.