Preprint Article Version 1 This version is not peer-reviewed

Sparsity Limit to Prune Large Language Models for on-Device AI Assistants: Llama-2 as an Example

* and
Version 1 : Received: 18 July 2024 / Approved: 19 July 2024 / Online: 19 July 2024 (10:28:40 CEST)

How to cite: Liu, B.; Xu, Y. Sparsity Limit to Prune Large Language Models for on-Device AI Assistants: Llama-2 as an Example. Preprints 2024, 2024071568. https://doi.org/10.20944/preprints202407.1568.v1 Liu, B.; Xu, Y. Sparsity Limit to Prune Large Language Models for on-Device AI Assistants: Llama-2 as an Example. Preprints 2024, 2024071568. https://doi.org/10.20944/preprints202407.1568.v1

Abstract

Large language models (LLMs) have shown impressive performance and versatility. However, their billions of parameters and high computational costs hinder the development of personalized and privacy-preserving AI assistants operating locally on user devices. In this work, we explored the potential of pruning LLMs to create lightweight models suitable for user devices, using the moderate-sized Llama-2 7B model as an example. By adopting a simple yet effective pruning method, we found that up to 60% of the weights in the Llama-2 7B model could be pruned without significantly impairing its language modeling capabilities. Furthermore, despite occasional factual inaccuracies, the pruned model at the sparsity limit generated fluent and helpful answers to daily queries, demonstrating its feasibility of on-device AI assistants. These inaccuracies might originate from forgetting or hallucination due to pruning. We proposed a simple protocol to distinguish between the two mechanisms, as well as future directions to improve the pruned models for local AI assistants.

Keywords

large language models; sparse neural networks; pruning; on-device AI assistants

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.