Preprint Article Version 1 This version is not peer-reviewed

Efficient Hybrid Inference for LLMs: Reward-Based Token Modelling with Selective Cloud Assistance

Version 1 : Received: 15 September 2024 / Approved: 16 September 2024 / Online: 17 September 2024 (05:12:30 CEST)

How to cite: MS, A.; VG, J.; PS, D. Efficient Hybrid Inference for LLMs: Reward-Based Token Modelling with Selective Cloud Assistance. Preprints 2024, 2024091208. https://doi.org/10.20944/preprints202409.1208.v1 MS, A.; VG, J.; PS, D. Efficient Hybrid Inference for LLMs: Reward-Based Token Modelling with Selective Cloud Assistance. Preprints 2024, 2024091208. https://doi.org/10.20944/preprints202409.1208.v1

Abstract

Large language models (LLMs) are known for their exceptional performance across a range of natural language processing tasks, but their deployment comes at a high computational and financial cost. On the other hand, smaller language models (SLMs), which can be deployed on lower-cost edge devices, struggle to match the performance of their larger counterparts. This paper presents a novel hybrid inference approach that leverages the strengths of both model types while minimizing reliance on costly cloud-based LLMs. Unlike existing methods that route entire queries to either an SLM or a cloud LLM, our approach introduces a reward-based mechanism to dynamically determine the involvement of the cloud LLM during token generation. Specifically, each token predicted by the SLM is evaluated against a reward score, and only when this score falls below a certain threshold is the cloud LLM consulted for assistance in the next token prediction. This method not only reduces the traffic to the cloud LLM, thereby lowering costs, but also allows for flexible control over response quality depending on the reward score threshold. Experimental results demonstrate that our approach significantly reduces cloud LLM usage with minimal impact on overall response quality, offering a cost-effective solution for deploying high-performance language models.

Keywords

llm; hybrid-llm; reward-token

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.