Version 1
: Received: 17 July 2024 / Approved: 18 July 2024 / Online: 19 July 2024 (06:29:02 CEST)
Version 2
: Received: 7 August 2024 / Approved: 8 August 2024 / Online: 11 August 2024 (04:42:06 CEST)
Version 3
: Received: 12 August 2024 / Approved: 13 August 2024 / Online: 14 August 2024 (07:08:01 CEST)
How to cite:
Guan, B.; Cao, J.; Wang, X.; Wang, Z.; Sui, M.; Wang, Z. Integrated Method of Deep learning and Large Language Model in Speech Recognition. Preprints2024, 2024071520. https://doi.org/10.20944/preprints202407.1520.v3
Guan, B.; Cao, J.; Wang, X.; Wang, Z.; Sui, M.; Wang, Z. Integrated Method of Deep learning and Large Language Model in Speech Recognition. Preprints 2024, 2024071520. https://doi.org/10.20944/preprints202407.1520.v3
Guan, B.; Cao, J.; Wang, X.; Wang, Z.; Sui, M.; Wang, Z. Integrated Method of Deep learning and Large Language Model in Speech Recognition. Preprints2024, 2024071520. https://doi.org/10.20944/preprints202407.1520.v3
APA Style
Guan, B., Cao, J., Wang, X., Wang, Z., Sui, M., & Wang, Z. (2024). Integrated Method of Deep learning and Large Language Model in Speech Recognition. Preprints. https://doi.org/10.20944/preprints202407.1520.v3
Chicago/Turabian Style
Guan, B., Mingxiu Sui and Zixiang Wang. 2024 "Integrated Method of Deep learning and Large Language Model in Speech Recognition" Preprints. https://doi.org/10.20944/preprints202407.1520.v3
Abstract
This research aims to explore the integration method of deep learning and large language models in speech recognition to improve the system’s recognition accuracy and ability to handle complex contexts. Deep neural network (DNN), convolutional neural network (CNN), long short-term memory network (LSTM) and Transformer-based large language model are used to build an integrated acoustic and language model framework. Experiments on TIMIT, LibriSpeech and Common Voice datasets show that the ensemble model shows significant improvements in both word error rate (WER) and real-time factor (RTF) compared to traditional models. Especially in terms of adaptability to multiple languages and accent changes, the model shows superior performance. The conclusion shows that through technology integration, the performance of the speech recognition system in complex environments can be effectively improved, providing a new direction for the future development of speech recognition technology.
Keywords
deep learning; large language model; speech recognition; ensemble method
Subject
Computer Science and Mathematics, Computer Science
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.