Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks

Marvin Coto-Jiménez

doi:10.20944/preprints201905.0228.v1

Submitted:

16 May 2019

Posted:

17 May 2019

You are already at the latest version

Abstract

Several researchers have contemplated deep learning-based post-filters to increase the quality of statistical parametric speech synthesis, which perform a mapping of the synthetic speech to the natural speech, considering the different parameters separately and trying to reduce the gap between them. The Long Short-term Memory (LSTM) Neural Networks have been applied successfully in this purpose, but there are still many aspects to improve in the results and in the process itself. In this paper, we introduce a new pre-training approach for the LSTM, with the objective of enhancing the quality of the synthesized speech, particularly in the spectrum, in a more efficient manner. Our approach begins with an auto-associative training of one LSTM network, which is used as an initialization for the post-filters. We show the advantages of this initialization for the enhancing of the Mel-Frequency Cepstral parameters of synthetic speech. Results show that the initialization succeeds in achieving better results in enhancing the statistical parametric speech spectrum in most cases when compared to the common random initialization approach of the networks.

Keywords:

Deep learning

;

LSTM

;

Machine learning

;

Post-filtering

;

Signal processing

;

Speech Synthesis

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe