You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 3, 2021. It is now read-only.
I am using the tacotron-gst for speech generation (mag) and getting choppy generated audio, as someone else noted here. My inference output files are here.
I'm running inference on an NVIDIA tf docker container. Here are my inference logs.
The text I am trying to generate is from the M-AILABS dataset itself. My inference file contains the one line below:
en_US/by_book/female/judy_bieber/the_master_key/wavs/the_master_key_10_f000002|UNUSED|How Rob Served a Mighty King.
If I understand correctly, the provided checkpoint has been trained on the M-AILABS dataset, which means it has seen this particular sentence/audio pair.
Is sample_step0_0_infer_mag.wav the quality to be expected?
Can I swap out griffin-lim and use wavenet to improve the audio quality?
Can you please share some Tacotron-GST audio samples (I found the non-GST tacotron samples in the docs) you have generated, so that we can know what to expect? My expectations are set by the Google tacotron team's audio samples on their webpage.
In short - Is there any way to tell (from the output spectrogram image perhaps) what is causing the low quality generation, and what to change to improve quality? The model, or the vocoder? Both?
The text was updated successfully, but these errors were encountered:
Hi,
I am using the tacotron-gst for speech generation (mag) and getting choppy generated audio, as someone else noted here. My inference output files are here.
I'm running inference on an NVIDIA tf docker container. Here are my inference logs.
The text I am trying to generate is from the M-AILABS dataset itself. My inference file contains the one line below:
If I understand correctly, the provided checkpoint has been trained on the M-AILABS dataset, which means it has seen this particular sentence/audio pair.
The text was updated successfully, but these errors were encountered: