Use with different languages #4

Misiu · 2023-12-22T13:58:19Z

Hi there,
I want to use the steps in https://www.home-assistant.io/voice_control/create_wake_word/ to create my custom wake word.
Is there a chance to use/add different languages?
I'm interested in Polish, but I think that support for any additional language would be awesome.

alesms · 2024-02-05T16:10:25Z

Me too, I'm interested in Italian

synesthesiam · 2024-02-05T20:16:04Z

Models for French, German, and Dutch have just been added. It will take more time to additional languages, but fortunately the data is available: http://openslr.org/94/

fherreror · 2024-02-06T13:53:02Z

Spanish would be a great addition :D

alesms · 2024-02-29T12:46:19Z

Hi @synesthesiam do you have any news about other languages?

synesthesiam · 2024-02-29T16:30:38Z

Not yet, but Spanish, Portuguese, Polish, and Italian should be possible with the MLS dataset.

alesms · 2024-02-29T17:20:01Z

Great, but i don't know how to train the model :( maybe do you have any instructions or something?

mmalyska · 2024-04-05T07:43:49Z

@synesthesiam do you have any guidelines how to export LibriTTS-R generator from a checkpoint?
I wanted to use this https://huggingface.co/datasets/rhasspy/piper-checkpoints/blob/main/pl/pl_PL/gosia/medium/epoch%3D5001-step%3D1457672.ckpt to generate samples, and also learn how to do it myself :)

alesms · 2024-04-10T22:17:10Z

Hi @synesthesiam I've tried several times over the past few days to create a new model (.pt) to use with this notebook: https://colab.research.google.com/drive/1q1oe2zOyZp7UsB3jJiQ1IFn8z5YfjwEb#scrollTo=1cbqBebHXjFD to create my custom Italian wake word. I've attempted to follow various guides, including this one: https://github.com/rhasspy/piper/blob/master/TRAINING.md. I also tried starting with the 15 GB dataset you mentioned here: https://openslr.org/94/, but I haven't been successful. Could you please tell me how to do it? It would be a great help.
Thank you

mario872 · 2024-06-25T03:35:56Z

I was able to export a .pt file, but not get it working with piper-sample-generator. (I'm using English US, but cannot use a model trained with piper in Piper Sample Generator)
Also, I'm using WSL Ubuntu on Windows.

Here is what I've done so far:
To get a .pt file I temporarily modified line 91 in https://github.com/rhasspy/piper/blob/master/src/python/piper_train/__main__.py
to

torch.save(model, '/path/to/save.pt')
exit()

Then ran

python3 -m piper_train \
    --dataset-dir /path/to/training_dir/ \
    --accelerator 'cpu' \
    --devices 1 \
    --batch-size 10 \
    --validation-split 0.0 \
    --num-test-examples 0 \
    --max_epochs 10000 \
    --resume_from_checkpoint /path/to/your/last.ckpt \
    --checkpoint-epochs 1 \
    --precision 32

(To be clear, this doesn't train the model, just export it from a checkpoint (.ckpt file).)
I then had a .pt file, but to get .pt.json I ran:

cp /path/to/training_dir/config.json \
   /path/to/save.pt.json

I installed PyTorch 2.0.0 and it's dependencies, piper-phonemize, and webrtcvad. By running pip freeze this is my list of installed modules.
I then attempted to run Piper Sample Generator using this modified script:

import os
import sys

if "piper-sample-generator/" not in sys.path:
        sys.path.append("piper-sample-generator/")

from generate_samples import generate_samples

target_word = 'edward'

def text_to_speech(text):
    generate_samples(text = text, max_samples=1, length_scales=[1.1], noise_scales=[0.7], noise_scale_ws = [0.7], output_dir = './', batch_size=1, auto_reduce_batch_size=True, file_names=["test_generation.wav"], model='James5.pt')

text_to_speech(target_word)

This was the output:

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "/home/james/new_open_wake_word_training/step-1.py", line 7, in <module>
    from generate_samples import generate_samples
  File "/home/james/new_open_wake_word_training/piper-sample-generator/generate_samples.py", line 14, in <module>
    import torchaudio
  File "/home/james/new_open_wake_word_training/venv/lib/python3.10/site-packages/torchaudio/__init__.py", line 1, in <module>
    from torchaudio import (  # noqa: F401
  File "/home/james/new_open_wake_word_training/venv/lib/python3.10/site-packages/torchaudio/compliance/__init__.py", line 1, in <module>
    from . import kaldi
  File "/home/james/new_open_wake_word_training/venv/lib/python3.10/site-packages/torchaudio/compliance/kaldi.py", line 22, in <module>
    EPSILON = torch.tensor(torch.finfo(torch.float).eps)
/home/james/new_open_wake_word_training/venv/lib/python3.10/site-packages/torchaudio/compliance/kaldi.py:22: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
  EPSILON = torch.tensor(torch.finfo(torch.float).eps)
DEBUG:generate_samples:Loading James5.pt
INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3mu_bb_l
INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3mu_bb_l/_remote_module_non_scriptable.py
INFO:generate_samples:Successfully loaded the model
Traceback (most recent call last):
  File "/home/james/new_open_wake_word_training/step-1.py", line 14, in <module>
    text_to_speech(target_word)
  File "/home/james/new_open_wake_word_training/step-1.py", line 12, in text_to_speech
    generate_samples(text = text, max_samples=1, length_scales=[1.1], noise_scales=[0.7], noise_scale_ws = [0.7], output_dir = './', batch_size=1, auto_reduce_batch_size=True, file_names=["test_generation.wav"], model='James5.pt')
  File "/home/james/new_open_wake_word_training/piper-sample-generator/generate_samples.py", line 178, in generate_samples
    audio, phoneme_samples = generate_audio(
  File "/home/james/new_open_wake_word_training/piper-sample-generator/generate_samples.py", line 302, in generate_audio
    x, m_p_orig, logs_p_orig, x_mask = model.enc_p(x, x_lengths)
  File "/home/james/new_open_wake_word_training/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'VitsModel' object has no attribute 'enc_p'

So, I'm not really sure what to do from here, I don't really understand how AI in Python works, but I've gotten this far, help would be greatly appreciated.

Edit: If it's any help, here's what is outputted when I run print(model) at line 301 in generate_samples.py (right before it fails)

Second edit: I just realised that I'm likely getting this error because I'm using a single speaker model. Would this be likely @synesthesiam?

tolnai · 2024-07-09T21:45:47Z

Hi @synesthesiam I've tried several times over the past few days to create a new model (.pt) to use with this notebook: https://colab.research.google.com/drive/1q1oe2zOyZp7UsB3jJiQ1IFn8z5YfjwEb#scrollTo=1cbqBebHXjFD to create my custom Italian wake word. I've attempted to follow various guides, including this one: https://github.com/rhasspy/piper/blob/master/TRAINING.md. I also tried starting with the 15 GB dataset you mentioned here: https://openslr.org/94/, but I haven't been successful. Could you please tell me how to do it? It would be a great help. Thank you

I was also trying to use the same notebook with referencing the German model (that's included in the release) instead of the English one, using the config from this repo, but can't get any proper human text with the sample generator, it sounds just some random phonemes...

erkamkavak · 2024-10-22T10:25:55Z

After some trial, my outputs are following:

You can't use a model which has single number of speaker(models with single speaker does not have emb_g module but generate_samples require emb_g module, you can play with generate_samples to run without emb_g but I am not sure if it will work)
If your model has multiple speakers:
- first download it from checkpoints
- then add following codes to piper_train/main.py

   parser.add_argument(
       "--ckpt-to-pt", help="Convert .ckpt file to .pt file and exit", metavar="CHECKPOINT_PATH"
   )
   ...

   if args.ckpt_to_pt: 
   	convert_ckpt_to_pt(args.ckpt_to_pt)
   	return

   ...

def convert_ckpt_to_pt(ckpt_path):
   _LOGGER.debug(f"Converting .ckpt file: {ckpt_path} to .pt file")
   
   model = VitsModel.load_from_checkpoint(ckpt_path)
   pt_path = ckpt_path.replace(".ckpt", ".pt")
   torch.save(model.model_g, pt_path)

   _LOGGER.debug(f"Model saved as .pt file: {pt_path}")

run this code with args as the path of your model's checkpoint. it will create a pt file
you can use this pt file with generate_samples like following:

python3 generate_samples.py '<your-word>' --model '<created-pt-model-path>' --max-samples 10 --output-dir <output-dir>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use with different languages #4

Use with different languages #4

Misiu commented Dec 22, 2023

alesms commented Feb 5, 2024

synesthesiam commented Feb 5, 2024

fherreror commented Feb 6, 2024

alesms commented Feb 29, 2024

synesthesiam commented Feb 29, 2024

alesms commented Feb 29, 2024 •

edited

Loading

mmalyska commented Apr 5, 2024

alesms commented Apr 10, 2024

mario872 commented Jun 25, 2024 •

edited

Loading

tolnai commented Jul 9, 2024

erkamkavak commented Oct 22, 2024

Use with different languages #4

Use with different languages #4

Comments

Misiu commented Dec 22, 2023

alesms commented Feb 5, 2024

synesthesiam commented Feb 5, 2024

fherreror commented Feb 6, 2024

alesms commented Feb 29, 2024

synesthesiam commented Feb 29, 2024

alesms commented Feb 29, 2024 • edited Loading

mmalyska commented Apr 5, 2024

alesms commented Apr 10, 2024

mario872 commented Jun 25, 2024 • edited Loading

tolnai commented Jul 9, 2024

erkamkavak commented Oct 22, 2024

alesms commented Feb 29, 2024 •

edited

Loading

mario872 commented Jun 25, 2024 •

edited

Loading