Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use with different languages #4

Open
Misiu opened this issue Dec 22, 2023 · 11 comments
Open

Use with different languages #4

Misiu opened this issue Dec 22, 2023 · 11 comments

Comments

@Misiu
Copy link

Misiu commented Dec 22, 2023

Hi there,
I want to use the steps in https://www.home-assistant.io/voice_control/create_wake_word/ to create my custom wake word.
Is there a chance to use/add different languages?
I'm interested in Polish, but I think that support for any additional language would be awesome.

@alesms
Copy link

alesms commented Feb 5, 2024

Me too, I'm interested in Italian

@synesthesiam
Copy link
Contributor

Models for French, German, and Dutch have just been added. It will take more time to additional languages, but fortunately the data is available: http://openslr.org/94/

@fherreror
Copy link

Spanish would be a great addition :D

@alesms
Copy link

alesms commented Feb 29, 2024

Hi @synesthesiam do you have any news about other languages?

@synesthesiam
Copy link
Contributor

Not yet, but Spanish, Portuguese, Polish, and Italian should be possible with the MLS dataset.

@alesms
Copy link

alesms commented Feb 29, 2024

Great, but i don't know how to train the model :( maybe do you have any instructions or something?

@mmalyska
Copy link

mmalyska commented Apr 5, 2024

@synesthesiam do you have any guidelines how to export LibriTTS-R generator from a checkpoint?
I wanted to use this https://huggingface.co/datasets/rhasspy/piper-checkpoints/blob/main/pl/pl_PL/gosia/medium/epoch%3D5001-step%3D1457672.ckpt to generate samples, and also learn how to do it myself :)

@alesms
Copy link

alesms commented Apr 10, 2024

Hi @synesthesiam I've tried several times over the past few days to create a new model (.pt) to use with this notebook: https://colab.research.google.com/drive/1q1oe2zOyZp7UsB3jJiQ1IFn8z5YfjwEb#scrollTo=1cbqBebHXjFD to create my custom Italian wake word. I've attempted to follow various guides, including this one: https://github.com/rhasspy/piper/blob/master/TRAINING.md. I also tried starting with the 15 GB dataset you mentioned here: https://openslr.org/94/, but I haven't been successful. Could you please tell me how to do it? It would be a great help.
Thank you

@mario872
Copy link

mario872 commented Jun 25, 2024

I was able to export a .pt file, but not get it working with piper-sample-generator. (I'm using English US, but cannot use a model trained with piper in Piper Sample Generator)
Also, I'm using WSL Ubuntu on Windows.

Here is what I've done so far:
To get a .pt file I temporarily modified line 91 in https://github.com/rhasspy/piper/blob/master/src/python/piper_train/__main__.py
to

torch.save(model, '/path/to/save.pt')
exit()

Then ran

python3 -m piper_train \
    --dataset-dir /path/to/training_dir/ \
    --accelerator 'cpu' \
    --devices 1 \
    --batch-size 10 \
    --validation-split 0.0 \
    --num-test-examples 0 \
    --max_epochs 10000 \
    --resume_from_checkpoint /path/to/your/last.ckpt \
    --checkpoint-epochs 1 \
    --precision 32

(To be clear, this doesn't train the model, just export it from a checkpoint (.ckpt file).)
I then had a .pt file, but to get .pt.json I ran:

cp /path/to/training_dir/config.json \
   /path/to/save.pt.json

I installed PyTorch 2.0.0 and it's dependencies, piper-phonemize, and webrtcvad. By running pip freeze this is my list of installed modules.
I then attempted to run Piper Sample Generator using this modified script:

import os
import sys

if "piper-sample-generator/" not in sys.path:
        sys.path.append("piper-sample-generator/")

from generate_samples import generate_samples

target_word = 'edward'

def text_to_speech(text):
    generate_samples(text = text, max_samples=1, length_scales=[1.1], noise_scales=[0.7], noise_scale_ws = [0.7], output_dir = './', batch_size=1, auto_reduce_batch_size=True, file_names=["test_generation.wav"], model='James5.pt')

text_to_speech(target_word)

This was the output:

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "/home/james/new_open_wake_word_training/step-1.py", line 7, in <module>
    from generate_samples import generate_samples
  File "/home/james/new_open_wake_word_training/piper-sample-generator/generate_samples.py", line 14, in <module>
    import torchaudio
  File "/home/james/new_open_wake_word_training/venv/lib/python3.10/site-packages/torchaudio/__init__.py", line 1, in <module>
    from torchaudio import (  # noqa: F401
  File "/home/james/new_open_wake_word_training/venv/lib/python3.10/site-packages/torchaudio/compliance/__init__.py", line 1, in <module>
    from . import kaldi
  File "/home/james/new_open_wake_word_training/venv/lib/python3.10/site-packages/torchaudio/compliance/kaldi.py", line 22, in <module>
    EPSILON = torch.tensor(torch.finfo(torch.float).eps)
/home/james/new_open_wake_word_training/venv/lib/python3.10/site-packages/torchaudio/compliance/kaldi.py:22: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
  EPSILON = torch.tensor(torch.finfo(torch.float).eps)
DEBUG:generate_samples:Loading James5.pt
INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3mu_bb_l
INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3mu_bb_l/_remote_module_non_scriptable.py
INFO:generate_samples:Successfully loaded the model
Traceback (most recent call last):
  File "/home/james/new_open_wake_word_training/step-1.py", line 14, in <module>
    text_to_speech(target_word)
  File "/home/james/new_open_wake_word_training/step-1.py", line 12, in text_to_speech
    generate_samples(text = text, max_samples=1, length_scales=[1.1], noise_scales=[0.7], noise_scale_ws = [0.7], output_dir = './', batch_size=1, auto_reduce_batch_size=True, file_names=["test_generation.wav"], model='James5.pt')
  File "/home/james/new_open_wake_word_training/piper-sample-generator/generate_samples.py", line 178, in generate_samples
    audio, phoneme_samples = generate_audio(
  File "/home/james/new_open_wake_word_training/piper-sample-generator/generate_samples.py", line 302, in generate_audio
    x, m_p_orig, logs_p_orig, x_mask = model.enc_p(x, x_lengths)
  File "/home/james/new_open_wake_word_training/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'VitsModel' object has no attribute 'enc_p'

So, I'm not really sure what to do from here, I don't really understand how AI in Python works, but I've gotten this far, help would be greatly appreciated.

Edit: If it's any help, here's what is outputted when I run print(model) at line 301 in generate_samples.py (right before it fails)

Second edit: I just realised that I'm likely getting this error because I'm using a single speaker model. Would this be likely @synesthesiam?

@tolnai
Copy link

tolnai commented Jul 9, 2024

Hi @synesthesiam I've tried several times over the past few days to create a new model (.pt) to use with this notebook: https://colab.research.google.com/drive/1q1oe2zOyZp7UsB3jJiQ1IFn8z5YfjwEb#scrollTo=1cbqBebHXjFD to create my custom Italian wake word. I've attempted to follow various guides, including this one: https://github.com/rhasspy/piper/blob/master/TRAINING.md. I also tried starting with the 15 GB dataset you mentioned here: https://openslr.org/94/, but I haven't been successful. Could you please tell me how to do it? It would be a great help. Thank you

I was also trying to use the same notebook with referencing the German model (that's included in the release) instead of the English one, using the config from this repo, but can't get any proper human text with the sample generator, it sounds just some random phonemes...

@erkamkavak
Copy link

After some trial, my outputs are following:

  • You can't use a model which has single number of speaker(models with single speaker does not have emb_g module but generate_samples require emb_g module, you can play with generate_samples to run without emb_g but I am not sure if it will work)
  • If your model has multiple speakers:
    • first download it from checkpoints
    • then add following codes to piper_train/main.py
   parser.add_argument(
       "--ckpt-to-pt", help="Convert .ckpt file to .pt file and exit", metavar="CHECKPOINT_PATH"
   )
   ...

   if args.ckpt_to_pt: 
   	convert_ckpt_to_pt(args.ckpt_to_pt)
   	return

   ...

def convert_ckpt_to_pt(ckpt_path):
   _LOGGER.debug(f"Converting .ckpt file: {ckpt_path} to .pt file")
   
   model = VitsModel.load_from_checkpoint(ckpt_path)
   pt_path = ckpt_path.replace(".ckpt", ".pt")
   torch.save(model.model_g, pt_path)

   _LOGGER.debug(f"Model saved as .pt file: {pt_path}")
  • run this code with args as the path of your model's checkpoint. it will create a pt file
  • you can use this pt file with generate_samples like following:
python3 generate_samples.py '<your-word>' --model '<created-pt-model-path>' --max-samples 10 --output-dir <output-dir>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants