LIP-SYNC

MuseTalk

Overview

MuseTalk is a real-time high-quality audio-driven lip-syncing model trained in the latent space of ft-mse-vae, which

modifies an unseen face according to the input audio, with a size of face region of 256 x 256.
Support audio in various languages, such as Chinese, English, and Japanese.
supports real-time inference with 30fps+ on an NVIDIA Tesla V100.
supports modification of the center point of the face region proposes, which SIGNIFICANTLY affects generation results.
checkpoint available trained on the HDTF dataset.

MuseTalk-Doc

Getting Started

We provide a detailed tutorial about the installation and the basic usage of MuseTalk for new users:

The pipeline was implemented step-by-step in Colab, covering all aspects including installation, configuration, downloading weights and models, performing inference, and real-time inference. Everything needed for the process was included

1. Download weights

You can copy the directory ("/content/drive/MyDrive/Lip-Sync/MuseTalk/musetalk/models") from my drive to your drive in the same path to work

OR

You can download weights manually as follows:

Download our trained weights.
Download the weights of other components:

Finally, these weights should be organized in models as follows:

./models/
├── musetalk
│   └── musetalk.json
│   └── pytorch_model.bin
├── dwpose
│   └── dw-ll_ucoco_384.pth
├── face-parse-bisent
│   ├── 79999_iter.pth
│   └── resnet18-5c106cde.pth
├── sd-vae-ft-mse
│   ├── config.json
│   └── diffusion_pytorch_model.bin
└── whisper
    └── tiny.pt

Resources Provided

Reference video: 13_K.mp4
Reference audio: audio_folder

Lip Syncing Model

Selected an open-source lip-syncing model, specifically MuseTalk.
Implemented the pipeline step-by-step in Colab, covering:
- Environment setup
- Configuration
- Downloading weights and models
- Inference and real-time inference
- Adjusting parameters for better results
Generated a high-quality video with excellent synchronization of lip movements with the audio.

Evaluation: Compared the generated video with the reference video [13_K.mp4]. Verified that the model provided good synchronization and overall video quality. Fine-Tuning:

Fine-Tuning: Instead of fine-tuning the model, adjusted parameters like bbox_shift to control mouth openness, resulting in better synchronization.

!python -m scripts.inference --inference_config configs/inference/test.yaml --bbox_shift -7

Results

Generated a high-quality video with excellent synchronization of lip movements with the audio. Inference

Real-Inference

Reference

MuseTalk :https://github.com/TMElyralab/MuseTalk/tree/main?tab=readme-ov-file
MuseV : https://github.com/TMElyralab/MuseV/tree/main

For more details, refer to the documentation provided in the repository. If you encounter any issues or have questions, feel free to open an issue or contact the maintainer.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
Generated- Inference		Generated- Inference
Generated-Real-Inference		Generated-Real-Inference
Notebooks		Notebooks
Resources		Resources
MuseTalk.ipynb		MuseTalk.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LIP-SYNC

MuseTalk

Overview

Getting Started

1. Download weights

Resources Provided

Lip Syncing Model

Results

Reference

About

Releases

Packages

Languages

MohamedAziz15/lip-syncing-pipeline

Folders and files

Latest commit

History

Repository files navigation

LIP-SYNC

MuseTalk

Overview

Getting Started

1. Download weights

Resources Provided

Lip Syncing Model

Results

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages