mirror of
https://github.com/jasonppy/VoiceCraft.git
synced 2025-05-24 06:44:24 +02:00
Replicate
This commit is contained in:
parent
4ff9930b8e
commit
8d1177149b
Binary file not shown.
@ -1,8 +0,0 @@
|
|||||||
--extra-index-url https://download.pytorch.org/whl/cu118
|
|
||||||
torch==2.0.1
|
|
||||||
torchaudio==2.0.2
|
|
||||||
xformers==0.0.22
|
|
||||||
tensorboard==2.16.2
|
|
||||||
phonemizer==3.2.1
|
|
||||||
datasets==2.16.0
|
|
||||||
torchmetrics==0.11.1
|
|
Binary file not shown.
@ -1,8 +0,0 @@
|
|||||||
--extra-index-url https://download.pytorch.org/whl/cu118
|
|
||||||
torch==2.0.1
|
|
||||||
torchaudio==2.0.2
|
|
||||||
xformers==0.0.22
|
|
||||||
tensorboard==2.16.2
|
|
||||||
phonemizer==3.2.1
|
|
||||||
datasets==2.16.0
|
|
||||||
torchmetrics==0.11.1
|
|
Binary file not shown.
@ -1,8 +0,0 @@
|
|||||||
--extra-index-url https://download.pytorch.org/whl/cu118
|
|
||||||
torch==2.0.1
|
|
||||||
torchaudio==2.0.2
|
|
||||||
xformers==0.0.22
|
|
||||||
tensorboard==2.16.2
|
|
||||||
phonemizer==3.2.1
|
|
||||||
datasets==2.16.0
|
|
||||||
torchmetrics==0.11.1
|
|
Binary file not shown.
@ -1,8 +0,0 @@
|
|||||||
--extra-index-url https://download.pytorch.org/whl/cu118
|
|
||||||
torch==2.0.1
|
|
||||||
torchaudio==2.0.2
|
|
||||||
xformers==0.0.22
|
|
||||||
tensorboard==2.16.2
|
|
||||||
phonemizer==3.2.1
|
|
||||||
datasets==2.16.0
|
|
||||||
torchmetrics==0.11.1
|
|
Binary file not shown.
@ -1,8 +0,0 @@
|
|||||||
--extra-index-url https://download.pytorch.org/whl/cu118
|
|
||||||
torch==2.0.1
|
|
||||||
torchaudio==2.0.2
|
|
||||||
xformers==0.0.22
|
|
||||||
tensorboard==2.16.2
|
|
||||||
phonemizer==3.2.1
|
|
||||||
datasets==2.16.0
|
|
||||||
torchmetrics==0.11.1
|
|
Binary file not shown.
@ -1,8 +0,0 @@
|
|||||||
--extra-index-url https://download.pytorch.org/whl/cu118
|
|
||||||
torch==2.0.1
|
|
||||||
torchaudio==2.0.2
|
|
||||||
xformers==0.0.22
|
|
||||||
tensorboard==2.16.2
|
|
||||||
phonemizer==3.2.1
|
|
||||||
datasets==2.16.0
|
|
||||||
torchmetrics==0.11.1
|
|
@ -1,5 +1,5 @@
|
|||||||
# VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
|
# VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
|
||||||
[](https://jasonppy.github.io/assets/pdfs/VoiceCraft.pdf) [](https://jasonppy.github.io/VoiceCraft_web/) [](https://huggingface.co/spaces/pyp1/VoiceCraft_gradio) [](https://colab.research.google.com/drive/1IOjpglQyMTO2C3Y94LD9FY0Ocn-RJRg6?usp=sharing) [](https://replicate.com/cjwbw/voicecraft)
|
[](https://arxiv.org/pdf/2403.16973.pdf) [](https://huggingface.co/spaces/pyp1/VoiceCraft_gradio) [](https://colab.research.google.com/drive/1IOjpglQyMTO2C3Y94LD9FY0Ocn-RJRg6?usp=sharing) [](https://replicate.com/cjwbw/voicecraft) [](https://youtu.be/eikybOi8iwU) [](https://jasonppy.github.io/VoiceCraft_web/)
|
||||||
|
|
||||||
|
|
||||||
### TL;DR
|
### TL;DR
|
||||||
@ -19,6 +19,8 @@ When you are inside the docker image or you have installed all dependencies, Che
|
|||||||
If you want to do model development such as training/finetuning, I recommend following [envrionment setup](#environment-setup) and [training](#training).
|
If you want to do model development such as training/finetuning, I recommend following [envrionment setup](#environment-setup) and [training](#training).
|
||||||
|
|
||||||
## News
|
## News
|
||||||
|
:star: 04/22/2024: 330M/830M TTS Enhanced Models are up [here](https://huggingface.co/pyp1), load them through [`gradio_app.py`](./gradio_app.py) or [`inference_tts.ipynb`](./inference_tts.ipynb)! Replicate demo is up, major thanks to [@chenxwh](https://github.com/chenxwh)!
|
||||||
|
|
||||||
:star: 04/11/2024: VoiceCraft Gradio is now available on HuggingFace Spaces [here](https://huggingface.co/spaces/pyp1/VoiceCraft_gradio)! Major thanks to [@zuev-stepan](https://github.com/zuev-stepan), [@Sewlell](https://github.com/Sewlell), [@pgsoar](https://github.com/pgosar) [@Ph0rk0z](https://github.com/Ph0rk0z).
|
:star: 04/11/2024: VoiceCraft Gradio is now available on HuggingFace Spaces [here](https://huggingface.co/spaces/pyp1/VoiceCraft_gradio)! Major thanks to [@zuev-stepan](https://github.com/zuev-stepan), [@Sewlell](https://github.com/Sewlell), [@pgsoar](https://github.com/pgosar) [@Ph0rk0z](https://github.com/Ph0rk0z).
|
||||||
|
|
||||||
:star: 04/05/2024: I finetuned giga330M with the TTS objective on gigaspeech and 1/5 of librilight. Weights are [here](https://huggingface.co/pyp1/VoiceCraft/tree/main). Make sure maximal prompt + generation length <= 16 seconds (due to our limited compute, we had to drop utterances longer than 16s in training data). Even stronger models forthcomming, stay tuned!
|
:star: 04/05/2024: I finetuned giga330M with the TTS objective on gigaspeech and 1/5 of librilight. Weights are [here](https://huggingface.co/pyp1/VoiceCraft/tree/main). Make sure maximal prompt + generation length <= 16 seconds (due to our limited compute, we had to drop utterances longer than 16s in training data). Even stronger models forthcomming, stay tuned!
|
||||||
@ -31,7 +33,7 @@ If you want to do model development such as training/finetuning, I recommend fol
|
|||||||
- [x] Inference demo for speech editing and TTS
|
- [x] Inference demo for speech editing and TTS
|
||||||
- [x] Training guidance
|
- [x] Training guidance
|
||||||
- [x] RealEdit dataset and training manifest
|
- [x] RealEdit dataset and training manifest
|
||||||
- [x] Model weights (giga330M.pth, giga830M.pth, and gigaHalfLibri330M_TTSEnhanced_max16s.pth)
|
- [x] Model weights
|
||||||
- [x] Better guidance on training/finetuning
|
- [x] Better guidance on training/finetuning
|
||||||
- [x] Colab notebooks
|
- [x] Colab notebooks
|
||||||
- [x] HuggingFace Spaces demo
|
- [x] HuggingFace Spaces demo
|
||||||
@ -211,7 +213,7 @@ We thank Feiteng for his [VALL-E reproduction](https://github.com/lifeiteng/vall
|
|||||||
## Citation
|
## Citation
|
||||||
```
|
```
|
||||||
@article{peng2024voicecraft,
|
@article{peng2024voicecraft,
|
||||||
author = {Peng, Puyuan and Huang, Po-Yao and Li, Daniel and Mohamed, Abdelrahman and Harwath, David},
|
author = {Peng, Puyuan and Huang, Po-Yao and Mohamed, Abdelrahman and Harwath, David},
|
||||||
title = {VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild},
|
title = {VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild},
|
||||||
journal = {arXiv},
|
journal = {arXiv},
|
||||||
year = {2024},
|
year = {2024},
|
||||||
|
@ -85,7 +85,7 @@ def load_models(whisper_backend_name, whisper_model_name, alignment_model_name,
|
|||||||
elif voicecraft_model_name == "830M":
|
elif voicecraft_model_name == "830M":
|
||||||
voicecraft_model_name = "giga830M"
|
voicecraft_model_name = "giga830M"
|
||||||
elif voicecraft_model_name == "330M_TTSEnhanced":
|
elif voicecraft_model_name == "330M_TTSEnhanced":
|
||||||
voicecraft_model_name = "gigaHalfLibri330M_TTSEnhanced_max16s"
|
voicecraft_model_name = "330M_TTSEnhanced"
|
||||||
elif voicecraft_model_name == "830M_TTSEnhanced":
|
elif voicecraft_model_name == "830M_TTSEnhanced":
|
||||||
voicecraft_model_name = "830M_TTSEnhanced"
|
voicecraft_model_name = "830M_TTSEnhanced"
|
||||||
|
|
||||||
|
@ -71,7 +71,7 @@
|
|||||||
"# load model, encodec, and phn2num\n",
|
"# load model, encodec, and phn2num\n",
|
||||||
"# # load model, tokenizer, and other necessary files\n",
|
"# # load model, tokenizer, and other necessary files\n",
|
||||||
"device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
|
"device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
|
||||||
"voicecraft_name=\"830M_TTSEnhanced.pth\" # or giga330M.pth, gigaHalfLibri330M_TTSEnhanced_max16s.pth, giga830M.pth\n",
|
"voicecraft_name=\"830M_TTSEnhanced.pth\" # or giga330M.pth, 330M_TTSEnhanced.pth, giga830M.pth\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# the new way of loading the model, with huggingface, recommended\n",
|
"# the new way of loading the model, with huggingface, recommended\n",
|
||||||
"from models import voicecraft\n",
|
"from models import voicecraft\n",
|
||||||
|
Loading…
x
Reference in New Issue
Block a user