mirror of
https://github.com/jasonppy/VoiceCraft.git
synced 2025-01-07 22:21:19 +01:00
add Dockerfile
This commit is contained in:
parent
991b1fe3bb
commit
1a1390e587
24
Dockerfile
Normal file
24
Dockerfile
Normal file
@ -0,0 +1,24 @@
|
||||
FROM jupyter/base-notebook:python-3.9.13
|
||||
|
||||
USER root
|
||||
|
||||
# Install OS dependencies
|
||||
RUN apt-get update && apt-get install -y git-core ffmpeg espeak-ng && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Update Conda, create the voicecraft environment, and install dependencies
|
||||
RUN conda update -y -n base -c conda-forge conda && \
|
||||
conda create -y -n voicecraft python=3.9.16 && \
|
||||
conda run -n voicecraft conda install -y -c conda-forge montreal-forced-aligner=2.2.17 openfst=1.8.2 kaldi=5.5.1068 && \
|
||||
conda run -n voicecraft pip install torch==2.0.1 && \
|
||||
tensorboard==2.16.2 && \
|
||||
phonemizer==3.2.1 && \
|
||||
torchaudio==2.0.2 && \
|
||||
datasets==2.16.0 && \
|
||||
torchmetrics==0.11.1 && \
|
||||
conda run -n voicecraft pip install -e git+https://github.com/facebookresearch/audiocraft.git@c5157b5bf14bf83449c17ea1eeb66c19fb4bc7f0#egg=audiocraft
|
||||
|
||||
# Install the Jupyter kernel
|
||||
RUN conda install -n voicecraft ipykernel --update-deps --force-reinstall -y && \
|
||||
conda run -n voicecraft python -m ipykernel install --name=voicecraft
|
27
README.md
27
README.md
@ -21,8 +21,8 @@ To clone or edit an unseen voice, VoiceCraft needs only a few seconds of referen
|
||||
- [ ] HuggingFace Spaces demo
|
||||
- [ ] Better guidance on training/finetuning
|
||||
|
||||
## How to run TTS inference
|
||||
There are two ways:
|
||||
## How to run TTS inference
|
||||
There are two ways:
|
||||
1. with docker. see [quickstart](#quickstart)
|
||||
2. without docker. see [envrionment setup](#environment-setup)
|
||||
|
||||
@ -31,7 +31,7 @@ When you are inside the docker image or you have installed all dependencies, Che
|
||||
If you want to do model development such as training/finetuning, I recommend following [envrionment setup](#environment-setup) and [training](#training).
|
||||
|
||||
## QuickStart
|
||||
:star: To try out TTS inference with VoiceCraft, the best way is using docker. Thank [@ubergarm](https://github.com/ubergarm) and [@jayc88](https://github.com/jay-c88) for making this happen.
|
||||
:star: To try out TTS inference with VoiceCraft, the best way is using docker. Thank [@ubergarm](https://github.com/ubergarm) and [@jayc88](https://github.com/jay-c88) for making this happen.
|
||||
|
||||
Tested on Linux and Windows and should work with any host with docker installed.
|
||||
```bash
|
||||
@ -43,23 +43,26 @@ cd VoiceCraft
|
||||
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.13.5/install-guide.html
|
||||
# sudo apt-get install -y nvidia-container-toolkit-base || yay -Syu nvidia-container-toolkit || echo etc...
|
||||
|
||||
# 3. Try to start an existing container otherwise create a new one passing in all GPUs
|
||||
# 3. First build the docker image
|
||||
docker build --tag "voicecraft" .
|
||||
|
||||
# 4. Try to start an existing container otherwise create a new one passing in all GPUs
|
||||
./start-jupyter.sh # linux
|
||||
start-jupyter.bat # windows
|
||||
|
||||
# 4. now open a webpage on the host box to the URL shown at the bottom of:
|
||||
# 5. now open a webpage on the host box to the URL shown at the bottom of:
|
||||
docker logs jupyter
|
||||
|
||||
# 5. optionally look inside from another terminal
|
||||
# 6. optionally look inside from another terminal
|
||||
docker exec -it jupyter /bin/bash
|
||||
export USER=(your_linux_username_used_above)
|
||||
export HOME=/home/$USER
|
||||
sudo apt-get update
|
||||
|
||||
# 6. confirm video card(s) are visible inside container
|
||||
# 7. confirm video card(s) are visible inside container
|
||||
nvidia-smi
|
||||
|
||||
# 7. Now in browser, open inference_tts.ipynb and work through one cell at a time
|
||||
# 8. Now in browser, open inference_tts.ipynb and work through one cell at a time
|
||||
echo GOOD LUCK
|
||||
```
|
||||
|
||||
@ -91,13 +94,13 @@ If you have encountered version issues when running things, checkout [environmen
|
||||
Checkout [`inference_speech_editing.ipynb`](./inference_speech_editing.ipynb) and [`inference_tts.ipynb`](./inference_tts.ipynb)
|
||||
|
||||
## Training
|
||||
To train an VoiceCraft model, you need to prepare the following parts:
|
||||
To train an VoiceCraft model, you need to prepare the following parts:
|
||||
1. utterances and their transcripts
|
||||
2. encode the utterances into codes using e.g. Encodec
|
||||
3. convert transcripts into phoneme sequence, and a phoneme set (we named it vocab.txt)
|
||||
4. manifest (i.e. metadata)
|
||||
|
||||
Step 1,2,3 are handled in [./data/phonemize_encodec_encode_hf.py](./data/phonemize_encodec_encode_hf.py), where
|
||||
Step 1,2,3 are handled in [./data/phonemize_encodec_encode_hf.py](./data/phonemize_encodec_encode_hf.py), where
|
||||
1. Gigaspeech is downloaded through HuggingFace. Note that you need to sign an agreement in order to download the dataset (it needs your auth token)
|
||||
2. phoneme sequence and encodec codes are also extracted using the script.
|
||||
|
||||
@ -119,7 +122,7 @@ python phonemize_encodec_encode_hf.py \
|
||||
where encodec_model_path is avaliable [here](https://huggingface.co/pyp1/VoiceCraft). This model is trained on Gigaspeech XL, it has 56M parameters, 4 codebooks, each codebook has 2048 codes. Details are described in our [paper](https://jasonppy.github.io/assets/pdfs/VoiceCraft.pdf). If you encounter OOM during extraction, try decrease the batch_size and/or max_len.
|
||||
The extracted codes, phonemes, and vocab.txt will be stored at `path/to/store_extracted_codes_and_phonemes/${dataset_size}/{encodec_16khz_4codebooks,phonemes,vocab.txt}`.
|
||||
|
||||
As for manifest, please download train.txt and validation.txt from [here](https://huggingface.co/datasets/pyp1/VoiceCraft_RealEdit/tree/main), and put them under `path/to/store_extracted_codes_and_phonemes/manifest/`. Please also download vocab.txt from [here](https://huggingface.co/datasets/pyp1/VoiceCraft_RealEdit/tree/main) if you want to use our pretrained VoiceCraft model (so that the phoneme-to-token matching is the same).
|
||||
As for manifest, please download train.txt and validation.txt from [here](https://huggingface.co/datasets/pyp1/VoiceCraft_RealEdit/tree/main), and put them under `path/to/store_extracted_codes_and_phonemes/manifest/`. Please also download vocab.txt from [here](https://huggingface.co/datasets/pyp1/VoiceCraft_RealEdit/tree/main) if you want to use our pretrained VoiceCraft model (so that the phoneme-to-token matching is the same).
|
||||
|
||||
Now, you are good to start training!
|
||||
|
||||
@ -138,7 +141,7 @@ first install it with `pip install g2p`
|
||||
```python
|
||||
from g2p import make_g2p
|
||||
transducer = make_g2p('eng', 'eng-ipa')
|
||||
transducer("hello").output_string
|
||||
transducer("hello").output_string
|
||||
# it will output: 'hʌloʊ'
|
||||
``` -->
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user