finetune 830M

This commit is contained in:
jason-on-salt-a40 2024-04-08 15:12:51 -07:00
parent a31be7023f
commit 778db3443d
2 changed files with 4 additions and 4 deletions

View File

@ -100,7 +100,7 @@ conda install -c conda-forge montreal-forced-aligner=2.2.17 openfst=1.8.2 kaldi=
# install MFA english dictionary and model
mfa model download dictionary english_us_arpa
mfa model download acoustic english_us_arpa
pip install huggingface_hub
# pip install huggingface_hub
# conda install pocl # above gives an warning for installing pocl, not sure if really need this
# to run ipynb
@ -154,7 +154,7 @@ bash e830M.sh
It's the same procedure to prepare your own custom dataset. Make sure that if
## Finetuning
You also need to do step 1-4 as Training, and I recommend to use AdamW for optimization if you finetune a pretrained model for better stability. checkout script `/home/pyp/VoiceCraft/z_scripts/e830M_ft.sh`.
You also need to do step 1-4 as Training, and I recommend to use AdamW for optimization if you finetune a pretrained model for better stability. checkout script `./z_scripts/e830M_ft.sh`.
If your dataset introduce new phonemes (which is very likely) that doesn't exist in the giga checkpoint, make sure you combine the original phonemes with the phoneme from your data when construction vocab. And you need to adjust `--text_vocab_size` and `--text_pad_token` so that the former is bigger than or equal to you vocab size, and the latter has the same value as `--text_vocab_size` (i.e. `--text_pad_token` is always the last token). Also since the text embedding are now of a different size, make sure you modify the weights loading part so that I won't crash (you could skip loading `text_embedding` or only load the existing part, and randomly initialize the new)

View File

@ -11,7 +11,7 @@ exp_root="path/to/store/exp_results"
exp_name=e830M_ft
dataset_dir="path/to/stored_extracted_codes_and_phonemes/xl" # xs if you only extracted xs in previous step
encodec_codes_folder_name="encodec_16khz_4codebooks"
load_model_from="/home/pyp/VoiceCraft/pretrained_models/giga830M.pth"
load_model_from="./pretrained_models/giga830M.pth"
# export CUDA_LAUNCH_BLOCKING=1 # for debugging
@ -34,7 +34,7 @@ torchrun --nnodes=1 --rdzv-backend=c10d --rdzv-endpoint=localhost:41977 --nproc_
--nhead 16 \
--num_decoder_layers 16 \
--max_num_tokens 20000 \
--gradient_accumulation_steps 20 \
--gradient_accumulation_steps 12 \
--val_max_num_tokens 6000 \
--num_buckets 6 \
--audio_max_length 20 \