weights, notebook working
This commit is contained in:
parent
a129883910
commit
ac73066eb7
|
@ -15,6 +15,8 @@ thumbs.db
|
|||
*.png
|
||||
*.wav
|
||||
*.mp3
|
||||
*.pth
|
||||
*.th
|
||||
|
||||
*durip*
|
||||
*rtx*
|
||||
|
|
|
@ -1,11 +1,15 @@
|
|||
# VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
|
||||
[Demo](https://jasonppy.github.io/VoiceCraft_web) [Paper](https://jasonppy.github.io/assets/pdfs/VoiceCraft.pdf)
|
||||
|
||||
|
||||
### TL;DR
|
||||
VoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both **speech editing** and **zero-shot text-to-speech (TTS)** on in-the-wild data including audiobooks, internet videos, and podcasts.
|
||||
|
||||
To clone or edit an unseen voice, VoiceCraft needs only a few seconds of reference.
|
||||
|
||||
## News
|
||||
:star: 03/28/2024: Model weights are up on HuggingFace🤗 [here](https://huggingface.co/pyp1/VoiceCraft/tree/main)!
|
||||
|
||||
|
||||
## TODO
|
||||
The TODOs left will be completed by the end of March 2024.
|
||||
|
@ -13,8 +17,9 @@ The TODOs left will be completed by the end of March 2024.
|
|||
- [x] Environment setup
|
||||
- [x] Inference demo for speech editing and TTS
|
||||
- [x] Training guidance
|
||||
- [x] Upload the RealEdit dataset and training manifest
|
||||
- [ ] Upload model weights (encodec weights are up)
|
||||
- [x] RealEdit dataset and training manifest
|
||||
- [x] Model weights (both 330M and 830M, the former seems to be just as good but way faster)
|
||||
- [ ] More
|
||||
|
||||
|
||||
## Environment setup
|
||||
|
|
|
@ -1,12 +1,12 @@
|
|||
Begin,End,Label,Type,Speaker
|
||||
0.03,0.18,but,words,temp
|
||||
0.18,0.32,when,words,temp
|
||||
0.32,0.49,i,words,temp
|
||||
0.49,0.64,had,words,temp
|
||||
0.32,0.48,i,words,temp
|
||||
0.48,0.64,had,words,temp
|
||||
0.64,1.19,approached,words,temp
|
||||
1.22,1.58,so,words,temp
|
||||
1.58,1.9,near,words,temp
|
||||
1.9,2.07,to,words,temp
|
||||
1.58,1.91,near,words,temp
|
||||
1.91,2.07,to,words,temp
|
||||
2.07,2.42,them,words,temp
|
||||
2.53,2.61,the,words,temp
|
||||
2.61,3.01,common,words,temp
|
||||
|
@ -19,8 +19,8 @@ Begin,End,Label,Type,Speaker
|
|||
5.54,6.0,not,words,temp
|
||||
6.0,6.14,by,words,temp
|
||||
6.14,6.67,distance,words,temp
|
||||
6.79,7.06,any,words,temp
|
||||
7.06,7.18,of,words,temp
|
||||
6.79,7.05,any,words,temp
|
||||
7.05,7.18,of,words,temp
|
||||
7.18,7.34,its,words,temp
|
||||
7.34,7.87,marks,words,temp
|
||||
0.03,0.06,B,phones,temp
|
||||
|
@ -29,22 +29,22 @@ Begin,End,Label,Type,Speaker
|
|||
0.18,0.23,W,phones,temp
|
||||
0.23,0.27,EH1,phones,temp
|
||||
0.27,0.32,N,phones,temp
|
||||
0.32,0.49,AY1,phones,temp
|
||||
0.49,0.5,HH,phones,temp
|
||||
0.5,0.6,AE1,phones,temp
|
||||
0.32,0.48,AY1,phones,temp
|
||||
0.48,0.49,HH,phones,temp
|
||||
0.49,0.6,AE1,phones,temp
|
||||
0.6,0.64,D,phones,temp
|
||||
0.64,0.7,AH0,phones,temp
|
||||
0.7,0.83,P,phones,temp
|
||||
0.83,0.87,R,phones,temp
|
||||
0.87,0.99,OW1,phones,temp
|
||||
0.83,0.88,R,phones,temp
|
||||
0.88,0.99,OW1,phones,temp
|
||||
0.99,1.12,CH,phones,temp
|
||||
1.12,1.19,T,phones,temp
|
||||
1.22,1.4,S,phones,temp
|
||||
1.4,1.58,OW1,phones,temp
|
||||
1.58,1.7,N,phones,temp
|
||||
1.7,1.84,IH1,phones,temp
|
||||
1.84,1.9,R,phones,temp
|
||||
1.9,2.01,T,phones,temp
|
||||
1.84,1.91,R,phones,temp
|
||||
1.91,2.01,T,phones,temp
|
||||
2.01,2.07,AH0,phones,temp
|
||||
2.07,2.13,DH,phones,temp
|
||||
2.13,2.3,EH1,phones,temp
|
||||
|
@ -75,8 +75,8 @@ Begin,End,Label,Type,Speaker
|
|||
4.34,4.42,D,phones,temp
|
||||
4.42,4.45,IH0,phones,temp
|
||||
4.45,4.59,S,phones,temp
|
||||
4.59,4.8,IY1,phones,temp
|
||||
4.8,4.87,V,phones,temp
|
||||
4.59,4.79,IY1,phones,temp
|
||||
4.79,4.87,V,phones,temp
|
||||
4.87,4.97,Z,phones,temp
|
||||
5.04,5.12,L,phones,temp
|
||||
5.12,5.33,AO1,phones,temp
|
||||
|
@ -96,14 +96,14 @@ Begin,End,Label,Type,Speaker
|
|||
6.57,6.67,S,phones,temp
|
||||
6.79,6.89,EH1,phones,temp
|
||||
6.89,6.95,N,phones,temp
|
||||
6.95,7.06,IY0,phones,temp
|
||||
7.06,7.13,AH0,phones,temp
|
||||
6.95,7.05,IY0,phones,temp
|
||||
7.05,7.13,AH0,phones,temp
|
||||
7.13,7.18,V,phones,temp
|
||||
7.18,7.22,IH0,phones,temp
|
||||
7.22,7.29,T,phones,temp
|
||||
7.29,7.34,S,phones,temp
|
||||
7.34,7.39,M,phones,temp
|
||||
7.39,7.49,AA1,phones,temp
|
||||
7.49,7.58,R,phones,temp
|
||||
7.58,7.69,K,phones,temp
|
||||
7.69,7.87,S,phones,temp
|
||||
7.39,7.5,AA1,phones,temp
|
||||
7.5,7.58,R,phones,temp
|
||||
7.58,7.7,K,phones,temp
|
||||
7.7,7.87,S,phones,temp
|
||||
|
|
|
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
Loading…
Reference in New Issue