clearer tts instruction

2025-02-18 20:50:37 +01:00 · 2024-03-30 12:45:26 -07:00 · 2024-03-30 12:45:26 -07:00 · a6a67899a8
commit a6a67899a8
parent 741a6559e9
3 changed files with 45 additions and 12 deletions
--- a/README.md
+++ b/README.md
@ -10,6 +10,24 @@ To clone or edit an unseen voice, VoiceCraft needs only a few seconds of referen
 ## News
 :star: 03/28/2024: Model weights are up on HuggingFace🤗 [here](https://huggingface.co/pyp1/VoiceCraft/tree/main)!

+## TODO
+- [x] Codebase upload
+- [x] Environment setup
+- [x] Inference demo for speech editing and TTS
+- [x] Training guidance
+- [x] RealEdit dataset and training manifest
+- [x] Model weights (both 330M and 830M, the former seems to be just as good)
+- [ ] Write colab notebooks for better hands-on experience
+- [ ] HuggingFace Spaces demo
+- [ ] Better guidance on training
+
+## How to run TTS inference 
+There are two ways: 
+1. with docker. see [quickstart](#quickstart)
+2. without docker. see [envrionment setup](#environment-setup)
+
+When you are inside the docker image or you have installed all dependencies, Checkout [`inference_tts.ipynb`](./inference_tts.ipynb).
+
 ## QuickStart
 :star: To try out TTS inference with VoiceCraft, the best way is using docker. Thank [@ubergarm](https://github.com/ubergarm) and [@jayc88](https://github.com/jay-c88) for making this happen. 

@ -43,18 +61,6 @@ nvidia-smi
 echo GOOD LUCK
 ```

-## TODO
- [x] Codebase upload
- [x] Environment setup
- [x] Inference demo for speech editing and TTS
- [x] Training guidance
- [x] RealEdit dataset and training manifest
- [x] Model weights (both 330M and 830M, the former seems to be just as good)
- [ ] Write colab notebooks for better hands-on experience
- [ ] HuggingFace Spaces demo
- [ ] Better guidance on training
-
-
 ## Environment setup
 ```bash
 conda create -n voicecraft python=3.9.16
--- a/inference_speech_editing.ipynb
+++ b/inference_speech_editing.ipynb
@ -39,6 +39,19 @@
    "from models import voicecraft\n"
   ]
  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# install MFA models and dictionaries if you haven't done so already\n",
+    "!source ~/.bashrc && \\\n",
+    "    conda activate voicecraft && \\\n",
+    "    mfa model download dictionary english_us_arpa && \\\n",
+    "    mfa model download acoustic english_us_arpa"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 3,
--- a/inference_tts.ipynb
+++ b/inference_tts.ipynb
@ -11,6 +11,13 @@
    "Run the next cells one at a time up until the *STOP* and follow those instructions before continuing. You only have to do this the first time to setup the container."
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Only do the below if you are using docker"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
@ -93,6 +100,13 @@
    "Now you can run the rest of the notebook and get an audio sample output. It will automatically download more models and such. The next time you use this container, you can just start below here as the dependencies will remain available until you delete the docker container."
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Only do the above if you are using docker"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,