mirror of
https://github.com/jasonppy/VoiceCraft.git
synced 2025-02-18 20:50:37 +01:00
122 lines
304 KiB
Plaintext
122 lines
304 KiB
Plaintext
{
|
|
"nbformat": 4,
|
|
"nbformat_minor": 0,
|
|
"metadata": {
|
|
"colab": {
|
|
"provenance": [],
|
|
"gpuType": "T4",
|
|
"authorship_tag": "ABX9TyPEhMt0mIcJv2BbaCwogF07",
|
|
"include_colab_link": true
|
|
},
|
|
"kernelspec": {
|
|
"name": "python3",
|
|
"display_name": "Python 3"
|
|
},
|
|
"language_info": {
|
|
"name": "python"
|
|
},
|
|
"accelerator": "GPU"
|
|
},
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "view-in-github",
|
|
"colab_type": "text"
|
|
},
|
|
"source": [
|
|
"<a href=\"https://colab.research.google.com/github/Sewlell/VoiceCraft-gradio-colab/blob/master/voicecraft.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"id": "Y87ixxsUVIhM"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"!git clone https://github.com/Sewlell/VoiceCraft-gradio-colab"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"source": [
|
|
"!pip install tensorboard\n",
|
|
"!pip install phonemizer\n",
|
|
"!pip install datasets\n",
|
|
"!pip install torchmetrics\n",
|
|
"\n",
|
|
"!apt-get install -y espeak espeak-data libespeak1 libespeak-dev\n",
|
|
"!apt-get install -y festival*\n",
|
|
"!apt-get install -y build-essential\n",
|
|
"!apt-get install -y flac libasound2-dev libsndfile1-dev vorbis-tools\n",
|
|
"!apt-get install -y libxml2-dev libxslt-dev zlib1g-dev\n",
|
|
"\n",
|
|
"!pip install -e git+https://github.com/facebookresearch/audiocraft.git@c5157b5bf14bf83449c17ea1eeb66c19fb4bc7f0#egg=audiocraft\n",
|
|
"\n",
|
|
"!pip install -r \"/content/VoiceCraft-gradio-colab/gradio_requirements.txt\""
|
|
],
|
|
"metadata": {
|
|
"id": "-w3USR91XdxY"
|
|
},
|
|
"execution_count": null,
|
|
"outputs": []
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"# Let it restarted, it won't let your entire installation be gone."
|
|
],
|
|
"metadata": {
|
|
"id": "jNuzjrtmv2n1"
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"# Note before launching the `gradio_app.py`\n",
|
|
"\n",
|
|
"***You will get JSON warning if you move anything beside `sample_batch_size`, `stop_repetition` and `seed`.*** Which for most advanced setting, `kvache` and `temperature` unable to set in different value.\n",
|
|
"\n",
|
|
"You will get fp16 compatibility issue if you set `whisper backend` to `whisperX`, for whatever reason, setting `forced alignment model` to `whisperX` doesn't do anything.\n",
|
|
"\n",
|
|
"You will download a .file File when you download the Output Audio for some reason. You will need to **convert the file from .snd to .wav/.mp3 manually**. Or if you enable showing file type in the name in Windows or wherever you are, change the file name to \"xxx.wav\" or \"xxx.mp3\". (know the solution? pull request my repository)\n",
|
|
"\n",
|
|
"Frequency of VRAM spikes no longer exist as well in April 5 Update.\n",
|
|
"\n",
|
|
"# **To those who want to voice cloning**\n",
|
|
"\n",
|
|
"\n",
|
|
"Don't make your input audio too long like in the screenshot, 20s-30s is fine. Or else it will raise up JSON issue. This one is due to how VoiceCraft worked so probably unfixable. It will add those text you want to get audio from at the end of the input audio transcript. It was way too much word for application or code to handle as it added up with original transcript. So please keep it short."
|
|
],
|
|
"metadata": {
|
|
"id": "qQ-And_w2vJV"
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"My tip on voice cloning is just \"get a good dataset\" that contain plausible amount of variety of tones. I guess that's it, you could always try experimenting with other voice that are hard to be clone.\n",
|
|
"\n",
|
|
"The inference speed is much stable. With sample text, T4 (Free Tier Colab GPU) can do 6-10s on 6s-8s `prompt end time`.\n",
|
|
"\n",
|
|
"I haven't test the Edit mode yet as those are not my focus, but you can try it."
|
|
],
|
|
"metadata": {
|
|
"id": "nnu2cY4t8P6X"
|
|
}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"source": [
|
|
"!python \"/content/VoiceCraft-gradio-colab/gradio_app.py\""
|
|
],
|
|
"metadata": {
|
|
"id": "NDt4r4DiXAwG"
|
|
},
|
|
"execution_count": null,
|
|
"outputs": []
|
|
}
|
|
]
|
|
} |