mirror of
https://github.com/KoboldAI/KoboldAI-Client.git
synced 2025-06-05 21:59:24 +02:00
Update readme, clean up gitmodules file
This commit is contained in:
8
.gitmodules
vendored
8
.gitmodules
vendored
@@ -4,11 +4,3 @@
|
|||||||
[submodule "KoboldAI-Horde-Bridge"]
|
[submodule "KoboldAI-Horde-Bridge"]
|
||||||
path = KoboldAI-Horde-Bridge
|
path = KoboldAI-Horde-Bridge
|
||||||
url = https://github.com/db0/KoboldAI-Horde-Bridge
|
url = https://github.com/db0/KoboldAI-Horde-Bridge
|
||||||
[submodule "repos/gptq"]
|
|
||||||
path = repos/gptq
|
|
||||||
url = https://github.com/0cc4m/GPTQ-for-LLaMa
|
|
||||||
branch = a8303654c200c25577130466e5f9bc1e70fc8a50
|
|
||||||
[submodule "repos/hf_bleeding_edge"]
|
|
||||||
path = repos/hf_bleeding_edge
|
|
||||||
url = https://github.com/0cc4m/hf_bleeding_edge
|
|
||||||
branch = b5d0b80c6947605b9ccf080fc17b68a516ea5857
|
|
||||||
|
@@ -42,16 +42,14 @@ Put your 4bit quantized .pt or .safetensors in that folder with all associated .
|
|||||||
|
|
||||||
Then move your model folder to KoboldAI/models, and rename the .pt or .safetensors file in your model folder to `4bit.pt` or `4bit.safetensors` for non-groupsize models or `4bit-<groupsize>g.pt` or `4bit-<groupsize>.safetensors` for a groupsize mode (Example: `4bit-128g.safetensors`)
|
Then move your model folder to KoboldAI/models, and rename the .pt or .safetensors file in your model folder to `4bit.pt` or `4bit.safetensors` for non-groupsize models or `4bit-<groupsize>g.pt` or `4bit-<groupsize>.safetensors` for a groupsize mode (Example: `4bit-128g.safetensors`)
|
||||||
|
|
||||||
So - your .pt's model folder should look like this: "4bit.pt, config.json, generation_config.json, pytorch_model.bin.index.json, special_tokens_map.json, tokenizer.model, tokenizer_config.json" Note: the 4bit.pt file can be in the same folder as the regular HF .bin files it was quantized from, so long as the 4-bit toggle switch is on, it'll load the quantized model (4-bit switch explained below).
|
So - your .pt's model folder should look like this: "4bit.pt, config.json, generation_config.json, pytorch_model.bin.index.json, special_tokens_map.json, tokenizer.model, tokenizer_config.json" Note: the 4bit.pt file can be in the same folder as the regular HF .bin files it was quantized from, it'll load the quantized model.
|
||||||
|
|
||||||
#### Running KoboldAI and loading 4bit models
|
#### Running KoboldAI and loading 4bit models
|
||||||
If you haven't done so already, exit the command prompt/leave KAI's conda env. (Close the commandline window on Windows, run `exit` on Linux)
|
If you haven't done so already, exit the command prompt/leave KAI's conda env. (Close the commandline window on Windows, run `exit` on Linux)
|
||||||
|
|
||||||
Run `play.bat` [windows], `play.sh` [linux Nvidia], or `play-rocm.sh` [linux AMD]
|
Run `play.bat` [windows], `play.sh` [linux Nvidia], or `play-rocm.sh` [linux AMD]
|
||||||
|
|
||||||
Switch to UI2, then load your model and be sure 4-bit toggle is on.
|
Switch to UI2, then load your model.
|
||||||
|
|
||||||
The 4bit toggle shows when a model to load is selected.
|
|
||||||
|
|
||||||
## KoboldAI - Your gateway to GPT writing
|
## KoboldAI - Your gateway to GPT writing
|
||||||
|
|
||||||
|
Reference in New Issue
Block a user