Commit Graph

4401 Commits

Author SHA1 Message Date
Llama
070cfd339a Strip the eos token from exllama generations.
The end-of-sequence (</s>) token indicates the end of a generation.
When a token sequence containing </s> is decoded, an extra (wrong)
space is inserted at the beginning of the generation. To avoid this,
strip the eos token out of the result before returning it.
The eos token was getting stripped later, so this doesn't change
the output except to avoid the spurious leading space.
2023-08-19 17:40:23 -07:00
0cc4m
973aea12ea Only import big python modules for GPTQ once they get used 2023-07-23 22:07:34 +02:00
0cc4m
49740aa5ab Fix ntk alpha 2023-07-23 21:56:48 +02:00
0cc4m
31a984aa3d Automatically install exllama module 2023-07-23 07:33:51 +02:00
0cc4m
a9aa04fd1b Merge remote-tracking branch 'upstream/united' into 4bit-plugin 2023-07-23 07:18:58 +02:00
0cc4m
09bb1021dd Fallback to transformers if hf_bleeding_edge not available 2023-07-23 07:16:52 +02:00
0cc4m
748e5ef318 Add sliders for exllama context size and related methods 2023-07-23 07:11:28 +02:00
Henk
7a5d813b92 Reimplement HF workaround only for llama 2023-07-22 16:59:49 +02:00
Henk
8dd7b93a6c HF's workaround breaks stuff 2023-07-22 16:29:55 +02:00
Henk
fa9d17b3d3 HF 4.31 2023-07-22 15:25:14 +02:00
Henk
7823da564e Link to Lite 2023-07-22 04:04:17 +02:00
henk717
83e5c29260 Merge pull request #413 from one-some/bug-hunt
Fix WI comment editing
2023-07-22 00:34:46 +02:00
somebody
e68972a270 Fix WI comments 2023-07-21 16:14:13 -05:00
Henk
a17d7aae60 Easier english 2023-07-21 19:42:49 +02:00
Henk
da9b54ec1c Don't show API link during load 2023-07-21 19:31:38 +02:00
Henk
432cdc9a08 Fix models with good pad tokens 2023-07-21 16:39:58 +02:00
Henk
ec745d8b80 Dont accidentally block pad tokens 2023-07-21 16:25:32 +02:00
henk717
dc4404f29c Merge pull request #409 from nkpz/bnb8bit
Configurable quantization level, fix for broken toggles in model settings
2023-07-19 14:22:44 +02:00
Nick Perez
9581e51476 feature(load model): select control for quantization level 2023-07-19 07:58:12 -04:00
0cc4m
58908ab846 Revert aiserver.py changes 2023-07-19 07:14:03 +02:00
0cc4m
19f511dc9f Load GPTQ module from GPTQ repo docs 2023-07-19 07:12:37 +02:00
0cc4m
1c5da2bbf3 Move pip docs from KoboldAI into GPTQ repo 2023-07-19 07:08:39 +02:00
0cc4m
7516ecf00d Merge upstream changes, fix conflict 2023-07-19 07:02:29 +02:00
0cc4m
c84d063be8 Revert settings changes 2023-07-19 07:01:11 +02:00
0cc4m
9aa6c5fbbf Merge upstream changes, fix conflict, adapt backends to changes 2023-07-19 06:56:09 +02:00
Nick Perez
0142913060 8 bit toggle, fix for broken toggle values 2023-07-18 23:29:38 -04:00
Henk
22e7baec52 Permit CPU layers on 4-bit (Worse than GGML) 2023-07-18 21:44:34 +02:00
henk717
5f2600d338 Merge pull request #406 from ebolam/Model_Plugins
Clarified message on what's required for model backend parameters
2023-07-18 02:42:23 +02:00
ebolam
66192efdb7 Clarified message on what's required for model backend parameters in the command line 2023-07-17 20:30:41 -04:00
Henk
5bbcdc47da 4-bit on Colab 2023-07-18 01:48:01 +02:00
henk717
da9226fba5 Merge pull request #401 from ebolam/Model_Plugins
Save the 4-bit flag to the model settings.
2023-07-18 01:19:43 +02:00
henk717
fee79928c8 Merge pull request #404 from one-some/united
Delete basic 4bit
2023-07-18 01:19:14 +02:00
somebody
1637760fa1 Delete basic 4bit
And add code to handle dangling __pycache__s
2023-07-17 18:16:03 -05:00
henk717
5c3a8e295a Merge pull request #402 from one-some/united
Patches: Make lazyload work with quantization
2023-07-17 23:53:14 +02:00
somebody
23b95343bd Patches: Make lazyload work on quantized
i wanna watch youtube while my model is loading without locking up my
system >:(
2023-07-17 16:47:31 -05:00
ebolam
4acf9235db Merge branch 'Model_Plugins' of https://github.com/ebolam/KoboldAI into Model_Plugins 2023-07-17 09:52:10 -04:00
ebolam
b9ee6e336a Save the 4-bit flag to the model settings. 2023-07-17 09:50:03 -04:00
ebolam
66377fc09e Save the 4-bit flag to the model settings. 2023-07-17 09:48:01 -04:00
henk717
e8d84bb787 Merge pull request #400 from ebolam/Model_Plugins
missed the elif
2023-07-17 15:16:34 +02:00
ebolam
eafb699bbf missed the elif 2023-07-17 09:12:45 -04:00
henk717
a3b0c6dd60 Merge pull request #399 from ebolam/Model_Plugins
Update to the upload_file function
2023-07-17 15:11:40 +02:00
ebolam
bfb26ab55d Ban uploading to the modeling directory 2023-07-17 09:05:22 -04:00
ebolam
52e061d0f9 Fix for potential jailbreak 2023-07-17 08:55:23 -04:00
henk717
f7561044c6 Merge pull request #398 from Alephrin/patch-1
Speeds up bnb 4bit with a custom BitsAndBytesConfig
2023-07-17 13:22:44 +02:00
Alephrin
145a43a000 Removed extra load_in_4bit. 2023-07-17 04:53:47 -06:00
Alephrin
e9913d657a Speeds up bnb 4bit with a custom BitsAndBytesConfig
With this BitsAndBytesConfig I get about double the speed compared to running without it. (Tested on llama 13B with a 3090)
2023-07-17 04:43:43 -06:00
Henk
6d7e9e6771 Post4 BnB for Linux 2023-07-16 02:13:42 +02:00
Henk
8bef2e5fef Fixes 16-bit if BnB is not installed 2023-07-16 02:02:58 +02:00
henk717
fac006125e Merge pull request #397 from ebolam/Model_Plugins
Fixes for model backend UI
2023-07-15 23:58:24 +02:00
0cc4m
e78361fc8f Pull upstream changes, fix conflicts 2023-07-15 23:01:52 +02:00