Llama
070cfd339a
Strip the eos token from exllama generations.
...
The end-of-sequence (</s>) token indicates the end of a generation.
When a token sequence containing </s> is decoded, an extra (wrong)
space is inserted at the beginning of the generation. To avoid this,
strip the eos token out of the result before returning it.
The eos token was getting stripped later, so this doesn't change
the output except to avoid the spurious leading space.
2023-08-19 17:40:23 -07:00
0cc4m
973aea12ea
Only import big python modules for GPTQ once they get used
2023-07-23 22:07:34 +02:00
0cc4m
49740aa5ab
Fix ntk alpha
2023-07-23 21:56:48 +02:00
0cc4m
31a984aa3d
Automatically install exllama module
2023-07-23 07:33:51 +02:00
0cc4m
a9aa04fd1b
Merge remote-tracking branch 'upstream/united' into 4bit-plugin
2023-07-23 07:18:58 +02:00
0cc4m
09bb1021dd
Fallback to transformers if hf_bleeding_edge not available
2023-07-23 07:16:52 +02:00
0cc4m
748e5ef318
Add sliders for exllama context size and related methods
2023-07-23 07:11:28 +02:00
Henk
7a5d813b92
Reimplement HF workaround only for llama
2023-07-22 16:59:49 +02:00
Henk
8dd7b93a6c
HF's workaround breaks stuff
2023-07-22 16:29:55 +02:00
Henk
fa9d17b3d3
HF 4.31
2023-07-22 15:25:14 +02:00
Henk
7823da564e
Link to Lite
2023-07-22 04:04:17 +02:00
henk717
83e5c29260
Merge pull request #413 from one-some/bug-hunt
...
Fix WI comment editing
2023-07-22 00:34:46 +02:00
somebody
e68972a270
Fix WI comments
2023-07-21 16:14:13 -05:00
Henk
a17d7aae60
Easier english
2023-07-21 19:42:49 +02:00
Henk
da9b54ec1c
Don't show API link during load
2023-07-21 19:31:38 +02:00
Henk
432cdc9a08
Fix models with good pad tokens
2023-07-21 16:39:58 +02:00
Henk
ec745d8b80
Dont accidentally block pad tokens
2023-07-21 16:25:32 +02:00
henk717
dc4404f29c
Merge pull request #409 from nkpz/bnb8bit
...
Configurable quantization level, fix for broken toggles in model settings
2023-07-19 14:22:44 +02:00
Nick Perez
9581e51476
feature(load model): select control for quantization level
2023-07-19 07:58:12 -04:00
0cc4m
58908ab846
Revert aiserver.py changes
2023-07-19 07:14:03 +02:00
0cc4m
19f511dc9f
Load GPTQ module from GPTQ repo docs
2023-07-19 07:12:37 +02:00
0cc4m
1c5da2bbf3
Move pip docs from KoboldAI into GPTQ repo
2023-07-19 07:08:39 +02:00
0cc4m
7516ecf00d
Merge upstream changes, fix conflict
2023-07-19 07:02:29 +02:00
0cc4m
c84d063be8
Revert settings changes
2023-07-19 07:01:11 +02:00
0cc4m
9aa6c5fbbf
Merge upstream changes, fix conflict, adapt backends to changes
2023-07-19 06:56:09 +02:00
Nick Perez
0142913060
8 bit toggle, fix for broken toggle values
2023-07-18 23:29:38 -04:00
Henk
22e7baec52
Permit CPU layers on 4-bit (Worse than GGML)
2023-07-18 21:44:34 +02:00
henk717
5f2600d338
Merge pull request #406 from ebolam/Model_Plugins
...
Clarified message on what's required for model backend parameters
2023-07-18 02:42:23 +02:00
ebolam
66192efdb7
Clarified message on what's required for model backend parameters in the command line
2023-07-17 20:30:41 -04:00
Henk
5bbcdc47da
4-bit on Colab
2023-07-18 01:48:01 +02:00
henk717
da9226fba5
Merge pull request #401 from ebolam/Model_Plugins
...
Save the 4-bit flag to the model settings.
2023-07-18 01:19:43 +02:00
henk717
fee79928c8
Merge pull request #404 from one-some/united
...
Delete basic 4bit
2023-07-18 01:19:14 +02:00
somebody
1637760fa1
Delete basic 4bit
...
And add code to handle dangling __pycache__s
2023-07-17 18:16:03 -05:00
henk717
5c3a8e295a
Merge pull request #402 from one-some/united
...
Patches: Make lazyload work with quantization
2023-07-17 23:53:14 +02:00
somebody
23b95343bd
Patches: Make lazyload work on quantized
...
i wanna watch youtube while my model is loading without locking up my
system >:(
2023-07-17 16:47:31 -05:00
ebolam
4acf9235db
Merge branch 'Model_Plugins' of https://github.com/ebolam/KoboldAI into Model_Plugins
2023-07-17 09:52:10 -04:00
ebolam
b9ee6e336a
Save the 4-bit flag to the model settings.
2023-07-17 09:50:03 -04:00
ebolam
66377fc09e
Save the 4-bit flag to the model settings.
2023-07-17 09:48:01 -04:00
henk717
e8d84bb787
Merge pull request #400 from ebolam/Model_Plugins
...
missed the elif
2023-07-17 15:16:34 +02:00
ebolam
eafb699bbf
missed the elif
2023-07-17 09:12:45 -04:00
henk717
a3b0c6dd60
Merge pull request #399 from ebolam/Model_Plugins
...
Update to the upload_file function
2023-07-17 15:11:40 +02:00
ebolam
bfb26ab55d
Ban uploading to the modeling directory
2023-07-17 09:05:22 -04:00
ebolam
52e061d0f9
Fix for potential jailbreak
2023-07-17 08:55:23 -04:00
henk717
f7561044c6
Merge pull request #398 from Alephrin/patch-1
...
Speeds up bnb 4bit with a custom BitsAndBytesConfig
2023-07-17 13:22:44 +02:00
Alephrin
145a43a000
Removed extra load_in_4bit.
2023-07-17 04:53:47 -06:00
Alephrin
e9913d657a
Speeds up bnb 4bit with a custom BitsAndBytesConfig
...
With this BitsAndBytesConfig I get about double the speed compared to running without it. (Tested on llama 13B with a 3090)
2023-07-17 04:43:43 -06:00
Henk
6d7e9e6771
Post4 BnB for Linux
2023-07-16 02:13:42 +02:00
Henk
8bef2e5fef
Fixes 16-bit if BnB is not installed
2023-07-16 02:02:58 +02:00
henk717
fac006125e
Merge pull request #397 from ebolam/Model_Plugins
...
Fixes for model backend UI
2023-07-15 23:58:24 +02:00
0cc4m
e78361fc8f
Pull upstream changes, fix conflicts
2023-07-15 23:01:52 +02:00