Commit Graph

210 Commits

Author SHA1 Message Date
0cc4m
09bb1021dd Fallback to transformers if hf_bleeding_edge not available 2023-07-23 07:16:52 +02:00
0cc4m
748e5ef318 Add sliders for exllama context size and related methods 2023-07-23 07:11:28 +02:00
0cc4m
7516ecf00d Merge upstream changes, fix conflict 2023-07-19 07:02:29 +02:00
0cc4m
9aa6c5fbbf Merge upstream changes, fix conflict, adapt backends to changes 2023-07-19 06:56:09 +02:00
Henk
22e7baec52 Permit CPU layers on 4-bit (Worse than GGML) 2023-07-18 21:44:34 +02:00
Henk
5bbcdc47da 4-bit on Colab 2023-07-18 01:48:01 +02:00
henk717
da9226fba5 Merge pull request #401 from ebolam/Model_Plugins
Save the 4-bit flag to the model settings.
2023-07-18 01:19:43 +02:00
somebody
1637760fa1 Delete basic 4bit
And add code to handle dangling __pycache__s
2023-07-17 18:16:03 -05:00
somebody
23b95343bd Patches: Make lazyload work on quantized
i wanna watch youtube while my model is loading without locking up my
system >:(
2023-07-17 16:47:31 -05:00
ebolam
b9ee6e336a Save the 4-bit flag to the model settings. 2023-07-17 09:50:03 -04:00
Alephrin
145a43a000 Removed extra load_in_4bit. 2023-07-17 04:53:47 -06:00
Alephrin
e9913d657a Speeds up bnb 4bit with a custom BitsAndBytesConfig
With this BitsAndBytesConfig I get about double the speed compared to running without it. (Tested on llama 13B with a 3090)
2023-07-17 04:43:43 -06:00
Henk
8bef2e5fef Fixes 16-bit if BnB is not installed 2023-07-16 02:02:58 +02:00
0cc4m
e78361fc8f Pull upstream changes, fix conflicts 2023-07-15 23:01:52 +02:00
Henk
0622810bc4 Better way of doing the if statement 2023-07-15 20:00:29 +02:00
Henk
23a104a4fe Only show 4-bit toggle on valid model 2023-07-15 19:42:26 +02:00
Henk
71b6e8d6d4 Fix accidental parameters overwrite 2023-07-15 19:35:40 +02:00
Henk
c43d60772b BnB dependency check 2023-07-15 18:56:13 +02:00
Henk
160effb9ea Add 4-bit BnB toggle 2023-07-15 18:20:10 +02:00
Henk
2c50d5d092 Don't ruin breakmodel 2023-07-15 14:14:06 +02:00
Henk
1f045110a4 Basic 4-bit backend 2023-07-15 02:49:31 +02:00
onesome
afa8766ea6 Add is_valid 2023-07-14 18:01:18 -05:00
somebody
f67cb7fa05 Make basic hf independant from hf 2023-07-12 18:36:30 -05:00
somebody
d17ce8461d Use device_map="auto" 2023-07-12 17:27:48 -05:00
somebody
60473d4c23 Fix and add some documentation to basic hf backend 2023-07-12 17:16:05 -05:00
onesome
8077d6c3f9 Self-contained sampler patch (Don't merge)
Completely untested 3:00 AM code; beware! I will test and add more
documentation tomorrow.
2023-07-12 03:22:43 -05:00
somebody
20b4b4bcef Add basic hf backend 2023-07-08 17:12:16 -05:00
somebody
3928d86339 Fall back to unpatched HF 2023-07-08 14:36:45 -05:00
somebody
c2ee30af32 Add --panic to raise when loading fails 2023-07-08 14:04:46 -05:00
Henk
16240878bc Restore --peft support 2023-07-04 20:42:29 +02:00
somebody
bce1a907e5 Update aux device to depend on primary device 2023-07-03 19:36:31 -05:00
somebody
6f7e6422ef Actually get correct primary device 2023-07-03 19:04:48 -05:00
somebody
59c731f805 Fix static primary_device
and some small cleanup
2023-07-03 18:37:48 -05:00
Henk
81e72329af CPU fixes 2023-07-02 21:50:23 +02:00
0cc4m
0e4b6571d5 Fix non-tuple return from gptq function 2023-06-28 22:50:04 +02:00
0cc4m
c753671ac1 Add exllama superhot positional embeddings compression support 2023-06-27 07:39:37 +02:00
Henk
1da4580e8b Remove wrong usegpu behavior 2023-06-22 07:07:02 +02:00
somebody
5ee20bd7d6 Fix for CPU loading 2023-06-21 21:18:43 -05:00
somebody
b81f61b820 Clean debug 2023-06-21 18:35:56 -05:00
somebody
947bcc58e4 Experiments 2023-06-21 17:33:14 -05:00
somebody
0012158eac Remove old 2023-06-21 16:58:59 -05:00
somebody
6bdcf2645e Merge branch 'united' of https://github.com/henk717/KoboldAI into accelerate-offloading 2023-06-21 16:58:39 -05:00
somebody
c40649a74e Probably fix f32 2023-06-21 16:54:41 -05:00
somebody
aca2b532d7 Remove debug 2023-06-21 14:15:38 -05:00
somebody
5f224e1366 Restore choice of lazyload or not 2023-06-21 14:13:14 -05:00
somebody
0052ad401a Basic breakmodel ui support
Seems to work
2023-06-21 13:57:32 -05:00
Henk
bbecdaeedb Silently disable MTJ when Jax is not installed 2023-06-21 17:08:45 +02:00
0cc4m
e8741a1b57 Disable scaled_dot_product_attention if torch version < 2 2023-06-20 09:19:43 +02:00
0cc4m
a191855b37 Track token generation progress 2023-06-19 19:14:26 +02:00
0cc4m
e874f0c1c2 Add token streaming support for exllama 2023-06-19 19:14:26 +02:00