0cc4m
3d4d5df76b
Remove rocm wheel, because it didn't work correctly
2023-05-13 20:33:13 +02:00
0cc4m
7f7b350741
Catch further error during multigpu 4bit setup
2023-05-13 20:31:01 +02:00
0cc4m
266c0574f6
Fix 4bit pt loading, add traceback output to GPT2 fallback
2023-05-13 20:15:11 +02:00
0cc4m
a2d01bb9e4
Update to GPTQ module 0.0.2, add support for upstream cuda quantizations, automatic detection
2023-05-09 22:20:35 +02:00
0cc4m
6121598142
Fix multigpu loading without lazy-loader
2023-05-08 22:57:09 +02:00
0cc4m
4f94247910
Fix chat mode empty generation error
2023-05-08 22:56:17 +02:00
0cc4m
e55a9d31c2
Update readme, clean up gitmodules file
2023-05-08 22:55:59 +02:00
0cc4m
6b4d3218d6
Fix OOM when loading large model split across GPUs
2023-05-07 06:55:51 +02:00
0cc4m
51e6dcdcd4
Revert accidental install_requirements change
2023-05-07 06:42:32 +02:00
0cc4m
9ec50c9972
Fix 4-bit mpt
2023-05-06 21:58:23 +02:00
0cc4m
a9fa199c49
Rename gptq module, pull fix
2023-05-06 21:30:33 +02:00
0cc4m
4a14c6a446
Merge pull request #10 from 0cc4m/model-structure-update
...
Model structure update
2023-05-06 20:55:16 +02:00
0cc4m
2f7856f0d1
Use GPTQ python module, add MPT quantized support
2023-05-06 20:52:42 +02:00
Henk
dedf2afeb3
More max_context_length flexibility
2023-05-05 20:09:51 +02:00
0cc4m
43b0afc7a8
Add safe MPT support
2023-05-05 20:07:10 +02:00
0cc4m
4180620999
Remove unnecessary changes, move gptq detection function to 4bit.py
2023-05-04 19:52:56 +02:00
0cc4m
d48fedcbfb
Fix llama 4-bit loading error
2023-05-04 18:31:37 +02:00
0cc4m
ef358fdf5a
Merge remote-tracking branch 'origin/united' into model-structure-update
2023-05-04 07:31:13 +02:00
0cc4m
1166c07bc3
Merge latestgptq, fix conflicts
2023-05-04 07:30:49 +02:00
Henk
a87d5d6f23
Remove HF's llama workaround
2023-05-03 20:18:40 +02:00
henk717
7f5242db17
Merge pull request #344 from pi6am/fix/llama-tokens
...
Fix/llama tokens
2023-05-03 19:07:47 +02:00
Llama
35d344b951
Remove torch dependency and more generic dim0 workaround
...
Remove torch dependency from hf.py
Make workaround for dimension zero values of token_ids
more generic to handle every token, not just newlines.
2023-05-03 09:48:16 -07:00
henk717
11a9f562a2
Merge pull request #346 from ebolam/united
...
More UI2 paste error fixes
2023-05-03 18:33:58 +02:00
0cc4m
58f0a336cb
Merge upstream changes, fix conflict
2023-05-03 18:33:11 +02:00
ebolam
0c9537e910
Potential fix for putting pasted text in wrong action
2023-05-03 12:04:05 -04:00
ebolam
fa3611b994
Update to United
...
Update to United
2023-05-03 10:54:17 -04:00
henk717
f958f086f1
Merge pull request #343 from LostRuins/united
...
Added v27 of Embedded Kobold Lite, which will now be usable locally.
2023-05-03 12:30:39 +02:00
Llama
3768848548
Fix tokenization and whitespace issues with llama-derived models
...
Work around the 'soft' prefix space behavior of sentencepiece.
Override encode to restore the deleted HF support for decode_with_prefix_space.
Override decode to skip the soft space and return true decoded tokens.
Allow submitting chat messages with embedded newlines.
Split sentences between punctuation and whitespace, rather than after whitespace.
Also include trailing quotes and brackets after sentence stoppers.
This avoids splitting ." and .) into two tokens, for instance.
Insert whitespace at the beginning of the author's note, since sentences are
split with leading whitespace.
Remove spurious newlines at the end of chat responses.
2023-05-03 01:27:11 -07:00
Concedo
063131a2e6
Added v27 of Embedded Kobold Lite, which will now be usable locally.
...
Avoid modifying this file directly since it will be overwritten in future versions - submit changes to the Lite repo instead.
2023-05-03 14:53:06 +08:00
Llama
507da6fcf7
Merge pull request #30 from henk717/united
...
Merge large refactor from united.
2023-05-02 21:25:47 -07:00
Henk
5d1ee39250
Fix loadmodelsettings
2023-05-03 04:21:37 +02:00
henk717
724ba43dc1
Merge pull request #342 from one-some/model-structure-and-maybe-rwkv
...
Move overrides to better places
2023-05-03 03:34:17 +02:00
somebody
4b3b240bce
Move loadmodelsettings
2023-05-02 20:33:37 -05:00
somebody
a0f4ab5c6a
Move bad token grabber until after newlinemode has been deduced
2023-05-02 20:23:36 -05:00
somebody
efe268df60
Move overrides to better places
2023-05-02 20:18:33 -05:00
Henk
480919a2a7
Nicer way of serving lite
2023-05-03 01:16:02 +02:00
Henk
03e10bed82
/lite (Not functional yet)
2023-05-03 01:04:51 +02:00
Henk
de7b760048
Typo Fix
2023-05-03 01:02:50 +02:00
0cc4m
dd6644aaf0
Pytorch 2.0 ( #18 )
...
* Update huggingface.yml to Pytorch 2.0 and CUDA 11.8
* Update github docs pip wheel hub
Update ROCm requirements
* Add rocm wheel
2023-05-02 22:11:28 +02:00
0cc4m
9c3d578d6c
Work on model download support
2023-05-02 21:32:20 +02:00
henk717
50c9ed3af1
Merge pull request #299 from one-some/model-structure-and-maybe-rwkv
...
Structure changes
2023-05-02 18:07:09 +02:00
somebody
111028642e
Fix tokenizer fallback for llama
2023-05-01 19:42:52 -05:00
somebody
f6b5548131
Support safetensors in get_sharded_checkpoint_num_tensors
2023-05-01 19:15:27 -05:00
somebody
97e84928ba
Download all shards correctly on aria2 and raise on bad load key
2023-05-01 18:53:36 -05:00
somebody
933dbd634a
HFInferenceModel: Make badwordsids not unique to torch
2023-05-01 17:13:33 -05:00
somebody
c95be636a4
Merge branch 'united' of https://github.com/henk717/KoboldAI into model-structure-and-maybe-rwkv
2023-05-01 17:08:20 -05:00
somebody
ce3d465972
Remove some debug
2023-05-01 17:03:34 -05:00
ebolam
5a32159e58
Remove debug prints
2023-05-01 10:53:02 -04:00
ebolam
137d056cb3
Fix for pasting text in the middle of an action
2023-05-01 10:48:45 -04:00
0cc4m
f83a0aa122
Merge latest changes, fix conflict
2023-05-01 08:01:54 +02:00