Commit Graph

88 Commits

Author SHA1 Message Date
0cc4m
43b0afc7a8 Add safe MPT support 2023-05-05 20:07:10 +02:00
0cc4m
4180620999 Remove unnecessary changes, move gptq detection function to 4bit.py 2023-05-04 19:52:56 +02:00
0cc4m
d48fedcbfb Fix llama 4-bit loading error 2023-05-04 18:31:37 +02:00
0cc4m
ef358fdf5a Merge remote-tracking branch 'origin/united' into model-structure-update 2023-05-04 07:31:13 +02:00
Henk
a87d5d6f23 Remove HF's llama workaround 2023-05-03 20:18:40 +02:00
Llama
35d344b951 Remove torch dependency and more generic dim0 workaround
Remove torch dependency from hf.py
Make workaround for dimension zero values of token_ids
more generic to handle every token, not just newlines.
2023-05-03 09:48:16 -07:00
0cc4m
58f0a336cb Merge upstream changes, fix conflict 2023-05-03 18:33:11 +02:00
Llama
3768848548 Fix tokenization and whitespace issues with llama-derived models
Work around the 'soft' prefix space behavior of sentencepiece.
Override encode to restore the deleted HF support for decode_with_prefix_space.
Override decode to skip the soft space and return true decoded tokens.
Allow submitting chat messages with embedded newlines.
Split sentences between punctuation and whitespace, rather than after whitespace.
Also include trailing quotes and brackets after sentence stoppers.
This avoids splitting ." and .) into two tokens, for instance.
Insert whitespace at the beginning of the author's note, since sentences are
split with leading whitespace.
Remove spurious newlines at the end of chat responses.
2023-05-03 01:27:11 -07:00
henk717
724ba43dc1 Merge pull request #342 from one-some/model-structure-and-maybe-rwkv
Move overrides to better places
2023-05-03 03:34:17 +02:00
somebody
a0f4ab5c6a Move bad token grabber until after newlinemode has been deduced 2023-05-02 20:23:36 -05:00
somebody
efe268df60 Move overrides to better places 2023-05-02 20:18:33 -05:00
Henk
de7b760048 Typo Fix 2023-05-03 01:02:50 +02:00
0cc4m
9c3d578d6c Work on model download support 2023-05-02 21:32:20 +02:00
somebody
111028642e Fix tokenizer fallback for llama 2023-05-01 19:42:52 -05:00
somebody
f6b5548131 Support safetensors in get_sharded_checkpoint_num_tensors 2023-05-01 19:15:27 -05:00
somebody
97e84928ba Download all shards correctly on aria2 and raise on bad load key 2023-05-01 18:53:36 -05:00
somebody
933dbd634a HFInferenceModel: Make badwordsids not unique to torch 2023-05-01 17:13:33 -05:00
somebody
ce3d465972 Remove some debug 2023-05-01 17:03:34 -05:00
0cc4m
f83a0aa122 Merge latest changes, fix conflict 2023-05-01 08:01:54 +02:00
0cc4m
aa67135d42 Implement new model format
Remove 4bit toggle
2023-04-30 21:59:22 +02:00
0cc4m
20a5587d66 Always use offloader script, because it speeds up multi gpu 2023-04-30 18:17:43 +02:00
one-some
455b8257a9 Implement softprompt hack 2023-04-28 10:26:59 -05:00
somebody
ace4364339 Two more time 2023-04-27 21:13:26 -05:00
somebody
446f38ee9d One more time 2023-04-27 21:07:34 -05:00
somebody
2eee535540 Actually fix decoding with soft prompts
it really wants a tensor
2023-04-27 21:01:12 -05:00
somebody
ffa7b22734 Experiment 2023-04-27 20:28:04 -05:00
somebody
cd1eb97c2a Debuuuug 2023-04-27 20:12:29 -05:00
somebody
4559112551 Potential fix 2023-04-27 19:51:10 -05:00
somebody
b256a8fbc7 Debug 2023-04-27 19:33:03 -05:00
onesome
467f2f25eb More loading fixes 2023-04-26 16:58:33 -05:00
onesome
d4f7b60dc9 Fix for multiple paths 2023-04-26 16:49:12 -05:00
onesome
6776a71532 Add more info to custom model error 2023-04-26 16:36:52 -05:00
onesome
bbf4963d6e Fix custmodpth stuff for hf loading 2023-04-26 16:18:45 -05:00
onesome
c146ae9d84 Delete legacy gpt2 custom loader 2023-04-26 16:07:18 -05:00
onesome
9579298df7 Better fallback 2023-04-25 22:28:07 -05:00
onesome
6e3aebc1ea Zap debug 2023-04-25 21:13:17 -05:00
onesome
d496e861f4 Undo pretty code because I haven't cracked the jax enigma yet 2023-04-25 21:11:49 -05:00
onesome
1db9d9ba61 Lazyload: Whoops 2023-04-25 18:46:54 -05:00
onesome
e28e268a2d Use safetensors only when available 2023-04-25 18:32:37 -05:00
onesome
0268305cfe Change fallback notifications to warnings 2023-04-25 18:26:49 -05:00
onesome
b8bef641ff Merge branch 'united' of https://github.com/henk717/KoboldAI into model-structure-and-maybe-rwkv 2023-04-25 16:54:53 -05:00
0cc4m
934571857b Fix offloading 2023-04-18 22:52:54 +02:00
0cc4m
1ef515f4c2 Fix lazy-loading on 4-bit 2023-04-17 07:21:18 +02:00
0cc4m
4d34f9b7de Move 4-bit loading code to separate inference_model file 2023-04-16 14:20:13 +02:00
somebody
f9fb5eba89 Remove debug 2023-04-15 18:56:49 -05:00
somebody
5dd67d027a Workaround for socketio context errors for loading 2023-04-15 18:54:21 -05:00
somebody
08b4e317ff Fix double slashing 2023-04-15 13:30:05 -05:00
somebody
d3a73aaeba Fix api 2023-04-15 13:17:20 -05:00
somebody
4dcf570407 Fix legacy model loading 2023-04-15 12:57:35 -05:00
one-some
1b500c7179 Merge pull request #5 from LostRuins/concedo_api
Added stop sequences functionality for API calls
2023-04-15 10:51:31 -05:00