0cc4m
43b0afc7a8
Add safe MPT support
2023-05-05 20:07:10 +02:00
0cc4m
4180620999
Remove unnecessary changes, move gptq detection function to 4bit.py
2023-05-04 19:52:56 +02:00
0cc4m
d48fedcbfb
Fix llama 4-bit loading error
2023-05-04 18:31:37 +02:00
0cc4m
ef358fdf5a
Merge remote-tracking branch 'origin/united' into model-structure-update
2023-05-04 07:31:13 +02:00
Henk
a87d5d6f23
Remove HF's llama workaround
2023-05-03 20:18:40 +02:00
Llama
35d344b951
Remove torch dependency and more generic dim0 workaround
...
Remove torch dependency from hf.py
Make workaround for dimension zero values of token_ids
more generic to handle every token, not just newlines.
2023-05-03 09:48:16 -07:00
0cc4m
58f0a336cb
Merge upstream changes, fix conflict
2023-05-03 18:33:11 +02:00
Llama
3768848548
Fix tokenization and whitespace issues with llama-derived models
...
Work around the 'soft' prefix space behavior of sentencepiece.
Override encode to restore the deleted HF support for decode_with_prefix_space.
Override decode to skip the soft space and return true decoded tokens.
Allow submitting chat messages with embedded newlines.
Split sentences between punctuation and whitespace, rather than after whitespace.
Also include trailing quotes and brackets after sentence stoppers.
This avoids splitting ." and .) into two tokens, for instance.
Insert whitespace at the beginning of the author's note, since sentences are
split with leading whitespace.
Remove spurious newlines at the end of chat responses.
2023-05-03 01:27:11 -07:00
henk717
724ba43dc1
Merge pull request #342 from one-some/model-structure-and-maybe-rwkv
...
Move overrides to better places
2023-05-03 03:34:17 +02:00
somebody
a0f4ab5c6a
Move bad token grabber until after newlinemode has been deduced
2023-05-02 20:23:36 -05:00
somebody
efe268df60
Move overrides to better places
2023-05-02 20:18:33 -05:00
Henk
de7b760048
Typo Fix
2023-05-03 01:02:50 +02:00
0cc4m
9c3d578d6c
Work on model download support
2023-05-02 21:32:20 +02:00
somebody
111028642e
Fix tokenizer fallback for llama
2023-05-01 19:42:52 -05:00
somebody
f6b5548131
Support safetensors in get_sharded_checkpoint_num_tensors
2023-05-01 19:15:27 -05:00
somebody
97e84928ba
Download all shards correctly on aria2 and raise on bad load key
2023-05-01 18:53:36 -05:00
somebody
933dbd634a
HFInferenceModel: Make badwordsids not unique to torch
2023-05-01 17:13:33 -05:00
somebody
ce3d465972
Remove some debug
2023-05-01 17:03:34 -05:00
0cc4m
f83a0aa122
Merge latest changes, fix conflict
2023-05-01 08:01:54 +02:00
0cc4m
aa67135d42
Implement new model format
...
Remove 4bit toggle
2023-04-30 21:59:22 +02:00
0cc4m
20a5587d66
Always use offloader script, because it speeds up multi gpu
2023-04-30 18:17:43 +02:00
one-some
455b8257a9
Implement softprompt hack
2023-04-28 10:26:59 -05:00
somebody
ace4364339
Two more time
2023-04-27 21:13:26 -05:00
somebody
446f38ee9d
One more time
2023-04-27 21:07:34 -05:00
somebody
2eee535540
Actually fix decoding with soft prompts
...
it really wants a tensor
2023-04-27 21:01:12 -05:00
somebody
ffa7b22734
Experiment
2023-04-27 20:28:04 -05:00
somebody
cd1eb97c2a
Debuuuug
2023-04-27 20:12:29 -05:00
somebody
4559112551
Potential fix
2023-04-27 19:51:10 -05:00
somebody
b256a8fbc7
Debug
2023-04-27 19:33:03 -05:00
onesome
467f2f25eb
More loading fixes
2023-04-26 16:58:33 -05:00
onesome
d4f7b60dc9
Fix for multiple paths
2023-04-26 16:49:12 -05:00
onesome
6776a71532
Add more info to custom model error
2023-04-26 16:36:52 -05:00
onesome
bbf4963d6e
Fix custmodpth stuff for hf loading
2023-04-26 16:18:45 -05:00
onesome
c146ae9d84
Delete legacy gpt2 custom loader
2023-04-26 16:07:18 -05:00
onesome
9579298df7
Better fallback
2023-04-25 22:28:07 -05:00
onesome
6e3aebc1ea
Zap debug
2023-04-25 21:13:17 -05:00
onesome
d496e861f4
Undo pretty code because I haven't cracked the jax enigma yet
2023-04-25 21:11:49 -05:00
onesome
1db9d9ba61
Lazyload: Whoops
2023-04-25 18:46:54 -05:00
onesome
e28e268a2d
Use safetensors only when available
2023-04-25 18:32:37 -05:00
onesome
0268305cfe
Change fallback notifications to warnings
2023-04-25 18:26:49 -05:00
onesome
b8bef641ff
Merge branch 'united' of https://github.com/henk717/KoboldAI into model-structure-and-maybe-rwkv
2023-04-25 16:54:53 -05:00
0cc4m
934571857b
Fix offloading
2023-04-18 22:52:54 +02:00
0cc4m
1ef515f4c2
Fix lazy-loading on 4-bit
2023-04-17 07:21:18 +02:00
0cc4m
4d34f9b7de
Move 4-bit loading code to separate inference_model file
2023-04-16 14:20:13 +02:00
somebody
f9fb5eba89
Remove debug
2023-04-15 18:56:49 -05:00
somebody
5dd67d027a
Workaround for socketio context errors for loading
2023-04-15 18:54:21 -05:00
somebody
08b4e317ff
Fix double slashing
2023-04-15 13:30:05 -05:00
somebody
d3a73aaeba
Fix api
2023-04-15 13:17:20 -05:00
somebody
4dcf570407
Fix legacy model loading
2023-04-15 12:57:35 -05:00
one-some
1b500c7179
Merge pull request #5 from LostRuins/concedo_api
...
Added stop sequences functionality for API calls
2023-04-15 10:51:31 -05:00