The goal of this series of commits is to have an implementation-agnostic
interface for models, thus being less reliant on HF Transformers for model
support. A model object will have a method for generation, a list of callbacks
to be run on every token generation, a list of samplers that will modify
probabilities, etc. Basically anything HF can do should be easily
implementable with the new interface :^)
Currently I've tested the loading of pre-downloaded models with
breakmodel between GPUs and that works, though essentially no testing
has been done in the larger scheme of things. Currently this is about
the only supported configuration, and generation isn't very functional.
The `saveas` method was modified to take a data dict but one of the
else blocks still referred to the previous `name` parameter. Assign
to `name` to fix the `NameError: name 'name' is not defined` exception.
This is a breaking change that allows 4.25.1 to work because they also have done breaking changes. If you do not make use of our automatic updater please update the dependencies when updating to this build.
This commit restores the chat models menu now we finally have good chat models available again.
Unfortunately huggingface reports back pytorch_model.bin even if the model's name is model.safetensors. I don't have a good way to combat this at the moment, so instead we now do a hack where if the model copy fails it manually tries model.safetensors instead hoping that it will work.
This fixes Pygmalion for now, if new issues arise from this in the future from other models we have to implement a cleaner method.
This commit decouples single line mode, well behaved models no longer need this since we stop at the You:.
There are scenario's however where this potentially breaks chatmode completely or makes models more frustrating to use. Users who experience this can enable the Single Line mode in the formatting menu to restore the old behavior.
I have also allowed token streaming again, since the issues with it have already been resolved.