Futureproofing for future tokenizers, for now this is not needed since everything uses GPT2. But when that changes we want to be prepared. Not all models have a proper tokenizer config, so if we can't find one we fall back to GPT2.
First batch, will be more, we will also need to update the other VRAM display's with the changes that have happened. Will happen depending on how the 8-bit stuff goes.
When the prompt is deleted by the user, the topmost remaining chunk of
the story that has at most one non-whitespace character is now made the
new prompt chunk.
Also fixed issues in Chromium-based browsers (desktop and Android) where
selecting all text in the story, typing some new text to replace the
entire story and then defocusing causes the editor to break.
NeoCustom is now obsolete beyond the file selection and the CLI. So after the CLI we adapt the input to a generic model and then use the improved generic routine to handle it. This saves duplicate efforts of maintaining an almost identical routine now that models are handled by their type and not their name.
This fixes a few scenario's of my commit yesterday, models that have a / are now first loaded from the corrected directory if it exists before we fall back to its original name to make sure it loads the config from the correct location. Cache dir fixes and a improved routine for the path loaded models that mimics the NeoCustom option fixing models that have no model_type specified. Because GPT2 doesn't work well with this option and should exclusively be used with the GPT2Custom and GPT-J models should have a model_type we assume its a Neo model when not specified.
Automatically detect or assume the model type so we do not have to hardcode all the different models people might use. This almost makes the behavior of --model identical to the NeoCustom behavior as far as the CLI is concerned. But only if the model_type is defined in the models config file.
Rather than coding a vars.custmodpath or vars.model in all the other parts of the code I opted to just set vars.custmodpath instead to make the behavior more consistent now that it always loads from the same location.
So that if you change, e.g., `top_p`, from a Lua generation modifier or
from the settings menu during generation, the rest of the generation
will use the new setting value instead of retaining the settings it had
when generation began.
Attempting to use transformers 4.11.0's experimental `low_cpu_mem_usage`
feature with GPT-2 models usually results in the output repeating a
token over and over or otherwise containing an incoherent response.
* `print()` and `warn()` now work correctly with `nil` arguments
* Typo: `gpt-neo-1.3M` has been corrected to `gpt-neo-1.3B`
* Regeneration is no longer triggered when writing to `keysecondary` of
a non-selective key
* Handle `genamt` changes in generation modifier properly
* Writing to `kobold.settings.numseqs` from a generation modifier no
longer affects
* Formatting options in `kobold.settings` have been fixed
* Added aliases for setting names
* Fix behaviour of editing story chunks from a generation modifier
* Warnings are now yellow instead of red
* kobold.logits is now the raw logits prior to being filtered, like
the documentation says, rather than after being filtered
* Some erroneous comments and error messages have been corrected
* These parts of the API have now been implemented properly:
* `compute_context()` methods
* `kobold.authorsnote`
* `kobold.restart_generation()`
This makes it so that SocketIO uses long polling to set up the
connection before switching to websocket, instead of immediately using
websocket.
This seems to resolve issues where the browser sometimes can't connect
to the websocket server until the window has been open for a minute.
kobold.outputs must be a 1D list of strings, but sometimes its still blank. In those cases rather than throwing an error and crashing the scripting its better if it does nothing.
Allow people to enter a prompt without generating anything by the AI. Combined with the always add prompt this is a very useful feature that allows people to write world information first, and then do a specific action. This mimics the behavior previously seen in AI Dungeon forks where it prompts for world information and then asks an action and can be particularly useful for people who want the prompt to always be part of the generation.