* Removed `vars.model_orig`
* `requirex()` in bridge.lua now maintains a separate module cache for each
userscript instead of using the same cache for all userscripts
* `vars.lua_deleted` and `vars.lua_edited` are now erased right before running
the input modifiers instead of right before each time the generation modifiers
are run
The Initial commit for Chat Mode, the nickname part of the UI is missing other than that it should be fully functional. To use Chat Mode effectively you first input a small dialogue (Can be around 6 lines 3 of your own inputs and 3 of the character) formatted as Name : it will then automate the actions needed to chat properly. During this mode single line mode is forced on, and Trim Incomplete Sentences is forced off.
Futureproofing for future tokenizers, for now this is not needed since everything uses GPT2. But when that changes we want to be prepared. Not all models have a proper tokenizer config, so if we can't find one we fall back to GPT2.
First batch, will be more, we will also need to update the other VRAM display's with the changes that have happened. Will happen depending on how the 8-bit stuff goes.
NeoCustom is now obsolete beyond the file selection and the CLI. So after the CLI we adapt the input to a generic model and then use the improved generic routine to handle it. This saves duplicate efforts of maintaining an almost identical routine now that models are handled by their type and not their name.
This fixes a few scenario's of my commit yesterday, models that have a / are now first loaded from the corrected directory if it exists before we fall back to its original name to make sure it loads the config from the correct location. Cache dir fixes and a improved routine for the path loaded models that mimics the NeoCustom option fixing models that have no model_type specified. Because GPT2 doesn't work well with this option and should exclusively be used with the GPT2Custom and GPT-J models should have a model_type we assume its a Neo model when not specified.
Automatically detect or assume the model type so we do not have to hardcode all the different models people might use. This almost makes the behavior of --model identical to the NeoCustom behavior as far as the CLI is concerned. But only if the model_type is defined in the models config file.
Rather than coding a vars.custmodpath or vars.model in all the other parts of the code I opted to just set vars.custmodpath instead to make the behavior more consistent now that it always loads from the same location.
So that if you change, e.g., `top_p`, from a Lua generation modifier or
from the settings menu during generation, the rest of the generation
will use the new setting value instead of retaining the settings it had
when generation began.
Attempting to use transformers 4.11.0's experimental `low_cpu_mem_usage`
feature with GPT-2 models usually results in the output repeating a
token over and over or otherwise containing an incoherent response.
* `print()` and `warn()` now work correctly with `nil` arguments
* Typo: `gpt-neo-1.3M` has been corrected to `gpt-neo-1.3B`
* Regeneration is no longer triggered when writing to `keysecondary` of
a non-selective key
* Handle `genamt` changes in generation modifier properly
* Writing to `kobold.settings.numseqs` from a generation modifier no
longer affects
* Formatting options in `kobold.settings` have been fixed
* Added aliases for setting names
* Fix behaviour of editing story chunks from a generation modifier
* Warnings are now yellow instead of red
* kobold.logits is now the raw logits prior to being filtered, like
the documentation says, rather than after being filtered
* Some erroneous comments and error messages have been corrected
* These parts of the API have now been implemented properly:
* `compute_context()` methods
* `kobold.authorsnote`
* `kobold.restart_generation()`
Allow people to enter a prompt without generating anything by the AI. Combined with the always add prompt this is a very useful feature that allows people to write world information first, and then do a specific action. This mimics the behavior previously seen in AI Dungeon forks where it prompts for world information and then asks an action and can be particularly useful for people who want the prompt to always be part of the generation.
Automatically converts Huggingface cache models to full models on (down)load.
WARNING: Does wipe old cache/ dir inside the KoboldAI folder, make a backup before you run these models if you are bandwith constraint.
This seems to be related to the model config files, because only certain
models have this problem, and replacing ALL configuration files of a
"bad" model with those of a "good" model of the same type would fix the
problem.
Shouldn't be required anymore.
Recent optimizations caused the CPU version to load in an incompatible format, now we convert it back to the correct format after loading it efficiently first.
As requested by VE_FORBRYDERNE (Possibly implemented it on to many places, needs testing but since the other one is already broken I am committing it first so I can more easily test)
If the beginning of the comment is at the beginning of a line AND the
end of a comment is at the end of a line, an additional newline will now
be ignored so that the AI doesn't see a blank line where the comment
was.
For example, consider the following message:
```
Hello
<|This is
a comment|>
World
```
The AI will now see this:
```
Hello
World
```
instead of this:
```
Hello
World
```
Multiple things have changed, for now models default to half mode even on the official transformers to make sure its as efficient on the GPU as finetune's. GPU selection is streamlined and cache files are now stored inside the KoboldAI folder (for the most part). A new command line parameter to force the models to run at their full size still needs to be added for the few users that would want a quality bump at the cost of ram.
Changes the line-endings to the Unix format and sets KoboldAI to launch with Python3 if executed directly.
(cherry picked from commit 5b0977ceb6807c0f80ce6717891ef5e23c8eeb77)
The only changes are a small addition to the breakmodel section where GPU0 is automatically chosen if the CLI options are used without specifying breakmodel. Lineendings have been changed to Linux formatting for compatibility reasons.
Its made for Python3, so we assume python3 is installed in its usual location. If it isn't you can always run it yourself with whatever command you used prior to this change.
This prevents the "thinking" animation from appearing on top of the
submit button under certain circumstances:
* When someone connects to the KoboldAI server while the model is
generating (occurs after generation finishes)
* Occasionally, the browser may suddenly disconnect and reconnect from
Flask-SocketIO during generation, which causes the same problem
Apparently transformers maintains an internal reference to input_ids
(to use for repetition penalty) so we have to clamp the internal
version, too, because otherwise transformers will throw an out-of-bounds
error upon attempting to access token IDs that are not in the
vocabulary.