In 1.16 we had significantly faster loading speeds because we did not do as much memory conservation, its time to give users the choice. If you want the original faster behavior and have the memory run KoboldAI as usual. Otherwise run play-lowmem.bat or aiserver.py with --lowmem. For colab this is still the default behavior to avoid breaking models that would otherwise load fine.
Makes the Colab mode also automatically activate the Quiet mode to improve privacy. We should no longer need this in the colab console thanks to the redo feature. Need something different for testing? Use --remote instead.
Change the proposed --share to --unblock to make it more apparent what this feature does. The feature unblocks the port from external access, but does not add remote play support. For remote play support without a proxy service I have added --host .
Default settings for the new repetition penalty settings (Better suggestions very much welcome since broader community testing has not been done).
Updated the Readme with the link to the offline installer.
Ran into issues with other modes like chatmode and adventure, moved it further down the pipeline and converting </s> back to \n before processing additional formatting.
Still has an issue with the html formatting not working, but at least the AI works now.
On second thought, it is probably better to not save this. Advanced users can add this themselves and that way newer versions of the model can override it if redownloaded.
Allows model creators to customize the welcome message using Markdown and Limited HTML
Existing United users need to run install_requirements..bat again, you can leave the existing dependencies intact.
Adds a Nobreakmodel var that allows Breakmodel to be turned off. This can be done trough commandline or a model config (In case Neo is used by the models config without it being a true Neo model that is compatible with breakmodel).
In addition I removed the args.colab check for breakmodel support and instead make args.colab activate nobreakmodel. And I have added a new check so that breakmodel is not even attempted if you do not specify the layers but do launch a model from the command line.
Changed the model VRAM requirements to what you'd need to comfortably run the model rather than barely (Like with the manual). Will probably revise this in a later commit.
More importantly, it now supports models that use </s> which will be required to support XGLM and Fairseq models.
My last attempt at fixing this caused GPT2 to break, since the other fix is an edge case we assume that the GPT2 method should be used, and if that fails we try the other one to catch rare errors with bad model config's.
Turns out model_config does not work on models that have no model_type defined. In case this happens we now fall back to the old .json loading method. This will not work in --colab mode if its not already a local model, but since almost all modern models define a model type and to my knowledge all models on huggingface do that should not be an issue. If it is we can always ask the model creator to either update it, distribute the model differently or load that model with --remote instead of --colab.
In TPU instances, `vars.sp.shape[0]` is not always the actual number of
tokens in the soft prompt. We have to use `vars.sp_length` to get an
accurate token count.
Breakmodel is useless on Colab, so for the sake of efficiency if --colab is present we will always assume a model is incompatible. The same applies to the conversion, colab's are discarded so converting the model to a .bin file only wastes time since the HDD isn't fast. Finally we automatically set all the useful variables for Colab, so that in the future this can be removed from ckds and other scripts.
Lastly ckds has been adapted not to copy the examples folder and to add the new --colab parameter.
Local players are much better off running the old --remote command.
No longer update the chatname outside of the config, this will not effect singleplayer tab at all, but it will allow people in multiplayer to chat with their own names.
To prevent confusion with users who have not used KoboldAI for a while, or who are following old tutorials I have added a disclaimer that informs people that most Colab links should not be used with this feature and instead opened in the browser.
The problem was that when a soft prompt is being used, the dynamic
scanning criteria searches a different set of tokens for world info
keys than the `_generate()` function, which results in generation loops
when a world info key appears in the former set of tokens but not the
latter.
The `dynamic_processor_wrap` makes it so that the repetition penalty is
read directly from `vars`, but this only works if the initial repetition
sent to `generator` is not equal to 1. So we are now forcing the initial
repetition penalty to be something other than 1.
This commit exposes antemplates to the model config, this lets authors specify what kind of authors notes template they would like to use for their model. Users can still change it if they desire.
Blank lines appear often in chatmode so it is best played with blank line removal turned on, this is now forced. Its not compatible with Adventure mode, so they now turn each other off.
Added more models in the menu, all the popular community models are now easily accessible. I also re-ordered the menu from large to small to have it make a bit more sense.
* Error messages are now shown when memory, author's note, etc. exceeds
budget by itself
* Formatting options no longer break if there are empty chunks in the
story (although there shouldn't be any in the first place)
* Number of generated tokens is now kept track of from Python
* Removed `vars.model_orig`
* `requirex()` in bridge.lua now maintains a separate module cache for each
userscript instead of using the same cache for all userscripts
* `vars.lua_deleted` and `vars.lua_edited` are now erased right before running
the input modifiers instead of right before each time the generation modifiers
are run
The Initial commit for Chat Mode, the nickname part of the UI is missing other than that it should be fully functional. To use Chat Mode effectively you first input a small dialogue (Can be around 6 lines 3 of your own inputs and 3 of the character) formatted as Name : it will then automate the actions needed to chat properly. During this mode single line mode is forced on, and Trim Incomplete Sentences is forced off.
Futureproofing for future tokenizers, for now this is not needed since everything uses GPT2. But when that changes we want to be prepared. Not all models have a proper tokenizer config, so if we can't find one we fall back to GPT2.
First batch, will be more, we will also need to update the other VRAM display's with the changes that have happened. Will happen depending on how the 8-bit stuff goes.