Recent optimizations caused the CPU version to load in an incompatible format, now we convert it back to the correct format after loading it efficiently first.
As requested by VE_FORBRYDERNE (Possibly implemented it on to many places, needs testing but since the other one is already broken I am committing it first so I can more easily test)
If the beginning of the comment is at the beginning of a line AND the
end of a comment is at the end of a line, an additional newline will now
be ignored so that the AI doesn't see a blank line where the comment
was.
For example, consider the following message:
```
Hello
<|This is
a comment|>
World
```
The AI will now see this:
```
Hello
World
```
instead of this:
```
Hello
World
```
Multiple things have changed, for now models default to half mode even on the official transformers to make sure its as efficient on the GPU as finetune's. GPU selection is streamlined and cache files are now stored inside the KoboldAI folder (for the most part). A new command line parameter to force the models to run at their full size still needs to be added for the few users that would want a quality bump at the cost of ram.
Changes the line-endings to the Unix format and sets KoboldAI to launch with Python3 if executed directly.
(cherry picked from commit 5b0977ceb6807c0f80ce6717891ef5e23c8eeb77)
The only changes are a small addition to the breakmodel section where GPU0 is automatically chosen if the CLI options are used without specifying breakmodel. Lineendings have been changed to Linux formatting for compatibility reasons.
Its made for Python3, so we assume python3 is installed in its usual location. If it isn't you can always run it yourself with whatever command you used prior to this change.
This prevents the "thinking" animation from appearing on top of the
submit button under certain circumstances:
* When someone connects to the KoboldAI server while the model is
generating (occurs after generation finishes)
* Occasionally, the browser may suddenly disconnect and reconnect from
Flask-SocketIO during generation, which causes the same problem
Apparently transformers maintains an internal reference to input_ids
(to use for repetition penalty) so we have to clamp the internal
version, too, because otherwise transformers will throw an out-of-bounds
error upon attempting to access token IDs that are not in the
vocabulary.
Adds Single Line mode, optimized for things like chatbot testing and other cases where you want to have control over what happens after a paragraph.
This can also be used as a foundation for a chatbot optimized interface mode.
breakmodel_layers and layers is confusing, changed the new method to breakmodel_gpulayers. The old one should no longer be used by people, but since it works in reverse we leave it in so scripts don't break.
Feedback from users is that its better to not always submit the prompt, this is consistent with the randomly generated stories. You can always toggle it on if you need this for coherency. This change does not override existing user settings.
Finetune's fork has unofficial support which we supported, but this is not compatible with models designed for the official version. In this update we let models decide which transformers backend to use, and fall back to Neo if they don't choose any. We also add the 6B to the menu and for the time being switch to the github version of transformers to be ahead of the waiting time. (Hopefully we can switch back to the conda version before merging upstream).
A new model was released that uses a different formatting for its enters, this causes to many enters in the UI. In this change we fix the issue so that when this happens the UI still displays the content as you would expect. Removing the formatting burden from the Model developers.
Originally omitted when model settings were forced. Now that models can only define the defaults for KoboldAI its a good idea to give model authors control over what formatting they think works best for their models.
If you save a story as a different name than it was loaded with, and
then try to download it as JSON/plaintext, the downloaded file's name
will now match the new story name.
This prevents duplicate submissions when multiple people are connected
to the same server and one person submits changes to memory, author's
note or world info, by pressing Submit (for author's note or memory) or
Accept (for world info).
Improved the default settings, better distinction on client / server. The python parts have been renamed to server, the browser to the client to be conform what you'd expect from a client and a server. The model name will also be shown now instead of NeoCustom.
Models can no longer override client settings, instead settings are now saved on a model per model basis with the settings provided by the model being the default. Users can also specify the desired configuration name as a command line parameter to avoid conflicting file names (Such as all Colabs having Colab.settings by default).
Many models have that one setting that just work best, like repetition penalty 2 or 1.2 while being incompatible with existing settings. Same applies for Adventure mode on or off. With this change models are allowed to override user preferences but only for the categories we deem this relevant (We don't want them to mess with things like tokens, length, etc). For users that do not want this behavior this can be turned off by changing msoverride to false in the client.settings.
Model creators can specify these settings in their config.json with the allowed settings being identical to their client.settings counterparts.
Bit of a workaround for now, but the [ badwords search routine has been replaced with a hardcoded list used by the colabs. This is far more effective at filtering out artifacts when running models locally. We can get away with this because all known models use the same vocab.json, in the future we will probably want to load this from badwords.json if present so model creators can bundle this with the model.
Replaces the placeholder readme with a proper one, the menu is also updated and reorganized to encourage users to use custom models and to better reflect the real world VRAM requirements.
Since some user interface buttons are disabled while in --remote mode,
they should also be disabled in aiserver.py so a malicious user can't
manually send those commands to the server.