There have been a lot of reports from newer users who experience AI breakdown because not all models properly handle 2048 max tokens. 1024 is the only value that all models support and was the original value KoboldAI used. This commit reverts the decision to increase this to 2048, any existing configurations are not effected. Users who wish to increase the max tokens can do so themselves. Most models handle up to 1900 well (The GPT2 models are excluded), for many you can go all the way. (It is currently not yet known why some finetunes cause a decrease in maxtoken support,
In addition this commit contains a request for more consistent slider behavior, allowing the sliders to be changed at 0.01 intervals instead of some sliders being capped to 0.05.
Specifically, we merge blank actions into the next action and we move
whitespace at the end of non-blank actions to the beginning of the next
action.
(cherry picked from commit 4b16600e49)
Since 1.18 Kobold had a few smaller features added, specifically the ability to re-order sampling options and a new sampler. Since it is a smaller addition a minor version bump was chosen since there are no breaking changes.
Removing this option because they are currently unavailable. People who still have them can load them trough the load from file option. Once they have been retrained and reuploaded I will add the menu back.
They released another version of transformers that still doesn't have
the OPT patch so I decided it would be safer to just mark all 4.19
transformers versions as needing the OPT patch.
This adds support for loading settings from the defaults folder, settings are loaded in the following order and overwritten if needed by the higher number.
1. The model config file.
2. The defaults folder.
3. The users defined settings file.
With this support we can begin to ship better defaults for models we do not manage. Our community tuners have been most helpful at adding good defaults to their configuration files, but for other models such as the base models this gives us the flexibility to define better settings for each model without messing with a users desired settings if they already exist.
Almost everyone prefers 2048 max tokens because of the superior coherency. It should only be lower due to ram limits, but the menu already shows the optimal ram for 2048. Negatively effected users can turn it down themselves, for everyone else especially on rented machines or colab 2048 is a better default.
Generated using:
```
import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained("facebook/opt-350m", fast=False)
badwordsids_opt = [[v] for k, v in tokenizer.vocab.items() if any(c in k for c in "<>[]")]
```
OPT supports newlines, but it also needs some of the behavior we use in S mode. NS mode is a more limited version of S mode that still handles the </s> token, but instead of replacing it with a new line we replace it empty and newlines are not converted.
In future if your Fairseq style model has newline support use NS mode, while if it needs artifically inserted newlines use S mode. This also means that people finetuning fairseq models to include newlines might benefit from testing their models on ns mode.