Compare commits

...

351 Commits

Author SHA1 Message Date
henk717 10d9ae88f3 Official as Default (TPU) 2022-06-04 13:41:09 +02:00
henk717 1c12d84554 Official as Default (GPU) 2022-06-04 13:11:54 +02:00
henk717 e15e42a1ce
Release of 1.18
Release of 1.18
2022-06-04 13:10:03 +02:00
henk717 dd84a22fa9 Model Descriptions (GPU) 2022-06-01 19:53:31 +02:00
henk717 18cf8e620e Model Descriptions (TPU) 2022-06-01 19:51:59 +02:00
Henk e5dcf91a08 Defaults Support
This adds support for loading settings from the defaults folder, settings are loaded in the following order and overwritten if needed by the higher number.

1. The model config file.
2. The defaults folder.
3. The users defined settings file.

With this support we can begin to ship better defaults for models we do not manage. Our community tuners have been most helpful at adding good defaults to their configuration files, but for other models such as the base models this gives us the flexibility to define better settings for each model without messing with a users desired settings if they already exist.
2022-06-01 10:34:16 +02:00
Henk 714fc1729b Updated model list 2022-06-01 10:20:24 +02:00
henk717 243543df13
Merge pull request #136 from VE-FORBRYDERNE/kaggle
Kaggle TPU support
2022-05-31 18:28:29 +02:00
Gnome Ann 707316de31 Kaggle TPU support 2022-05-31 12:20:16 -04:00
Henk 1a1f2f6428 30B ram requirements 2022-05-31 13:17:06 +02:00
Henk 004caa5ba7 Extra Icons
Green will probably be used by the offline installer, but lets also ship blue in case people want to make their own shortcuts.
2022-05-29 14:55:25 +02:00
henk717 a46ee07f3e
Merge pull request #135 from VE-FORBRYDERNE/opt
Update list of transformers versions that have broken OPT
2022-05-29 12:59:26 +02:00
Gnome Ann 69da5b7bc2 Update list of transformers versions that have broken OPT 2022-05-28 23:44:19 -04:00
Henk 52b977a3d3 Better Icon
Thanks to Spock for all the effort in refining it
2022-05-29 02:20:06 +02:00
henk717 3764149aef
Merge pull request #134 from VE-FORBRYDERNE/settings
Don't cap setting values when manually entered by user
2022-05-29 01:21:01 +02:00
Gnome Ann 1c4ae8877f Resolve merge conflict 2022-05-28 19:18:13 -04:00
Gnome Ann b9c6c0b3bd Out-of-bounds setting values are now shown in red 2022-05-28 19:14:26 -04:00
Henk 2798cf11da Hide drive error
Fixes a small issue where the user gets an error if the drive is not mounted.
2022-05-29 00:53:22 +02:00
Gnome Ann 69a28210e9 Don't cap setting values when manually entered by user 2022-05-28 18:33:57 -04:00
Henk 6ae12f29e4 (Un)Installer improvements
Inno Setup has proven limited in its ability to uninstall because its hard to keep up to date which folders it should ask or keep and not everyone has used the Offline Installer yet wishes to use an uninstaller. This commit adds a new uninstall script which my future versions of the uninstaller will look for to delete the files. Eliminating the risk of the setup accidentally deleting your stories.

If the proper Inno Setup uninstaller is detected the script will terminate itself and launch that first, to ensure the entire uninstallation is handled. If not, it will warn you before removing anything.

It will also get rid of the virtual KoboldAI drives and we can update it trough normal git means.
2022-05-29 00:25:53 +02:00
henk717 8ae0c8311b GPU Colab improvements 2022-05-28 19:52:06 +02:00
Henk 0ac36fff37 Merge branch 'united' of https://github.com/henk717/koboldai into united 2022-05-28 19:46:09 +02:00
Henk 97401026dd Nerys Description in Readme 2022-05-28 19:46:05 +02:00
henk717 bab0cd6362 Nerys Description 2022-05-28 19:45:32 +02:00
Henk 4b65ce9c76 1.18 version bump 2022-05-28 19:39:05 +02:00
henk717 6e0510e1f5 Replaced Jax models with HF models where possible 2022-05-27 01:35:15 +02:00
Henk b30370bf4b 2048 maxtoken default
Almost everyone prefers 2048 max tokens because of the superior coherency. It should only be lower due to ram limits, but the menu already shows the optimal ram for 2048. Negatively effected users can turn it down themselves, for everyone else especially on rented machines or colab 2048 is a better default.
2022-05-27 01:23:48 +02:00
henk717 f47db6d155 Nerys 13B 2022-05-27 00:12:35 +02:00
henk717 4482e6db9a
Merge pull request #132 from VE-FORBRYDERNE/gpt2
Fix an error that occurs when loading GPT-2 models
2022-05-20 22:24:24 +01:00
Gnome Ann c692987e40 Fix an error that occurs when loading GPT-2 models
I forgot that this new_rebuild_tensor function's first argument's type
is different when loading GPT-2 models.
2022-05-20 14:54:49 -04:00
henk717 266308b086
Merge pull request #131 from mrseeker/patch-8
Adding Nerys model 13B
2022-05-18 13:55:03 +02:00
Julius ter Pelkwijk 6ae7b48b69
Adding Nerys model 13B 2022-05-18 13:50:57 +02:00
henk717 348fd1c4e2
Merge pull request #130 from mrseeker/patch-8
Adding Nerys model 2.7B
2022-05-16 11:45:01 +02:00
Julius ter Pelkwijk f0df3de610
Adding Nerys model 2.7B 2022-05-16 09:50:45 +02:00
henk717 24a2eb8c0b
Merge pull request #129 from VE-FORBRYDERNE/tqdm
Better model saving and better progress bars
2022-05-14 18:02:41 +02:00
Gnome Ann d4e8f56789 Remove debugging code from tpu_mtj_backend.py 2022-05-14 12:00:44 -04:00
Gnome Ann d5ab3ef5b1 Fix `no attribute get_checkpoint_shard_files` 2022-05-14 11:49:04 -04:00
Gnome Ann 6e82f205b4 Aria2 bug fix for Windows users 2022-05-14 11:44:28 -04:00
henk717 9eaa76c72b Add OPT 13B to the models 2022-05-14 07:55:47 +02:00
Gnome Ann 1476e76cfc Copy fp16 model files instead of resaving them 2022-05-14 00:45:43 -04:00
Gnome Ann 0c5ca5261e Loading a sharded model will now display only one progress bar 2022-05-13 23:32:16 -04:00
Gnome Ann f9f1a5f3a9 Make sure tqdm progress bars display properly in Colab 2022-05-13 17:37:45 -04:00
Gnome Ann 91d3672446 Proper progress bar for aria2 downloads 2022-05-13 17:00:10 -04:00
henk717 7ea0c49c1a
Merge pull request #128 from VE-FORBRYDERNE/opt
OPT breakmodel and TPU support
2022-05-13 18:07:02 +02:00
Gnome Ann a051bf4397 OPT breakmodel bug fix 2022-05-13 10:45:57 -04:00
Gnome Ann 1200173386 Custom badwords for OPT
Generated using:
```
import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained("facebook/opt-350m", fast=False)
badwordsids_opt = [[v] for k, v in tokenizer.vocab.items() if any(c in k for c in "<>[]")]
```
2022-05-13 10:45:28 -04:00
Henk d5fa782483 NS Mode (comment fix) 2022-05-13 10:53:19 +02:00
Henk 8376f12e21 Add NS mode
OPT supports newlines, but it also needs some of the behavior we use in S mode. NS mode is a more limited version of S mode that still handles the </s> token, but instead of replacing it with a new line we replace it empty and newlines are not converted.

In future if your Fairseq style model has newline support use NS mode, while if it needs artifically inserted newlines use S mode. This also means that people finetuning fairseq models to include newlines might benefit from testing their models on ns mode.
2022-05-13 10:44:12 +02:00
Gnome Ann 55079f672a Fix typo in soft prompt patching code 2022-05-13 01:51:55 -04:00
Gnome Ann 29bb3f569b Fix a bug in OPTForCausalLM where self.lm_head is the wrong size 2022-05-13 01:37:17 -04:00
Gnome Ann defbb53b68 OPT breakmodel 2022-05-13 01:03:38 -04:00
Gnome Ann b1d8797a54 Allow TPU Colab to load sharded HF models 2022-05-12 23:51:40 -04:00
Gnome Ann 4fa5f1cd6a Add TPU support for OPT-350M
The 350M model seems to have a different structure than the other ones ???
2022-05-12 22:21:15 -04:00
Gnome Ann dfa2aa7314 Merge branch 'united' into opt 2022-05-12 20:11:53 -04:00
Henk 5c4a087970 Disable S mode for OPT 2022-05-13 01:47:59 +02:00
Gnome Ann f5e689a725 Upload maps/opt.json and update requirements 2022-05-12 19:09:31 -04:00
Henk e98cc3cb16 OPT models 2022-05-12 23:55:21 +02:00
Henk 376e76f5da S mode for OPT 2022-05-12 02:18:14 +02:00
henk717 a1c7017ddc
Merge pull request #127 from VE-FORBRYDERNE/aria2
Handle aria2 properly when it exits with nonzero exit code
2022-05-11 22:57:45 +02:00
Gnome Ann 580dd0b2a3 Handle aria2 properly when it exits with nonzero exit code 2022-05-11 16:23:24 -04:00
henk717 05549de42d
Merge pull request #126 from VE-FORBRYDERNE/aria2
Aria2 downloader bug fixes
2022-05-11 21:58:31 +02:00
Gnome Ann 2ebba9488b Change `force_download` back to False
This is to prevent fully downloaded models from being re-downloaded in
Colab.
2022-05-11 15:51:48 -04:00
Gnome Ann 6d481ca57e Merge branch 'united' into aria2 2022-05-11 15:51:11 -04:00
Gnome Ann c65272052a aria2 now downloads to different filename and renames afterwards
This is to match the behaviour of the original transformers downloader
in order to deal with the rare case of someone downloading a model using
aria2, cancelling before it finishes, and then attempting to resume the
download with the normal transformers downloader.
2022-05-11 15:45:38 -04:00
Henk 6d27084e8a Better Aria2 Defaults
Trunc prevents slow allocation on windows, force_download=True has proven a more reliable default. Since models are converted to local formats it does not impact local users. And because -c is used the impact of checking if the model is correct is desirable and minimal.
2022-05-11 21:38:33 +02:00
Gnome Ann 7a3f865e3f Prevent aria2 from resuming cancelled downloads
Resumed downloads tend to be very slow.

The original transformers downloader didn't allow resuming downloads
either.
2022-05-11 15:14:37 -04:00
Gnome Ann c81f3bd084 Use `--file-allocation=trunc` instead of `--file-allocation=none` 2022-05-11 14:51:43 -04:00
Gnome Ann f96c878d83 Use aria2 even when all model files are already in cache
This allows aria2 to continue downloading a pytorch_model.bin after a
cancelled download.
2022-05-11 14:43:56 -04:00
Gnome Ann f60c7d8492 Fix the behaviour of `aria2_hook()` when using `force_download` 2022-05-11 14:41:34 -04:00
Gnome Ann 5732a8f15a Don't use `aria2_hook()` if `force_download=True` is used 2022-05-11 14:40:31 -04:00
henk717 903d593ce4
Merge pull request #125 from VE-FORBRYDERNE/aria2
Use aria2 to improve HF model download speeds in Colab
2022-05-11 07:55:53 +02:00
Gnome Ann 46cfa1367f Add `--no_aria2` command line flag 2022-05-11 00:44:56 -04:00
Gnome Ann f09959f9be Fix patching code of `PreTrainedModel.from_pretrained()` 2022-05-11 00:41:53 -04:00
Gnome Ann 22b4f3c9df Bug fixes for `aria2_hook()` when running Windows 2022-05-11 00:14:00 -04:00
Gnome Ann 82205722af Fix logic of `aria2_hook()` 2022-05-10 23:46:29 -04:00
Gnome Ann 4b49d1c464 Make sure `vars.revision` is defined 2022-05-10 22:51:36 -04:00
Gnome Ann 4b693b4858 Fix the logic of `force_download` in utils.py 2022-05-10 22:47:03 -04:00
Gnome Ann c1ef20bcff Also enable aria2 downloading for non-sharded checkpoints 2022-05-10 22:43:41 -04:00
Gnome Ann e115bb68e4 aria2 downloads in utils.py now use correct user agent 2022-05-10 22:22:46 -04:00
Gnome Ann b97b2a02d6 Add `--revision` command line flag 2022-05-10 22:14:56 -04:00
Gnome Ann 937d9ee06a Change default `model.save_pretrained` shard size to 500 MiB 2022-05-10 22:04:25 -04:00
Gnome Ann a388c63023 Use aria2 to download split checkpoints 2022-05-10 21:28:13 -04:00
Henk 01e15d03d6 Remove play.ipnyb
Interactive Python doesn't work well on Jupyter, until they support what Colab can do this file is pointless.
2022-05-11 01:49:07 +02:00
Henk 7a9297adc3 Jupyter Git integration 2022-05-11 01:31:12 +02:00
Henk f917d3438f Updated models 2022-05-10 21:39:16 +02:00
henk717 7fcc1a9acb Fix C1 2022-05-10 18:38:50 +02:00
Henk c5462ec480 Better Jupyter 2022-05-09 02:41:00 +02:00
Henk e09b939f04 Force Bash 2022-05-08 16:02:16 +02:00
Henk 0ca4917056 Linux Runtime Info 2022-05-08 00:24:14 +02:00
Henk 030df1a09f Small installer fix 2022-05-07 20:22:33 +02:00
Henk a3dc188c8f Linux Installer Improvements 2022-05-01 15:58:37 +02:00
henk717 9f7c9c4b9e
Merge pull request #124 from Crafteko/united
Replaced Adventure 125M and added C1-1.3B to the menu
2022-04-29 17:38:55 +02:00
subtlewave 9c83ef7fa9
Replaced Adventure 125M and added C1-1.3B 2022-04-28 22:35:04 +00:00
Henk 810f6614af Cap GIT version for now 2022-04-27 18:20:43 +02:00
henk717 716951f059
Merge pull request #123 from VE-FORBRYDERNE/settings
Prevent the settings throttle from lagging the sliders
2022-04-27 03:17:53 +02:00
Gnome Ann b1faca3686 Prevent the settings throttle from lagging the sliders 2022-04-26 21:14:44 -04:00
Henk 9cff8268b5 Lower slider latency for now 2022-04-27 00:53:55 +02:00
henk717 b2277f242b
Merge pull request #122 from VE-FORBRYDERNE/settings 2022-04-27 00:49:01 +02:00
Gnome Ann 02e4e6be1e Fix author's note slider 2022-04-26 17:29:18 -04:00
Gnome Ann ee8ced2f5f Allow users to type in the values for the settings 2022-04-26 15:27:28 -04:00
Gnome Ann ea82867e4d Merge branch 'united' into settings 2022-04-26 13:58:01 -04:00
henk717 4ce9a5fe28 New text 2022-04-26 19:45:37 +02:00
henk717 cd1a02e705 New TPU Colab 2022-04-26 19:14:40 +02:00
henk717 e94b97790c
Merge pull request #121 from VE-FORBRYDERNE/code
Fix the vscode notebook
2022-04-21 20:01:52 +02:00
henk717 c873d36374
Merge pull request #120 from VE-FORBRYDERNE/lazy-loader
Fix some lazy loader edge cases
2022-04-21 19:59:56 +02:00
Gnome Ann e4cf19e707 Fix the vscode notebook 2022-04-21 13:13:46 -04:00
Gnome Ann 2d38e90509 Remove lm_head.weight from maps/xglm.json 2022-04-20 12:56:57 -04:00
Gnome Ann 6803531384 Force grad to be off by default when loading with lazy loader 2022-04-19 12:26:02 -04:00
Henk a82a165146 ColabKobold Fixes 2022-04-19 15:15:57 +02:00
Henk 11280a6e66 LocalTunnel Linux Fix 2022-04-19 14:41:21 +02:00
Henk b8e79afe5e LocalTunnel support 2022-04-19 13:47:44 +02:00
Gnome Ann c7b03398f6 Merge 'nolialsea/patch-1' into settings without Colab changes 2022-04-17 12:15:36 -04:00
henk717 33733bf962
Merge branch 'KoboldAI:main' into united 2022-04-16 00:26:58 +02:00
henk717 372eb4c981
Merge pull request #119 from VE-FORBRYDERNE/scripting-sp
Allow userscripts to change the soft prompt
2022-04-14 21:33:20 +02:00
henk717 78d6ee491d
Merge pull request #117 from mrseeker/patch-7
Shinen FSD 13B (NSFW)
2022-04-14 21:33:08 +02:00
henk717 e180db88aa
Merge pull request #118 from VE-FORBRYDERNE/lazy-loader
Fix lazy loader in aiserver.py
2022-04-14 21:33:00 +02:00
Gnome Ann dcdd0263fc Increment `API_VERSION` in bridge.lua 2022-04-14 15:00:41 -04:00
Gnome Ann efea584d84 Update API documentation 2022-04-14 14:58:11 -04:00
Gnome Ann bd6f7798b9 Fix lazy loader in aiserver.py 2022-04-14 14:33:10 -04:00
Julius ter Pelkwijk ad94f6c01c
Shinen FSD 13B (NSFW) 2022-04-14 08:23:50 +02:00
henk717 c08630b0eb
Merge pull request #116 from mrseeker/patch-6
Shinen FSD 6.7B (NSFW)
2022-04-13 14:48:56 +02:00
Julius ter Pelkwijk 945c34e822
Shinen FSD 6.7B (NSFW) 2022-04-13 14:47:22 +02:00
Henk eeff126df4 Memory Sizes 2022-04-13 12:41:21 +02:00
Gnome Ann a3a52dc9c3 Add support for changing soft prompt from userscripts 2022-04-12 15:59:05 -04:00
Henk 9a2d346d60 Merge branch 'main' into united 2022-04-12 10:41:51 +02:00
Henk 26909e6cf3 Model Categories 2022-04-10 20:53:15 +02:00
henk717 f029e0215e
Merge pull request #113 from mrseeker/patch-5 2022-04-10 15:08:00 +02:00
Julius ter Pelkwijk 6fcb0af488
Adding Janeway 13B 2022-04-10 15:03:39 +02:00
Henk 607600cbf4 Allow external commands 2022-04-10 14:24:43 +02:00
henk717 9ac47f6f54
Merge branch 'KoboldAI:main' into united 2022-04-10 09:05:31 +02:00
henk717 a060219ff7
Merge pull request #112 from VE-FORBRYDERNE/lazy-loader
Fix lazy loader
2022-04-09 03:16:30 +02:00
Gnome Ann 359a0a1c99 Copy Python 3.6 compatible lazy loader to aiserver.py 2022-04-08 19:40:12 -04:00
Gnome Ann c117bfd0ad Fix lazy loader 2022-04-08 19:38:15 -04:00
henk717 fd762110cb
Merge pull request #111 from mrseeker/patch-4
Releasing Janeway 6.7B
2022-04-08 12:28:14 +02:00
Henk 841ad97a1f Make .sh executable 2022-04-08 12:10:12 +02:00
Julius ter Pelkwijk 1974761f70
Releasing Janeway 6.7B 2022-04-08 08:13:36 +02:00
henk717 47e825c83c
Merge pull request #110 from gooseai/united.add-oai-numseqs-support
Add `numseqs` support to GooseAI/OpenAI client handler.
2022-04-07 20:57:10 +02:00
Wes Brown 09fee52abd Add `num_seqs` support to GooseAI/OpenAI client handler. 2022-04-07 14:50:23 -04:00
Henky!! 5feda462fb OAI - Fixes last commit 2022-04-07 02:39:37 +02:00
Henky!! 34b6c907f0 OAI Max Token Slider 2022-04-07 02:26:15 +02:00
Henky!! b568e31381 OAI Path Support 2022-04-06 05:15:25 +02:00
Henky!! 699b3fc10b OAI Redo Fixes 2 2022-04-06 04:54:27 +02:00
Henky!! b5a633e69b OAI Redo Fix 2022-04-06 04:45:01 +02:00
Henky!! 965b5b5b04 Install Improvements 2022-04-05 01:52:46 +02:00
henk717 ee682702ee
Merge branch 'KoboldAI:main' into united 2022-04-05 01:35:22 +02:00
henk717 04707abde6
Merge pull request #109 from VE-FORBRYDERNE/requirements
Remove Ray from requirements_mtj.txt
2022-04-04 19:17:56 +02:00
Gnome Ann 66bc7e10bf Remove Ray from requirements_mtj.txt
I made some changes recently to mesh transformer JAX so that we don't
need Ray anymore. This should make the installation a little faster.
2022-04-04 12:42:33 -04:00
henk717 0882ba165c
Merge pull request #108 from VE-FORBRYDERNE/lazy-loader
Lazy loader Python 3.6 compatibility
2022-04-03 01:15:48 +02:00
Gnome Ann fabbdf2bb1 Lazy loader Python 3.6 compatibility
The current lazy loader relies on a feature of the Python zipfile module
that was added in Python 3.7.0:

https://bugs.python.org/issue22908

This commit adds compatibility for Python 3.6.
2022-04-02 15:02:54 -04:00
Noli d9670f4e16 Revert "add colab cell to start ECILA"
This reverts commit 231418e15e.
2022-03-29 07:23:36 +02:00
Noli 231418e15e add colab cell to start ECILA 2022-03-29 07:21:23 +02:00
henk717 8368b20421
Merge pull request #107 from VE-FORBRYDERNE/typical
Typical sampling needs to use nansum instead of sum
2022-03-28 11:13:38 +02:00
Gnome Ann 67e28d2b5c Typical sampling needs to use nansum instead of sum
If `probs` is zero then `log_probs` will be negative infinity, and the
calculation of `neg_entropy` would then give NaN because zero times
infinity is a mathematically indeterminate value.

We need to use nansum so that those NaN values are treated as zeros to
ignore them in the entropy calculation.
2022-03-28 00:02:31 -04:00
Henky!! e644963564 OpenAI Fixes 2022-03-28 02:02:37 +02:00
henk717 77ae893f4d
Merge pull request #106 from VE-FORBRYDERNE/typical
Typical sampling
2022-03-28 00:14:09 +02:00
Gnome Ann e2cd49d552 Typo fix in `TypicalLogitsWarper` 2022-03-27 17:08:57 -04:00
Gnome Ann bbd0a83fef Fix `TypicalLogitsWarper` argument typing 2022-03-27 16:59:23 -04:00
Gnome Ann d5989d4c62 Hide division by zero warning in JAX typical filter
This warning happens when `np.log` gets an input containing zeros.
In that case, NumPy will throw a warning and output negative infinity.

Negative infinity is the correct behaviour here, so we can safely ignore
the warning.
2022-03-27 16:57:12 -04:00
Gnome Ann 20e48b11d7 Typical sampling 2022-03-27 16:25:50 -04:00
Noli aa8de64aa4 fix default port 2022-03-25 23:26:27 +01:00
Noli 3e003d3b42 add port to the command options 2022-03-25 22:18:28 +01:00
Noli 076c6c8efa update slider value without waiting for socketio 2022-03-25 22:18:00 +01:00
Noli 72b74d9ab6 fix tabs 2022-03-25 20:46:08 +01:00
Noli 6aaa8d026f semicolon 2022-03-25 20:44:41 +01:00
Noli 8270d92073 fix wrong this scope 2022-03-25 20:44:25 +01:00
Noli 6ed50ee1e9 make the throttle timer a dict to keep track of which slider has been changed 2022-03-25 20:37:45 +01:00
nolialsea 1de4944d46
Add throttle closure for settings sliders
Adds a throttling closure to add a waiting time before calling a callback,
Uses this closure to throttle the event fired by socketio on slider value change
2022-03-25 20:08:56 +01:00
henk717 e4c72ca2e5
Merge pull request #104 from VE-FORBRYDERNE/retry-randomgame
Allow regenerating random story using Retry button
2022-03-24 12:57:04 +01:00
Gnome Ann 0348970b19 Make sure AI is not busy when using retry to regenerate random story 2022-03-23 22:09:35 -04:00
Gnome Ann 4832dd6f37 Allow regenerating random story using Retry button
Commit b55e5a8e0b removed this feature, so
this commit adds it back.
2022-03-23 13:39:46 -04:00
henk717 38d78d10db
Merge pull request #103 from VE-FORBRYDERNE/neox
Divide GPT-NeoX replicated bias layers by 4 again instead of by 8
2022-03-21 02:19:32 +01:00
henk717 cf99f02ca5 Merge branch 'main' into united 2022-03-20 19:22:53 +01:00
Gnome Ann 73aecc0510 Divide NeoX replicated bias layers by 4 again instead of by 8 2022-03-20 01:04:55 -04:00
henk717 f1487a4551 New Linux Runtime 2022-03-20 00:00:21 +01:00
henk717 a7f652f293
Merge pull request #101 from VE-FORBRYDERNE/neox
GPT-NeoX-20B support in Colab TPU instances
2022-03-19 09:56:15 +01:00
Gnome Ann 05fc46b253 Changing this again to divide by 8 2022-03-19 02:09:41 -04:00
Gnome Ann b1125a6705 Add EOS and padding token to default NeoX badwords 2022-03-19 01:30:02 -04:00
Gnome Ann 6c20d0d657 Nevermind, dividing by 4 is actually correct... 2022-03-19 00:55:04 -04:00
Gnome Ann f16b61ec77 Should divide NeoX replicated parameters by 8 (not by 4)
Also, suppresses the PyTorch 1.11 warning about transposing tensors with
ndim != 2 in the new code
2022-03-19 00:48:33 -04:00
Gnome Ann c2c139e940 Change default PE type for NeoX to `neox_rotary` 2022-03-19 00:26:04 -04:00
Gnome Ann 85a4959efa Merge branch 'united' into neox 2022-03-18 11:19:03 -04:00
henk717 f581fe89cb Torch version changes 2022-03-17 21:11:36 +01:00
henk717 9e9c1c3fe0
Merge pull request #100 from VE-FORBRYDERNE/patch
Add PyTorch 1.11 support for lazy loader
2022-03-17 21:06:38 +01:00
Gnome Ann c444260eac Silence PyTorch warning about transposing tensors with dimension != 2 2022-03-17 15:16:56 -04:00
Gnome Ann ef21ab9c91 PyTorch 1.9 lazy loader compatibility bugfix 2022-03-17 14:10:51 -04:00
Gnome Ann eaf190469d Add PyTorch 1.11 support for lazy loader 2022-03-17 12:51:41 -04:00
henk717 9235754eb9 Dependency Fixes 2022-03-17 00:35:59 +01:00
henk717 a3e5e052b3 Newer umamba + slope tweak 2022-03-16 18:34:02 +01:00
Gnome Ann 95c4251db9 Print two newlines before loading HF models 2022-03-15 13:58:53 -04:00
Gnome Ann 9e2848e48f Show parameter count when loading GPT-NeoX in Colab TPU instance 2022-03-15 13:55:27 -04:00
Gnome Ann 9dc48b15f0 Add custom badwords and pad token ID for GPT-NeoX 2022-03-14 23:31:49 -04:00
Gnome Ann 88f247d535 GPT-NeoX-20B support in Colab TPU instances 2022-03-14 23:14:20 -04:00
henk717 4892556059 Model saving for colab mode 2022-03-13 11:22:44 +01:00
henk717 ccadeabbde
Merge pull request #99 from VE-FORBRYDERNE/model-patch
Model loading fixes
2022-03-13 11:10:15 +01:00
Gnome Ann 2b8c46338e Change current working directory to KoboldAI folder 2022-03-13 01:22:11 -05:00
Gnome Ann 48d07adb54 Also fallback to generic GPT2 tokenizer in Colab TPU instances 2022-03-12 23:19:35 -05:00
henk717 d29a629320
Merge pull request #98 from ebolam/united
Fix for retry
2022-03-12 16:52:07 +01:00
ebolam 45eed78d21 Merge branch 'united' of https://github.com/ebolam/KoboldAI into united 2022-03-12 10:33:01 -05:00
ebolam b55e5a8e0b Retry Bug Fix 2022-03-12 10:32:27 -05:00
henk717 2e1b3c82f9
Merge pull request #97 from ebolam/united
Fix for retry causing issues for future redo actions
2022-03-11 17:41:49 +01:00
ebolam ae854bab3d Fix for retry causing issues for future redo actions 2022-03-11 11:40:55 -05:00
henk717 2c66461c14
Merge pull request #96 from VE-FORBRYDERNE/dlpack
Use DLPack to convert PyTorch tensors to JAX arrays
2022-03-10 22:00:38 +01:00
Gnome Ann a99eb8724d Use DLPack to convert PyTorch tensors to JAX arrays 2022-03-10 15:12:42 -05:00
henk717 b02d5e8696 Allows missing model_config again 2022-03-10 19:59:10 +01:00
henk717 172a548fa1 Fallback to generic GPT2 Tokenizer 2022-03-10 19:52:15 +01:00
henk717 68281184bf Remove Lowmem from TPU 2022-03-09 19:21:15 +01:00
henk717 9dee9b5c6d Ignore incorrect problems 2022-03-09 12:03:37 +01:00
henk717 a28e553412 Remove unused gettokenids 2022-03-09 11:59:33 +01:00
henk717 7434c9221b Expand OAI Setting Compatibility 2022-03-07 08:56:47 +01:00
ebolam f6c95f18fa
Fix for Redo (#94)
* Corrected redo to skip blank steps (blank from "deleting" the chunk with the edit function)

* Removed debug code
2022-03-06 23:18:14 +01:00
henk717 f857696224 OAI ConfigName Bugfix 2022-03-06 20:18:42 +01:00
henk717 3ddc9647eb Basic GooseAI Support 2022-03-06 20:10:30 +01:00
henk717 f1b0ea711e
Merge branch 'KoboldAI:main' into united 2022-03-06 19:02:59 +01:00
henk717 4835192041 Load TK on demand 2022-03-06 14:12:01 +01:00
henk717 daea4b8d15 Fix Breakmodel RAM Regression 2022-03-06 08:26:50 +01:00
henk717 105d3831b5 Lazy Load Float32 for CPU 2022-03-06 07:56:04 +01:00
henk717 77cc2ee789
Merge pull request #93 from VE-FORBRYDERNE/lazy-loader
Lazy loader
2022-03-05 20:32:31 +01:00
Gnome Ann 373f7b9bd5 Don't convert tensors to float16 if using CPU-only mode 2022-03-05 14:30:26 -05:00
Gnome Ann 579e85820c Resolve merge conflict 2022-03-05 14:13:56 -05:00
Gnome Ann 2e19ea1bb6 Auto detect if we're in a Colab TPU instance 2022-03-05 14:07:23 -05:00
henk717 3a5793c815 No longer uses --colab_tpu 2022-03-05 19:58:24 +01:00
henk717 935c7e5786 Improved TPU support 2022-03-05 19:47:51 +01:00
henk717 6f2febb142
Merge pull request #92 from ebolam/united
Hopefully Last Redo Fix
2022-03-05 19:26:15 +01:00
ebolam 4a8d7f5e0b
Merge branch 'henk717:united' into united 2022-03-05 13:25:10 -05:00
henk717 c20435855b
Merge pull request #91 from VE-FORBRYDERNE/transformers-version-check
Put the XGLM embedding patch behind a version check
2022-03-05 19:03:00 +01:00
Gnome Ann 4625158d30 Fix typo in previous commit 2022-03-05 12:56:42 -05:00
Gnome Ann 0a258a6282 Support for loading HF models on TPU with `--colab_tpu` 2022-03-05 12:33:33 -05:00
Gnome Ann 86ac562b0c Lazy loader should convert model tensors to float16 before moving them 2022-03-05 11:31:34 -05:00
ebolam 4dd119c38d Redo no longer goes through formatting function (thereby getting changed) 2022-03-05 11:15:33 -05:00
ebolam 353817b4da Remove debug print statements 2022-03-05 10:35:06 -05:00
ebolam 221f264fa7 Redo fix. Fix for actions structure to not error out when asking for next_id when the actions list is empty. 2022-03-05 10:31:28 -05:00
Gnome Ann a00dede610 Put the XGLM embedding patch behind a version check 2022-03-04 19:10:15 -05:00
Gnome Ann 5674516f0c Merge branch 'united' into lazy-loader 2022-03-04 18:27:51 -05:00
henk717 8e12b7df61
Merge pull request #90 from ebolam/united
Redo Bug Fix
2022-03-04 22:10:49 +01:00
ebolam 5f92cbc231 Merge branch 'united' of https://github.com/ebolam/KoboldAI into united 2022-03-04 15:37:34 -05:00
ebolam 321f45ccad Fix debug to never crash (would on some initialization steps) 2022-03-04 15:36:13 -05:00
ebolam ee883fc4da
Merge branch 'henk717:united' into united 2022-03-04 14:15:16 -05:00
ebolam 26b9268391 Redo bug fix 2022-03-04 14:14:44 -05:00
henk717 eb247d69c3
Merge branch 'KoboldAI:main' into united 2022-03-04 18:24:56 +01:00
Gnome Ann 4474607f88 Merge branch 'united' into lazy-loader 2022-03-04 11:12:29 -05:00
Gnome Ann a1fedca2c8 Use lazy loading automatically if a config file exists for the model 2022-03-04 11:11:33 -05:00
henk717 addc7edd49
Merge branch 'KoboldAI:main' into united 2022-03-04 11:34:04 +01:00
henk717 2936778dbc
Merge branch 'KoboldAI:main' into united 2022-03-04 09:56:35 +01:00
Gnome Ann f0629958b1 Merge branch 'united' into lazy-loader 2022-03-04 00:37:25 -05:00
Gnome Ann 58a2c18821 Add lazy torch loading support to transformers backend 2022-03-04 00:33:10 -05:00
Gnome Ann 1515996fca Fix torch_lazy_loader seek offset calculation 2022-03-03 23:53:40 -05:00
Gnome Ann 24bc0f81ea Remove duplicate `torch_load` definition 2022-03-03 19:55:31 -05:00
Gnome Ann 8e6e04be5f (torch_lazy_loader.py) Add dematerialized modules setting 2022-03-03 11:17:59 -05:00
Gnome Ann 1ecc452dc8 (torch_lazy_loader.py) Add support for materializing from a ZipExtFile 2022-03-02 13:08:21 -05:00
henk717 e033b04f87 Restore United 2022-03-02 11:40:50 +01:00
Gnome Ann c338b52d68 (torch_lazy_loader.py) Handle checkpoints with merged storage blocks 2022-03-02 01:02:35 -05:00
Gnome Ann 4fa4dbac50 Clean up when error is thrown in `use_lazy_torch_load` 2022-03-01 19:30:22 -05:00
Gnome Ann a0344b429c Upload torch_lazy_loader.py 2022-03-01 15:40:44 -05:00
ebolam 3f73f84b69 bug fix 2022-02-28 19:04:12 -05:00
henk717 50ad6864c9
Merge pull request #87 from ebolam/united
Debug and load story fix for actions_metadata variable
2022-02-28 16:58:49 +01:00
ebolam 6003b2369b Debug and load story fix for actions_metadata variable 2022-02-28 10:39:36 -05:00
henk717 261981da45
Merge pull request #86 from ebolam/united
Fixed error in redo action
2022-02-28 14:43:53 +01:00
ebolam 47d102635e
Merge branch 'united' into united 2022-02-28 08:37:45 -05:00
ebolam 7803fbb137 Fixed error in redo action when editing previous entries and/or editing right after a redo 2022-02-28 08:31:26 -05:00
henk717 13fe472264 Menu Polish 2022-02-28 02:47:15 +01:00
henk717 f628929401
Merge pull request #85 from VE-FORBRYDERNE/sp
Fix a bug with soft prompts when using transformers XGLM
2022-02-28 02:33:18 +01:00
henk717 4849a30d88
Merge pull request #84 from mrseeker/patch-3
Added KoboldAI/fairseq-dense-2.7B-Janeway
2022-02-28 02:33:07 +01:00
henk717 a466e13c00 Model List Support 2022-02-26 12:34:07 +01:00
Gnome Ann a22d59e191 Fix a bug with soft prompts when using transformers XGLM 2022-02-25 12:35:23 -05:00
Julius ter Pelkwijk 0a7376a711
Added KoboldAI/fairseq-dense-2.7B-Janeway
With pleasure I am introducing KoboldAI/fairseq-dense-2.7B-Janeway.
2022-02-24 09:00:56 +01:00
henk717 1fc173890e
Merge pull request #83 from VE-FORBRYDERNE/loadsettings
Load settings earlier to avoid TPU badwords issues
2022-02-24 04:24:28 +01:00
Gnome Ann 072ca87977 Load soft prompt at the end instead of inside `loadsettings()` 2022-02-23 21:15:08 -05:00
Gnome Ann 8120e4dfa2 Need to set `vars.allowsp` to True before calling `loadsettings()` 2022-02-23 21:09:31 -05:00
Gnome Ann c45ba497c9 Load settings earlier to avoid TPU badwords issues 2022-02-23 20:39:11 -05:00
henk717 ac59e55d62 Smaller optimizations 2022-02-24 01:14:26 +01:00
henk717 8e9d9faa97
Merge pull request #82 from VE-FORBRYDERNE/tpu-config
Allow TPU models to specify settings/config in config.json
2022-02-24 00:53:40 +01:00
Gnome Ann ad10ac8871 Allow TPU models to specify settings/config in config.json 2022-02-23 18:22:18 -05:00
henk717 7de3311000 Fix sentencepiece model saving 2022-02-23 22:04:41 +01:00
henk717 6151d16df0
Merge pull request #81 from VE-FORBRYDERNE/dematerialized
Use dematerialized loading in TPU backend for lower device memory usage
2022-02-23 07:11:26 +01:00
Gnome Ann 7ec549c726 Use dematerialized loading in TPU backend for lower device memory usage 2022-02-22 19:43:13 -05:00
henk717 fd7ba9f70e Also check for Config in models/ 2022-02-22 19:22:08 +01:00
henk717 306d96a8eb Seperate Drive Disconnect 2022-02-22 18:03:06 +01:00
henk717 a0518edc36 Temporary Transformers Git for XGLM 2022-02-22 02:42:04 +01:00
henk717 74012a24c9 Expose GDrive Models 2022-02-22 02:35:27 +01:00
henk717 9aeae94d0e Cleanup leakage (Didn't appear in my commit list) 2022-02-22 02:32:02 +01:00
henk717 cb6ccacd64 Dependencies required for newer models 2022-02-21 21:17:12 +01:00
henk717 4ace11f5b8
Merge pull request #80 from VE-FORBRYDERNE/xglm-position-ids
Temporary fix for XGLM positional embedding issues
2022-02-21 00:47:20 +01:00
henk717 300db651de Open models folder by default 2022-02-21 00:46:18 +01:00
Gnome Ann da10e2dc1d Don't crash if `XGLMSinusoidalPositionalEmbedding` doesn't exist 2022-02-20 17:41:00 -05:00
Gnome Ann 5dc4969173 Temporary fix for XGLM positional embedding issues 2022-02-20 14:17:24 -05:00
henk717 7c678820cd Exclude Models from our Git 2022-02-20 19:36:14 +01:00
henk717 27cf59bb94
Merge pull request #79 from VE-FORBRYDERNE/xglm-eos
Prevent transformers XGLM from stopping generation on `</s>` token
2022-02-20 19:03:51 +01:00
Gnome Ann a63fa3b067 Prevent transformers XGLM from stopping generation on `</s>` token 2022-02-19 23:15:16 -05:00
henk717 70e0295600
Merge branch 'KoboldAI:main' into united 2022-02-19 23:34:46 +01:00
henk717 a47e93cee7 Seperate Low Memory Mode
In 1.16 we had significantly faster loading speeds because we did not do as much memory conservation, its time to give users the choice. If you want the original faster behavior and have the memory run KoboldAI as usual. Otherwise run play-lowmem.bat or aiserver.py with --lowmem. For colab this is still the default behavior to avoid breaking models that would otherwise load fine.
2022-02-18 16:21:28 +01:00
henk717 4c84d731db
Merge branch 'KoboldAI:main' into united 2022-02-18 15:02:24 +01:00
henk717 8e03f1c612
Merge branch 'KoboldAI:main' into united 2022-02-18 14:21:34 +01:00
henk717 cba93e29d2 Update aiserver.py 2022-02-18 02:11:08 +01:00
henk717 76a6c124dd Quiet on Colab
Makes the Colab mode also automatically activate the Quiet mode to improve privacy. We should no longer need this in the colab console thanks to the redo feature. Need something different for testing? Use --remote instead.
2022-02-18 02:07:40 +01:00
henk717 02246dfc4d Remote play improvements
Change the proposed --share to --unblock to make it more apparent what this feature does. The feature unblocks the port from external access, but does not add remote play support. For remote play support without a proxy service I have added --host .
2022-02-18 01:08:12 +01:00
henk717 9b72583110
Merge branch 'KoboldAI:main' into united 2022-02-18 00:37:34 +01:00
henk717 a05aef552c
Merge branch 'KoboldAI:main' into united 2022-02-14 18:10:56 +01:00
henk717 ca5b9f968f
Merge pull request #76 from VE-FORBRYDERNE/newline
Fix fairseq newline handling issues
2022-02-14 18:10:25 +01:00
Gnome Ann ec54bc9d9b Fix typo in `send_debug()` 2022-02-12 20:11:35 -05:00
Gnome Ann f682c1229a Fix fairseq newline handling issues 2022-02-12 13:23:59 -05:00
henk717 c1af8f72c3
Merge pull request #75 from ebolam/united
Fixed retry bug due to redo/pin code
2022-02-11 03:27:51 +01:00
ebolam 633152ee84 Fixed Retry bug due to redo/pin code 2022-02-10 10:01:07 -05:00
ebolam cd00373cfb Deleted unused svg 2022-02-10 09:21:07 -05:00
henk717 e1ef4e4fa8
Merge pull request #74 from ebolam/united
Redo, Pinning, and docker enhancements
2022-02-07 01:06:36 +01:00
ebolam c0bbe9f810 Reverted docker-cuda to mainline version. 2022-02-06 19:04:13 -05:00
ebolam 586b989582 Redo bug fix 2022-02-06 18:53:24 -05:00
ebolam 98609a8abc Merge branch 'united' of https://github.com/ebolam/KoboldAI into united 2022-02-06 13:48:34 -05:00
ebolam 80ae054cb5
Merge branch 'henk717:united' into united 2022-02-06 13:42:59 -05:00
ebolam 9e17ea9636 Fixed model downloading problem where models were downloaded multiple times 2022-02-06 13:42:46 -05:00
ebolam 8195360fcc Merge branch 'united' of https://github.com/ebolam/KoboldAI into united 2022-02-06 12:31:45 -05:00
henk717 7695eeb31a
Merge branch 'KoboldAI:main' into united 2022-02-06 18:06:07 +01:00
henk717 c38108d818
Merge pull request #73 from VE-FORBRYDERNE/xglm-breakmodel
Breakmodel support for the fairseq models
2022-02-06 18:05:59 +01:00
ebolam 475995f8a5
Merge branch 'henk717:united' into united 2022-02-04 10:21:10 -05:00
ebolam 5534fc9800 Moved build script into the docker folder 2022-02-03 08:25:51 -05:00
ebolam 02c7ca3e84
Merge branch 'henk717:united' into united 2022-02-03 08:11:06 -05:00
ebolam 0684a221cd Changed pin icon for re-dos to be a circular arrow that is not clickable to make it clear it is a redo action and cannot be cleared. 2022-02-03 08:08:43 -05:00
Gnome Ann 4904af6adc Fix a mistake in the previous commit 2022-02-02 23:04:59 -05:00
Gnome Ann 78f52063c7 Fix XGLM soft prompts 2022-02-02 22:45:16 -05:00
Ben Fox 3a6d8f1030 Added script to build the 5 images for the docker containers 2022-02-02 15:18:18 -05:00
Ben Fox 004bb3bcc8 change tag name 2022-02-02 15:15:11 -05:00
Ben Fox 6f7578abca adding base environment file 2022-02-02 15:10:50 -05:00
Ben Fox 604246d12c Merge branch 'henk717-united' into united 2022-02-02 15:05:14 -05:00
Ben Fox e2d2ebcae6 upstream merge 2022-02-02 15:04:59 -05:00
Gnome Ann d847d04605 Fix some typos in XGLM breakmodel 2022-02-01 16:00:46 -05:00
Gnome Ann 8e1169ea61 Enable `vars.bmsupported` when using XGLM 2022-02-01 15:31:59 -05:00
Gnome Ann e7f65cee09 XGLM breakmodel 2022-02-01 13:04:35 -05:00
ebolam 1470b1666d Fixed single gen redo 2022-01-27 20:17:13 -05:00
ebolam ab5d3b4255 Docker file fix 2022-01-26 21:14:10 -05:00
ebolam 2278b7c103 Changed behavior of redo if there is only 1 option to just select it 2022-01-26 21:07:55 -05:00
ebolam 06bbe429d9 Bug fix for redo/pinning persisting over new game requests 2022-01-26 21:02:36 -05:00
ebolam a27c441cdf Updated base image again to only have transformers change between the images 2022-01-26 15:35:36 -05:00
ebolam d2b15e2a6e Updated dockerfiles to create images for docker hub that are per-compiled 2022-01-26 11:35:58 -05:00
ebolam b0f1bdf2fd
Merge branch 'henk717:united' into united 2022-01-26 11:27:12 -05:00
ebolam a0100ff3cc Fixed error with redo action when a list of options is on screen sometimes causing the list to disappear entirely. 2022-01-24 15:15:45 -05:00
ebolam bd0732fbd6 Fix for redo with options.
Added debug menu
2022-01-24 12:54:44 -05:00
ebolam 47ec22873d bug-fix if settings directory is a symlink. 2022-01-22 21:43:32 -05:00
ebolam d12a6a5620 added DockerFile for finetune 2022-01-22 20:38:43 -05:00
ebolam f54f46b068 bugfix for metadata saving 2022-01-22 20:30:14 -05:00
ebolam 9355ae420d Merge branch 'henk717-united' into united 2022-01-22 17:57:51 -05:00
ebolam bdd358f40f Merge branch 'united' of https://github.com/henk717/KoboldAI into henk717-united 2022-01-22 17:57:33 -05:00
ebolam 9df758c1f4 added quiet option to suppress any story text from showing in the console (reduce logs when running in a docker container) 2022-01-22 15:30:56 -05:00
ebolam 7d76ecbf83 added models folder 2022-01-22 15:01:01 -05:00
ebolam 54d99490a9 Set the dockerfile to save the code in the image
set the transformer version to huggingface
added the default run command
2022-01-22 14:48:32 -05:00
ebolam 12e7b6d10b Added --share command line parameter so we can set host=0.0.0.0 on local instances without editing code
moved save location of downloaded models to models/XXXXXX so we can more easily set this as a volume in docker
2022-01-22 14:47:28 -05:00
ebolam 8e2fab8eb0 whops. Missed a } 2022-01-22 08:48:32 -05:00
ebolam 2010e7b9bc Added saveas option for saving without metadata information
Fixed redo on an empty story erroring
Fixed redo when you're at the current end of a chain causing an error
2022-01-21 19:02:56 -05:00
ebolam d31fb278ce Working redo and pin options 2022-01-21 15:30:37 -05:00
ebolam c9a99adde8
Add files via upload 2022-01-21 07:41:04 -05:00
ebolam fcaacf636d
Merge branch 'henk717:united' into united 2022-01-21 07:40:25 -05:00
ebolam 72f5b147cc
Merge branch 'henk717:united' into united 2022-01-20 18:03:59 -05:00
Ben Fox 03d54364f4 Initial commit of the actions metadata variable population 2022-01-20 15:18:43 -05:00
52 changed files with 3588 additions and 902 deletions

7
.gitignore vendored
View File

@ -9,6 +9,8 @@ stories
/.project
*.bak
miniconda3
runtime
bin
*.settings
__pycache__
*.log
@ -21,9 +23,14 @@ userscripts
softprompts
models
!models/models go here.txt
Uninstall
.ipynb_checkpoints
# Ignore PyCharm project files.
.idea
# Ignore compiled Python files.
*.pyc
# Don't ignore defaults
!defaults/*

View File

@ -7,23 +7,23 @@ IF %M%==2 GOTO subfolder
IF %M%==3 GOTO drivemap_B
:subfolder
umamba.exe install --no-shortcuts -r miniconda3 -n base -c conda-forge jupyter
umamba.exe install --no-shortcuts -r miniconda3 -n base -c conda-forge jupyterlab jupyterlab-git
call miniconda3\condabin\activate
jupyter notebook
jupyter-lab
cmd /k
:drivemap
subst K: miniconda3 >nul
umamba.exe install --no-shortcuts -r K:\python\ -n base -c conda-forge jupyter
umamba.exe install --no-shortcuts -r K:\python\ -n base -c conda-forge jupyterlab jupyterlab-git
call K:\python\condabin\activate
jupyter notebook
jupyter-lab
subst K: /D
cmd /k
:drivemap_B
subst B: miniconda3 >nul
umamba.exe install --no-shortcuts -r B:\python\ -n base -c conda-forge jupyter
umamba.exe install --no-shortcuts -r B:\python\ -n base -c conda-forge jupyterlab jupyterlab-git
call B:\python\condabin\activate
jupyter notebook
jupyter-lab
subst B: /D
cmd /k

32
Uninstall.bat Normal file
View File

@ -0,0 +1,32 @@
@echo off
cd /D %~dp0
TITLE KoboldAI Uninstall Helper
SET /P M=<loader.settings
IF %M%==3 subst /D B: >nul
IF %M%==1 subst /D K: >nul
IF "%1" == "FORCE" GOTO UNINSTALL
IF EXIST "Uninstall\unins000.exe" (
start Uninstall\unins000.exe
exit
) ELSE (
echo This will remove all KoboldAI folders that do not contain user data
pause
GOTO UNINSTALL
)
:UNINSTALL
echo Uninstallation in progress, please wait...
set DM=Y
attrib -h .git >nul
for /d %%D in (*) do if not "%%~nxD"=="stories" if not "%%~nxD"=="userscripts" if not "%%~nxD"=="settings" if not "%%~nxD"=="softprompts" if not "%%~nxD"=="models" if not "%%~nxD"=="Uninstall" rmdir /S /Q %%~nxD
for %%i in (*) do if not "%%i"=="Uninstall.bat" del /q "%%i"
set /P DM=Would you like to delete the models folder? (Y/n) :
IF %DM%==Y rmdir models /s /q
IF %DM%==y rmdir models /s /q
set DM=N
set /P DM=Would you like to delete all other user folders? (y/N) :
IF %DM%==Y rmdir stories userscripts settings softprompts /s /q
IF %DM%==y rmdir stories userscripts settings softprompts /s /q
del Uninstall.bat

File diff suppressed because it is too large Load Diff

View File

@ -212,14 +212,17 @@ Copyright 2018 The Hugging Face team
import torch
from torch import nn
import torch.cuda.comm
import copy
import gc
import sys
import itertools
import bisect
import random
from typing import Optional
from transformers.modeling_outputs import BaseModelOutputWithPast
from transformers.modeling_outputs import BaseModelOutputWithPast, BaseModelOutputWithPastAndCrossAttentions
from transformers.utils import logging
logger = logging.get_logger(__name__)
@ -230,22 +233,40 @@ gpu_blocks = []
primary_device = 0
def move_hidden_layers(transformer):
# Copied from transformers.models.bart.modeling_bart._expand_mask
def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Optional[int] = None):
"""
Expands attention_mask from `[bsz, seq_len]` to `[bsz, 1, tgt_seq_len, src_seq_len]`.
"""
bsz, src_len = mask.size()
tgt_len = tgt_len if tgt_len is not None else src_len
expanded_mask = mask[:, None, None, :].expand(bsz, 1, tgt_len, src_len).to(dtype)
inverted_mask = 1.0 - expanded_mask
return inverted_mask.masked_fill(inverted_mask.bool(), torch.finfo(dtype).min)
def move_hidden_layers(transformer, h=None):
if h is None:
h = transformer.h
assert len(gpu_blocks) <= torch.cuda.device_count()
assert sum(gpu_blocks) <= len(transformer.h)
ram_blocks = len(transformer.h) - sum(gpu_blocks)
assert sum(gpu_blocks) <= len(h)
ram_blocks = len(h) - sum(gpu_blocks)
transformer.extrastorage = {}
torch.cuda.empty_cache()
able_to_pin_layers = True
for i in range(ram_blocks):
transformer.h[i].to("cpu")
transformer.extrastorage[i] = copy.deepcopy(transformer.h[i])
h[i].to("cpu")
transformer.extrastorage[i] = copy.deepcopy(h[i])
smalltensor = torch.tensor(0).to(primary_device)
for param1 in transformer.h[i].parameters():
for param1 in h[i].parameters():
param1.data = smalltensor
transformer.h[i].to(primary_device)
h[i].to(primary_device)
for param in transformer.extrastorage[i].parameters():
param.requires_grad = False
param.data = param.data.detach()
@ -259,34 +280,34 @@ def move_hidden_layers(transformer):
torch.cuda.empty_cache()
if ram_blocks:
for param1,param2 in zip(transformer.h[0].parameters(),transformer.extrastorage[0].parameters()):
for param1,param2 in zip(h[0].parameters(),transformer.extrastorage[0].parameters()):
param1.data = param2.data.to(primary_device, non_blocking=False).detach()
for param1,param2 in zip(transformer.h[ram_blocks-1].parameters(),transformer.extrastorage[ram_blocks-1].parameters()):
for param1,param2 in zip(h[ram_blocks-1].parameters(),transformer.extrastorage[ram_blocks-1].parameters()):
param1.data = param2.data.to(primary_device, non_blocking=False).detach()
i = ram_blocks
for j in range(len(gpu_blocks)):
for _ in range(gpu_blocks[j]):
transformer.h[i].to(j)
h[i].to(j)
i += 1
def new_forward(
self,
input_ids=None,
past_key_values=None,
attention_mask=None,
token_type_ids=None,
position_ids=None,
head_mask=None,
inputs_embeds=None,
use_cache=None,
output_attentions=None,
output_hidden_states=None,
return_dict=None,
embs=None,
):
def new_forward_neo(
self,
input_ids=None,
past_key_values=None,
attention_mask=None,
token_type_ids=None,
position_ids=None,
head_mask=None,
inputs_embeds=None,
use_cache=None,
output_attentions=None,
output_hidden_states=None,
return_dict=None,
embs=None,
):
assert len(gpu_blocks) <= torch.cuda.device_count()
assert sum(gpu_blocks) <= len(self.h)
ram_blocks = len(self.h) - sum(gpu_blocks)
@ -477,3 +498,365 @@ def new_forward(
hidden_states=all_hidden_states,
attentions=all_self_attentions,
)
def new_forward_xglm(
self,
input_ids=None,
attention_mask=None,
encoder_hidden_states=None,
encoder_attention_mask=None,
head_mask=None,
cross_attn_head_mask=None,
past_key_values=None,
inputs_embeds=None,
use_cache=None,
output_attentions=None,
output_hidden_states=None,
return_dict=None,
):
assert len(gpu_blocks) <= torch.cuda.device_count()
assert sum(gpu_blocks) <= len(self.layers)
ram_blocks = len(self.layers) - sum(gpu_blocks)
cumulative_gpu_blocks = tuple(itertools.accumulate(gpu_blocks))
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
)
use_cache = use_cache if use_cache is not None else self.config.use_cache
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
# retrieve input_ids and inputs_embeds
if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
elif input_ids is not None:
input_shape = input_ids.size()
input_ids = input_ids.view(-1, input_shape[-1])
elif inputs_embeds is not None:
input_shape = inputs_embeds.size()[:-1]
else:
raise ValueError("You have to specify either input_ids or inputs_embeds")
# past_key_values_length
past_key_values_length = past_key_values[0][0].shape[2] if past_key_values is not None else 0
if inputs_embeds is None:
if breakmodel:
input_ids = input_ids.to(primary_device)
inputs_embeds = self.embed_tokens(input_ids) * self.embed_scale
attention_mask = self._prepare_decoder_attention_mask(
attention_mask, input_shape, inputs_embeds, past_key_values_length
)
# expand encoder attention mask
if encoder_hidden_states is not None and encoder_attention_mask is not None:
# [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
encoder_attention_mask = _expand_mask(encoder_attention_mask, inputs_embeds.dtype, tgt_len=input_shape[-1])
# embed positions
if breakmodel:
inputs_embeds = inputs_embeds.to(primary_device)
positions = self.embed_positions(input_ids, inputs_embeds, past_key_values_length)
if breakmodel:
positions = positions.to(primary_device)
hidden_states = inputs_embeds + positions
hidden_states = nn.functional.dropout(hidden_states, p=self.dropout, training=self.training)
# decoder layers
all_hidden_states = () if output_hidden_states else None
all_self_attns = () if output_attentions else None
all_cross_attentions = () if (output_attentions and encoder_hidden_states is not None) else None
next_decoder_cache = () if use_cache else None
if breakmodel and ram_blocks:
copystream = torch.cuda.Stream(device=primary_device, priority=-1)
# check if head_mask/cross_attn_head_mask has a correct number of layers specified if desired
for attn_mask, mask_name in zip([head_mask, cross_attn_head_mask], ["head_mask", "cross_attn_head_mask"]):
if attn_mask is not None:
assert attn_mask.size()[0] == (
len(self.layers)
), f"The `{mask_name}` should be specified for {len(self.layers)} layers, but it is for {head_mask.size()[0]}."
for idx, decoder_layer in enumerate(self.layers):
i = idx
if breakmodel:
if i in range(ram_blocks):
index1 = (i+1)%ram_blocks
for param1,param2 in zip(self.layers[index1].parameters(),self.layers[(i-1)%ram_blocks].parameters()):
param1.data = param2.data
for param1,param2 in zip(self.layers[index1].parameters(),self.extrastorage[index1].parameters()):
with torch.cuda.stream(copystream):
torch.cuda.comm.broadcast(param2.data,out = [param1.data])
# add LayerDrop (see https://arxiv.org/abs/1909.11556 for description)
if output_hidden_states:
all_hidden_states += (hidden_states,)
dropout_probability = random.uniform(0, 1)
if self.training and (dropout_probability < self.layerdrop):
continue
past_key_value = past_key_values[idx] if past_key_values is not None else None
if self.gradient_checkpointing and self.training:
if use_cache:
logger.warning(
"`use_cache = True` is incompatible with gradient checkpointing`. Setting `use_cache = False`..."
)
use_cache = False
def create_custom_forward(module):
def custom_forward(*inputs):
# None for past_key_value
return module(*inputs, output_attentions, use_cache)
return custom_forward
layer_outputs = torch.utils.checkpoint.checkpoint(
create_custom_forward(decoder_layer),
hidden_states,
attention_mask,
encoder_hidden_states,
encoder_attention_mask,
head_mask[idx] if head_mask is not None else None,
cross_attn_head_mask[idx] if cross_attn_head_mask is not None else None,
None,
)
else:
if breakmodel:
device = primary_device if i < ram_blocks else bisect.bisect_right(cumulative_gpu_blocks, i - ram_blocks)
layer_outputs = decoder_layer(
hidden_states.to(device) if breakmodel and hidden_states is not None else hidden_states,
attention_mask=attention_mask.to(device) if breakmodel and attention_mask is not None else attention_mask,
encoder_hidden_states=encoder_hidden_states.to(device) if breakmodel and encoder_hidden_states is not None else encoder_hidden_states,
encoder_attention_mask=encoder_attention_mask.to(device) if breakmodel and encoder_attention_mask is not None else encoder_attention_mask,
layer_head_mask=((head_mask[idx].to(device) if breakmodel and head_mask[idx] is not None else head_mask[idx]) if head_mask is not None else None),
cross_attn_layer_head_mask=(
(cross_attn_head_mask[idx].to(device) if breakmodel and cross_attn_head_mask[idx] is not None else cross_attn_head_mask[idx]) if cross_attn_head_mask is not None else None
),
past_key_value=tuple(v.to(device) for v in past_key_value if v is not None) if breakmodel and past_key_value is not None and i >= ram_blocks and len(past_key_value) and past_key_value[0].device.index != device else past_key_value,
output_attentions=output_attentions,
use_cache=use_cache,
)
hidden_states = layer_outputs[0]
if use_cache:
next_decoder_cache += (layer_outputs[3 if output_attentions else 1],)
if output_attentions:
all_self_attns += (layer_outputs[1],)
if encoder_hidden_states is not None:
all_cross_attentions += (layer_outputs[2],)
if breakmodel:
if i in range(ram_blocks):
torch.cuda.synchronize()
torch.cuda.empty_cache()
if breakmodel:
if ram_blocks:
del copystream
torch.cuda.empty_cache()
hidden_states = hidden_states.to(primary_device)
hidden_states = self.layer_norm(hidden_states)
if breakmodel:
hidden_states = hidden_states.to(primary_device)
# add hidden states from the last decoder layer
if output_hidden_states:
all_hidden_states += (hidden_states,)
next_cache = next_decoder_cache if use_cache else None
if not return_dict:
return tuple(
v
for v in [hidden_states, next_cache, all_hidden_states, all_self_attns, all_cross_attentions]
if v is not None
)
return BaseModelOutputWithPastAndCrossAttentions(
last_hidden_state=hidden_states,
past_key_values=next_cache,
hidden_states=all_hidden_states,
attentions=all_self_attns,
cross_attentions=all_cross_attentions,
)
def new_forward_opt(
self,
input_ids=None,
attention_mask=None,
head_mask=None,
past_key_values=None,
inputs_embeds=None,
use_cache=None,
output_attentions=None,
output_hidden_states=None,
return_dict=None,
):
assert len(gpu_blocks) <= torch.cuda.device_count()
assert sum(gpu_blocks) <= len(self.layers)
ram_blocks = len(self.layers) - sum(gpu_blocks)
cumulative_gpu_blocks = tuple(itertools.accumulate(gpu_blocks))
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
)
use_cache = use_cache if use_cache is not None else self.config.use_cache
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
# retrieve input_ids and inputs_embeds
if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time")
elif input_ids is not None:
input_shape = input_ids.size()
input_ids = input_ids.view(-1, input_shape[-1])
elif inputs_embeds is not None:
input_shape = inputs_embeds.size()[:-1]
else:
raise ValueError("You have to specify either decoder_input_ids or decoder_inputs_embeds")
past_key_values_length = past_key_values[0][0].shape[2] if past_key_values is not None else 0
if inputs_embeds is None:
if breakmodel:
input_ids = input_ids.to(primary_device)
inputs_embeds = self.embed_tokens(input_ids)
# embed positions
if breakmodel:
inputs_embeds = inputs_embeds.to(primary_device)
if attention_mask is None:
attention_mask = torch.ones(inputs_embeds.shape[:2], dtype=torch.bool, device=inputs_embeds.device)
positions = self.embed_positions(attention_mask)[:, past_key_values_length:, :]
if breakmodel:
positions = positions.to(primary_device)
attention_mask = self._prepare_decoder_attention_mask(
attention_mask, input_shape, inputs_embeds, past_key_values_length
)
if self.project_in is not None:
inputs_embeds = self.project_in(inputs_embeds)
hidden_states = inputs_embeds + positions
hidden_states = nn.functional.dropout(hidden_states, p=self.dropout, training=self.training)
# decoder layers
all_hidden_states = () if output_hidden_states else None
all_self_attns = () if output_attentions else None
next_decoder_cache = () if use_cache else None
if breakmodel and ram_blocks:
copystream = torch.cuda.Stream(device=primary_device, priority=-1)
# check if head_mask has a correct number of layers specified if desired
for attn_mask, mask_name in zip([head_mask], ["head_mask"]):
if attn_mask is not None:
if attn_mask.size()[0] != (len(self.layers)):
raise ValueError(
f"The `{mask_name}` should be specified for {len(self.layers)} layers, but it is for"
f" {head_mask.size()[0]}."
)
for idx, decoder_layer in enumerate(self.layers):
i = idx
if breakmodel:
if i in range(ram_blocks):
index1 = (i+1)%ram_blocks
for param1,param2 in zip(self.layers[index1].parameters(),self.layers[(i-1)%ram_blocks].parameters()):
param1.data = param2.data
for param1,param2 in zip(self.layers[index1].parameters(),self.extrastorage[index1].parameters()):
with torch.cuda.stream(copystream):
torch.cuda.comm.broadcast(param2.data,out = [param1.data])
# add LayerDrop (see https://arxiv.org/abs/1909.11556 for description)
if output_hidden_states:
all_hidden_states += (hidden_states,)
dropout_probability = random.uniform(0, 1)
if self.training and (dropout_probability < self.layerdrop):
continue
past_key_value = past_key_values[idx] if past_key_values is not None else None
if self.gradient_checkpointing and self.training:
if use_cache:
logger.warning(
"`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
)
use_cache = False
def create_custom_forward(module):
def custom_forward(*inputs):
# None for past_key_value
return module(*inputs, output_attentions, None)
return custom_forward
layer_outputs = torch.utils.checkpoint.checkpoint(
create_custom_forward(decoder_layer),
hidden_states,
attention_mask,
head_mask[idx] if head_mask is not None else None,
None,
)
else:
if breakmodel:
device = primary_device if i < ram_blocks else bisect.bisect_right(cumulative_gpu_blocks, i - ram_blocks)
layer_outputs = decoder_layer(
hidden_states.to(device) if breakmodel and hidden_states is not None else hidden_states,
attention_mask=attention_mask.to(device) if breakmodel and attention_mask is not None else attention_mask,
layer_head_mask=((head_mask[idx].to(device) if breakmodel and head_mask[idx] is not None else head_mask[idx]) if head_mask is not None else None),
past_key_value=tuple(v.to(device) for v in past_key_value if v is not None) if breakmodel and past_key_value is not None and i >= ram_blocks and len(past_key_value) and past_key_value[0].device.index != device else past_key_value,
output_attentions=output_attentions,
use_cache=use_cache,
)
hidden_states = layer_outputs[0]
if use_cache:
next_decoder_cache += (layer_outputs[2 if output_attentions else 1],)
if output_attentions:
all_self_attns += (layer_outputs[1],)
if breakmodel:
if i in range(ram_blocks):
torch.cuda.synchronize()
torch.cuda.empty_cache()
if breakmodel:
if ram_blocks:
del copystream
torch.cuda.empty_cache()
hidden_states = hidden_states.to(primary_device)
if self.project_out is not None:
hidden_states = self.project_out(hidden_states)
if breakmodel:
hidden_states = hidden_states.to(primary_device)
# add hidden states from the last decoder layer
if output_hidden_states:
all_hidden_states += (hidden_states,)
next_cache = next_decoder_cache if use_cache else None
if not return_dict:
return tuple(v for v in [hidden_states, next_cache, all_hidden_states, all_self_attns] if v is not None)
return BaseModelOutputWithPast(
last_hidden_state=hidden_states,
past_key_values=next_cache,
hidden_states=all_hidden_states,
attentions=all_self_attns,
)

View File

@ -165,7 +165,7 @@ return function(_python, _bridged)
---@field num_outputs integer
---@field feedback string
---@field is_config_file_open boolean
local kobold = setmetatable({API_VERSION = 1.0}, metawrapper)
local kobold = setmetatable({API_VERSION = 1.1}, metawrapper)
local KoboldLib_mt = setmetatable({}, metawrapper)
local KoboldLib_getters = setmetatable({}, metawrapper)
local KoboldLib_setters = setmetatable({}, metawrapper)
@ -866,6 +866,7 @@ return function(_python, _bridged)
---@field settopp number
---@field settopk integer
---@field settfs number
---@field settypical number
---@field setreppen number
---@field setreppenslope number
---@field setreppenrange number
@ -882,6 +883,7 @@ return function(_python, _bridged)
---@field top_p number
---@field top_k integer
---@field tfs number
---@field typical number
---@field reppen number
---@field reppenslope number
---@field reppenrange number
@ -1048,11 +1050,34 @@ return function(_python, _bridged)
return
elseif not bridged.vars.gamestarted and v == "" then
error("`KoboldLib.submission` must not be set to the empty string when the story is empty")
return
end
bridged.vars.submission = v
end
--==========================================================================
-- Userscript API: Soft prompt
--==========================================================================
---@param t KoboldLib
---@return string?
function KoboldLib_getters.spfilename(t)
return bridged.get_spfilename()
end
---@param t KoboldLib
---@param v string?
function KoboldLib_setters.spfilename(t, v)
if v:find("/") or v:find("\\") then
error("Cannot set `KoboldLib.spfilename` to a string that contains slashes")
end
if bridged.set_spfilename(v) then
maybe_require_regeneration()
end
end
--==========================================================================
-- Userscript API: Model information
--==========================================================================

View File

@ -7,7 +7,7 @@
"private_outputs": true,
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyOKIa/NDLlYI5j63GXPtkXv",
"authorship_tag": "ABX9TyPbwW79K9/RkYH9i9rkYFyj",
"include_colab_link": true
},
"kernelspec": {
@ -68,14 +68,20 @@
"#@title <b><-- Click this to start KoboldAI</b>\n",
"#@markdown You can find a description of the models below along with instructions on how to start KoboldAI.\n",
"\n",
"Model = \"KoboldAI/GPT-Neo-2.7B-Janeway\" #@param [\"KoboldAI/GPT-Neo-2.7B-Janeway\", \"KoboldAI/GPT-Neo-2.7B-AID\", \"KoboldAI/GPT-Neo-2.7B-Picard\", \"KoboldAI/GPT-Neo-2.7B-Horni-LN\", \"KoboldAI/GPT-Neo-2.7B-Horni\", \"KoboldAI/GPT-Neo-2.7B-Shinen\", \"EleutherAI/gpt-neo-2.7B\"] {allow-input: true}\n",
"Model = \"KoboldAI/fairseq-dense-2.7B-Nerys\" #@param [\"KoboldAI/fairseq-dense-2.7B-Nerys\", \"KoboldAI/GPT-Neo-2.7B-Janeway\", \"KoboldAI/GPT-Neo-2.7B-AID\", \"KoboldAI/GPT-Neo-2.7B-Picard\", \"KoboldAI/GPT-Neo-2.7B-Horni-LN\", \"KoboldAI/GPT-Neo-2.7B-Horni\", \"KoboldAI/GPT-Neo-2.7B-Shinen\", \"EleutherAI/gpt-neo-2.7B\"] {allow-input: true}\n",
"Version = \"Official\" #@param [\"Official\", \"United\"] {allow-input: true}\n",
"Provider = \"Localtunnel\" #@param [\"Localtunnel\", \"Cloudflare\"]\n",
"\n",
"!nvidia-smi\n",
"from google.colab import drive\n",
"drive.mount('/content/drive/')\n",
"\n",
"!wget https://henk.tech/ckds -O - | bash /dev/stdin -m $Model -g $Version"
"if Provider == \"Localtunnel\":\n",
" tunnel = \"--localtunnel yes\"\n",
"else:\n",
" tunnel = \"\"\n",
"\n",
"!wget https://henk.tech/ckds -O - | bash /dev/stdin -m $Model -g $Version $tunnel"
],
"execution_count": null,
"outputs": []
@ -84,27 +90,32 @@
"cell_type": "markdown",
"source": [
"# GPU Edition Model Descriptions\n",
"| Model | Size | Style | Description |\n",
"| ------------------------------------------------------------ | -------- | ---------- | ------------------------------------------------------------ |\n",
"| [GPT-Neo-2.7B-Janeway](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Janeway) by Mr Seeker | 2.7B GPU | Novel | Janeway is a model created from Picard's dataset combined with a brand new collection of ebooks. This model is trained on 20% more content than Picard and has been trained on literature from various genres. Although the model is mainly focussed on SFW, romantic scenes might involve a degree of nudity. |\n",
"| [GPT-Neo-2.7B-Picard](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Picard) by Mr Seeker | 2.7B GPU | Novel | Picard is a model trained for SFW Novels based on GPT-Neo-2.7B. It is focused on Novel style writing without the NSFW bias. While the name suggests a sci-fi model this model is designed for Novels of a variety of genre's. It is meant to be used in KoboldAI's regular mode. |\n",
"| [GPT-Neo-2.7B-AID](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-AID) by melastacho | 2.7B GPU | Adventure | Also know as Adventure 2.7B this is a clone of the AI Dungeon Classic model and is best known for the epic wackey adventures that AI Dungeon Classic players love. |\n",
"| [GPT-Neo-2.7B-Horni-LN](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Horni-LN) by finetune | 2.7B GPU | Novel | This model is based on GPT-Neo-2.7B-Horni and retains its NSFW knowledge, but was then further biased towards SFW novel stories. If you seek a balance between a SFW Novel model and a NSFW model this model should be a good choice. |\n",
"| [GPT-Neo-2.7B-Horni](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Horni) by finetune | 2.7B GPU | NSFW | This model is tuned on Literotica to produce a Novel style model biased towards NSFW content. Can still be used for SFW stories but will have a bias towards NSFW content. It is meant to be used in KoboldAI's regular mode. |\n",
"| [GPT-Neo-2.7B-Shinen](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Shinen) by Mr Seeker | 2.7B GPU | NSFW | Shinen is an alternative to the Horni model designed to be more explicit. If Horni is to tame for you shinen might produce better results. While it is a Novel model it is unsuitable for SFW stories due to its heavy NSFW bias. Shinen will not hold back. It is meant to be used in KoboldAI's regular mode. |\n",
"| [GPT-Neo-2.7B](https://huggingface.co/EleutherAI/gpt-neo-2.7B) by EleutherAI | 2.7B GPU | Generic | This is the base model for all the other 2.7B models, it is best used when you have a use case that we have no other models available for, such as writing blog articles or programming. It can also be a good basis for the experience of some of the softprompts if your softprompt is not about a subject the other models cover. |\n",
"| Model | Size | Style | Description |\n",
"| --- | --- | --- | --- |\n",
"| [Fairseq-Dense-2.7B-Nerys](https://huggingface.co/KoboldAI/fairseq-dense-2.7B-Nerys) by Mr Seeker | 2.7B | Novel/Adventure | Nerys is a hybrid model based on Pike (A newer Janeway), on top of the Pike dataset you also get some Light Novels, Adventure mode support and a little bit of shinen thrown in the mix. The end result is a very diverse model that is heavily biased towards SFW novel writing, but one that can go beyond its novel training and make for an excellent adventure model to. Adventure mode is best played from a second person perspective, but can be played in first or third person as well. Novel writing can be done best from the first or third person. |\n",
"| [GPT-Neo-2.7B-Janeway](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Janeway) by Mr Seeker | 2.7B | Novel | Janeway is a model created from Picard's dataset combined with a brand new collection of ebooks. This model is trained on 20% more content than Picard and has been trained on literature from various genres. Although the model is mainly focussed on SFW, romantic scenes might involve a degree of nudity. |\n",
"| [GPT-Neo-2.7B-Picard](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Picard) by Mr Seeker | 2.7B | Novel | Picard is a model trained for SFW Novels based on GPT-Neo-2.7B. It is focused on Novel style writing without the NSFW bias. While the name suggests a sci-fi model this model is designed for Novels of a variety of genre's. It is meant to be used in KoboldAI's regular mode. |\n",
"| [GPT-Neo-2.7B-AID](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-AID) by melastacho | 2.7B | Adventure | Also know as Adventure 2.7B this is a clone of the AI Dungeon Classic model and is best known for the epic wackey adventures that AI Dungeon Classic players love. |\n",
"| [GPT-Neo-2.7B-Horni-LN](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Horni-LN) by finetune | 2.7B | Novel | This model is based on GPT-Neo-2.7B-Horni and retains its NSFW knowledge, but was then further biased towards SFW novel stories. If you seek a balance between a SFW Novel model and a NSFW model this model should be a good choice. |\n",
"| [GPT-Neo-2.7B-Horni](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Horni) by finetune | 2.7B | NSFW | This model is tuned on Literotica to produce a Novel style model biased towards NSFW content. Can still be used for SFW stories but will have a bias towards NSFW content. It is meant to be used in KoboldAI's regular mode. |\n",
"| [GPT-Neo-2.7B-Shinen](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Shinen) by Mr Seeker | 2.7B | NSFW | Shinen is an alternative to the Horni model designed to be more explicit. If Horni is to tame for you shinen might produce better results. While it is a Novel model it is unsuitable for SFW stories due to its heavy NSFW bias. Shinen will not hold back. It is meant to be used in KoboldAI's regular mode. |\n",
"| [GPT-Neo-2.7B](https://huggingface.co/EleutherAI/gpt-neo-2.7B) by EleutherAI | 2.7B | Generic | This is the base model for all the other 2.7B models, it is best used when you have a use case that we have no other models available for, such as writing blog articles or programming. It can also be a good basis for the experience of some of the softprompts if your softprompt is not about a subject the other models cover. |\n",
"\n",
"# [TPU Edition Model Descriptions](https://colab.research.google.com/github/KoboldAI/KoboldAI-Client/blob/main/colab/TPU.ipynb)\n",
"\n",
"| Model | Size | Style | Drive Space | Description |\n",
"| ------------------------------ | ------ | --------- | ----------- | ------------------------------------------------------------ |\n",
"| Skein 6B by VE_FORBDRYDERNE | 6B TPU | Hybrid | 0 GB | Skein is our flagship 6B model, it is a hybrid between a Adventure model and a Novel model. Best used with either Adventure mode or the You Bias userscript enabled. Skein has been trained on high quality Novels along with CYOA adventure stories and is not as wackey as the Adventure model. It also has tagging support. |\n",
"| Janeway 6B by Mr Seeker | 6B TPU | Novel | 0 GB | Janeway is a model created from Picard's dataset combined with a brand new collection of ebooks. This model is trained on 20% more content than Picard and has been trained on literature from various genres. Although the model is mainly focussed on SFW, romantic scenes might involve a degree of nudity. |\n",
"| Adventure 6B by VE_FORBRYDERNE | 6B TPU | Adventure | 0 GB | Adventure is a 6B model designed to mimick the behavior of AI Dungeon. It is exclusively for Adventure Mode and can take you on the epic and wackey adventures that AI Dungeon players love. It also features the many tropes of AI Dungeon as it has been trained on very similar data. It must be used in second person (You). |\n",
"| Lit 6B by Haru | 6B TPU | NSFW | 8 GB / 12 GB | Lit is a great NSFW model trained by Haru on both a large set of Literotica stories and high quality novels along with tagging support. Creating a high quality model for your NSFW stories. This model is exclusively a novel model and is best used in third person. |\n",
"| Shinen 6B by Mr Seeker | 6B TPU | NSFW | 0 GB | Shinen is an alternative to the Lit model designed to be more explicit. If Lit is to tame for you Shinen might produce better results. While it is a Novel model it is unsuitable for SFW stories due to its heavy NSFW bias. Shinen will not hold back. It is meant to be used in KoboldAI's regular mode. |\n",
"| Generic 6B by EleutherAI | 6B TPU | Generic | 10 GB / 12 GB | GPT-J-6B is what all other models are based on, if you need something that has no specific bias towards any particular subject this is the model for you. Best used when the other models are not suitable for what you wish to do. Such as homework assistance, blog writing, coding and more. It needs more hand holding than other models and is more prone to undesirable formatting changes. |\n",
"| C1 6B by Haru | 6B TPU | Chatbot | 8 GB / 12 GB | C1 has been trained on various internet chatrooms, it makes the basis for an interesting chatbot model and has been optimized to be used in the Chatmode. |\n",
"| Model | Size | Style | Description |\n",
"| --- | --- | --- | --- |\n",
"| [Nerys](https://huggingface.co/KoboldAI/fairseq-dense-13B-Nerys) by Mr Seeker | 13B | Novel/Adventure | Nerys is a hybrid model based on Pike (A newer Janeway), on top of the Pike dataset you also get some Light Novels, Adventure mode support and a little bit of shinen thrown in the mix. The end result is a very diverse model that is heavily biased towards SFW novel writing, but one that can go beyond its novel training and make for an excellent adventure model to. Adventure mode is best played from a second person perspective, but can be played in first or third person as well. Novel writing can be done best from the first or third person. |\n",
"| [Janeway](https://huggingface.co/KoboldAI/fairseq-dense-13B-Janeway) by Mr Seeker | 13B | Novel | Janeway is a model created from Picard's dataset combined with a brand new collection of ebooks. This model is trained on 20% more content than Picard and has been trained on literature from various genres. Although the model is mainly focussed on SFW, romantic scenes might involve a degree of nudity. |\n",
"| [Shinen](https://huggingface.co/KoboldAI/fairseq-dense-13B-Shinen) by Mr Seeker | 13B | NSFW | Shinen is an NSFW model designed to be more explicit. Trained on a variety of stories from the website Sexstories it contains many different kinks. |\n",
"| [Skein](https://huggingface.co/KoboldAI/GPT-J-6B-Skein) by VE\\_FORBRYDERNE | 6B | Adventure | Skein is best used with Adventure mode enabled, it consists of a 4 times larger adventure dataset than the Adventure model making it excellent for text adventure gaming. On top of that it also consists of light novel training further expanding its knowledge and writing capabilities. It can be used with the You filter bias if you wish to write Novels with it, but dedicated Novel models can perform better for this task. |\n",
"| [Adventure](https://huggingface.co/KoboldAI/GPT-J-6B-Adventure) by VE\\_FORBRYDERNE | 6B | Adventure | Adventure is a 6B model designed to mimick the behavior of AI Dungeon. It is exclusively for Adventure Mode and can take you on the epic and wackey adventures that AI Dungeon players love. It also features the many tropes of AI Dungeon as it has been trained on very similar data. It must be used in second person (You). |\n",
"| [Lit](https://huggingface.co/hakurei/lit-6B) by Haru | 6B | NSFW | Lit is a great NSFW model trained by Haru on both a large set of Literotica stories and high quality novels along with tagging support. Creating a high quality model for your NSFW stories. This model is exclusively a novel model and is best used in third person. |\n",
"| [Convo](https://huggingface.co/hitomi-team/convo-6B) by Hitomi Team | 6B | Chatbot | Convo-6B is a GPT-J 6B model fine-tuned on a collection of high quality open source datasets which amount to 6 million messages. The primary goal of the model is to provide improved performance and generalization when generating multi-turn dialogue for characters that were not present from within the fine tuning data. The prompted performance has especially improved over the predecessor model [C1-6B](https://huggingface.co/hakurei/c1-6B). |\n",
"| [C1](https://huggingface.co/hakurei/c1-6B) by Haru | 6B | Chatbot | C1 has been trained on various internet chatrooms, it makes the basis for an interesting chatbot model and has been optimized to be used in the Chatmode. |\n",
"| Neo(X) by EleutherAI | 20B | Generic | NeoX is the largest EleutherAI model currently available, being a generic model it is not particularly trained towards anything and can do a variety of writing, Q&A and coding tasks. 20B's performance is closely compared to the 13B models and it is worth trying both especially if you have a task that does not involve english writing. Its behavior will be similar to the GPT-J-6B model since they are trained on the same dataset but with more sensitivity towards repetition penalty and with more knowledge. |\n",
"| [Fairseq Dense](https://huggingface.co/KoboldAI/fairseq-dense-13B) | 13B | Generic | Trained by Facebook Researchers this model stems from the MOE research project within Fairseq. This particular version has been converted by us for use in KoboldAI. It is known to be on par with the larger 20B model from EleutherAI and considered as better for pop culture and language tasks. Because the model has never seen a new line (enter) it may perform worse on formatting and paragraphing. |\n",
"| [GPT-J-6B](https://huggingface.co/EleutherAI/gpt-j-6B) by EleutherAI | 6B | Generic | This model serves as the basis for most other 6B models (Some being based on Fairseq Dense instead). Being trained on the Pile and not biased towards anything in particular it is suitable for a variety of tasks such as writing, Q&A and coding tasks. You will likely get better result with larger generic models or finetuned models. |\n",
"\n",
"\n",
"| Style | Description |\n",
@ -113,7 +124,6 @@
"| NSFW | Indicates that the model is strongly biased towards NSFW content and is not suitable for children, work environments or livestreaming. Most NSFW models are also Novel models in nature. |\n",
"| Adventure | These models are excellent for people willing to play KoboldAI like a Text Adventure game and are meant to be used with Adventure mode enabled. Even if you wish to use it as a Novel style model you should always have Adventure mode on and set it to story. These models typically have a strong bias towards the use of the word You and without Adventure mode enabled break the story flow and write actions on your behalf. |\n",
"| Chatbot | These models are specifically trained for chatting and are best used with the Chatmode enabled. Typically trained on either public chatrooms or private chats. |\n",
"| Hybrid | Hybrid models are a blend between different styles, for example they are trained on both Novel stories and Adventure stories. These models are great variety models that you can use for multiple different playstyles and modes, but depending on your usage you may need to enable Adventure Mode or the You bias (in userscripts). |\n",
"| Generic | Generic models are not trained towards anything specific, typically used as a basis for other tasks and models. They can do everything the other models can do, but require much more handholding to work properly. Generic models are an ideal basis for tasks that we have no specific model for, or for experiencing a softprompt in its raw form. |\n",
"\n",
"# How to start KoboldAI in 7 simple steps\n",

View File

@ -18,7 +18,21 @@
"\n",
"For more information about KoboldAI check our our Github readme : https://github.com/KoboldAI/KoboldAI-Client/blob/main/readme.md\n",
"\n",
"More (smaller) models are available in the **[GPU edition](https://colab.research.google.com/github/koboldai/KoboldAI-Client/blob/main/colab/GPU.ipynb)**!"
"More (smaller) models are available in the **[GPU edition](https://colab.research.google.com/github/koboldai/KoboldAI-Client/blob/main/colab/GPU.ipynb)**!\n",
"\n",
"---\n",
"## How to load KoboldAI: Everything you need to know\n",
"1. On a phone? First put your browser in desktop mode because of a Google Colab bug. Otherwise nothing will happen when you click the play button. Then tap the play button next to \"<-- Tap This if you play on Mobile\", you will see an audio player. Keep the audio player playing so Colab does not get shut down in the background.\n",
"2. Select the desired model, you will find a description of all the available models further down the page.\n",
"3. Click the play button next to \"<-- Select your model below and then click this to start KoboldAI\".\n",
"4. Got a message saying no accelerator is available? Click cancel, and try again in a few minutes. If you do not manage to get a session when you frequently try again try at a different time of day, colab can be busy or your priority may have been lowered by frequent usage.\n",
"5. After everything is done loading you will get a link that you can use to open KoboldAI. In case of Localtunnel you will also be warned that some people are abusing Localtunnel for phishing, once you acknowledge this warning you will be taken to KoboldAI's interface. If you picked Cloudflare and get a 1033 error refresh the error page after waiting one minute.\n",
"\n",
"---\n",
"\n",
"Further down the page you can find descriptions of the models, and tips to get the most out of your Google Colab experience.\n",
"\n",
"Make sure to keep this page open while you are using KoboldAI, and check back regularly to see if you got a Captcha. Failure to complete the captcha's in time can result in termination of your session or a lower priority towards the TPUs."
],
"metadata": {
"id": "zrLGxVCEaqZx"
@ -47,11 +61,13 @@
},
"outputs": [],
"source": [
"#@title <b><-- Select your model below and then click this to start KoboldAI</b>\n",
"#@markdown You can find a description of the models below along with instructions on how to start KoboldAI.\n",
"\n",
"#@title <b><-- Click this to start KoboldAI</b>\n",
"Model = \"Skein 6B\" #@param [\"Skein 6B\", \"Janeway 6B\", \"Adventure 6B\", \"Lit 6B\", \"Shinen 6B\", \"Generic 6B\", \"C1 6B\"]\n",
"Model = \"Nerys 13B\" #@param [\"Nerys 13B\", \"Janeway 13B\", \"Shinen 13B\", \"Skein 6B\", \"Janeway 6B\", \"Adventure 6B\", \"Shinen 6B\", \"Lit 6B\", \"Convo 6B\", \"C1 6B\", \"NeoX 20B\", \"facebook/opt-13b\", \"KoboldAI/fairseq-dense-13B\", \"EleutherAI/gpt-j-6B\"] {allow-input: true}\n",
"Version = \"Official\" #@param [\"Official\", \"United\"] {allow-input: true}\n",
"Drive = \"Unextracted (Less Space)\" #@param [\"Unextracted (Less Space)\", \"Extracted (Faster Loading)\"]\n",
"#@markdown Extracted models take up more space but load faster the next time you use them, not all models use your Google Drive. See the Model list below for descriptions and space requirements. If your extracted model does not load the next time you try to launch KoboldAI delete the folder from your Google Drive and ensure enough space is available.\n",
"Provider = \"Localtunnel\" #@param [\"Localtunnel\", \"Cloudflare\"]\n",
"\n",
"import os\n",
"try:\n",
@ -64,113 +80,115 @@
"from google.colab import drive\n",
"drive.mount('/content/drive/')\n",
"\n",
"!wget https://henk.tech/ckds -O - | bash /dev/stdin -i drive\n",
"\n",
"if Model == \"Skein 6B\":\n",
" path = \"gpt-j-6b-skein-jax\"\n",
"if Model == \"Janeway 13B\":\n",
" Model = \"KoboldAI/fairseq-dense-13B-Janeway\"\n",
" path = \"\"\n",
" download = \"\"\n",
"elif Model == \"Nerys 13B\":\n",
" Model = \"KoboldAI/fairseq-dense-13B-Nerys\"\n",
" path = \"\"\n",
" download = \"\"\n",
"elif Model == \"Shinen 13B\":\n",
" Model = \"KoboldAI/fairseq-dense-13B-Shinen\"\n",
" path = \"\"\n",
" download = \"\"\n",
"elif Model == \"NeoX 20B\":\n",
" Model = \"TPUMeshTransformerGPTNeoX\"\n",
" path = \" -p gpt-neox-20b-jax\"\n",
" location = \"colab\"\n",
" download = \"-a https://storage.henk.tech/KoboldAI/skein-jax.txt\"\n",
" download = \" -a https://storage.henk.tech/KoboldAI/neox-20b.txt\"\n",
" extract = \"\"\n",
" Drive = \"Unextracted (Less Space)\"\n",
" ![[ -f /content/drive/MyDrive/KoboldAI/settings/gpt-j-6b-skein-jax.settings ]] || echo -e \"{\\n \\\"apikey\\\": \\\"\\\",\\n \\\"andepth\\\": 3,\\n \\\"temp\\\": 0.5,\\n \\\"top_p\\\": 0.9,\\n \\\"top_k\\\": 0,\\n \\\"tfs\\\": 1.0,\\n \\\"rep_pen\\\": 1.1,\\n \\\"genamt\\\": 80,\\n \\\"max_length\\\": 2048,\\n \\\"ikgen\\\": 200,\\n \\\"formatoptns\\\": {\\n \\\"frmttriminc\\\": true,\\n \\\"frmtrmblln\\\": false,\\n \\\"frmtrmspch\\\": false,\\n \\\"frmtadsnsp\\\": false\\n },\\n \\\"numseqs\\\": 1,\\n \\\"widepth\\\": 3,\\n \\\"useprompt\\\": true,\\n \\\"adventure\\\": false\\n}\" > /content/drive/MyDrive/KoboldAI/settings/gpt-j-6b-skein-jax.settings\n",
"if Model == \"Janeway 6B\":\n",
" path = \"gpt-j-6b-janeway-jax\"\n",
" location = \"colab\"\n",
" download = \"-a https://storage.henk.tech/KoboldAI/janeway-jax.txt\"\n",
" extract = \"\"\n",
" Drive = \"Unextracted (Less Space)\"\n",
" ![[ -f /content/drive/MyDrive/KoboldAI/settings/gpt-j-6b-janeway-jax.settings ]] || echo -e \"{\\n \\\"apikey\\\": \\\"\\\",\\n \\\"andepth\\\": 3,\\n \\\"temp\\\": 0.5,\\n \\\"top_p\\\": 0.9,\\n \\\"top_k\\\": 0,\\n \\\"tfs\\\": 1.0,\\n \\\"rep_pen\\\": 1.1,\\n \\\"rep_pen_slope\\\": 0.7,\\n \\\"rep_pen_range\\\": 1024.0,\\n \\\"genamt\\\": 80,\\n \\\"max_length\\\": 2048,\\n \\\"ikgen\\\": 200,\\n \\\"formatoptns\\\": {\\n \\\"frmttriminc\\\": true,\\n \\\"frmtrmblln\\\": false,\\n \\\"frmtrmspch\\\": false,\\n \\\"frmtadsnsp\\\": false,\\n \\\"singleline\\\": false\\n },\\n \\\"numseqs\\\": 1,\\n \\\"widepth\\\": 3,\\n \\\"useprompt\\\": true,\\n \\\"adventure\\\": false,\\n \\\"chatmode\\\": false,\\n \\\"chatname\\\": \\\"You\\\",\\n \\\"dynamicscan\\\": false,\\n \\\"nopromptgen\\\": false,\\n \\\"rngpersist\\\": false,\\n \\\"nogenmod\\\": false,\\n \\\"autosave\\\": false,\\n \\\"welcome\\\": false,\\n \\\"newlinemode\\\": \\\"n\\\",\\n \\\"antemplate\\\": \\\"[Genre: <|>]\\\",\\n \\\"userscripts\\\": [],\\n \\\"corescript\\\": \\\"default.lua\\\",\\n \\\"softprompt\\\": \\\"\\\"\\n}\" > /content/drive/MyDrive/KoboldAI/settings/gpt-j-6b-janeway-jax.settings\n",
"if Model == \"Adventure 6B\":\n",
" path = \"gpt-j-6b-adventure-jax\"\n",
" location = \"colab\"\n",
" download = \"-a https://api.wandb.ai/files/ve-forbryderne/adventure/carol-data/models/gpt-j-6b-adventure-jax/aria2.txt\"\n",
" extract = \"\"\n",
" Drive = \"Unextracted (Less Space)\"\n",
" ![[ -f /content/drive/MyDrive/KoboldAI/settings/gpt-j-6b-adventure-jax.settings ]] || echo -e \"{\\n \\\"apikey\\\": \\\"\\\",\\n \\\"andepth\\\": 3,\\n \\\"temp\\\": 0.5,\\n \\\"top_p\\\": 0.9,\\n \\\"top_k\\\": 0,\\n \\\"tfs\\\": 1.0,\\n \\\"rep_pen\\\": 1.1,\\n \\\"genamt\\\": 80,\\n \\\"max_length\\\": 2048,\\n \\\"ikgen\\\": 200,\\n \\\"formatoptns\\\": {\\n \\\"frmttriminc\\\": true,\\n \\\"frmtrmblln\\\": false,\\n \\\"frmtrmspch\\\": false,\\n \\\"frmtadsnsp\\\": false\\n },\\n \\\"numseqs\\\": 1,\\n \\\"widepth\\\": 3,\\n \\\"useprompt\\\": true,\\n \\\"adventure\\\": true\\n}\" > /content/drive/MyDrive/KoboldAI/settings/gpt-j-6b-adventure-jax.settings\n",
"if Model == \"Lit 6B\":\n",
" path = \"gpt-j-6b-lit-jax\"\n",
" location = \"drive\"\n",
" download = \"-a https://storage.henk.tech/KoboldAI/aria2.php?file=gpt-j-6b-lit-jax.7z\"\n",
" extract = \"-z gpt-j-6b-lit-jax.7z\"\n",
" ![[ -f /content/drive/MyDrive/KoboldAI/settings/gpt-j-6b-lit-jax.settings ]] || echo -e \"{\\n \\\"apikey\\\": \\\"\\\",\\n \\\"andepth\\\": 3,\\n \\\"temp\\\": 0.5,\\n \\\"top_p\\\": 0.9,\\n \\\"top_k\\\": 0,\\n \\\"tfs\\\": 1.0,\\n \\\"rep_pen\\\": 1.1,\\n \\\"genamt\\\": 80,\\n \\\"max_length\\\": 2048,\\n \\\"ikgen\\\": 200,\\n \\\"formatoptns\\\": {\\n \\\"frmttriminc\\\": true,\\n \\\"frmtrmblln\\\": false,\\n \\\"frmtrmspch\\\": false,\\n \\\"frmtadsnsp\\\": false\\n },\\n \\\"numseqs\\\": 1,\\n \\\"widepth\\\": 3,\\n \\\"useprompt\\\": true,\\n \\\"adventure\\\": false\\n}\" > /content/drive/MyDrive/KoboldAI/settings/gpt-j-6b-lit-jax.settings\n",
"if Model == \"Shinen 6B\":\n",
" path = \"gpt-j-6b-shinen-jax\"\n",
" location = \"colab\"\n",
" download = \"-a https://storage.henk.tech/KoboldAI/shinen-jax.txt\"\n",
" extract = \"\"\n",
" Drive = \"Unextracted (Less Space)\"\n",
" ![[ -f /content/drive/MyDrive/KoboldAI/settings/gpt-j-6b-shinen-jax.settings ]] || echo -e \"{\\n \\\"apikey\\\": \\\"\\\",\\n \\\"andepth\\\": 3,\\n \\\"temp\\\": 0.5,\\n \\\"top_p\\\": 0.9,\\n \\\"top_k\\\": 0,\\n \\\"tfs\\\": 1.0,\\n \\\"rep_pen\\\": 1.1,\\n \\\"rep_pen_slope\\\": 0.7,\\n \\\"rep_pen_range\\\": 1024.0,\\n \\\"genamt\\\": 80,\\n \\\"max_length\\\": 2048,\\n \\\"ikgen\\\": 200,\\n \\\"formatoptns\\\": {\\n \\\"frmttriminc\\\": true,\\n \\\"frmtrmblln\\\": false,\\n \\\"frmtrmspch\\\": false,\\n \\\"frmtadsnsp\\\": false,\\n \\\"singleline\\\": false\\n },\\n \\\"numseqs\\\": 1,\\n \\\"widepth\\\": 3,\\n \\\"useprompt\\\": true,\\n \\\"adventure\\\": false,\\n \\\"chatmode\\\": false,\\n \\\"chatname\\\": \\\"You\\\",\\n \\\"dynamicscan\\\": false,\\n \\\"nopromptgen\\\": false,\\n \\\"rngpersist\\\": false,\\n \\\"nogenmod\\\": false,\\n \\\"autosave\\\": false,\\n \\\"welcome\\\": false,\\n \\\"newlinemode\\\": \\\"n\\\",\\n \\\"antemplate\\\": \\\"[Genre: <|>]\\\",\\n \\\"userscripts\\\": [],\\n \\\"corescript\\\": \\\"default.lua\\\",\\n \\\"softprompt\\\": \\\"\\\"\\n}\" > /content/drive/MyDrive/KoboldAI/settings/gpt-j-6b-shinen-jax.settings\n",
"if Model == \"Generic 6B\":\n",
" path = \"step_383500\"\n",
" location = \"drive\"\n",
" download = \"-a https://storage.henk.tech/KoboldAI/aria2.php?file=step_383500_slim.tar.zstd\"\n",
" extract = \"-t step_383500_slim.tar.zstd\"\n",
" ![[ -f /content/drive/MyDrive/KoboldAI/settings/step_383500.settings ]] || echo -e \"{\\n \\\"apikey\\\": \\\"\\\",\\n \\\"andepth\\\": 3,\\n \\\"temp\\\": 0.5,\\n \\\"top_p\\\": 0.9,\\n \\\"top_k\\\": 0,\\n \\\"tfs\\\": 1.0,\\n \\\"rep_pen\\\": 1.1,\\n \\\"genamt\\\": 80,\\n \\\"max_length\\\": 2048,\\n \\\"ikgen\\\": 200,\\n \\\"formatoptns\\\": {\\n \\\"frmttriminc\\\": true,\\n \\\"frmtrmblln\\\": false,\\n \\\"frmtrmspch\\\": false,\\n \\\"frmtadsnsp\\\": false\\n },\\n \\\"numseqs\\\": 1,\\n \\\"widepth\\\": 3,\\n \\\"useprompt\\\": true,\\n \\\"adventure\\\": false\\n}\" > /content/drive/MyDrive/KoboldAI/settings/step_383500.settings\n",
"if Model == \"C1 6B\":\n",
" path = \"gpt-j-6b-c1-jax\"\n",
" location = \"drive\"\n",
" download = \"-a https://storage.henk.tech/KoboldAI/aria2.php?file=gpt-j-6b-c1-jax.7z\"\n",
" extract = \"-z gpt-j-6b-c1-jax.7z\"\n",
" ![[ -f /content/drive/MyDrive/KoboldAI/settings/gpt-j-6b-c1-jax.settings ]] || echo -e \"{\\n \\\"apikey\\\": \\\"\\\",\\n \\\"andepth\\\": 3,\\n \\\"temp\\\": 0.5,\\n \\\"top_p\\\": 0.9,\\n \\\"top_k\\\": 0,\\n \\\"tfs\\\": 1.0,\\n \\\"rep_pen\\\": 1.1,\\n \\\"genamt\\\": 80,\\n \\\"max_length\\\": 2048,\\n \\\"ikgen\\\": 200,\\n \\\"formatoptns\\\": {\\n \\\"frmttriminc\\\": true,\\n \\\"frmtrmblln\\\": false,\\n \\\"frmtrmspch\\\": false,\\n \\\"frmtadsnsp\\\": false\\n },\\n \\\"numseqs\\\": 1,\\n \\\"widepth\\\": 3,\\n \\\"useprompt\\\": true,\\n \\\"chatmode\\\": true\\n}\" > /content/drive/MyDrive/KoboldAI/settings/gpt-j-6b-c1-jax.settings\n",
" ![[ -f /content/drive/MyDrive/KoboldAI/settings/gpt-neox-20b-jax.settings ]] || echo -e \"{\\n \\\"apikey\\\": \\\"\\\",\\n \\\"andepth\\\": 3,\\n \\\"temp\\\": 0.5,\\n \\\"top_p\\\": 0.9,\\n \\\"top_k\\\": 0,\\n \\\"tfs\\\": 1.0,\\n \\\"rep_pen\\\": 1.03,\\n \\\"genamt\\\": 80,\\n \\\"max_length\\\": 2048,\\n \\\"ikgen\\\": 200,\\n \\\"formatoptns\\\": {\\n \\\"frmttriminc\\\": true,\\n \\\"frmtrmblln\\\": false,\\n \\\"frmtrmspch\\\": false,\\n \\\"frmtadsnsp\\\": false\\n },\\n \\\"numseqs\\\": 1,\\n \\\"widepth\\\": 3,\\n \\\"useprompt\\\": true,\\n \\\"adventure\\\": false\\n}\" > /content/drive/MyDrive/KoboldAI/settings/gpt-neox-20b-jax.settings\n",
"elif Model == \"Skein 6B\":\n",
" Model = \"KoboldAI/GPT-J-6B-Skein\"\n",
" path = \"\"\n",
" download = \"\"\n",
"elif Model == \"Janeway 6B\":\n",
" Model = \"KoboldAI/GPT-J-6B-Janeway\"\n",
" path = \"\"\n",
" download = \"\"\n",
"elif Model == \"Adventure 6B\":\n",
" Model = \"KoboldAI/GPT-J-6B-Adventure\"\n",
" path = \"\"\n",
" download = \"\"\n",
"elif Model == \"Lit 6B\":\n",
" Model = \"hakurei/lit-6B\"\n",
" path = \"\"\n",
" download = \"\"\n",
"elif Model == \"Shinen 6B\":\n",
" Model = \"KoboldAI/GPT-J-6B-Shinen\"\n",
" path = \"\"\n",
" download = \"\"\n",
"elif Model == \"Convo 6B\":\n",
" Model = \"hitomi-team/convo-6B\"\n",
" path = \"\"\n",
" download = \"\"\n",
"elif Model == \"C1 6B\":\n",
" Model = \"hakurei/c1-6B\"\n",
" path = \"\"\n",
" download = \"\"\n",
"else:\n",
" path = \"\"\n",
" download = \"\"\n",
"\n",
"if Drive == \"Unextracted (Less Space)\":\n",
" xloc = \"colab\"\n",
"if Drive == \"Extracted (Faster Loading)\":\n",
" xloc = \"drive\"\n",
"if Provider == \"Localtunnel\":\n",
" tunnel = \"--localtunnel yes\"\n",
"else:\n",
" tunnel = \"\"\n",
"\n",
"\n",
"!wget https://henk.tech/ckds -O - | bash /dev/stdin $download -l $location $extract -p $path -m TPUMeshTransformerGPTJ -g $Version -x $xloc"
"!wget https://henk.tech/ckds -O - | bash /dev/stdin $path$download -m $Model -g $Version $tunnel"
]
},
{
"cell_type": "markdown",
"source": [
"# TPU Edition Model Descriptions\n",
"## TPU Edition Model Descriptions\n",
"\n",
"| Model | Size | Style | Drive Space | Description |\n",
"| ------------------------------ | ------ | --------- | ----------- | ------------------------------------------------------------ |\n",
"| Skein 6B by VE_FORBRYDERNE | 6B TPU | Hybrid | 0 GB | Skein is our flagship 6B model, it is a hybrid between a Adventure model and a Novel model. Best used with either Adventure mode or the You Bias userscript enabled. Skein has been trained on high quality Novels along with CYOA adventure stories and is not as wackey as the Adventure model. It also has tagging support. |\n",
"| Janeway 6B by Mr Seeker | 6B TPU | Novel | 0 GB | Janeway is a model created from Picard's dataset combined with a brand new collection of ebooks. This model is trained on 20% more content than Picard and has been trained on literature from various genres. Although the model is mainly focussed on SFW, romantic scenes might involve a degree of nudity. |\n",
"| Adventure 6B by VE_FORBRYDERNE | 6B TPU | Adventure | 0 GB | Adventure is a 6B model designed to mimick the behavior of AI Dungeon. It is exclusively for Adventure Mode and can take you on the epic and wackey adventures that AI Dungeon players love. It also features the many tropes of AI Dungeon as it has been trained on very similar data. It must be used in second person (You). |\n",
"| Lit 6B by Haru | 6B TPU | NSFW | 8 GB / 12 GB | Lit is a great NSFW model trained by Haru on both a large set of Literotica stories and high quality novels along with tagging support. Creating a high quality model for your NSFW stories. This model is exclusively a novel model and is best used in third person. |\n",
"| Shinen 6B by Mr Seeker | 6B TPU | NSFW | 0 GB | Shinen is an alternative to the Lit model designed to be more explicit. If Lit is to tame for you Shinen might produce better results. While it is a Novel model it is unsuitable for SFW stories due to its heavy NSFW bias. Shinen will not hold back. It is meant to be used in KoboldAI's regular mode. |\n",
"| Generic 6B by EleutherAI | 6B TPU | Generic | 10 GB / 12 GB | GPT-J-6B is what all other models are based on, if you need something that has no specific bias towards any particular subject this is the model for you. Best used when the other models are not suitable for what you wish to do. Such as homework assistance, blog writing, coding and more. It needs more hand holding than other models and is more prone to undesirable formatting changes. |\n",
"| C1 6B by Haru | 6B TPU | Chatbot | 8 GB / 12 GB | C1 has been trained on various internet chatrooms, it makes the basis for an interesting chatbot model and has been optimized to be used in the Chatmode. |\n",
"| Model | Size | Style | Description |\n",
"| --- | --- | --- | --- |\n",
"| [Nerys](https://huggingface.co/KoboldAI/fairseq-dense-13B-Nerys) by Mr Seeker | 13B | Novel/Adventure | Nerys is a hybrid model based on Pike (A newer Janeway), on top of the Pike dataset you also get some Light Novels, Adventure mode support and a little bit of shinen thrown in the mix. The end result is a very diverse model that is heavily biased towards SFW novel writing, but one that can go beyond its novel training and make for an excellent adventure model to. Adventure mode is best played from a second person perspective, but can be played in first or third person as well. Novel writing can be done best from the first or third person. |\n",
"| [Janeway](https://huggingface.co/KoboldAI/fairseq-dense-13B-Janeway) by Mr Seeker | 13B | Novel | Janeway is a model created from Picard's dataset combined with a brand new collection of ebooks. This model is trained on 20% more content than Picard and has been trained on literature from various genres. Although the model is mainly focussed on SFW, romantic scenes might involve a degree of nudity. |\n",
"| [Shinen](https://huggingface.co/KoboldAI/fairseq-dense-13B-Shinen) by Mr Seeker | 13B | NSFW | Shinen is an NSFW model designed to be more explicit. Trained on a variety of stories from the website Sexstories it contains many different kinks. |\n",
"| [Skein](https://huggingface.co/KoboldAI/GPT-J-6B-Skein) by VE\\_FORBRYDERNE | 6B | Adventure | Skein is best used with Adventure mode enabled, it consists of a 4 times larger adventure dataset than the Adventure model making it excellent for text adventure gaming. On top of that it also consists of light novel training further expanding its knowledge and writing capabilities. It can be used with the You filter bias if you wish to write Novels with it, but dedicated Novel models can perform better for this task. |\n",
"| [Adventure](https://huggingface.co/KoboldAI/GPT-J-6B-Adventure) by VE\\_FORBRYDERNE | 6B | Adventure | Adventure is a 6B model designed to mimick the behavior of AI Dungeon. It is exclusively for Adventure Mode and can take you on the epic and wackey adventures that AI Dungeon players love. It also features the many tropes of AI Dungeon as it has been trained on very similar data. It must be used in second person (You). |\n",
"| [Lit](https://huggingface.co/hakurei/lit-6B) by Haru | 6B | NSFW | Lit is a great NSFW model trained by Haru on both a large set of Literotica stories and high quality novels along with tagging support. Creating a high quality model for your NSFW stories. This model is exclusively a novel model and is best used in third person. |\n",
"| [Convo](https://huggingface.co/hitomi-team/convo-6B) by Hitomi Team | 6B | Chatbot | Convo-6B is a GPT-J 6B model fine-tuned on a collection of high quality open source datasets which amount to 6 million messages. The primary goal of the model is to provide improved performance and generalization when generating multi-turn dialogue for characters that were not present from within the fine tuning data. The prompted performance has especially improved over the predecessor model [C1-6B](https://huggingface.co/hakurei/c1-6B). |\n",
"| [C1](https://huggingface.co/hakurei/c1-6B) by Haru | 6B | Chatbot | C1 has been trained on various internet chatrooms, it makes the basis for an interesting chatbot model and has been optimized to be used in the Chatmode. |\n",
"| Neo(X) by EleutherAI | 20B | Generic | NeoX is the largest EleutherAI model currently available, being a generic model it is not particularly trained towards anything and can do a variety of writing, Q&A and coding tasks. 20B's performance is closely compared to the 13B models and it is worth trying both especially if you have a task that does not involve english writing. Its behavior will be similar to the GPT-J-6B model since they are trained on the same dataset but with more sensitivity towards repetition penalty and with more knowledge. |\n",
"| [Fairseq Dense](https://huggingface.co/KoboldAI/fairseq-dense-13B) | 13B | Generic | Trained by Facebook Researchers this model stems from the MOE research project within Fairseq. This particular version has been converted by us for use in KoboldAI. It is known to be on par with the larger 20B model from EleutherAI and considered as better for pop culture and language tasks. Because the model has never seen a new line (enter) it may perform worse on formatting and paragraphing. |\n",
"| [GPT-J-6B](https://huggingface.co/EleutherAI/gpt-j-6B) by EleutherAI | 6B | Generic | This model serves as the basis for most other 6B models (Some being based on Fairseq Dense instead). Being trained on the Pile and not biased towards anything in particular it is suitable for a variety of tasks such as writing, Q&A and coding tasks. You will likely get better result with larger generic models or finetuned models. |\n",
"\n",
"\n",
"# [GPU Edition Model Descriptions](https://colab.research.google.com/github/KoboldAI/KoboldAI-Client/blob/main/colab/GPU.ipynb)\n",
"\n",
"| Model | Size | Style | Description |\n",
"| ------------------------------------------------------------ | -------- | ---------- | ------------------------------------------------------------ |\n",
"| [GPT-Neo-2.7B-Janeway](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Janeway) by Mr Seeker | 2.7B GPU | Novel | Janeway is a model created from Picard's dataset combined with a brand new collection of ebooks. This model is trained on 20% more content than Picard and has been trained on literature from various genres. Although the model is mainly focussed on SFW, romantic scenes might involve a degree of nudity. |\n",
"| [GPT-Neo-2.7B-Picard](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Picard) by Mr Seeker | 2.7B GPU | Novel | Picard is a model trained for SFW Novels based on GPT-Neo-2.7B. It is focused on Novel style writing without the NSFW bias. While the name suggests a sci-fi model this model is designed for Novels of a variety of genre's. It is meant to be used in KoboldAI's regular mode. |\n",
"| [GPT-Neo-2.7B-AID](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-AID) by melastacho | 2.7B GPU | Adventure | Also know as Adventure 2.7B this is a clone of the AI Dungeon Classic model and is best known for the epic wackey adventures that AI Dungeon Classic players love. |\n",
"| [GPT-Neo-2.7B-Horni-LN](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Horni-LN) by finetune | 2.7B GPU | Novel | This model is based on GPT-Neo-2.7B-Horni and retains its NSFW knowledge, but was then further biased towards SFW novel stories. If you seek a balance between a SFW Novel model and a NSFW model this model should be a good choice. |\n",
"| [GPT-Neo-2.7B-Horni](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Horni) by finetune | 2.7B GPU | NSFW | This model is tuned on Literotica to produce a Novel style model biased towards NSFW content. Can still be used for SFW stories but will have a bias towards NSFW content. It is meant to be used in KoboldAI's regular mode. |\n",
"| [GPT-Neo-2.7B-Shinen](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Shinen) by Mr Seeker | 2.7B GPU | NSFW | Shinen is an alternative to the Horni model designed to be more explicit. If Horni is to tame for you shinen might produce better results. While it is a Novel model it is unsuitable for SFW stories due to its heavy NSFW bias. Shinen will not hold back. It is meant to be used in KoboldAI's regular mode. |\n",
"| [GPT-Neo-2.7B](https://huggingface.co/EleutherAI/gpt-neo-2.7B) by EleutherAI | 2.7B GPU | Generic | This is the base model for all the other 2.7B models, it is best used when you have a use case that we have no other models available for, such as writing blog articles or programming. It can also be a good basis for the experience of some of the softprompts if your softprompt is not about a subject the other models cover. |\n",
"| Model | Size | Style | Description |\n",
"| --- | --- | --- | --- |\n",
"| [Fairseq-Dense-2.7B-Nerys](https://huggingface.co/KoboldAI/fairseq-dense-2.7B-Nerys) by Mr Seeker | 2.7B | Novel/Adventure | Nerys is a hybrid model based on Pike (A newer Janeway), on top of the Pike dataset you also get some Light Novels, Adventure mode support and a little bit of shinen thrown in the mix. The end result is a very diverse model that is heavily biased towards SFW novel writing, but one that can go beyond its novel training and make for an excellent adventure model to. Adventure mode is best played from a second person perspective, but can be played in first or third person as well. Novel writing can be done best from the first or third person. |\n",
"| [GPT-Neo-2.7B-Janeway](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Janeway) by Mr Seeker | 2.7B | Novel | Janeway is a model created from Picard's dataset combined with a brand new collection of ebooks. This model is trained on 20% more content than Picard and has been trained on literature from various genres. Although the model is mainly focussed on SFW, romantic scenes might involve a degree of nudity. |\n",
"| [GPT-Neo-2.7B-Picard](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Picard) by Mr Seeker | 2.7B | Novel | Picard is a model trained for SFW Novels based on GPT-Neo-2.7B. It is focused on Novel style writing without the NSFW bias. While the name suggests a sci-fi model this model is designed for Novels of a variety of genre's. It is meant to be used in KoboldAI's regular mode. |\n",
"| [GPT-Neo-2.7B-AID](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-AID) by melastacho | 2.7B | Adventure | Also know as Adventure 2.7B this is a clone of the AI Dungeon Classic model and is best known for the epic wackey adventures that AI Dungeon Classic players love. |\n",
"| [GPT-Neo-2.7B-Horni-LN](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Horni-LN) by finetune | 2.7B | Novel | This model is based on GPT-Neo-2.7B-Horni and retains its NSFW knowledge, but was then further biased towards SFW novel stories. If you seek a balance between a SFW Novel model and a NSFW model this model should be a good choice. |\n",
"| [GPT-Neo-2.7B-Horni](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Horni) by finetune | 2.7B | NSFW | This model is tuned on Literotica to produce a Novel style model biased towards NSFW content. Can still be used for SFW stories but will have a bias towards NSFW content. It is meant to be used in KoboldAI's regular mode. |\n",
"| [GPT-Neo-2.7B-Shinen](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Shinen) by Mr Seeker | 2.7B | NSFW | Shinen is an alternative to the Horni model designed to be more explicit. If Horni is to tame for you shinen might produce better results. While it is a Novel model it is unsuitable for SFW stories due to its heavy NSFW bias. Shinen will not hold back. It is meant to be used in KoboldAI's regular mode. |\n",
"| [GPT-Neo-2.7B](https://huggingface.co/EleutherAI/gpt-neo-2.7B) by EleutherAI | 2.7B | Generic | This is the base model for all the other 2.7B models, it is best used when you have a use case that we have no other models available for, such as writing blog articles or programming. It can also be a good basis for the experience of some of the softprompts if your softprompt is not about a subject the other models cover. |\n",
"\n",
"| Style | Description |\n",
"| --------- | ------------------------------------------------------------ |\n",
"| Novel | For regular story writing, not compatible with Adventure mode or other specialty modes. |\n",
"| NSFW | Indicates that the model is strongly biased towards NSFW content and is not suitable for children, work environments or livestreaming. Most NSFW models are also Novel models in nature. |\n",
"| Style | Description |\n",
"| --- | --- |\n",
"| Novel | For regular story writing, not compatible with Adventure mode or other specialty modes. |\n",
"| NSFW | Indicates that the model is strongly biased towards NSFW content and is not suitable for children, work environments or livestreaming. Most NSFW models are also Novel models in nature. |\n",
"| Adventure | These models are excellent for people willing to play KoboldAI like a Text Adventure game and are meant to be used with Adventure mode enabled. Even if you wish to use it as a Novel style model you should always have Adventure mode on and set it to story. These models typically have a strong bias towards the use of the word You and without Adventure mode enabled break the story flow and write actions on your behalf. |\n",
"| Chatbot | These models are specifically trained for chatting and are best used with the Chatmode enabled. Typically trained on either public chatrooms or private chats. |\n",
"| Hybrid | Hybrid models are a blend between different styles, for example they are trained on both Novel stories and Adventure stories. These models are great variety models that you can use for multiple different playstyles and modes, but depending on your usage you may need to enable Adventure Mode or the You bias (in userscripts). |\n",
"| Generic | Generic models are not trained towards anything specific, typically used as a basis for other tasks and models. They can do everything the other models can do, but require much more handholding to work properly. Generic models are an ideal basis for tasks that we have no specific model for, or for experiencing a softprompt in its raw form. |\n",
"| Chatbot | These models are specifically trained for chatting and are best used with the Chatmode enabled. Typically trained on either public chatrooms or private chats. |\n",
"| Generic | Generic models are not trained towards anything specific, typically used as a basis for other tasks and models. They can do everything the other models can do, but require much more handholding to work properly. Generic models are an ideal basis for tasks that we have no specific model for, or for experiencing a softprompt in its raw form. |\n",
"\n",
"## How to start KoboldAI in 7 simple steps\n",
"Using KoboldAI on Google Colab is easy! Simply follow these steps to get started:\n",
"1. Mobile phone? Tap the play button below next to \"<--- Tap this if you play on mobile\" to reveal an audio player, play the silent audio to keep the tab alive so Google will not shut you down when your using KoboldAI. If no audio player is revealed your phone browser does not support Google Colab in the mobile view, go to your browser menu and enable Desktop mode before you continue.\n",
"2. Select the model that most describes what you would like to do, by default we have the most recommended model for people willing to try out KoboldAI selected.\n",
"3. Click the play button next to \"<--- Click this to start KoboldAI\".\n",
"4. Allow Google Drive access, this typically happens trough a popup but sometimes Google Drive access may be requested trough the older method by asking you to click on a link and copy a code. This is normal behavior for Colab and only you will get access to your files, nothing is shared with us.\n",
"5. Now the automatic installation and Download process starts, for most models in the TPU edition expect the loading to take between 15 and 30 minutes on average depending on the current Colab download speeds and the model you selected. These downloads happen trough Google's internet connection, you will not be billed by your internet provider and it will not count towards any download limits.\n",
"6. After waiting a Trycloudflare link appears, click the link to enjoy KoboldAI. If you get a 1033 error Cloudflare is not done loading, in that case keep refreshing until it goes away. (If it keeps happening after 2 minutes Cloudflare has an issue, in that case you can use Runtime -> Restart and Run All to get a new link).\n",
"7. As you play KoboldAI, keep this Colab tab open in the background and check occationally for Captcha's so they do not shut your instance down. If you do get shut down you can always download a copy of your gamesave in the Save menu inside KoboldAI. Stories are never lost as long as you keep KoboldAI open in your browser.\n",
"\n",
"Get a error message saying you do not have access to a GPU/TPU instance? Do not continue and try again later, KoboldAI will not run correctly without them.\n",
"\n"
"---\n",
"## Tips to get the most out of Google Colab\n",
"- Google will occationally show a Captcha, typically after it has been open for 30 minutes but it can be more frequent if you often use Colab. Make sure to do these properly, or you risk getting your instance shut down and getting a lower priority towards the TPU's.\n",
"- KoboldAI uses Google Drive to store your files and settings, if you wish to upload a softprompt or userscript this can be done directly on the Google Drive website. You can also use this to download backups of your KoboldAI related files or upload models of your own.\n",
"- Don't want to save your stories on Google Drive for privacy reasons? Do not use KoboldAI's save function and instead click Download as .json, this will automatically download the story to your own computer without ever touching Google's harddrives. You can load this back trough the Load from file option.\n",
"- Google shut your instance down unexpectedly? You can still make use of the Download as .json button to recover your story as long as you did not close the KoboldAI window. You can then load this back up in your next session.\n",
"- Done with KoboldAI? Go to the Runtime menu, click on Manage Sessions and terminate your open sessions that you no longer need. This trick can help you maintain higher priority towards getting a TPU.\n",
"- Models stored on Google Drive typically load faster than models we need to download from the internet."
],
"metadata": {
"id": "i0-9ARA3c4Fx"

View File

@ -67,6 +67,7 @@
"!wget https://henk.tech/ckds -O - | bash /dev/stdin -g $Version -i only $Args\n",
"\n",
"!pip install colabcode\n",
"!pip install 'flask>=2.1.0'\n",
"from colabcode import ColabCode\n",
"ColabCode(authtoken=Authtoken)"
]

View File

@ -2,7 +2,7 @@
# KoboldAI Easy Colab Deployment Script by Henk717
# read the options
TEMP=`getopt -o m:i:p:c:d:x:a:l:z:g:t:n:b: --long model:,init:,path:,configname:,download:,aria2:,dloc:xloc:7z:git:tar:ngrok:branch: -- "$@"`
TEMP=`getopt -o m:i:p:c:d:x:a:l:z:g:t:n:b:s: --long model:,init:,path:,configname:,download:,aria2:,dloc:,xloc:,7z:,git:,tar:,ngrok:,branch:,savemodel:,localtunnel:,lt: -- "$@"`
eval set -- "$TEMP"
# extract options and their arguments into variables.
@ -17,7 +17,9 @@ while true ; do
-c|--configname)
configname=" --configname $2" ; shift 2 ;;
-n|--ngrok)
configname=" --ngrok" ; shift 2 ;;
ngrok=" --ngrok" ; shift 2 ;;
--lt|--localtunnel)
localtunnel=" --localtunnel" ; shift 2 ;;
-d|--download)
download="$2" ; shift 2 ;;
-a|--aria2)
@ -34,6 +36,8 @@ while true ; do
git="$2" ; shift 2 ;;
-b|--branch)
branch="$2" ; shift 2 ;;
-s|--savemodel)
savemodel=" --savemodel" ; shift 2 ;;
--) shift ; break ;;
*) echo "Internal error!" ; exit 1 ;;
esac
@ -48,8 +52,8 @@ function launch
exit 0
else
cd /content/KoboldAI-Client
echo "Launching KoboldAI with the following options : python3 aiserver.py$model$kmpath$configname$ngrok --remote --override_delete --override_rename"
python3 aiserver.py$model$kmpath$configname$ngrok --colab
echo "Launching KoboldAI with the following options : python3 aiserver.py$model$kmpath$configname$ngrok$localtunnel$savemodel --colab"
python3 aiserver.py$model$kmpath$configname$ngrok$localtunnel$savemodel --colab
exit
fi
}
@ -134,28 +138,32 @@ if [ "$init" != "skip" ]; then
cd /content/KoboldAI-Client
cp -rn stories/*.* /content/drive/MyDrive/KoboldAI/stories/
cp -rn userscripts/*.* /content/drive/MyDrive/KoboldAI/userscripts/
cp -rn softprompts/*.* /content/drive/MyDrive/KoboldAI/softprompts/
cp -rn stories/* /content/drive/MyDrive/KoboldAI/stories/
cp -rn userscripts/* /content/drive/MyDrive/KoboldAI/userscripts/
cp -rn softprompts/* /content/drive/MyDrive/KoboldAI/softprompts/
rm stories
rm -rf stories/
rm userscripts
rm -rf userscripts/
rm softprompts
rm -rf softprompts/
rm models
rm -rf models/
ln -s /content/drive/MyDrive/KoboldAI/stories/ stories
ln -s /content/drive/MyDrive/KoboldAI/settings/ settings
ln -s /content/drive/MyDrive/KoboldAI/softprompts/ softprompts
ln -s /content/drive/MyDrive/KoboldAI/userscripts/ userscripts
ln -s /content/drive/MyDrive/KoboldAI/models/ models
if [ "$model" == " --model TPUMeshTransformerGPTJ" ]; then
if [ -n "${COLAB_TPU_ADDR+set}" ]; then
pip install -r requirements_mtj.txt
else
pip install -r requirements.txt
fi
# Make sure Colab has netbase
sudo apt install netbase -y
# Make sure Colab has the system dependencies
sudo apt install netbase aria2 -y
npm install -g localtunnel
fi
cd /content
@ -178,8 +186,7 @@ fi
#Download routine for Aria2c scripts
if [ ! -z ${aria2+x} ]; then
apt install aria2 -y
curl -L $aria2 | aria2c -c -i- -d$dloc --user-agent=KoboldAI --file-allocation=none
curl -L $aria2 | aria2c -x 10 -s 10 -j 10 -c -i- -d$dloc --user-agent=KoboldAI --file-allocation=none
fi
#Extract the model with 7z

1
commandline-rocm.sh Executable file
View File

@ -0,0 +1 @@
bin/micromamba run -r runtime -n koboldai-rocm bash

View File

@ -10,18 +10,18 @@ IF %M%==3 GOTO drivemap_B
SET TEMP=%~DP0MINICONDA3
SET TMP=%~DP0MINICONDA3
call miniconda3\condabin\activate
cmd /k
cmd /k "%*"
:drivemap
subst K: miniconda3 >nul
SET TEMP=K:\
SET TMP=K:\
call K:\python\condabin\activate
cmd /k
cmd /k "%*"
:drivemap_B
subst B: miniconda3 >nul
SET TEMP=B:\
SET TMP=B:\
call B:\python\condabin\activate
cmd /k
cmd /k "%*"

1
commandline.sh Executable file
View File

@ -0,0 +1 @@
bin/micromamba run -r runtime -n koboldai bash

View File

@ -0,0 +1,7 @@
@echo off
SET /P M=<loader.settings
IF %M%==3 subst /D B:
IF %M%==1 subst /D K:
cls
echo KoboldAI Drive disconnected
pause

0
play-cuda.sh → docker-cuda.sh Normal file → Executable file
View File

View File

@ -6,4 +6,4 @@ WORKDIR /content/
COPY env.yml /home/micromamba/env.yml
RUN micromamba install -y -n base -f /home/micromamba/env.yml
USER root
RUN apt update && apt install xorg -y
RUN apt update && apt install xorg -y

4
docker-rocm.sh Executable file
View File

@ -0,0 +1,4 @@
cd docker-rocm
xhost +local:docker
cp ../environments/rocm.yml env.yml
docker-compose run --service-ports koboldai bash -c "cd /content && python3 aiserver.py $*"

View File

@ -14,7 +14,7 @@ dependencies:
- markdown
- bleach=4.1.0
- pip
- git
- git=2.35.1
- pip:
- git+https://github.com/finetuneanon/transformers@gpt-neo-localattention3-rp-b
- flask-cloudflared

View File

@ -6,16 +6,18 @@ channels:
dependencies:
- colorama
- flask-socketio
- pytorch
- pytorch=1.11.*
- python=3.8.*
- cudatoolkit=11.1
- transformers
- eventlet
- markdown
- bleach=4.1.0
- pip
- git
- git=2.35.1
- sentencepiece
- protobuf
- pip:
- flask-cloudflared
- flask-ngrok
- lupa==1.10
- transformers>=4.17

View File

@ -10,7 +10,7 @@ dependencies:
- markdown
- bleach=4.1.0
- pip
- git
- git=2.35.1
- pip:
- --find-links https://download.pytorch.org/whl/rocm4.2/torch_stable.html
- torch

View File

@ -3,7 +3,6 @@ channels:
- conda-forge
- defaults
dependencies:
- transformers
- colorama
- flask-socketio
- python=3.8.*
@ -11,11 +10,14 @@ dependencies:
- markdown
- bleach=4.1.0
- pip
- git
- git=2.35.1
- sentencepiece
- protobuf
- pip:
- --find-links https://download.pytorch.org/whl/rocm4.2/torch_stable.html
- torch
- torchvision==0.11.1
- torch==1.10.*
- torchvision
- flask-cloudflared
- flask-ngrok
- lupa==1.10
- transformers>=4.17

View File

@ -1,5 +1,3 @@
import tkinter as tk
from tkinter import filedialog
from os import getcwd, listdir, path
from typing import Tuple, Union, Optional
import os
@ -10,6 +8,8 @@ import zipfile
# Generic Method for prompting for file path
#==================================================================#
def getsavepath(dir, title, types):
import tkinter as tk
from tkinter import filedialog
root = tk.Tk()
root.attributes("-topmost", True)
path = tk.filedialog.asksaveasfile(
@ -28,6 +28,8 @@ def getsavepath(dir, title, types):
# Generic Method for prompting for file path
#==================================================================#
def getloadpath(dir, title, types):
import tkinter as tk
from tkinter import filedialog
root = tk.Tk()
root.attributes("-topmost", True)
path = tk.filedialog.askopenfilename(
@ -45,6 +47,8 @@ def getloadpath(dir, title, types):
# Generic Method for prompting for directory path
#==================================================================#
def getdirpath(dir, title):
import tkinter as tk
from tkinter import filedialog
root = tk.Tk()
root.attributes("-topmost", True)
path = filedialog.askdirectory(
@ -61,30 +65,30 @@ def getdirpath(dir, title):
# Returns the path (as a string) to the given story by its name
#==================================================================#
def storypath(name):
return path.join(path.dirname(path.realpath(__file__)), "stories", name + ".json")
return path.join("stories", name + ".json")
#==================================================================#
# Returns the path (as a string) to the given soft prompt by its filename
#==================================================================#
def sppath(filename):
return path.join(path.dirname(path.realpath(__file__)), "softprompts", filename)
return path.join("softprompts", filename)
#==================================================================#
# Returns the path (as a string) to the given username by its filename
#==================================================================#
def uspath(filename):
return path.join(path.dirname(path.realpath(__file__)), "userscripts", filename)
return path.join("userscripts", filename)
#==================================================================#
# Returns an array of dicts containing story files in /stories
#==================================================================#
def getstoryfiles():
list = []
for file in listdir(path.dirname(path.realpath(__file__))+"/stories"):
for file in listdir("stories"):
if file.endswith(".json"):
ob = {}
ob["name"] = file.replace(".json", "")
f = open(path.dirname(path.realpath(__file__))+"/stories/"+file, "r")
f = open("stories/"+file, "r")
try:
js = json.load(f)
except:
@ -108,7 +112,7 @@ def checksp(filename: str, model_dimension: int) -> Tuple[Union[zipfile.ZipFile,
if 'np' not in globals():
import numpy as np
try:
z = zipfile.ZipFile(path.dirname(path.realpath(__file__))+"/softprompts/"+filename)
z = zipfile.ZipFile("softprompts/"+filename)
with z.open('tensor.npy') as f:
# Read only the header of the npy file, for efficiency reasons
version: Tuple[int, int] = np.lib.format.read_magic(f)
@ -118,7 +122,10 @@ def checksp(filename: str, model_dimension: int) -> Tuple[Union[zipfile.ZipFile,
shape, fortran_order, dtype = np.lib.format._read_array_header(f, version)
assert len(shape) == 2
except:
z.close()
try:
z.close()
except UnboundLocalError:
pass
return 1, None, None, None, None
if dtype not in ('V2', np.float16, np.float32):
z.close()
@ -136,8 +143,8 @@ def checksp(filename: str, model_dimension: int) -> Tuple[Union[zipfile.ZipFile,
#==================================================================#
def getspfiles(model_dimension: int):
lst = []
os.makedirs(path.dirname(path.realpath(__file__))+"/softprompts", exist_ok=True)
for file in listdir(path.dirname(path.realpath(__file__))+"/softprompts"):
os.makedirs("softprompts", exist_ok=True)
for file in listdir("softprompts"):
if not file.endswith(".zip"):
continue
z, version, shape, fortran_order, dtype = checksp(file, model_dimension)
@ -170,8 +177,8 @@ def getspfiles(model_dimension: int):
#==================================================================#
def getusfiles(long_desc=False):
lst = []
os.makedirs(path.dirname(path.realpath(__file__))+"/userscripts", exist_ok=True)
for file in listdir(path.dirname(path.realpath(__file__))+"/userscripts"):
os.makedirs("userscripts", exist_ok=True)
for file in listdir("userscripts"):
if file.endswith(".lua"):
ob = {}
ob["filename"] = file

View File

@ -51,8 +51,19 @@ gensettingstf = [
"min": 0.0,
"max": 1.0,
"step": 0.05,
"default": 0.0,
"default": 1.0,
"tooltip": "Alternative sampling method; it is recommended to disable top_p and top_k (set top_p to 1 and top_k to 0) if using this. 0.95 is thought to be a good value. (Put this value on 1 to disable its effect)"
},
{
"uitype": "slider",
"unit": "float",
"label": "Typical Sampling",
"id": "settypical",
"min": 0.0,
"max": 1.0,
"step": 0.05,
"default": 1.0,
"tooltip": "Alternative sampling method described in the paper \"Typical Decoding for Natural Language Generation\" (10.48550/ARXIV.2202.00666). The paper suggests 0.2 as a good value for this setting. Set this setting to 1 to disable its effect."
},
{
"uitype": "slider",
@ -207,6 +218,17 @@ gensettingstf = [
"step": 1,
"default": 0,
"tooltip": "Disables userscript generation modifiers."
},
{
"uitype": "toggle",
"unit": "bool",
"label": "Debug",
"id": "debug",
"min": 0,
"max": 1,
"step": 1,
"default": 0,
"tooltip": "Show debug info"
}
]
@ -341,6 +363,17 @@ gensettingsik =[{
"step": 1,
"default": 0,
"tooltip": "When enabled, the Memory text box in the Random Story dialog will be prefilled by default with your current story's memory instead of being empty."
},
{
"uitype": "toggle",
"unit": "bool",
"label": "Debug",
"id": "debug",
"min": 0,
"max": 1,
"step": 1,
"default": 0,
"tooltip": "Show debug info"
}
]

View File

@ -45,6 +45,8 @@ subst B: miniconda3
SET TEMP=B:\
SET TMP=B:\
copy umamba.exe B:\umamba.exe
copy loader.settings B:\loader.settings
copy disconnect-kobold-drive.bat B:\disconnect-kobold-drive.bat
B:
umamba.exe create -r B:\python\ -n base
IF %B%==1 umamba.exe install --no-shortcuts -r B:\python\ -n base -f "%~dp0\environments\huggingface.yml" -y --always-copy

16
install_requirements.sh Executable file
View File

@ -0,0 +1,16 @@
#!/bin/bash
if [[ $1 = "cuda" ]]; then
wget -qO- https://micromamba.snakepit.net/api/micromamba/linux-64/latest | tar -xvj bin/micromamba
bin/micromamba create -f environments/huggingface.yml -r runtime -n koboldai -y
# Weird micromamba bug causes it to fail the first time, running it twice just to be safe, the second time is much faster
bin/micromamba create -f environments/huggingface.yml -r runtime -n koboldai -y
exit
fi
if [[ $1 = "rocm" ]]; then
wget -qO- https://micromamba.snakepit.net/api/micromamba/linux-64/latest | tar -xvj bin/micromamba
bin/micromamba create -f environments/rocm.yml -r runtime -n koboldai-rocm -y
# Weird micromamba bug causes it to fail the first time, running it twice just to be safe, the second time is much faster
bin/micromamba create -f environments/rocm.yml -r runtime -n koboldai-rocm -y
exit
fi
echo Please specify either CUDA or ROCM

BIN
koboldai.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 150 KiB

BIN
koboldaiblue.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 152 KiB

BIN
koboldaigreen.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 151 KiB

32
maps/gpt_neo.json Normal file
View File

@ -0,0 +1,32 @@
{
"mtj_compat": "neo",
"mtj_pe": "fixed",
"mtj_config_map": {
"d_model": "hidden_size",
"n_heads": "num_heads",
"layers": "num_layers"
},
"static_weights": {
"transformer.wte.weight": {"mtj": {"module": "embedding_shard/~/linear", "param": "w", "transforms": ["no_transpose", "vocab_pad"]}},
"transformer.wpe.weight": {"mtj": {"module": "embedding_shard", "param": "pos_embs", "transforms": ["no_transpose"]}},
"transformer.ln_f.weight": {"mtj": {"module": "projection_shard/~/replicated_layer_norm", "param": "scale"}},
"transformer.ln_f.bias": {"mtj": {"module": "projection_shard/~/replicated_layer_norm", "param": "offset"}}
},
"layer_weights": {
"transformer.h.{layer}.attn.attention.bias": {},
"transformer.h.{layer}.attn.attention.masked_bias": {},
"transformer.h.{layer}.attn.attention.q_proj.weight": {"mtj": {"module": "layer_{layer}/~/linear", "param": "w"}},
"transformer.h.{layer}.attn.attention.v_proj.weight": {"mtj": {"module": "layer_{layer}/~/linear_1", "param": "w"}},
"transformer.h.{layer}.attn.attention.k_proj.weight": {"mtj": {"module": "layer_{layer}/~/linear_2", "param": "w"}},
"transformer.h.{layer}.attn.attention.out_proj.weight": {"mtj": {"module": "layer_{layer}/~/linear_3", "param": "w"}},
"transformer.h.{layer}.attn.attention.out_proj.bias": {"mtj": {"module": "layer_{layer}/~/linear_3", "param": "b", "transforms": ["divide_by_shards"]}},
"transformer.h.{layer}.mlp.c_fc.weight": {"mtj": {"module": "layer_{layer}/~/linear_4", "param": "w"}},
"transformer.h.{layer}.mlp.c_fc.bias": {"mtj": {"module": "layer_{layer}/~/linear_4", "param": "b"}},
"transformer.h.{layer}.mlp.c_proj.weight": {"mtj": {"module": "layer_{layer}/~/linear_5", "param": "w"}},
"transformer.h.{layer}.mlp.c_proj.bias": {"mtj": {"module": "layer_{layer}/~/linear_5", "param": "b", "transforms": ["divide_by_shards"]}},
"transformer.h.{layer}.ln_1.weight": {"mtj": {"module": "layer_{layer}/~/replicated_layer_norm", "param": "scale"}},
"transformer.h.{layer}.ln_1.bias": {"mtj": {"module": "layer_{layer}/~/replicated_layer_norm", "param": "offset"}},
"transformer.h.{layer}.ln_2.weight": {"mtj": {"module": "layer_{layer}/~/replicated_layer_norm_1", "param": "scale"}},
"transformer.h.{layer}.ln_2.bias": {"mtj": {"module": "layer_{layer}/~/replicated_layer_norm_1", "param": "offset"}}
}
}

32
maps/gptj.json Normal file
View File

@ -0,0 +1,32 @@
{
"mtj_compat": "j",
"mtj_pe": "rotary",
"mtj_config_map": {
"pe_rotary_dims": ["rotary_dim", 64],
"d_model": "n_embd",
"n_heads": "n_head",
"layers": "n_layer"
},
"static_weights": {
"transformer.wte.weight": {"mtj": {"module": "embedding_shard/~/linear", "param": "w", "transforms": ["no_transpose", "vocab_pad"]}},
"transformer.wte.bias": {"mtj": {"module": "embedding_shard/~/linear", "param": "b"}},
"transformer.ln_f.weight": {"mtj": {"module": "projection_shard/~/replicated_layer_norm", "param": "scale"}},
"transformer.ln_f.bias": {"mtj": {"module": "projection_shard/~/replicated_layer_norm", "param": "offset"}},
"lm_head.weight": {"mtj": {"module": "projection_shard/~/linear", "param": "w", "transforms": ["vocab_pad"]}},
"lm_head.bias": {"mtj": {"module": "projection_shard/~/linear", "param": "b"}}
},
"layer_weights": {
"transformer.h.{layer}.attn.bias": {},
"transformer.h.{layer}.attn.masked_bias": {},
"transformer.h.{layer}.attn.q_proj.weight": {"mtj": {"module": "layer_{layer}/~/linear", "param": "w"}},
"transformer.h.{layer}.attn.v_proj.weight": {"mtj": {"module": "layer_{layer}/~/linear_1", "param": "w"}},
"transformer.h.{layer}.attn.k_proj.weight": {"mtj": {"module": "layer_{layer}/~/linear_2", "param": "w"}},
"transformer.h.{layer}.attn.out_proj.weight": {"mtj": {"module": "layer_{layer}/~/linear_3", "param": "w"}},
"transformer.h.{layer}.mlp.fc_in.weight": {"mtj": {"module": "layer_{layer}/~/linear_4", "param": "w"}},
"transformer.h.{layer}.mlp.fc_in.bias": {"mtj": {"module": "layer_{layer}/~/linear_4", "param": "b"}},
"transformer.h.{layer}.mlp.fc_out.weight": {"mtj": {"module": "layer_{layer}/~/linear_5", "param": "w"}},
"transformer.h.{layer}.mlp.fc_out.bias": {"mtj": {"module": "layer_{layer}/~/linear_5", "param": "b", "transforms": ["divide_by_shards"]}},
"transformer.h.{layer}.ln_1.weight": {"mtj": {"module": "layer_{layer}/~/replicated_layer_norm", "param": "scale"}},
"transformer.h.{layer}.ln_1.bias": {"mtj": {"module": "layer_{layer}/~/replicated_layer_norm", "param": "offset"}}
}
}

35
maps/opt.json Normal file
View File

@ -0,0 +1,35 @@
{
"mtj_compat": "opt",
"mtj_pe": "fixed",
"mtj_config_map": {
"do_layer_norm_before": ["do_layer_norm_before", true],
"d_embed": "word_embed_proj_dim",
"d_model": "hidden_size",
"n_heads": "num_attention_heads",
"layers": "num_hidden_layers"
},
"static_weights": {
"decoder.embed_tokens.weight": {"mtj": {"module": "embedding_shard/~/linear", "param": "w", "transforms": ["no_transpose", "vocab_pad"]}},
"decoder.project_in.weight": {"mtj": {"module": "embedding_shard", "param": "project_in"}},
"decoder.embed_positions.weight": {"mtj": {"module": "embedding_shard", "param": "pos_embs", "transforms": ["no_transpose", "remove_first_two_rows"]}},
"decoder.project_out.weight": {"mtj": {"module": "projection_shard", "param": "project_out"}}
},
"layer_weights": {
"decoder.layers.{layer}.self_attn.q_proj.weight": {"mtj": {"module": "layer_{layer}/~/linear", "param": "w"}},
"decoder.layers.{layer}.self_attn.q_proj.bias": {"mtj": {"module": "layer_{layer}/~/linear", "param": "b"}},
"decoder.layers.{layer}.self_attn.v_proj.weight": {"mtj": {"module": "layer_{layer}/~/linear_1", "param": "w"}},
"decoder.layers.{layer}.self_attn.v_proj.bias": {"mtj": {"module": "layer_{layer}/~/linear_1", "param": "b"}},
"decoder.layers.{layer}.self_attn.k_proj.weight": {"mtj": {"module": "layer_{layer}/~/linear_2", "param": "w"}},
"decoder.layers.{layer}.self_attn.k_proj.bias": {"mtj": {"module": "layer_{layer}/~/linear_2", "param": "b"}},
"decoder.layers.{layer}.self_attn.out_proj.weight": {"mtj": {"module": "layer_{layer}/~/linear_3", "param": "w"}},
"decoder.layers.{layer}.self_attn.out_proj.bias": {"mtj": {"module": "layer_{layer}/~/linear_3", "param": "b", "transforms": ["divide_by_shards"]}},
"decoder.layers.{layer}.fc1.weight": {"mtj": {"module": "layer_{layer}/~/linear_4", "param": "w"}},
"decoder.layers.{layer}.fc1.bias": {"mtj": {"module": "layer_{layer}/~/linear_4", "param": "b"}},
"decoder.layers.{layer}.fc2.weight": {"mtj": {"module": "layer_{layer}/~/linear_5", "param": "w"}},
"decoder.layers.{layer}.fc2.bias": {"mtj": {"module": "layer_{layer}/~/linear_5", "param": "b", "transforms": ["divide_by_shards"]}},
"decoder.layers.{layer}.self_attn_layer_norm.weight": {"mtj": {"module": "layer_{layer}/~/replicated_layer_norm", "param": "scale"}},
"decoder.layers.{layer}.self_attn_layer_norm.bias": {"mtj": {"module": "layer_{layer}/~/replicated_layer_norm", "param": "offset"}},
"decoder.layers.{layer}.final_layer_norm.weight": {"mtj": {"module": "layer_{layer}/~/replicated_layer_norm_1", "param": "scale"}},
"decoder.layers.{layer}.final_layer_norm.bias": {"mtj": {"module": "layer_{layer}/~/replicated_layer_norm_1", "param": "offset"}}
}
}

32
maps/xglm.json Normal file
View File

@ -0,0 +1,32 @@
{
"mtj_compat": "fairseq_lm",
"mtj_pe": "fairseq_sinusoidal",
"mtj_config_map": {
"d_model": "d_model",
"n_heads": "attention_heads",
"layers": "num_layers"
},
"static_weights": {
"model.embed_tokens.weight": {"mtj": {"module": "embedding_shard/~/linear", "param": "w", "transforms": ["no_transpose", "vocab_pad"]}},
"model.layer_norm.weight": {"mtj": {"module": "projection_shard/~/replicated_layer_norm", "param": "scale"}},
"model.layer_norm.bias": {"mtj": {"module": "projection_shard/~/replicated_layer_norm", "param": "offset"}}
},
"layer_weights": {
"model.layers.{layer}.self_attn.q_proj.weight": {"mtj": {"module": "layer_{layer}/~/linear", "param": "w"}},
"model.layers.{layer}.self_attn.q_proj.bias": {"mtj": {"module": "layer_{layer}/~/linear", "param": "b"}},
"model.layers.{layer}.self_attn.v_proj.weight": {"mtj": {"module": "layer_{layer}/~/linear_1", "param": "w"}},
"model.layers.{layer}.self_attn.v_proj.bias": {"mtj": {"module": "layer_{layer}/~/linear_1", "param": "b"}},
"model.layers.{layer}.self_attn.k_proj.weight": {"mtj": {"module": "layer_{layer}/~/linear_2", "param": "w"}},
"model.layers.{layer}.self_attn.k_proj.bias": {"mtj": {"module": "layer_{layer}/~/linear_2", "param": "b"}},
"model.layers.{layer}.self_attn.out_proj.weight": {"mtj": {"module": "layer_{layer}/~/linear_3", "param": "w"}},
"model.layers.{layer}.self_attn.out_proj.bias": {"mtj": {"module": "layer_{layer}/~/linear_3", "param": "b", "transforms": ["divide_by_shards"]}},
"model.layers.{layer}.fc1.weight": {"mtj": {"module": "layer_{layer}/~/linear_4", "param": "w"}},
"model.layers.{layer}.fc1.bias": {"mtj": {"module": "layer_{layer}/~/linear_4", "param": "b"}},
"model.layers.{layer}.fc2.weight": {"mtj": {"module": "layer_{layer}/~/linear_5", "param": "w"}},
"model.layers.{layer}.fc2.bias": {"mtj": {"module": "layer_{layer}/~/linear_5", "param": "b", "transforms": ["divide_by_shards"]}},
"model.layers.{layer}.self_attn_layer_norm.weight": {"mtj": {"module": "layer_{layer}/~/replicated_layer_norm", "param": "scale"}},
"model.layers.{layer}.self_attn_layer_norm.bias": {"mtj": {"module": "layer_{layer}/~/replicated_layer_norm", "param": "offset"}},
"model.layers.{layer}.final_layer_norm.weight": {"mtj": {"module": "layer_{layer}/~/replicated_layer_norm_1", "param": "scale"}},
"model.layers.{layer}.final_layer_norm.bias": {"mtj": {"module": "layer_{layer}/~/replicated_layer_norm_1", "param": "offset"}}
}
}

View File

@ -0,0 +1,2 @@
Place the models extracted in their own subfolder.
Downloaded models from the menu will automatically appear here.

9
play-rocm.sh Normal file → Executable file
View File

@ -1,4 +1,5 @@
cd docker-rocm
xhost +local:docker
cp ../environments/rocm.yml env.yml
docker-compose run --service-ports koboldai bash -c "cd /content && python3 aiserver.py $*"
#!/bin/bash
if [ ! -f "runtime/envs/koboldai-rocm/bin/python" ]; then
./install_requirements.sh rocm
fi
bin/micromamba run -r runtime -n koboldai-rocm python aiserver.py $*

View File

@ -16,20 +16,20 @@ cmd /k
:drivemap
ECHO Runtime launching in K: drive mode
subst /D K: >nul
subst K: miniconda3 >nul
SET TEMP=K:\
SET TMP=K:\
call K:\python\condabin\activate
python aiserver.py %*
subst K: /D
cmd /k
:drivemap_B
ECHO Runtime launching in B: drive mode
subst /D B: >nul
subst B: miniconda3 >nul
SET TEMP=B:\
SET TMP=B:\
call B:\python\condabin\activate
python aiserver.py %*
subst B: /D
cmd /k

View File

@ -1,45 +0,0 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "KoboldAI Jupyter",
"provenance": [],
"authorship_tag": "ABX9TyMDTbAhtDnKJa+aIEaQjpsL"
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "TPU"
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# KoboldAI Launcher for generic Jupyter Notebooks\n",
"This notebook is meant as a way to easily launch KoboldAI on existing Jupyter instances that already have KoboldAI installed (For example a custom Saturn Cloud or Paperspace instance).\n",
"\n",
"For Google Colab please check out our Google Colab edition available at : https://colab.research.google.com/github/KoboldAI/KoboldAI-Client/blob/main/colab/TPU.ipynb"
],
"metadata": {
"id": "hMRnGz42Xsy3"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "40B1QvI3Xv02"
},
"outputs": [],
"source": [
"!pip install -r requirements.txt\n",
"!python3 aiserver.py --remote"
]
}
]
}

5
play.sh Executable file
View File

@ -0,0 +1,5 @@
#!/bin/bash
if [ ! -f "runtime/envs/koboldai/bin/python" ]; then
./install_requirements.sh cuda
fi
bin/micromamba run -r runtime -n koboldai python aiserver.py $*

190
readme.md
View File

@ -1,4 +1,4 @@
# KoboldAI - Your gateway to GPT writing
## KoboldAI - Your gateway to GPT writing
This is a browser-based front-end for AI-assisted writing with multiple local & remote AI models. It offers the standard array of tools, including Memory, Author's Note, World Info, Save & Load, adjustable AI settings, formatting options, and the ability to import existing AI Dungeon adventures. You can also turn on Adventure mode and play the game like AI Dungeon Unleashed.
@ -12,9 +12,9 @@ By default KoboldAI will run in a generic mode optimized for writing, but with t
The gameplay will be slightly different than the gameplay in AI Dungeon because we adopted the Type of the Unleashed fork, giving you full control over all the characters because we do not automatically adapt your sentences behind the scenes. This means you can more reliably control characters that are not you.
As a result of this what you need to type is slightly different, in AI Dungeon you would type ***take the sword*** while in KoboldAI you would type it like a sentence such as ***You take the sword*** and this is best done with the word You instead of I.
As a result of this what you need to type is slightly different, in AI Dungeon you would type _**take the sword**_ while in KoboldAI you would type it like a sentence such as _**You take the sword**_ and this is best done with the word You instead of I.
To speak simply type : *You say "We should probably gather some supplies first"*
To speak simply type : _You say "We should probably gather some supplies first"_
Just typing the quote might work, but the AI is at its best when you specify who does what in your commands.
If you want to do this with your friends we advise using the main character as You and using the other characters by their name if you are playing on a model trained for Adventures. These models assume there is a You in the story. This mode does usually not perform well on Novel models because they do not know how to handle the input those are best used with regular story writing where you take turns with the AI.
@ -27,7 +27,7 @@ If you want to use KoboldAI as a writing assistant this is best done in the regu
In chatbot mode you can use a suitable model as a chatbot, this mode automatically adds your name to the beginning of the sentences and prevents the AI from talking as you. To use it properly you must write your story opening as both characters in the following format (You can use your own text) :
``` ChatBot Opening Example
```plaintext
Bot : Hey!
You : Hey Boyname, how have you been?
Bot : Been good! How about you?
@ -42,8 +42,6 @@ This mode works the best on either a Generic model or a chatbot model specifical
Novel or Adventure models are not recommended for this feature but might still work but can derail away from the conversation format quickly.
## Play KoboldAI online for free on Google Colab (The easiest way to play)
If you would like to play KoboldAI online for free on a powerful computer you can use Google Colaboraty. We provide two editions, a TPU and a GPU edition with a variety of models available. These run entirely on Google's Servers and will automatically upload saves to your Google Drive if you choose to save a story (Alternatively, you can choose to download your save instead so that it never gets stored on Google Drive). Detailed instructions on how to use them are at the bottom of the Colab's.
@ -52,35 +50,71 @@ Each edition features different models and requires different hardware to run, t
### [Click here for the TPU Edition Colab](https://colab.research.google.com/github/KoboldAI/KoboldAI-Client/blob/main/colab/TPU.ipynb)
| Model | Size | Type | Drive Space | Description |
| ------------------------------ | ------ | --------- | ----------- | ------------------------------------------------------------ |
| Skein 6B by VE_FORBDRYDERNE | 6B TPU | Hybrid | 0 GB | Skein is our flagship 6B model, it is a hybrid between a Adventure model and a Novel model. Best used with either Adventure mode or the You Bias userscript enabled. Skein has been trained on high quality Novels along with CYOA adventure stories and is not as wackey as the Adventure model. It also has tagging support. |
| Adventure 6B by VE_FORBRYDERNE | 6B TPU | Adventure | 0 GB | Adventure is a 6B model designed to mimick the behavior of AI Dungeon. It is exclusively for Adventure Mode and can take you on the epic and wackey adventures that AI Dungeon players love. It also features the many tropes of AI Dungeon as it has been trained on very similar data. It must be used in second person (You). |
| Lit 6B by Haru | 6B TPU | NSFW | 8 GB / 12 GB | Lit is a great NSFW model trained by Haru on both a large set of Literotica stories and high quality novels along with tagging support. Creating a high quality model for your NSFW stories. This model is exclusively a novel model and is best used in third person. |
| Generic 6B by EleutherAI | 6B TPU | Generic | 10 GB / 12 GB | GPT-J-6B is what all other models are based on, if you need something that has no specific bias towards any particular subject this is the model for you. Best used when the other models are not suitable for what you wish to do. Such as homework assistance, blog writing, coding and more. It needs more hand holding than other models and is more prone to undesirable formatting changes. |
| C1 6B by Haru | 6B TPU | Chatbot | 8 GB / 12 GB | C1 has been trained on various internet chatrooms, it makes the basis for an interesting chatbot model and has been optimized to be used in the Chatmode. |
| Model | Size | Style | Description |
| --- | --- | --- | --- |
| [Nerys](https://huggingface.co/KoboldAI/fairseq-dense-13B-Nerys) by Mr Seeker | 13B | Novel/Adventure | Nerys is a hybrid model based on Pike (A newer Janeway), on top of the Pike dataset you also get some Light Novels, Adventure mode support and a little bit of shinen thrown in the mix. The end result is a very diverse model that is heavily biased towards SFW novel writing, but one that can go beyond its novel training and make for an excellent adventure model to. Adventure mode is best played from a second person perspective, but can be played in first or third person as well. Novel writing can be done best from the first or third person. |
| [Janeway](https://huggingface.co/KoboldAI/fairseq-dense-13B-Janeway) by Mr Seeker | 13B | Novel | Janeway is a model created from Picard's dataset combined with a brand new collection of ebooks. This model is trained on 20% more content than Picard and has been trained on literature from various genres. Although the model is mainly focussed on SFW, romantic scenes might involve a degree of nudity. |
| [Shinen](https://huggingface.co/KoboldAI/fairseq-dense-13B-Shinen) by Mr Seeker | 13B | NSFW | Shinen is an NSFW model designed to be more explicit. Trained on a variety of stories from the website Sexstories it contains many different kinks. |
| [Skein](https://huggingface.co/KoboldAI/GPT-J-6B-Skein) by VE\_FORBRYDERNE | 6B | Adventure | Skein is best used with Adventure mode enabled, it consists of a 4 times larger adventure dataset than the Adventure model making it excellent for text adventure gaming. On top of that it also consists of light novel training further expanding its knowledge and writing capabilities. It can be used with the You filter bias if you wish to write Novels with it, but dedicated Novel models can perform better for this task. |
| [Adventure](https://huggingface.co/KoboldAI/GPT-J-6B-Adventure) by VE\_FORBRYDERNE | 6B | Adventure | Adventure is a 6B model designed to mimick the behavior of AI Dungeon. It is exclusively for Adventure Mode and can take you on the epic and wackey adventures that AI Dungeon players love. It also features the many tropes of AI Dungeon as it has been trained on very similar data. It must be used in second person (You). |
| [Lit](https://huggingface.co/hakurei/lit-6B) by Haru | 6B | NSFW | Lit is a great NSFW model trained by Haru on both a large set of Literotica stories and high quality novels along with tagging support. Creating a high quality model for your NSFW stories. This model is exclusively a novel model and is best used in third person. |
| [Convo](https://huggingface.co/hitomi-team/convo-6B) by Hitomi Team | 6B | Chatbot | Convo-6B is a GPT-J 6B model fine-tuned on a collection of high quality open source datasets which amount to 6 million messages. The primary goal of the model is to provide improved performance and generalization when generating multi-turn dialogue for characters that were not present from within the fine tuning data. The prompted performance has especially improved over the predecessor model [C1-6B](https://huggingface.co/hakurei/c1-6B). |
| [C1](https://huggingface.co/hakurei/c1-6B) by Haru | 6B | Chatbot | C1 has been trained on various internet chatrooms, it makes the basis for an interesting chatbot model and has been optimized to be used in the Chatmode. |
| Neo(X) by EleutherAI | 20B | Generic | NeoX is the largest EleutherAI model currently available, being a generic model it is not particularly trained towards anything and can do a variety of writing, Q&A and coding tasks. 20B's performance is closely compared to the 13B models and it is worth trying both especially if you have a task that does not involve english writing. Its behavior will be similar to the GPT-J-6B model since they are trained on the same dataset but with more sensitivity towards repetition penalty and with more knowledge. |
| [Fairseq Dense](https://huggingface.co/KoboldAI/fairseq-dense-13B) | 13B | Generic | Trained by Facebook Researchers this model stems from the MOE research project within Fairseq. This particular version has been converted by us for use in KoboldAI. It is known to be on par with the larger 20B model from EleutherAI and considered as better for pop culture and language tasks. Because the model has never seen a new line (enter) it may perform worse on formatting and paragraphing. |
| [GPT-J-6B](https://huggingface.co/EleutherAI/gpt-j-6B) by EleutherAI | 6B | Generic | This model serves as the basis for most other 6B models (Some being based on Fairseq Dense instead). Being trained on the Pile and not biased towards anything in particular it is suitable for a variety of tasks such as writing, Q&A and coding tasks. You will likely get better result with larger generic models or finetuned models. |
## [GPU Edition Model Descriptions](https://colab.research.google.com/github/KoboldAI/KoboldAI-Client/blob/main/colab/GPU.ipynb)
| Model | Size | Style | Description |
| --- | --- | --- | --- |
| [Fairseq-Dense-2.7B-Nerys](https://huggingface.co/KoboldAI/fairseq-dense-2.7B-Nerys) by Mr Seeker | 2.7B | Novel/Adventure | Nerys is a hybrid model based on Pike (A newer Janeway), on top of the Pike dataset you also get some Light Novels, Adventure mode support and a little bit of shinen thrown in the mix. The end result is a very diverse model that is heavily biased towards SFW novel writing, but one that can go beyond its novel training and make for an excellent adventure model to. Adventure mode is best played from a second person perspective, but can be played in first or third person as well. Novel writing can be done best from the first or third person. |
| [GPT-Neo-2.7B-Janeway](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Janeway) by Mr Seeker | 2.7B | Novel | Janeway is a model created from Picard's dataset combined with a brand new collection of ebooks. This model is trained on 20% more content than Picard and has been trained on literature from various genres. Although the model is mainly focussed on SFW, romantic scenes might involve a degree of nudity. |
| [GPT-Neo-2.7B-Picard](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Picard) by Mr Seeker | 2.7B | Novel | Picard is a model trained for SFW Novels based on GPT-Neo-2.7B. It is focused on Novel style writing without the NSFW bias. While the name suggests a sci-fi model this model is designed for Novels of a variety of genre's. It is meant to be used in KoboldAI's regular mode. |
| [GPT-Neo-2.7B-AID](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-AID) by melastacho | 2.7B | Adventure | Also know as Adventure 2.7B this is a clone of the AI Dungeon Classic model and is best known for the epic wackey adventures that AI Dungeon Classic players love. |
| [GPT-Neo-2.7B-Horni-LN](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Horni-LN) by finetune | 2.7B | Novel | This model is based on GPT-Neo-2.7B-Horni and retains its NSFW knowledge, but was then further biased towards SFW novel stories. If you seek a balance between a SFW Novel model and a NSFW model this model should be a good choice. |
| [GPT-Neo-2.7B-Horni](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Horni) by finetune | 2.7B | NSFW | This model is tuned on Literotica to produce a Novel style model biased towards NSFW content. Can still be used for SFW stories but will have a bias towards NSFW content. It is meant to be used in KoboldAI's regular mode. |
| [GPT-Neo-2.7B-Shinen](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Shinen) by Mr Seeker | 2.7B | NSFW | Shinen is an alternative to the Horni model designed to be more explicit. If Horni is to tame for you shinen might produce better results. While it is a Novel model it is unsuitable for SFW stories due to its heavy NSFW bias. Shinen will not hold back. It is meant to be used in KoboldAI's regular mode. |
| [GPT-Neo-2.7B](https://huggingface.co/EleutherAI/gpt-neo-2.7B) by EleutherAI | 2.7B | Generic | This is the base model for all the other 2.7B models, it is best used when you have a use case that we have no other models available for, such as writing blog articles or programming. It can also be a good basis for the experience of some of the softprompts if your softprompt is not about a subject the other models cover. |
| Style | Description |
| --- | --- |
| Novel | For regular story writing, not compatible with Adventure mode or other specialty modes. |
| NSFW | Indicates that the model is strongly biased towards NSFW content and is not suitable for children, work environments or livestreaming. Most NSFW models are also Novel models in nature. |
| Adventure | These models are excellent for people willing to play KoboldAI like a Text Adventure game and are meant to be used with Adventure mode enabled. Even if you wish to use it as a Novel style model you should always have Adventure mode on and set it to story. These models typically have a strong bias towards the use of the word You and without Adventure mode enabled break the story flow and write actions on your behalf. |
| Chatbot | These models are specifically trained for chatting and are best used with the Chatmode enabled. Typically trained on either public chatrooms or private chats. |
| Generic | Generic models are not trained towards anything specific, typically used as a basis for other tasks and models. They can do everything the other models can do, but require much more handholding to work properly. Generic models are an ideal basis for tasks that we have no specific model for, or for experiencing a softprompt in its raw form. |
## Tips to get the most out of Google Colab
* Google will occationally show a Captcha, typically after it has been open for 30 minutes but it can be more frequent if you often use Colab. Make sure to do these properly, or you risk getting your instance shut down and getting a lower priority towards the TPU's.
* KoboldAI uses Google Drive to store your files and settings, if you wish to upload a softprompt or userscript this can be done directly on the Google Drive website. You can also use this to download backups of your KoboldAI related files or upload models of your own.
* Don't want to save your stories on Google Drive for privacy reasons? Do not use KoboldAI's save function and instead click Download as .json, this will automatically download the story to your own computer without ever touching Google's harddrives. You can load this back trough the Load from file option.
* Google shut your instance down unexpectedly? You can still make use of the Download as .json button to recover your story as long as you did not close the KoboldAI window. You can then load this back up in your next session.
* Done with KoboldAI? Go to the Runtime menu, click on Manage Sessions and terminate your open sessions that you no longer need. This trick can help you maintain higher priority towards getting a TPU.
* Models stored on Google Drive typically load faster than models we need to download from the internet.
### [Click here for the GPU Edition Colab](https://colab.research.google.com/github/KoboldAI/KoboldAI-Client/blob/main/colab/GPU.ipynb)
| Model | Size | Type | Description |
| ------------------------------------------------------------ | -------- | ---------- | ------------------------------------------------------------ |
| [GPT-Neo-2.7B-Picard](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Picard) by Mr Seeker | 2.7B GPU | Novel | Picard is a model trained for SFW Novels based on GPT-Neo-2.7B. It is focused on Novel Type writing without the NSFW bias. While the name suggests a sci-fi model this model is designed for Novels of a variety of genre's. It is meant to be used in KoboldAI's regular mode. |
| Model | Size | Type | Description |
| --- | --- | --- | --- |
| [GPT-Neo-2.7B-Picard](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Picard) by Mr Seeker | 2.7B GPU | Novel | Picard is a model trained for SFW Novels based on GPT-Neo-2.7B. It is focused on Novel Type writing without the NSFW bias. While the name suggests a sci-fi model this model is designed for Novels of a variety of genre's. It is meant to be used in KoboldAI's regular mode. |
| [GPT-Neo-2.7B-AID](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-AID) by melastacho | 2.7B GPU | Adventure | Also know as Adventure 2.7B this is a clone of the AI Dungeon Classic model and is best known for the epic wackey adventures that AI Dungeon Classic players love. |
| [GPT-Neo-2.7B-Horni-LN](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Horni-LN) by finetune | 2.7B GPU | Novel | This model is based on GPT-Neo-2.7B-Horni and retains its NSFW knowledge, but was then further biased towards SFW novel stories. If you seek a balance between a SFW Novel model and a NSFW model this model should be a good choice. |
| [GPT-Neo-2.7B-Horni](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Horni) by finetune | 2.7B GPU | NSFW | This model is tuned on Literotica to produce a Novel Type model biased towards NSFW content. Can still be used for SFW stories but will have a bias towards NSFW content. It is meant to be used in KoboldAI's regular mode. |
| [GPT-Neo-2.7B-Shinen](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Shinen) by Mr Seeker | 2.7B GPU | NSFW | Shinen is an alternative to the Horni model designed to be more explicit. If Horni is to tame for you shinen might produce better results. While it is a Novel model it is unsuitable for SFW stories due to its heavy NSFW bias. Shinen will not hold back. It is meant to be used in KoboldAI's regular mode. |
| [GPT-Neo-2.7B](https://huggingface.co/EleutherAI/gpt-neo-2.7B) by EleutherAI | 2.7B GPU | Generic | This is the base model for all the other 2.7B models, it is best used when you have a use case that we have no other models available for, such as writing blog articles or programming. It can also be a good basis for the experience of some of the softprompts if your softprompt is not about a subject the other models cover. |
| [GPT-Neo-2.7B-Horni](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Horni) by finetune | 2.7B GPU | NSFW | This model is tuned on Literotica to produce a Novel Type model biased towards NSFW content. Can still be used for SFW stories but will have a bias towards NSFW content. It is meant to be used in KoboldAI's regular mode. |
| [GPT-Neo-2.7B-Shinen](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Shinen) by Mr Seeker | 2.7B GPU | NSFW | Shinen is an alternative to the Horni model designed to be more explicit. If Horni is to tame for you shinen might produce better results. While it is a Novel model it is unsuitable for SFW stories due to its heavy NSFW bias. Shinen will not hold back. It is meant to be used in KoboldAI's regular mode. |
| [GPT-Neo-2.7B](https://huggingface.co/EleutherAI/gpt-neo-2.7B) by EleutherAI | 2.7B GPU | Generic | This is the base model for all the other 2.7B models, it is best used when you have a use case that we have no other models available for, such as writing blog articles or programming. It can also be a good basis for the experience of some of the softprompts if your softprompt is not about a subject the other models cover. |
### Model Types
| Type | Description |
| --------- | ------------------------------------------------------------ |
| Novel | For regular story writing, not compatible with Adventure mode or other specialty modes. |
| NSFW | Indicates that the model is strongly biased towards NSFW content and is not suitable for children, work environments or livestreaming. Most NSFW models are also Novel models in nature. |
| Adventure | These models are excellent for people willing to play KoboldAI like a Text Adventure game and are meant to be used with Adventure mode enabled. Even if you wish to use it as a Novel Type model you should always have Adventure mode on and set it to story. These models typically have a strong bias towards the use of the word You and without Adventure mode enabled break the story flow and write actions on your behalf. |
| Chatbot | These models are specifically trained for chatting and are best used with the Chatmode enabled. Typically trained on either public chatrooms or private chats. |
| Hybrid | Hybrid models are a blend between different Types, for example they are trained on both Novel stories and Adventure stories. These models are great variety models that you can use for multiple different playTypes and modes, but depending on your usage you may need to enable Adventure Mode or the You bias (in userscripts). |
| Generic | Generic models are not trained towards anything specific, typically used as a basis for other tasks and models. They can do everything the other models can do, but require much more handholding to work properly. Generic models are an ideal basis for tasks that we have no specific model for, or for experiencing a softprompt in its raw form. |
| Type | Description |
| --- | --- |
| Novel | For regular story writing, not compatible with Adventure mode or other specialty modes. |
| NSFW | Indicates that the model is strongly biased towards NSFW content and is not suitable for children, work environments or livestreaming. Most NSFW models are also Novel models in nature. |
| Adventure | These models are excellent for people willing to play KoboldAI like a Text Adventure game and are meant to be used with Adventure mode enabled. Even if you wish to use it as a Novel Type model you should always have Adventure mode on and set it to story. These models typically have a strong bias towards the use of the word You and without Adventure mode enabled break the story flow and write actions on your behalf. |
| Chatbot | These models are specifically trained for chatting and are best used with the Chatmode enabled. Typically trained on either public chatrooms or private chats. |
| Hybrid | Hybrid models are a blend between different Types, for example they are trained on both Novel stories and Adventure stories. These models are great variety models that you can use for multiple different playTypes and modes, but depending on your usage you may need to enable Adventure Mode or the You bias (in userscripts). |
| Generic | Generic models are not trained towards anything specific, typically used as a basis for other tasks and models. They can do everything the other models can do, but require much more handholding to work properly. Generic models are an ideal basis for tasks that we have no specific model for, or for experiencing a softprompt in its raw form. |
## Install KoboldAI on your own computer
@ -94,33 +128,42 @@ The easiest way for Windows users is to use the [offline installer](https://sour
### Installing KoboldAI offline bundle on Windows 7 or higher using the KoboldAI Offline Installer (Easiest)
1. [Download the latest offline installer from here](https://sourceforge.net/projects/koboldai/files/latest/download)
2. Run the installer to place KoboldAI on a location of choice, KoboldAI is portable software and is not bound to a specific harddrive. (Because of long paths inside our dependencies you may not be able to extract it many folders deep).
3. Update KoboldAI to the latest version with update-koboldai.bat if desired.
4. Use KoboldAI offline using play.bat or remotely with remote-play.bat
1. [Download the latest offline installer from here](https://sourceforge.net/projects/koboldai/files/latest/download)
2. Run the installer to place KoboldAI on a location of choice, KoboldAI is portable software and is not bound to a specific harddrive. (Because of long paths inside our dependencies you may not be able to extract it many folders deep).
3. Update KoboldAI to the latest version with update-koboldai.bat if desired.
4. Use KoboldAI offline using play.bat or remotely with remote-play.bat
### Installing KoboldAI Github release on Windows 10 or higher using the KoboldAI Runtime Installer
1. Extract the .zip to a location you wish to install KoboldAI, you will need roughly 20GB of free space for the installation (this does not include the models).
2. Open install_requirements.bat as **administrator**.
3. Choose the regular version of Transformers (Option 1), finetuneanon is depreciated and no longer recommended.
4. You will now be asked to choose the installation mode, we **strongly** recommend the Temporary B: drive option. This option eliminates most installation issues and also makes KoboldAI portable. The B: drive will be gone after a reboot and will automatically be recreated each time you play KoboldAI.
5. The installation will now automatically install its requirements, some stages may appear to freeze do not close the installer until it asks you to press a key. Before pressing a key to exit the installer please check if errors occurred. Most problems with the game crashing are related to installation/download errors. Disabling your antivirus can help if you get errors.
6. Use play.bat to start KoboldAI.
1. Extract the .zip to a location you wish to install KoboldAI, you will need roughly 20GB of free space for the installation (this does not include the models).
2. Open install\_requirements.bat as **administrator**.
3. Choose the regular version of Transformers (Option 1), finetuneanon is depreciated and no longer recommended.
4. You will now be asked to choose the installation mode, we **strongly** recommend the Temporary B: drive option. This option eliminates most installation issues and also makes KoboldAI portable. The B: drive will be gone after a reboot and will automatically be recreated each time you play KoboldAI.
5. The installation will now automatically install its requirements, some stages may appear to freeze do not close the installer until it asks you to press a key. Before pressing a key to exit the installer please check if errors occurred. Most problems with the game crashing are related to installation/download errors. Disabling your antivirus can help if you get errors.
6. Use play.bat to start KoboldAI.
### Manual installation / Linux / Mac
### Installing KoboldAI on Linux using the KoboldAI Runtime (Easiest)
1. Clone the URL of this Github repository (For example git clone [https://github.com/koboldai/koboldai-client](https://github.com/koboldai/koboldai-client) )
2. AMD user? Make sure ROCm is installed if you want GPU support. Is yours not compatible with ROCm? Follow the usual instructions.
3. Run play.sh or if your AMD GPU supports ROCm use play-rocm.sh
KoboldAI will now automatically configure its dependencies and start up, everything is contained in its own conda runtime so we will not clutter your system. The files will be located in the runtime subfolder. If at any point you wish to force a reinstallation of the runtime you can do so with the install\_requirements.sh file. While you can run this manually it is not neccesary.
### Manual installation / Mac
We can not provide a step by step guide for manual installation due to the vast differences between the existing software configuration and the systems of our users.
If you would like to manually install KoboldAI you will need some python/conda package management knowledge to manually do one of the following steps :
1. Use our bundled environments files to install your own conda environment, this should also automatically install CUDA (Recommended, you can get Miniconda from https://docs.conda.io/en/latest/miniconda.html#latest-miniconda-installer-links). The recommended configuration is huggingface.yml for CUDA users and rocm.yml for ROCm users.
2. If you have a working copy of Docker for either CUDA or ROCm try play-cuda.sh or play-rocm.sh to launch the docker versions. In this case the installation is mostly automatic.
3. If conda is proving difficult you could also look inside requirements.txt for the required dependencies and try to install them yourself. This will likely be a mixture of pip and your native package manager, just installing our requirements.txt is not recommended since to speed things up we do not force any version changes. For local installations definitely prioritize conda as that is a better way for us to enforce you have the latest compatible versions.
1. Use our bundled environments files to install your own conda environment, this should also automatically install CUDA (Recommended, you can get Miniconda from https://docs.conda.io/en/latest/miniconda.html#latest-miniconda-installer-links). The recommended configuration is huggingface.yml for CUDA users and rocm.yml for ROCm users.
2. If conda is proving difficult you could also look inside requirements.txt for the required dependencies and try to install them yourself. This will likely be a mixture of pip and your native package manager, just installing our requirements.txt is not recommended since we assume local users will run conda to get all dependencies. For local installations definitely prioritize conda as that is a better way for us to enforce that you have the compatible versions.
3. Clone our Github or download the zip file.
4. Now start KoboldAI with aiserver.py and not with our play.bat or play.sh files.
### AMD GPU's
### AMD GPU's (Linux only)
AMD GPU's have terrible compute support, this will currently not work on Windows and will only work for a select few Linux GPU's. [You can find a list of the compatible GPU's here](https://github.com/RadeonOpenCompute/ROCm#Hardware-and-Software-Support). Any GPU that is not listed is guaranteed not to work with KoboldAI and we will not be able to provide proper support on GPU's that are not compatible with the versions of ROCm we require.
AMD GPU's have terrible compute support, this will currently not work on Windows and will only work for a select few Linux GPU's. [You can find a list of the compatible GPU's here](https://github.com/RadeonOpenCompute/ROCm#Hardware-and-Software-Support). Any GPU that is not listed is guaranteed not to work with KoboldAI and we will not be able to provide proper support on GPU's that are not compatible with the versions of ROCm we require. Make sure to first install ROCm on your Linux system using a guide for your distribution, after that you can follow the usual linux instructions above.
### Troubleshooting
@ -140,45 +183,17 @@ In general, the less versions of Python you have on your system the higher your
GPU not found errors can be caused by one of two things, either you do not have a suitable Nvidia GPU (It needs Compute Capability 5.0 or higher to be able to play KoboldAI). Your Nvidia GPU is supported by KoboldAI but is not supported by the latest version of CUDA. Your Nvidia GPU is not yet supported by the latest version of CUDA or you have a dependency conflict like the ones mentioned above.
Like with Python version conflicts we recommend uninstalling CUDA from your system if you have manually installed it and do not need it for anything else and trying again. If your GPU needs CUDA10 to function open environments\finetuneanon.yml and add a line that says - cudatoolkit=10.2 underneath dependencies: . After this you can run the installer again (Pick the option to delete the existing files) and it will download a CUDA10 compatible version.
Like with Python version conflicts we recommend uninstalling CUDA from your system if you have manually installed it and do not need it for anything else and trying again. If your GPU needs CUDA10 to function open environments\\finetuneanon.yml and add a line that says - cudatoolkit=10.2 underneath dependencies: . After this you can run the installer again (Pick the option to delete the existing files) and it will download a CUDA10 compatible version.
If you do not have a suitable Nvidia GPU that can run on CUDA10 or Higher and that supports Compute Capabilities 5.0 or higher we can not help you get the game detected on the GPU. Unless you are following our ROCm guide with a compatible AMD GPU.
#### vocab.json / config.json is not found error
If you get these errors you either did not select the correct folder for your custom model or the model you have downloaded is not (yet) compatible with KoboldAI. There exist a few models out there that are compatible and provide a pytorch_model.bin file but do not ship all the required files. In this case try downloading a compatible model of the same kind (For example another GPT-Neo if you downloaded a GPT-Neo model) and replace the pytorch_model.bin file with the one you are trying to run. Chances are this will work fine.
## KoboldAI Compatible Models
Most of the high quality models have been integrated in the menu, these models have their download link removed since the easiest way to obtain them is to run them directly from the menu. KoboldAI will automatically download and convert the models to a offline format for later use.
If you have old 6B versions which end in -hf they will no longer be compatible with the newer versions of transformers and will no longer behave correctly. It is highly recommended that you install the official version of transformers (offline installers for KoboldAI contain this version by default) and redownload these models from the menu to get compatible versions. If you have very limited internet we will for a limited time also offer finetuneanon's fork in the install_requirements.bat file, when using that option you will not be able to use the 6B models in our main menu so definitely upgrade when your internet allows.
The VRAM requirements amounts are the recommended amounts for fast smooth play, playing with lower VRAM is possible but then you may need to either lower the amount of tokens in the settings, or you may need to put less layers on your GPU causing a significant performance loss.
**For CPU players and during the loading regular RAM usage is double of what we list here.**
| **Model** | Type | **(V)RAM** | Repetition Penalty | Description |
| ------------------------------------------------------------ | --------------------------------- | ---------- | ------------------ | ------------------------------------------------------------ |
| Skein 6B by VE_FORBDRYERNE | Adventure Novel / 6B / Neo Custom | 16GB | 1.1 | Skein is our flagship 6B model, it is a hybrid between a Adventure model and a Novel model. Best used with either Adventure mode or the You Bias userscript enabled. Skein has been trained on high quality Novels along with CYOA adventure stories and is not as wackey as the Adventure model. It also has tagging support. |
| Adventure 6B by VE_FORBRYDERNE | Adventure / 6B / Neo Custom | 16GB | 1.2 | Adventure is a 6B model designed to mimick the behavior of AI Dungeon. It is exclusively for Adventure Mode and can take you on the epic and wackey adventures that AI Dungeon players love. It also features the many tropes of AI Dungeon as it has been trained on very similar data. It must be used in second person (You). |
| Adventure 2.7B by melastashco | Adventure / 2.7B / Neo Custom | 8GB | 2.0 | This is one of the closest replications of the original AI Dungeon Classic model. Tuned on the same data that got uploaded alongside AI Dungeon. In KoboldAI we noticed this model performs better than the conversions of the original AI Dungeon model. It has all the traits you expect of AI Dungeon Classic while not having as many artifacts as this model was trained specifically for KoboldAI. Must be played with Adventure mode enabled to prevent it from doing actions on your behalf. |
| Horni 2.7B by finetuneanon | Novel / 2.7B / Neo Custom | 8GB | 2.0 | One of the best novel models available for 2.7B focused on NSFW content. This model trains the AI to write in a story like fashion using a very large collection of Literotica stories. It is one of the original finetuned models for 2.7B. |
| Horni-LN 2.7B by finetuneanon | Novel / 2.7B / Neo Custom | 8GB | 2.0 | This model is much like the one above, but has been additionally trained on regular light novels. More likely to go SFW and is more focused towards themes found in these light novels over general cultural references. This is a good model for Novel writing especially if you want to add erotica to the mix. |
| Picard 2.7B by Mr Seeker | Novel / 2.7B / Neo Custom | 8GB | 2.0 | Picard is another Novel model, this time exclusively focused on SFW content of various genres. Unlike the name suggests this goes far beyond Star Trek stories and is not exclusively sci-fi. |
| Janeway 2.7B by Mr Seeker | Novel / 2.7B / Neo Custom | 8GB | 2.0 | Janeway is a model created from Picard's dataset combined with a brand new collection of ebooks. This model is trained on 20% more content than Picard and has been trained on literature from various genres. Although the model is mainly focussed on SFW, romantic scenes might involve a degree of nudity.|
| Shinen 2.7B by Mr Seeker | Novel / 2.7B / Neo Custom | 8GB | 2.0 | The most NSFW of them all, Shinen WILL make things sexual. This model will assume that whatever you are doing is meant to be a sex story and will sexualize constantly. It is designed for people who find Horni to tame. It was trained on SexStories instead of Literotica and was trained on tags making it easier to guide the AI to the right context. |
| [AID-16Bit](https://storage.henk.tech/KoboldAI/aid-16bit.zip) | Adventure / 1.5B / GPT-2 Custom | 4GB | 2.0 | The original AI Dungeon Classic model converted to Pytorch and then converted to a 16-bit Model making it half the size. |
| [model_v5_pytorch](https://storage.henk.tech/KoboldAI/model_v5_pytorch.zip) (AI Dungeon's Original Model) | Adventure / 1.5B / GPT-2 Custom | 8GB | 2.0 | This is the original AI Dungeon Classic model converted to the Pytorch format compatible with AI Dungeon Clover and KoboldAI. We consider this model inferior to the GPT-Neo version because it has more artifacting due to its conversion. This is however the most authentic you can get to AI Dungeon Classic. |
| [Novel 774M](https://storage.henk.tech/KoboldAI/Novel%20model%20774M.rar) | Novel / 774M / GPT-2 Custom | 4GB | 2.0 | Novel 774M is made by the AI Dungeon Clover community, because of its small size and novel bias it is more suitable for CPU players that want to play with speed over substance or players who want to test a GPU with a low amount of VRAM. These performance savings are at the cost of story quality and you should not expect the kind of in depth story capabilities that the larger models offer. It was trained for SFW stories. |
| [Smut 774M](https://storage.henk.tech/KoboldAI/Smut%20model%20774M%2030K.rar) | Novel / 774M / GPT-2 Custom | 4GB | 2.0 | The NSFW version of the above, its a smaller GPT-2 based model made by the AI Dungeon Clover community. Gives decent speed on a CPU at the cost of story quality like the other 774M models. |
| [Mia (GPT-Neo-125M-AID)](https://huggingface.co/KoboldAI/GPT-Neo-125M-AID) by Henk717 | Adventure / 125M / Neo Custom | 1GB | 2.0 | Mia is the smallest Adventure model, it runs at very fast speeds on the CPU which makes it a good testing model for developers who do not have GPU access. Because of its small size it will constantly attempt to do actions on behalf of the player and it will not produce high quality stories. If you just need a small model for a quick test, or if you want to take the challenge of trying to run KoboldAI entirely on your phone this would be an easy model to use due to its small RAM requirements and fast (loading) speeds. |
If you get these errors you either did not select the correct folder for your custom model or the model you have downloaded is not (yet) compatible with KoboldAI. There exist a few models out there that are compatible and provide a pytorch\_model.bin file but do not ship all the required files. In this case try downloading a compatible model of the same kind (For example another GPT-Neo if you downloaded a GPT-Neo model) and replace the pytorch\_model.bin file with the one you are trying to run. Chances are this will work fine.
## Softprompts
Softprompts (also known as Modules in other products) are addons that can change the output of existing models. For example you may load a softprompt that biases the AI towards a certain subject and style like transcripts from your favorite TV show.
Softprompts (also known as Modules in other products) are addons that can change the output of existing models. For example you may load a softprompt that biases the AI towards a certain subject and style like transcripts from your favorite TV show.
Since these softprompts are often based on existing franchises we currently do not bundle any of them with KoboldAI due to copyright concerns (We do not want to put the entire project at risk). Instead look at community resources like #softprompts on the [KoboldAI Discord](https://discord.gg/XuQWadgU9k) or the [community hosted mirror](https://storage.henk.tech/KoboldAI/softprompts/) .
@ -188,10 +203,10 @@ Training softprompts can be done for free with the [mtj-softtuner colab](https:/
## Userscripts
Userscripts are scripts that can automate tasks in KoboldAI, or modify the AI behavior / input / output.
Userscripts are scripts that can automate tasks in KoboldAI, or modify the AI behavior / input / output.
Scripting is done in LUA5.4 (Lua does not need to be separately installed as long as you got all the python requirements) and has sandboxing to help protect you from malicious behavior. Even with these measures in place we strongly advise you only run userscripts from places you trust and/or understand, otherwise consult the community for advice on how safe the script might be.
Inside the userscripts folder you will find our kaipreset scripts, these are default scripts that we think will be useful for our users. These scripts are automatically overwritten when you update KoboldAI, if you wish to modify these scripts make sure to first rename them to something else that does not contain kaipreset so your changes are not lost. These scripts range from a You Bias filter that prevents the AI from addressing characters as you. Ways to be able to prevent the AI from using words, word replacements and more.
Inside the userscripts folder you will find our kaipreset scripts, these are default scripts that we think will be useful for our users. These scripts are automatically overwritten when you update KoboldAI, if you wish to modify these scripts make sure to first rename them to something else that does not contain kaipreset so your changes are not lost. These scripts range from a You Bias filter that prevents the AI from addressing characters as you. Ways to be able to prevent the AI from using words, word replacements and more.
Along with our preset scripts we also ship examples in the examples folder that merely serve as a demonstration and do not enhance your usage of KoboldAI. To use these scripts make sure to move them out of the examples folder before either using or modifying the script.
@ -203,16 +218,17 @@ For our TPU versions keep in mind that scripts modifying AI behavior relies on a
This project contains work from the following contributors :
- The Gantian - Creator of KoboldAI, has created most features such as the interface, the different AI model / API integrations and in general the largest part of the project.
- VE FORBRYDERNE - Contributed many features such as the Editing overhaul, Adventure Mode, expansions to the world info section, breakmodel integration, scripting support, softpromtps and much more. As well as vastly improving the TPU compatibility and integrating external code into KoboldAI so we could use official versions of Transformers with virtually no downsides.
- Henk717 - Contributed the installation scripts, this readme, random story generator, the docker scripts, the foundation for the commandline interface and other smaller changes as well as integrating multiple parts of the code of different forks to unite it all. He also optimized the model loading so that downloaded models get converted to efficient offline models and that in future models are more likely to work out of the box. Not all code Github attributes to Henk717 is by Henk717 as some of it has been integrations of other people's work. We try to clarify this in the contributors list as much as we can.
- Ebolam - Automatic Saving
- Frogging101 - top_k / tfs support (Part of this support was later redone by VE to integrate what was originally inside of finetuneanon's transformers)
- UWUplus (Ralf) - Contributed storage systems for community colabs, as well as cleaning up and integrating the website dependencies/code better. He is also the maintainer of flask-cloudflared which we use to generate the cloudflare links.
- Javalar - Initial Performance increases on the story_refresh
- LexSong - Initial environment file adaptation for conda that served as a basis for the install_requirements.bat overhaul.
- Arrmansa - Breakmodel support for other projects that served as a basis for VE FORBRYDERNE's integration.
- Jojorne - Small improvements to the response selection for gens per action.
* The Gantian - Creator of KoboldAI, has created most features such as the interface, the different AI model / API integrations and in general the largest part of the project.
* VE FORBRYDERNE - Contributed many features such as the Editing overhaul, Adventure Mode, expansions to the world info section, breakmodel integration, scripting support, softpromtps and much more. As well as vastly improving the TPU compatibility and integrating external code into KoboldAI so we could use official versions of Transformers with virtually no downsides.
* Henk717 - Contributed the installation scripts, this readme, random story generator, the docker scripts, the foundation for the commandline interface and other smaller changes as well as integrating multiple parts of the code of different forks to unite it all. He also optimized the model loading so that downloaded models get converted to efficient offline models and that in future models are more likely to work out of the box. Not all code Github attributes to Henk717 is by Henk717 as some of it has been integrations of other people's work. We try to clarify this in the contributors list as much as we can.
* Ebolam - Automatic Saving
* Frogging101 - top\_k / tfs support (Part of this support was later redone by VE to integrate what was originally inside of finetuneanon's transformers)
* UWUplus (Ralf) - Contributed storage systems for community colabs, as well as cleaning up and integrating the website dependencies/code better. He is also the maintainer of flask-cloudflared which we use to generate the cloudflare links.
* Javalar - Initial Performance increases on the story\_refresh
* LexSong - Initial environment file adaptation for conda that served as a basis for the install\_requirements.bat overhaul.
* Arrmansa - Breakmodel support for other projects that served as a basis for VE FORBRYDERNE's integration.
* Jojorne - Small improvements to the response selection for gens per action.
* OccultSage (GooseAI) - Improved support for GooseAI/OpenAI
As well as various Model creators who will be listed near their models, and all the testers who helped make this possible!
@ -222,4 +238,4 @@ Did we miss your contribution? Feel free to issue a commit adding your name to t
KoboldAI is licensed with a AGPL license, in short this means that it can be used by anyone for any purpose. However, if you decide to make a publicly available instance your users are entitled to a copy of the source code including all modifications that you have made (which needs to be available trough an interface such as a button on your website), you may also not distribute this project in a form that does not contain the source code (Such as compiling / encrypting the code and distributing this version without also distributing the source code that includes the changes that you made. You are allowed to distribute this in a closed form if you also provide a separate archive with the source code.).
umamba.exe is bundled for convenience because we observed that many of our users had trouble with command line download methods, it is not part of our project and does not fall under the AGPL license. It is licensed under the BSD-3-Clause license.
umamba.exe is bundled for convenience because we observed that many of our users had trouble with command line download methods, it is not part of our project and does not fall under the AGPL license. It is licensed under the BSD-3-Clause license. Other files with differing licenses will have a reference or embedded version of this license within the file.

View File

@ -1,11 +1,13 @@
transformers
transformers>=4.19
Flask
Flask-SocketIO
requests
torch
torch==1.11
flask-cloudflared
flask-ngrok
eventlet
lupa==1.10
markdown
bleach==4.1.0
bleach==4.1.0
sentencepiece
protobuf

View File

@ -1,11 +1,11 @@
torch >= 1.9, <= 1.11
numpy
tqdm
requests
optax >= 0.0.5, <= 0.0.9
dm-haiku == 0.0.5
ray[default]
jax == 0.2.21
transformers
transformers >= 4.19
progressbar2
git+https://github.com/VE-FORBRYDERNE/mesh-transformer-jax@ck
flask

View File

@ -25,6 +25,7 @@ var button_mode_label;
var button_send;
var button_actmem;
var button_actback;
var button_actfwd;
var button_actretry;
var button_actwi;
var game_text;
@ -38,6 +39,7 @@ var anote_menu;
var anote_input;
var anote_labelcur;
var anote_slider;
var debug_area;
var popup;
var popup_title;
var popup_content;
@ -49,6 +51,7 @@ var aidg_accept;
var aidg_close;
var saveaspopup;
var saveasinput;
var savepins;
var topic;
var saveas_accept;
var saveas_close;
@ -115,10 +118,33 @@ var adventure = false;
// Chatmode
var chatmode = false;
var sliders_throttle = getThrottle(200);
//=================================================================//
// METHODS
//=================================================================//
/**
* Returns a function that will automatically wait for X ms before executing the callback
* The timer is reset each time the returned function is called
* Useful for methods where something is overridden too fast
* @param ms milliseconds to wait before executing the callback
* @return {(function(*): void)|*} function that takes the ms to wait and a callback to execute after the timer
*/
function getThrottle(ms) {
var timer = {};
return function (id, callback) {
if (timer[id]) {
clearTimeout(timer[id]);
}
timer[id] = setTimeout(function () {
callback();
delete timer[id];
}, ms);
}
}
function addSetting(ob) {
// Add setting block to Settings Menu
if(ob.uitype == "slider"){
@ -127,9 +153,7 @@ function addSetting(ob) {
<div class=\"justifyleft\">\
"+ob.label+" <span class=\"helpicon\">?<span class=\"helptext\">"+ob.tooltip+"</span></span>\
</div>\
<div class=\"justifyright\" id=\""+ob.id+"cur\">\
"+ob.default+"\
</div>\
<input inputmode=\""+(ob.unit === "float" ? "decimal" : "numeric")+"\" class=\"justifyright flex-push-right\" id=\""+ob.id+"cur\" value=\""+ob.default+"\">\
</div>\
<div>\
<input type=\"range\" class=\"form-range airange\" min=\""+ob.min+"\" max=\""+ob.max+"\" step=\""+ob.step+"\" id=\""+ob.id+"\">\
@ -149,8 +173,37 @@ function addSetting(ob) {
window["setting_"+ob.id] = refin; // Is this still needed?
window["label_"+ob.id] = reflb; // Is this still needed?
// Add event function to input
refin.on("input", function () {
socket.send({'cmd': $(this).attr('id'), 'data': $(this).val()});
var updateLabelColor = function () {
var value = (ob.unit === "float" ? parseFloat : parseInt)(reflb.val());
if(value > ob.max || value < ob.min) {
reflb.addClass("setting-value-warning");
} else {
reflb.removeClass("setting-value-warning");
}
}
var send = function () {
sliders_throttle(ob.id, function () {
socket.send({'cmd': $(refin).attr('id'), 'data': $(reflb).val()});
});
}
refin.on("input", function (event) {
reflb.val(refin.val());
updateLabelColor();
send();
}).on("change", updateLabelColor);
reflb.on("change", function (event) {
var value = (ob.unit === "float" ? parseFloat : parseInt)(event.target.value);
if(Number.isNaN(value) || (ob.min >= 0 && value < 0)) {
event.target.value = refin.val();
return;
}
if (ob.unit === "float") {
value = parseFloat(value.toFixed(3)); // Round to 3 decimal places to help avoid the number being too long to fit in the box
}
refin.val(value);
reflb.val(value);
updateLabelColor();
send();
});
} else if(ob.uitype == "toggle"){
settings_menu.append("<div class=\"settingitem\">\
@ -748,7 +801,7 @@ function enterMemoryMode() {
setchatnamevisibility(false);
showMessage("Edit the memory to be sent with each request to the AI.");
button_actmem.html("Cancel");
hide([button_actback, button_actretry, button_actwi]);
hide([button_actback, button_actfwd, button_actretry, button_actwi]);
// Display Author's Note field
anote_menu.slideDown("fast");
}
@ -759,7 +812,7 @@ function exitMemoryMode() {
setchatnamevisibility(chatmode);
hideMessage();
button_actmem.html("Memory");
show([button_actback, button_actretry, button_actwi]);
show([button_actback, button_actfwd, button_actretry, button_actwi]);
input_text.val("");
// Hide Author's Note field
anote_menu.slideUp("fast");
@ -768,7 +821,7 @@ function exitMemoryMode() {
function enterWiMode() {
showMessage("World Info will be added to memory only when the key appears in submitted text or the last action.");
button_actwi.html("Accept");
hide([button_actback, button_actmem, button_actretry, game_text]);
hide([button_actback, button_actfwd, button_actmem, button_actretry, game_text]);
setchatnamevisibility(false);
show([wi_menu]);
disableSendBtn();
@ -780,7 +833,7 @@ function exitWiMode() {
button_actwi.html("W Info");
hide([wi_menu]);
setchatnamevisibility(chatmode);
show([button_actback, button_actmem, button_actretry, game_text]);
show([button_actback, button_actfwd, button_actmem, button_actretry, game_text]);
enableSendBtn();
$("#gamescreen").removeClass("wigamescreen");
}
@ -884,7 +937,7 @@ function hideSaveAsPopup() {
}
function sendSaveAsRequest() {
socket.send({'cmd': 'saveasrequest', 'data': saveasinput.val()});
socket.send({'cmd': 'saveasrequest', 'data': {"name": saveasinput.val(), "pins": savepins.val()}});
}
function showLoadPopup() {
@ -1142,9 +1195,9 @@ function updateSPStatItems(items) {
function setStartState() {
enableSendBtn();
enableButtons([button_actmem, button_actwi]);
disableButtons([button_actback, button_actretry]);
disableButtons([button_actback, button_actfwd, button_actretry]);
hide([wi_menu]);
show([game_text, button_actmem, button_actwi, button_actback, button_actretry]);
show([game_text, button_actmem, button_actwi, button_actback, button_actfwd, button_actretry]);
hideMessage();
hideWaitAnimation();
button_actmem.html("Memory");
@ -1160,10 +1213,41 @@ function parsegenseqs(seqs) {
seqselcontents.html("");
var i;
for(i=0; i<seqs.length; i++) {
seqselcontents.append("<div class=\"seqselitem\" id=\"seqsel"+i+"\" n=\""+i+"\">"+seqs[i].generated_text+"</div>");
//setup selection data
text_data = "<table><tr><td width=100%><div class=\"seqselitem\" id=\"seqsel"+i+"\" n=\""+i+"\">"+seqs[i][0]+"</div></td><td width=10>"
//Now do the icon (pin/redo)
if (seqs[i][1] == "redo") {
text_data = text_data + "<span style=\"color: white\" class=\"oi oi-loop-circular\" title=\"Redo\" aria-hidden=\"true\" id=\"seqselpin"+i+"\" n=\""+i+"\"></span>"
} else if (seqs[i][1] == "pinned") {
text_data = text_data + "<span style=\"color: white\" class=\"oi oi-pin\" title=\"Pin\" aria-hidden=\"true\" id=\"seqselpin"+i+"\" n=\""+i+"\"></span>"
} else {
text_data = text_data + "<span style=\"color: grey\" class=\"oi oi-pin\" title=\"Pin\" aria-hidden=\"true\" id=\"seqselpin"+i+"\" n=\""+i+"\"></span>"
}
text_data = text_data + "</td></tr></table>"
seqselcontents.append(text_data);
//setup on-click actions
$("#seqsel"+i).on("click", function () {
socket.send({'cmd': 'seqsel', 'data': $(this).attr("n")});
});
//onclick for pin only
if (seqs[i][1] != "redo") {
$("#seqselpin"+i).on("click", function () {
socket.send({'cmd': 'seqpin', 'data': $(this).attr("n")});
if ($(this).attr("style") == "color: grey") {
console.log($(this).attr("style"));
$(this).css({"color": "white"});
console.log($(this).attr("style"));
} else {
console.log($(this).attr("style"));
$(this).css({"color": "grey"});
console.log($(this).attr("style"));
}
});
}
}
$('#seqselmenu').slideDown("slow");
}
@ -1756,6 +1840,7 @@ $(document).ready(function(){
button_send = $('#btnsend');
button_actmem = $('#btn_actmem');
button_actback = $('#btn_actundo');
button_actfwd = $('#btn_actredo');
button_actretry = $('#btn_actretry');
button_actwi = $('#btn_actwi');
game_text = $('#gametext');
@ -1765,6 +1850,7 @@ $(document).ready(function(){
settings_menu = $("#settingsmenu");
format_menu = $('#formatmenu');
anote_menu = $('#anoterowcontainer');
debug_area = $('#debugcontainer');
wi_menu = $('#wimenu');
anote_input = $('#anoteinput');
anote_labelcur = $('#anotecur');
@ -1780,6 +1866,7 @@ $(document).ready(function(){
aidg_close = $("#btn_aidgpopupclose");
saveaspopup = $("#saveascontainer");
saveasinput = $("#savename");
savepins = $("#savepins");
topic = $("#topic");
saveas_accept = $("#btn_saveasaccept");
saveas_close = $("#btn_saveasclose");
@ -1928,13 +2015,13 @@ $(document).ready(function(){
// Enable or Disable buttons
if(msg.data == "ready") {
enableSendBtn();
enableButtons([button_actmem, button_actwi, button_actback, button_actretry]);
enableButtons([button_actmem, button_actwi, button_actback, button_actfwd, button_actretry]);
hideWaitAnimation();
gamestate = "ready";
} else if(msg.data == "wait") {
gamestate = "wait";
disableSendBtn();
disableButtons([button_actmem, button_actwi, button_actback, button_actretry]);
disableButtons([button_actmem, button_actwi, button_actback, button_actfwd, button_actretry]);
showWaitAnimation();
} else if(msg.data == "start") {
setStartState();
@ -1988,74 +2075,81 @@ $(document).ready(function(){
newTextHighlight($("#n"+msg.data))
} else if(msg.cmd == "updatetemp") {
// Send current temp value to input
$("#settemp").val(parseFloat(msg.data));
$("#settempcur").html(msg.data);
$("#settempcur").val(msg.data);
$("#settemp").val(parseFloat(msg.data)).trigger("change");
} else if(msg.cmd == "updatetopp") {
// Send current top p value to input
$("#settopp").val(parseFloat(msg.data));
$("#settoppcur").html(msg.data);
$("#settoppcur").val(msg.data);
$("#settopp").val(parseFloat(msg.data)).trigger("change");
} else if(msg.cmd == "updatetopk") {
// Send current top k value to input
$("#settopk").val(parseFloat(msg.data));
$("#settopkcur").html(msg.data);
$("#settopkcur").val(msg.data);
$("#settopk").val(parseFloat(msg.data)).trigger("change");
} else if(msg.cmd == "updatetfs") {
// Send current tfs value to input
$("#settfs").val(parseFloat(msg.data));
$("#settfscur").html(msg.data);
$("#settfscur").val(msg.data);
$("#settfs").val(parseFloat(msg.data)).trigger("change");
} else if(msg.cmd == "updatetypical") {
// Send current typical value to input
$("#settypicalcur").val(msg.data);
$("#settypical").val(parseFloat(msg.data)).trigger("change");
} else if(msg.cmd == "updatereppen") {
// Send current rep pen value to input
$("#setreppen").val(parseFloat(msg.data));
$("#setreppencur").html(msg.data);
$("#setreppencur").val(msg.data);
$("#setreppen").val(parseFloat(msg.data)).trigger("change");
} else if(msg.cmd == "updatereppenslope") {
// Send current rep pen value to input
$("#setreppenslope").val(parseFloat(msg.data));
$("#setreppenslopecur").html(msg.data);
$("#setreppenslopecur").val(msg.data);
$("#setreppenslope").val(parseFloat(msg.data)).trigger("change");
} else if(msg.cmd == "updatereppenrange") {
// Send current rep pen value to input
$("#setreppenrange").val(parseFloat(msg.data));
$("#setreppenrangecur").html(msg.data);
$("#setreppenrangecur").val(msg.data);
$("#setreppenrange").val(parseFloat(msg.data)).trigger("change");
} else if(msg.cmd == "updateoutlen") {
// Send current output amt value to input
$("#setoutput").val(parseInt(msg.data));
$("#setoutputcur").html(msg.data);
$("#setoutputcur").val(msg.data);
$("#setoutput").val(parseInt(msg.data)).trigger("change");
} else if(msg.cmd == "updatetknmax") {
// Send current max tokens value to input
$("#settknmax").val(parseInt(msg.data));
$("#settknmaxcur").html(msg.data);
$("#settknmaxcur").val(msg.data);
$("#settknmax").val(parseInt(msg.data)).trigger("change");
} else if(msg.cmd == "updateikgen") {
// Send current max tokens value to input
$("#setikgen").val(parseInt(msg.data));
$("#setikgencur").html(msg.data);
$("#setikgencur").val(msg.data);
$("#setikgen").val(parseInt(msg.data)).trigger("change");
} else if(msg.cmd == "setlabeltemp") {
// Update setting label with value from server
$("#settempcur").html(msg.data);
$("#settempcur").val(msg.data);
} else if(msg.cmd == "setlabeltopp") {
// Update setting label with value from server
$("#settoppcur").html(msg.data);
$("#settoppcur").val(msg.data);
} else if(msg.cmd == "setlabeltopk") {
// Update setting label with value from server
$("#settopkcur").html(msg.data);
$("#settopkcur").val(msg.data);
} else if(msg.cmd == "setlabeltfs") {
// Update setting label with value from server
$("#settfscur").html(msg.data);
$("#settfscur").val(msg.data);
} else if(msg.cmd == "setlabeltypical") {
// Update setting label with value from server
$("#settypicalcur").val(msg.data);
} else if(msg.cmd == "setlabelreppen") {
// Update setting label with value from server
$("#setreppencur").html(msg.data);
$("#setreppencur").val(msg.data);
} else if(msg.cmd == "setlabelreppenslope") {
// Update setting label with value from server
$("#setreppenslopecur").html(msg.data);
$("#setreppenslopecur").val(msg.data);
} else if(msg.cmd == "setlabelreppenrange") {
// Update setting label with value from server
$("#setreppenrangecur").html(msg.data);
$("#setreppenrangecur").val(msg.data);
} else if(msg.cmd == "setlabeloutput") {
// Update setting label with value from server
$("#setoutputcur").html(msg.data);
$("#setoutputcur").val(msg.data);
} else if(msg.cmd == "setlabeltknmax") {
// Update setting label with value from server
$("#settknmaxcur").html(msg.data);
$("#settknmaxcur").val(msg.data);
} else if(msg.cmd == "setlabelikgen") {
// Update setting label with value from server
$("#setikgencur").html(msg.data);
$("#setikgencur").val(msg.data);
} else if(msg.cmd == "updateanotedepth") {
// Send current Author's Note depth value to input
anote_slider.val(parseInt(msg.data));
@ -2226,15 +2320,15 @@ $(document).ready(function(){
$("#setnumseqcur").html(msg.data);
} else if(msg.cmd == "updatenumseq") {
// Send current max tokens value to input
$("#setnumseq").val(parseInt(msg.data));
$("#setnumseqcur").html(msg.data);
$("#setnumseq").val(parseInt(msg.data)).trigger("change");
} else if(msg.cmd == "setlabelwidepth") {
// Update setting label with value from server
$("#setwidepthcur").html(msg.data);
} else if(msg.cmd == "updatewidepth") {
// Send current max tokens value to input
$("#setwidepth").val(parseInt(msg.data));
$("#setwidepthcur").html(msg.data);
$("#setwidepth").val(parseInt(msg.data)).trigger("change");
} else if(msg.cmd == "updateuseprompt") {
// Update toggle state
$("#setuseprompt").prop('checked', msg.data).change();
@ -2269,6 +2363,14 @@ $(document).ready(function(){
} else if(msg.cmd == "runs_remotely") {
remote = true;
hide([button_savetofile, button_import, button_importwi]);
} else if(msg.cmd == "debug_info") {
$("#debuginfo").val(msg.data);
} else if(msg.cmd == "set_debug") {
if(msg.data) {
debug_area.removeClass("hidden");
} else {
debug_area.addClass("hidden");
}
}
});
@ -2349,6 +2451,12 @@ $(document).ready(function(){
hidegenseqs();
});
button_actfwd.on("click", function(ev) {
hideMessage();
//hidegenseqs();
socket.send({'cmd': 'redo', 'data': ''});
});
button_actmem.on("click", function(ev) {
socket.send({'cmd': 'memory', 'data': ''});
});

View File

@ -22,6 +22,25 @@ chunk.editing, chunk.editing * {
font-style: normal !important;
}
.setting-value-warning {
color: #ff7777;
}
.setting-value-warning:focus {
color: #ffaaaa !important;
}
.settinglabel input {
width: 5ch;
background-color: inherit;
border: none;
outline: none;
}
.settinglabel input:focus {
color: #cdf;
}
#gametext, chunk, chunk * {
outline: 0px solid transparent;
}
@ -1273,8 +1292,8 @@ body.connected .popupfooter, .popupfooter.always-available {
.settinglabel {
color: #ffffff;
display: grid;
grid-template-columns: 80% 20%;
display: flex;
flex-flow: wrap;
}
.settingminmax {

View File

@ -19,10 +19,16 @@ class KoboldStoryRegister(collections.OrderedDict):
return self.popitem()[1]
def get_first_key(self) -> int:
return next(iter(self))
if len(self) == 0:
return -1
else:
return next(iter(self))
def get_last_key(self) -> int:
return next(reversed(self))
if len(self) == 0:
return -1
else:
return next(reversed(self))
def __getitem__(self, k: int) -> str:
return super().__getitem__(k)

View File

@ -9,7 +9,7 @@
<link rel="stylesheet" href="static/bootstrap.min.css">
<link rel="stylesheet" href="static/bootstrap-toggle.min.css">
<link rel="stylesheet" href="static/open-iconic-bootstrap.min.css">
<link rel="stylesheet" href="static/custom.css?ver=1.17">
<link rel="stylesheet" href="static/custom.css?ver=1.18b">
<script src="static/jquery-3.6.0.min.js"></script>
<script src="static/jquery-ui.sortable.min.js"></script>
@ -17,7 +17,7 @@
<script src="static/bootstrap.min.js"></script>
<script src="static/bootstrap-toggle.min.js"></script>
<script src="static/rangy-core.min.js"></script>
<script src="static/application.js?ver=1.17a"></script>
<script src="static/application.js?ver=1.18b"></script>
</head>
<body>
<input type="file" id="remote-save-select" accept="application/json" style="display:none">
@ -123,6 +123,7 @@
<button type="button" class="btn btn-primary" id="btn_actmem">Memory</button>
<button type="button" class="btn btn-primary" id="btn_actwi">W Info</button>
<button type="button" class="btn btn-primary" id="btn_actundo">Back</button>
<button type="button" class="btn btn-primary" id="btn_actredo">Redo</button>
<button type="button" class="btn btn-primary" id="btn_actretry">Retry</button>
</div>
<input type="text" id="chatname" class="form-control hidden" placeholder="Chat name">
@ -185,6 +186,9 @@
</div>
</div>
</div>
<div class="hidden" id="debugcontainer">
<textarea class="form-control" placeholder="Debug Info" id="debuginfo"></textarea>
</div>
</div>
</div>
<div class="hidden" id="popupcontainer">
@ -228,7 +232,9 @@
<div class="popuptitletext">Enter Name For Save</div>
</div>
<div class="aidgpopupcontent">
<input class="form-control" type="text" placeholder="Save Name" id="savename">
<input class="form-control" type="text" placeholder="Save Name" id="savename"><br>
<input type="checkbox" data-toggle="toggle" data-onstyle="success" id="savepins" checked>
<div class="box-label">Save Pin Information</div>
</div>
<div class="popuperror hidden">
<span></span>

268
torch_lazy_loader.py Normal file
View File

@ -0,0 +1,268 @@
'''
This file is AGPL-licensed.
Some of the code in this file is copied from PyTorch.
The license for PyTorch is shown below:
Copyright (c) 2016- Facebook, Inc (Adam Paszke)
Copyright (c) 2014- Facebook, Inc (Soumith Chintala)
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. Neither the names of Facebook, Deepmind Technologies, NYU, NEC Laboratories America
and IDIAP Research Institute nor the names of its contributors may be
used to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
'''
import contextlib
from functools import reduce
import itertools
import zipfile
import pickle
import torch
from torch.nn import Module
from typing import Any, Callable, Dict, Optional, Tuple, Type, Union
_EXTRA_STATE_KEY_SUFFIX = '_extra_state'
STORAGE_TYPE_MAP = {
torch.float64: torch.DoubleStorage,
torch.float32: torch.FloatStorage,
torch.float16: torch.HalfStorage,
torch.int64: torch.LongStorage,
torch.int32: torch.IntStorage,
torch.int16: torch.ShortStorage,
torch.int8: torch.CharStorage,
torch.uint8: torch.ByteStorage,
torch.bool: torch.BoolStorage,
torch.bfloat16: torch.BFloat16Storage,
}
class LazyTensor:
def __init__(self, storage_type: Type[torch._StorageBase], key: str, location: str, dtype: Optional[torch.dtype] = None, seek_offset: Optional[int] = None, shape: Optional[Tuple[int, ...]] = None, stride: Optional[Tuple[int, ...]] = None, requires_grad=False, backward_hooks: Any = None):
self.storage_type = storage_type
self.key = key
self.location = location
self.dtype = dtype
self.seek_offset = seek_offset
self.shape = shape
self.stride = stride
self.requires_grad = requires_grad
self.backward_hooks = backward_hooks
def __view(self, f: Callable):
return f"{type(self).__name__}(storage_type={f(self.storage_type)}, key={f(self.key)}, location={f(self.location)}, dtype={f(self.dtype)}, seek_offset={f(self.seek_offset)}, shape={f(self.shape)}, stride={f(self.stride)}, requires_grad={f(self.requires_grad)}, backward_hooks={f(self.backward_hooks)})"
def __repr__(self):
return self.__view(repr)
def materialize(self, checkpoint: Union[zipfile.ZipFile, zipfile.ZipExtFile], map_location=None, no_grad=True) -> torch.Tensor:
size = reduce(lambda x, y: x * y, self.shape, 1)
dtype = self.dtype
nbytes = size if dtype is torch.bool else size * ((torch.finfo if dtype.is_floating_point else torch.iinfo)(dtype).bits >> 3)
if isinstance(checkpoint, zipfile.ZipFile):
f = checkpoint.open(f"archive/data/{self.key}", "r")
f.read(self.seek_offset)
else:
f = checkpoint
try:
storage = STORAGE_TYPE_MAP[dtype].from_buffer(f.read(nbytes), "little")
finally:
if isinstance(checkpoint, zipfile.ZipFile):
f.close()
storage = torch.serialization._get_restore_location(map_location)(storage, self.location)
tensor = torch.tensor([], dtype=storage.dtype, device=storage.device)
tensor.set_(storage, 0, self.shape, self.stride)
tensor.requires_grad = not no_grad and self.requires_grad
tensor._backward_hooks = self.backward_hooks
return tensor
class _LazyUnpickler(pickle.Unpickler):
lazy_loaded_storages: Dict[str, LazyTensor]
def __init__(self, *args, **kwargs):
self.lazy_loaded_storages = {}
return super().__init__(*args, **kwargs)
def forced_persistent_load(self, saved_id):
assert isinstance(saved_id, tuple)
typename = saved_id[0]
assert typename == "storage", f"Unknown typename for persistent_load, expected 'storage' but got '{typename}'"
storage_type, key, location, _ = saved_id[1:]
return LazyTensor(storage_type, key, location)
def load(self, *args, **kwargs):
self.persistent_load = self.forced_persistent_load
retval = super().load(*args, **kwargs)
self.lazy_loaded_storages = {}
return retval
def _rebuild_tensor(lazy_storage: LazyTensor, storage_offset, shape, stride):
lazy_storage.shape = shape
lazy_storage.stride = stride
dtype = lazy_storage.storage_type.dtype
if not isinstance(dtype, torch.dtype):
dtype = lazy_storage.storage_type(0).dtype
lazy_storage.dtype = dtype
lazy_storage.seek_offset = storage_offset if dtype is torch.bool else storage_offset * ((torch.finfo if dtype.is_floating_point else torch.iinfo)(dtype).bits >> 3)
return lazy_storage
# Modified version of https://github.com/pytorch/pytorch/blob/v1.11.0-rc4/torch/nn/modules/module.py#L1346-L1438
def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs):
for hook in self._load_state_dict_pre_hooks.values():
hook(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)
persistent_buffers = {k: v for k, v in self._buffers.items() if k not in self._non_persistent_buffers_set}
local_name_params = itertools.chain(self._parameters.items(), persistent_buffers.items())
local_state = {k: v for k, v in local_name_params if v is not None}
for name, param in local_state.items():
key = prefix + name
if key in state_dict:
input_param = state_dict[key]
if not torch.overrides.is_tensor_like(input_param):
error_msgs.append('While copying the parameter named "{}", '
'expected torch.Tensor or Tensor-like object from checkpoint but '
'received {}'
.format(key, type(input_param)))
continue
# This is used to avoid copying uninitialized parameters into
# non-lazy modules, since they dont have the hook to do the checks
# in such case, it will error when accessing the .shape attribute.
is_param_lazy = torch.nn.parameter.is_lazy(param)
# Backward compatibility: loading 1-dim tensor from 0.3.* to version 0.4+
if not is_param_lazy and len(param.shape) == 0 and len(input_param.shape) == 1:
input_param = input_param[0]
if not is_param_lazy and input_param.shape != param.shape:
# local shape should match the one in checkpoint
error_msgs.append('size mismatch for {}: copying a param with shape {} from checkpoint, '
'the shape in current model is {}.'
.format(key, input_param.shape, param.shape))
continue
try:
with torch.no_grad():
#param.copy_(input_param)
new_param = torch.nn.Parameter(input_param, requires_grad=param.requires_grad) # This line is new
if name in self._parameters: # This line is new
self._parameters[name] = new_param # This line is new
if name in persistent_buffers: # This line is new
self._buffers[name] = new_param # This line is new
except Exception as ex:
error_msgs.append('While copying the parameter named "{}", '
'whose dimensions in the model are {} and '
'whose dimensions in the checkpoint are {}, '
'an exception occurred : {}.'
.format(key, param.size(), input_param.size(), ex.args))
elif strict:
missing_keys.append(key)
extra_state_key = prefix + _EXTRA_STATE_KEY_SUFFIX
if hasattr(Module, "set_extra_state") and getattr(self.__class__, "set_extra_state", Module.set_extra_state) is not Module.set_extra_state: # if getattr(self.__class__, "set_extra_state", Module.set_extra_state) is not Module.set_extra_state:
if extra_state_key in state_dict:
self.set_extra_state(state_dict[extra_state_key])
elif strict:
missing_keys.append(extra_state_key)
elif strict and (extra_state_key in state_dict):
unexpected_keys.append(extra_state_key)
if strict:
for key in state_dict.keys():
if key.startswith(prefix) and key != extra_state_key:
input_name = key[len(prefix):]
input_name = input_name.split('.', 1)[0] # get the name of param/buffer/child
if input_name not in self._modules and input_name not in local_state:
unexpected_keys.append(key)
@contextlib.contextmanager
def use_lazy_torch_load(enable=True, callback: Optional[Callable] = None, dematerialized_modules=False):
if not enable:
yield False
return
try:
old_unpickler = pickle.Unpickler
pickle.Unpickler = _LazyUnpickler
old_rebuild_tensor = torch._utils._rebuild_tensor
torch._utils._rebuild_tensor = _rebuild_tensor
old_torch_load = torch.load
def torch_load(f, map_location=None, pickle_module=pickle, **pickle_load_args):
retval = old_torch_load(f=f, map_location=map_location, pickle_module=pickle_module, **pickle_load_args)
if callback is not None:
callback(retval, f=f, map_location=map_location, pickle_module=pickle_module, **pickle_load_args)
return retval
torch.load = torch_load
if dematerialized_modules:
old_linear_init = torch.nn.Linear.__init__
old_embedding_init = torch.nn.Embedding.__init__
old_layernorm_init = torch.nn.LayerNorm.__init__
def linear_init(self, *args, device=None, **kwargs):
return old_linear_init(self, *args, device="meta", **kwargs)
def embedding_init(self, *args, device=None, **kwargs):
return old_embedding_init(self, *args, device="meta", **kwargs)
def layernorm_init(self, *args, device=None, **kwargs):
return old_layernorm_init(self, *args, device="meta", **kwargs)
torch.nn.Linear.__init__ = linear_init
torch.nn.Embedding.__init__ = embedding_init
torch.nn.LayerNorm.__init__ = layernorm_init
old_load_from_state_dict = torch.nn.Module._load_from_state_dict
torch.nn.Module._load_from_state_dict = _load_from_state_dict
yield True
finally:
pickle.Unpickler = old_unpickler
torch._utils._rebuild_tensor = old_rebuild_tensor
torch.load = old_torch_load
if dematerialized_modules:
torch.nn.Linear.__init__ = old_linear_init
torch.nn.Embedding.__init__ = old_embedding_init
torch.nn.LayerNorm.__init__ = old_layernorm_init
torch.nn.Module._load_from_state_dict = old_load_from_state_dict

View File

@ -27,23 +27,31 @@ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
'''
import utils
import multiprocessing
from typing import Any, Callable, Dict, List, Optional, Tuple, TypeVar
import progressbar
import time
import os
import sys
import json
import zipfile
import requests
import random
import jax
import jax.dlpack
from jax.config import config
from jax.experimental import maps
import jax.numpy as jnp
import numpy as np
import optax
import haiku as hk
import transformers
from transformers import AutoTokenizer, GPT2TokenizerFast, AutoModelForCausalLM, GPTNeoForCausalLM
from tokenizers import Tokenizer
from mesh_transformer.checkpoint import read_ckpt_lowmem
from mesh_transformer.transformer_shard import CausalTransformer, CausalTransformerShard
from mesh_transformer.transformer_shard import CausalTransformer, CausalTransformerShard, PlaceholderTensor
from mesh_transformer.util import to_bf16
params: Dict[str, Any] = {}
@ -61,6 +69,7 @@ def settings_callback() -> dict:
"temp": 0.5,
"top_k": 0,
"tfs": 1.0,
"typical": 1.0,
"repetition_penalty": 1.0,
"rpslope": 0.0,
"rprange": 0,
@ -149,11 +158,11 @@ def apply_repetition_penalty_dynamic(logits, tokens, repetition_penalty, generat
logits[tokens] = penalty_logits
return logits
def kobold_sample_dynamic(key, logits, top_p=0.9, temp=0.5, top_k=0, tfs=1.0):
def kobold_sample_dynamic(key, logits, top_p=0.9, temp=0.5, top_k=0, tfs=1.0, typical=1.0):
'''
This gets called by generate_loop_fn to apply a series of 4 filters
to the logits (top-k, then top-p, then TFS, then temperature) before
picking one token using the modified logits
This gets called by generate_loop_fn to apply a series of 5 filters
to the logits (top-k, then top-p, then TFS, then typical, then temperature)
before picking one token using the modified logits
'''
# Top-k (keep only the k tokens with the highest logits and remove
# the rest, by setting their logits to negative infinity)
@ -240,6 +249,37 @@ def kobold_sample_dynamic(key, logits, top_p=0.9, temp=0.5, top_k=0, tfs=1.0):
return np.where(indices_to_remove, -np.inf, logits)
if tfs < 1.0:
logits = tail_free_filter(logits)
# Typical sampling (https://arxiv.org/pdf/2202.00666.pdf)
def typical_filter(logits):
# Compute softmax probabilities and the natural logarithms of them
probs = jax.nn.softmax(logits)
with np.errstate(divide="ignore"):
log_probs = np.log(probs)
# Compute the negative of entropy, which is the sum of p*ln(p) for all p
# in the set of softmax probabilities of the logits
neg_entropy = np.nansum(probs * log_probs, axis=-1, keepdims=True)
# Determine absolute difference between the negative entropy and the
# log probabilities
entropy_deviation = np.abs(neg_entropy - log_probs)
# Keep certain tokens such that the sum of the entropy_deviation of the
# kept tokens is the smallest possible value such that the sum of the
# softmax probabilities of the kept tokens is at least the threshold
# value (by sorting the tokens in ascending order of entropy_deviation
# and then keeping the smallest possible number of tokens from the
# beginning such that sum of softmax probabilities is at or above the
# threshold)
_, sorted_logits = jax.lax.sort_key_val(entropy_deviation, probs)
sorted_indices_to_remove = np.cumsum(sorted_logits, axis=-1) >= typical
sorted_indices_to_remove = np.roll(sorted_indices_to_remove, 1, axis=-1)
sorted_indices_to_remove[0] = False
# Unsort and remove
_, indices_to_remove = jax.lax.sort_key_val(
jnp.argsort(entropy_deviation),
sorted_indices_to_remove,
)
return np.where(indices_to_remove, -jnp.inf, logits)
if typical < 1.0:
logits = typical_filter(logits)
# Temperature (just divide the logits by the temperature)
logits /= temp
# Finally, pick one token using the softmax thingy again (it gives
@ -292,11 +332,11 @@ def apply_repetition_penalty_static(logits, tokens, repetition_penalty, generate
# positions in the logits array
return logits.at[tokens].set(penalty_logits)
def kobold_sample_static(key, logits, top_p=0.9, temp=0.5, top_k=0, tfs=1.0):
def kobold_sample_static(key, logits, top_p=0.9, temp=0.5, top_k=0, tfs=1.0, typical=1.0):
'''
This gets called by generate_loop_fn to apply a series of 4 filters
to the logits (top-k, then top-p, then TFS, then temperature) before
picking one token using the modified logits
This gets called by generate_loop_fn to apply a series of 5 filters
to the logits (top-k, then top-p, then TFS, then typical, then temperature)
before picking one token using the modified logits
'''
# Top-k (keep only the k tokens with the highest logits and remove
# the rest, by setting their logits to negative infinity)
@ -380,6 +420,35 @@ def kobold_sample_static(key, logits, top_p=0.9, temp=0.5, top_k=0, tfs=1.0):
)
return jnp.where(indices_to_remove, -jnp.inf, logits)
logits = jax.lax.cond(tfs < 1.0, tail_free_filter, lambda x: x, logits)
# Typical sampling (https://arxiv.org/pdf/2202.00666.pdf)
def typical_filter(logits):
# Compute softmax probabilities and the natural logarithms of them
probs = jax.nn.softmax(logits)
log_probs = jnp.log(probs)
# Compute the negative of entropy, which is the sum of p*ln(p) for all p
# in the set of softmax probabilities of the logits
neg_entropy = jnp.nansum(probs * log_probs, axis=-1, keepdims=True)
# Determine absolute difference between the negative entropy and the
# log probabilities
entropy_deviation = jnp.abs(neg_entropy - log_probs)
# Keep certain tokens such that the sum of the entropy_deviation of the
# kept tokens is the smallest possible value such that the sum of the
# softmax probabilities of the kept tokens is at least the threshold
# value (by sorting the tokens in ascending order of entropy_deviation
# and then keeping the smallest possible number of tokens from the
# beginning such that sum of softmax probabilities is at or above the
# threshold)
_, sorted_logits = jax.lax.sort_key_val(entropy_deviation, probs)
sorted_indices_to_remove = jnp.cumsum(sorted_logits, axis=-1) >= typical
sorted_indices_to_remove = jnp.roll(sorted_indices_to_remove, 1, axis=-1)
sorted_indices_to_remove = sorted_indices_to_remove.at[0].set(False)
# Unsort and remove
_, indices_to_remove = jax.lax.sort_key_val(
jnp.argsort(entropy_deviation),
sorted_indices_to_remove,
)
return jnp.where(indices_to_remove, -jnp.inf, logits)
logits = jax.lax.cond(typical < 1.0, typical_filter, lambda x: x, logits)
# Temperature (just divide the logits by the temperature)
def temp_filter(logits):
return logits / temp
@ -443,9 +512,9 @@ def sample_func(data, key, numseqs_aux, badwords, repetition_penalty, generated_
return carry
class PenalizingCausalTransformer(CausalTransformer):
def __init__(self, config):
def __init__(self, config, **kwargs):
# Initialize
super().__init__(config)
super().__init__(config, **kwargs)
def generate_static(state, key, ctx, ctx_length, gen_length, numseqs_aux, sampler_options, soft_embeddings=None):
compiling_callback()
numseqs = numseqs_aux.shape[0]
@ -736,6 +805,7 @@ def infer_static(
temp=0.5,
top_k=0,
tfs=1.0,
typical=1.0,
repetition_penalty=1.0,
rpslope=0.0,
rprange=0,
@ -758,6 +828,7 @@ def infer_static(
"temp": temp * np.ones(total_batch),
"top_p": top_p * np.ones(total_batch),
"tfs": tfs * np.ones(total_batch),
"typical": typical * np.ones(total_batch),
"repetition_penalty": repetition_penalty * np.ones(total_batch),
"rpslope": rpslope * np.ones(total_batch),
"rprange": np.full(total_batch, rprange, dtype=np.uint32),
@ -776,7 +847,142 @@ def infer_static(
return samples
def load_model(path: str, driver_version="tpu_driver0.1_dev20210607", **kwargs) -> None:
def reshard_reverse(x, total_shards, old_shape):
assert len(x.shape) != 1
if len(x.shape) == 2:
if old_shape[1] == x.shape[1]:
out = x[0:1].tile((total_shards, 1))
else:
out = x.reshape(old_shape)
elif len(x.shape) == 3:
if x.shape[0] * x.shape[2] == old_shape[2]:
out = x.reshape(old_shape)
elif x.shape[0] * x.shape[1] == old_shape[1]:
out = x.reshape((old_shape[1], old_shape[0], old_shape[2])).permute((1, 0, 2))
else:
assert False
else:
assert False
return out
def get_old_shape(t, total_shards, dim=2):
if len(t.shape) == 2:
shard_shape = t.shape
if dim == 1:
assert shard_shape[0] % total_shards == 0
return (shard_shape[0] // total_shards, shard_shape[1])
elif dim == 2:
assert shard_shape[1] % total_shards == 0
return (shard_shape[0], shard_shape[1] // total_shards)
else:
raise ValueError(f"Unsupported dim {dim}")
if len(t.shape) == 1:
assert t.shape[0] % total_shards == 0
return (t.shape[0] // total_shards,)
else:
raise ValueError(f"Unsupported shape {t.shape}")
def read_neox_checkpoint(state, path, config, checkpoint_shards=2):
assert config["cores_per_replica"] % checkpoint_shards == 0
output_shards = config["cores_per_replica"] // checkpoint_shards
import torch
import torch.utils.dlpack
from tqdm.auto import tqdm
move_xmap = jax.experimental.maps.xmap(
fun=lambda x, _: to_bf16(x),
in_axes=(["shard", ...], ["batch", ...]),
out_axes=["shard", ...],
axis_resources={'shard': 'mp', 'batch': 'dp'}
)
path_template = os.path.join(path, "layer_{layer:02d}-model_{shard:02d}-model_states.pt")
static_mapping = {
"word_embeddings.weight": {"module": "embedding_shard/~/linear", "param": "w", "axis": 1},
"final_linear.weight": {"module": "projection_shard/~/linear", "param": "w", "axis": 2},
"norm.weight": {"module": "projection_shard/~/replicated_layer_norm", "param": "scale", "axis": None},
"norm.bias": {"module": "projection_shard/~/replicated_layer_norm", "param": "offset", "axis": None},
}
layer_mapping = {
"attention.query_key_value.weight": {"module": "combined_qkv", "param": "w", "axis": 2},
"attention.query_key_value.bias": {"module": "combined_qkv", "param": "b", "axis": 1},
"attention.dense.weight": {"module": "linear_3", "param": "w", "axis": 1},
"attention.dense.bias": {"module": "linear_3", "param": "b", "axis": None},
"mlp.dense_h_to_4h.weight": {"module": "linear_4", "param": "w", "axis": 2},
"mlp.dense_h_to_4h.bias": {"module": "linear_4", "param": "b", "axis": 1},
"mlp.dense_4h_to_h.weight": {"module": "linear_5", "param": "w", "axis": 1},
"mlp.dense_4h_to_h.bias": {"module": "linear_5", "param": "b", "axis": None},
"input_layernorm.weight": {"module": "replicated_layer_norm", "param": "scale", "axis": None},
"input_layernorm.bias": {"module": "replicated_layer_norm", "param": "offset", "axis": None},
"post_attention_layernorm.weight": {"module": "replicated_layer_norm_1", "param": "scale", "axis": None},
"post_attention_layernorm.bias": {"module": "replicated_layer_norm_1", "param": "offset", "axis": None},
}
tqdm_length = len(static_mapping) + config["layers"]*len(layer_mapping)
bar = tqdm(total=tqdm_length, desc="Loading from NeoX checkpoint")
for checkpoint_layer in range(config["layers"] + 5):
if checkpoint_layer in (1, config["layers"] + 2):
continue
layer = checkpoint_layer - 2
shards = []
for checkpoint_shard in range(checkpoint_shards):
shards.append(torch.load(path_template.format(layer=checkpoint_layer, shard=checkpoint_shard), map_location="cpu"))
for key in shards[0]:
if key == "attention.rotary_emb.inv_freq":
continue
elif key in static_mapping:
target_module = "causal_transformer_shard/~/" + static_mapping[key]["module"]
target_param = static_mapping[key]["param"]
target_axis = static_mapping[key]["axis"]
elif key in layer_mapping:
target_module = f"causal_transformer_shard/~/layer_{layer}/~/" + layer_mapping[key]["module"]
target_param = layer_mapping[key]["param"]
target_axis = layer_mapping[key]["axis"]
else:
error = f"{repr(key)} not found in mapping"
print("\n\nERROR: ", error, file=sys.stderr)
raise RuntimeError(error)
original_shape = shards[0][key].shape
for checkpoint_shard in range(checkpoint_shards):
if key in ("attention.dense.bias", "mlp.dense_4h_to_h.bias"):
shards[checkpoint_shard][key] /= output_shards
if key != "word_embeddings.weight" and shards[checkpoint_shard][key].ndim == 2:
shards[checkpoint_shard][key] = shards[checkpoint_shard][key].T
tensor = shards[checkpoint_shard][key]
if target_axis is not None:
target_shape = (output_shards,) + get_old_shape(tensor, total_shards=output_shards, dim=target_axis)
else:
target_shape = (output_shards, tensor.shape[0])
shards[checkpoint_shard][key] = reshard_reverse(tensor.unsqueeze_(0), output_shards, target_shape)
#print(key, ":", original_shape, "->", shards[0][key].shape)
tensor = torch.cat([shards[s][key] for s in range(checkpoint_shards)], dim=0)
target_shape = state["params"][target_module][target_param].shape
if tensor.shape != target_shape:
error = f"Weight {repr(key)} has shape {tensor.shape} in checkpoint but shape {target_shape} was requested by MTJ for {target_module} {target_param}"
print("\n\nERROR: ", error, file=sys.stderr)
raise RuntimeError(error)
if tensor.dtype is torch.float16 or tensor.dtype is torch.float32:
tensor = tensor.bfloat16()
state["params"][target_module][target_param] = move_xmap(
jax.dlpack.from_dlpack(torch.utils.dlpack.to_dlpack(tensor)).copy(),
np.zeros(config["cores_per_replica"]),
)
bar.update(1)
for mk, mv in state["params"].items():
for pk, pv in mv.items():
if isinstance(pv, PlaceholderTensor):
error = f"{mk} {pk} could not be found in the model checkpoint"
print("\n\nERROR: " + error, file=sys.stderr)
raise RuntimeError(error)
def load_model(path: str, driver_version="tpu_driver0.1_dev20210607", hf_checkpoint=False, **kwargs) -> None:
global thread_resources_env, seq, tokenizer, network, params
default_params = {
@ -791,12 +997,96 @@ def load_model(path: str, driver_version="tpu_driver0.1_dev20210607", **kwargs)
"pe_rotary_dims": 64,
"seq": 2048,
"cores_per_replica": 8,
"tokenizer_class": "GPT2TokenizerFast",
"tokenizer": "gpt2",
}
params = kwargs
if vars.model == "TPUMeshTransformerGPTNeoX":
default_params = {
"compat": "neox",
"layers": 44,
"d_model": 6144,
"n_heads": 64,
"n_vocab": 50432,
"n_vocab_padding": 0,
"norm": "doublelayernorm",
"pe": "neox_rotary",
"pe_rotary_dims": 24,
"seq": 2048,
"cores_per_replica": 8,
"tokenizer_class": "GPT2TokenizerFast",
"tokenizer": "gpt2",
}
# Try to convert HF config.json to MTJ config
if hf_checkpoint:
spec_path = os.path.join("maps", vars.model_type + ".json")
if not os.path.isfile(spec_path):
raise NotImplementedError(f"Unsupported model type {repr(vars.model_type)}")
with open(spec_path) as f:
lazy_load_spec = json.load(f)
if "mtj_compat" in lazy_load_spec:
params["compat"] = lazy_load_spec["mtj_compat"]
if "mtj_pe" in lazy_load_spec:
params["pe"] = lazy_load_spec["mtj_pe"]
for k, v in lazy_load_spec.get("mtj_config_map", {}).items():
if type(v) is not list:
params[k] = params[v]
continue
for i in range(len(v)):
if i == len(v) - 1:
params[k] = v[i]
elif v[i] in params:
params[k] = params[v[i]]
break
params["n_vocab"] = params["vocab_size"]
if "activation_function" in params:
params["activation"] = params["activation_function"]
# Both the number of attention heads in the model and the embedding
# dimension of the model need to be divisible by the number of TPU cores
# that we use, and JAX also requires the number of TPU cores used to be
# an even number if we're using more than one core, so logically we try
# to pick the largest possible even number of TPU cores such that the
# number of attention heads and embedding dimension are both divisible
# by the number of TPU cores, and fall back to one core if an even
# number of TPU cores is not possible.
for c in (8, 6, 4, 2, 1):
if 0 == params["n_heads"] % c == params.get("d_embed", params["d_model"]) % c:
params["cores_per_replica"] = c
break
# The vocabulary size of the model also has to be divisible by the
# number of TPU cores, so we pad the vocabulary with the minimum
# possible number of dummy tokens such that it's divisible.
params["n_vocab_padding"] = -(params["n_vocab"] % -params["cores_per_replica"])
if "compat" in params:
default_params["compat"] = params["compat"]
if default_params["compat"] == "fairseq_lm":
default_params["tokenizer"] = "KoboldAI/fairseq-dense-125M"
for param in default_params:
if param not in params:
params[param] = default_params[param]
# Load tokenizer
if vars.model == "TPUMeshTransformerGPTNeoX":
tokenizer = Tokenizer.from_file(os.path.join(path, "20B_tokenizer.json"))
def new_encode(old_encode):
def encode(s, *args, **kwargs):
return old_encode(s).ids
return encode
tokenizer.encode = new_encode(tokenizer.encode)
elif not hf_checkpoint:
if not isinstance(params["tokenizer_class"], str) or not any(params["tokenizer_class"].endswith(s) for s in ("Tokenizer", "TokenizerFast")):
raise ValueError("`tokenizer_class` must be a string ending in 'Tokenizer' or 'TokenizerFast'")
tokenizer_class = getattr(__import__("transformers"), params["tokenizer_class"])
tokenizer = tokenizer_class.from_pretrained(params["tokenizer"])
# Disable JAX warnings about these two functions having been renamed
jax.host_count = jax.process_count
jax.host_id = jax.process_index
@ -804,13 +1094,18 @@ def load_model(path: str, driver_version="tpu_driver0.1_dev20210607", **kwargs)
print("Connecting to your Colab instance's TPU", flush=True)
spinner = multiprocessing.Process(target=show_spinner, args=())
spinner.start()
colab_tpu_addr = os.environ['COLAB_TPU_ADDR'].split(':')[0]
url = f'http://{colab_tpu_addr}:8475/requestversion/{driver_version}'
if os.environ.get('COLAB_TPU_ADDR', '') != '':
tpu_address = os.environ['COLAB_TPU_ADDR'] # Colab
else:
tpu_address = os.environ['TPU_NAME'] # Kaggle
tpu_address = tpu_address.replace("grpc://", "")
tpu_address_without_port = tpu_address.split(':', 1)[0]
url = f'http://{tpu_address_without_port}:8475/requestversion/{driver_version}'
config.FLAGS.jax_xla_backend = "tpu_driver"
config.FLAGS.jax_backend_target = "grpc://" + tpu_address
requests.post(url)
spinner.terminate()
print()
config.FLAGS.jax_xla_backend = "tpu_driver"
config.FLAGS.jax_backend_target = "grpc://" + os.environ['COLAB_TPU_ADDR']
cores_per_replica = params["cores_per_replica"]
seq = params["seq"]
@ -819,7 +1114,6 @@ def load_model(path: str, driver_version="tpu_driver0.1_dev20210607", **kwargs)
devices = np.array(jax.devices()[:cores_per_replica]).reshape(mesh_shape)
thread_resources_env = maps.ResourceEnv(maps.Mesh(devices, ('dp', 'mp')), ())
maps.thread_resources.env = thread_resources_env
tokenizer = transformers.GPT2TokenizerFast.from_pretrained('gpt2')
global shard_xmap, batch_xmap
shard_xmap = __shard_xmap()
@ -832,6 +1126,198 @@ def load_model(path: str, driver_version="tpu_driver0.1_dev20210607", **kwargs)
if not path.endswith("/"):
path += "/"
network = PenalizingCausalTransformer(params)
network.state = read_ckpt_lowmem(network.state, path, devices.shape[1])
network.state = network.move_xmap(network.state, np.zeros(cores_per_replica))
network = PenalizingCausalTransformer(params, dematerialized=True)
if not hf_checkpoint and vars.model != "TPUMeshTransformerGPTNeoX":
network.state = read_ckpt_lowmem(network.state, path, devices.shape[1])
#network.state = network.move_xmap(network.state, np.zeros(cores_per_replica))
return
if vars.model == "TPUMeshTransformerGPTNeoX":
print("\n\n\nThis model has ", f"{hk.data_structures.tree_size(network.state['params']):,d}".replace(",", " "), " parameters.\n")
read_neox_checkpoint(network.state, path, params)
return
# Convert from HF checkpoint
move_xmap = jax.experimental.maps.xmap(
fun=lambda x, _: to_bf16(x),
in_axes=(["shard", ...], ["batch", ...]),
out_axes=["shard", ...],
axis_resources={'shard': 'mp', 'batch': 'dp'}
)
model_spec = {}
for key, spec in lazy_load_spec.get("static_weights", {}).items():
if spec.get("mtj") is not None:
model_spec[key] = spec["mtj"].copy()
model_spec[key]["module"] = "causal_transformer_shard/~/" + model_spec[key]["module"]
for _key, spec in lazy_load_spec.get("layer_weights", {}).items():
for layer in range(params["layers"]):
if spec.get("mtj") is not None:
key = _key.format(layer=layer)
model_spec[key] = spec["mtj"].copy()
model_spec[key]["module"] = "causal_transformer_shard/~/" + model_spec[key]["module"].format(layer=layer)
import torch_lazy_loader
import torch
from tqdm.auto import tqdm
import functools
def callback(model_dict, f, **_):
if callback.nested:
return
callback.nested = True
with zipfile.ZipFile(f, "r") as z:
try:
last_storage_key = None
f = None
current_offset = 0
if utils.current_shard == 0:
print("\n\n\nThis model has ", f"{hk.data_structures.tree_size(network.state['params']):,d}".replace(",", " "), " parameters.\n")
if utils.num_shards is None or utils.current_shard == 0:
if utils.num_shards is not None:
num_tensors = len(utils.get_sharded_checkpoint_num_tensors(utils.from_pretrained_model_name, utils.from_pretrained_index_filename, **utils.from_pretrained_kwargs))
else:
num_tensors = len(model_dict)
utils.bar = tqdm(total=num_tensors, desc="Loading model tensors")
if utils.num_shards is not None:
utils.current_shard += 1
for key in sorted(model_dict.keys(), key=lambda k: (model_dict[k].key, model_dict[k].seek_offset)):
# Some model weights are used by transformers but not by MTJ.
# We have to materialize these weights anyways because
# transformers will throw a tantrum otherwise. To attain
# the least possible memory usage, we create them as meta
# tensors, which don't take up any actual CPU or TPU memory.
if key not in model_spec:
model_dict[key] = torch.empty(model_dict[key].shape, dtype=model_dict[key].dtype, device="meta")
utils.bar.update(1)
continue
storage_key = model_dict[key].key
if storage_key != last_storage_key or model_dict[key].seek_offset < current_offset:
last_storage_key = storage_key
if isinstance(f, zipfile.ZipExtFile):
f.close()
f = z.open(f"archive/data/{storage_key}")
current_offset = 0
if current_offset != model_dict[key].seek_offset:
f.read(model_dict[key].seek_offset - current_offset)
current_offset = model_dict[key].seek_offset
spec = model_spec[key]
transforms = set(spec.get("transforms", ()))
if not isinstance(model_dict[key], torch_lazy_loader.LazyTensor):
error = f"Duplicate key {repr(key)}"
print("\n\nERROR: " + error, file=sys.stderr)
raise RuntimeError(error)
size = functools.reduce(lambda x, y: x * y, model_dict[key].shape, 1)
dtype = model_dict[key].dtype
nbytes = size if dtype is torch.bool else size * ((torch.finfo if dtype.is_floating_point else torch.iinfo)(dtype).bits >> 3)
tensor = model_dict[key].materialize(f, map_location="cpu")
model_dict[key] = tensor.to("meta")
current_offset += nbytes
# MTJ requires certain mathematical operations to be performed
# on tensors in order for them to be in the correct format
if "remove_first_two_rows" in transforms:
tensor = tensor[2:]
if "divide_by_shards" in transforms:
tensor /= params["cores_per_replica"]
if "vocab_pad" in transforms:
tensor = torch.nn.functional.pad(tensor, (0, 0, 0, params["n_vocab_padding"]))
if "no_transpose" not in transforms and tensor.ndim == 2:
tensor = tensor.T
tensor.unsqueeze_(0)
if tensor.dtype is torch.float16 or tensor.dtype is torch.float32:
tensor = tensor.bfloat16()
# Shard the tensor so that parts of the tensor can be used
# on different TPU cores
network.state["params"][spec["module"]][spec["param"]] = move_xmap(
jax.dlpack.from_dlpack(torch.utils.dlpack.to_dlpack(
reshard_reverse(
tensor,
params["cores_per_replica"],
network.state["params"][spec["module"]][spec["param"]].shape,
)
)).copy(),
np.empty(params["cores_per_replica"]),
)
utils.bar.update(1)
if utils.num_shards is not None and utils.current_shard < utils.num_shards:
return
# Check for tensors that MTJ needs that were not provided in the
# HF model
for mk, mv in network.state["params"].items():
for pk, pv in mv.items():
if isinstance(pv, PlaceholderTensor):
# The transformers GPT-J models apparently do not
# have embedding bias, whereas MTJ GPT-J models do,
# so we have to supplement an embedding bias tensor
# by creating a tensor with the necessary shape, filled
# with zeros.
if mk == "causal_transformer_shard/~/embedding_shard/~/linear" and pk == "b":
mv[pk] = move_xmap(jnp.zeros(mv[pk].shape, dtype=jnp.bfloat16), np.empty(params["cores_per_replica"]))
else:
error = f"{mk} {pk} could not be found in the model checkpoint"
print("\n\nERROR: " + error, file=sys.stderr)
raise RuntimeError(error)
finally:
if utils.num_shards is None or utils.current_shard >= utils.num_shards:
utils.bar.close()
utils.bar = None
callback.nested = False
if isinstance(f, zipfile.ZipExtFile):
f.close()
callback.nested = False
if os.path.isdir(vars.model.replace('/', '_')):
import shutil
shutil.move(vars.model.replace('/', '_'), "models/{}".format(vars.model.replace('/', '_')))
print("\n", flush=True)
with torch_lazy_loader.use_lazy_torch_load(callback=callback, dematerialized_modules=True):
if(os.path.isdir(vars.custmodpth)):
try:
tokenizer = AutoTokenizer.from_pretrained(vars.custmodpth, revision=vars.revision, cache_dir="cache")
except Exception as e:
try:
tokenizer = GPT2TokenizerFast.from_pretrained(vars.custmodpth, revision=vars.revision, cache_dir="cache")
except Exception as e:
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2", revision=vars.revision, cache_dir="cache")
try:
model = AutoModelForCausalLM.from_pretrained(vars.custmodpth, revision=vars.revision, cache_dir="cache")
except Exception as e:
model = GPTNeoForCausalLM.from_pretrained(vars.custmodpth, revision=vars.revision, cache_dir="cache")
elif(os.path.isdir("models/{}".format(vars.model.replace('/', '_')))):
try:
tokenizer = AutoTokenizer.from_pretrained("models/{}".format(vars.model.replace('/', '_')), revision=vars.revision, cache_dir="cache")
except Exception as e:
try:
tokenizer = GPT2TokenizerFast.from_pretrained("models/{}".format(vars.model.replace('/', '_')), revision=vars.revision, cache_dir="cache")
except Exception as e:
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2", revision=vars.revision, cache_dir="cache")
try:
model = AutoModelForCausalLM.from_pretrained("models/{}".format(vars.model.replace('/', '_')), revision=vars.revision, cache_dir="cache")
except Exception as e:
model = GPTNeoForCausalLM.from_pretrained("models/{}".format(vars.model.replace('/', '_')), revision=vars.revision, cache_dir="cache")
else:
try:
tokenizer = AutoTokenizer.from_pretrained(vars.model, revision=vars.revision, cache_dir="cache")
except Exception as e:
try:
tokenizer = GPT2TokenizerFast.from_pretrained(vars.model, revision=vars.revision, cache_dir="cache")
except Exception as e:
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2", revision=vars.revision, cache_dir="cache")
try:
model = AutoModelForCausalLM.from_pretrained(vars.model, revision=vars.revision, cache_dir="cache")
except Exception as e:
model = GPTNeoForCausalLM.from_pretrained(vars.model, revision=vars.revision, cache_dir="cache")
#network.state = network.move_xmap(network.state, np.zeros(cores_per_replica))

Binary file not shown.

View File

@ -50,4 +50,4 @@ git remote add origin %origin%
git fetch --all
git checkout %branch% -f
git reset --hard origin/%branch%
cmd /k
%windir%\system32\timeout -t 10

View File

@ -56,6 +56,7 @@
<li><a href="#kobold.num_outputs">kobold.num_outputs</a></li>
<li><a href="#kobold.outputs">kobold.outputs</a></li>
<li><a href="#kobold.settings">kobold.settings</a></li>
<li><a href="#kobold.spfilename">kobold.spfilename</a></li>
<li><a href="#kobold.story">kobold.story</a>
<ul>
<li></li>
@ -172,6 +173,7 @@
<li><code>kobold.num_outputs</code></li>
<li><code>kobold.outputs</code></li>
<li><code>kobold.settings</code></li>
<li><code>kobold.spfilename</code></li>
<li><code>kobold.story</code></li>
<li><code>kobold.submission</code></li>
<li><code>kobold.worldinfo</code></li>
@ -394,6 +396,14 @@
<li><code>kobold.settings.setwidepth</code> (World Info Depth)</li>
<li><code>kobold.settings.setuseprompt</code> (Always Use Prompt)</li>
</ul>
<h1 id="kobold.spfilename">kobold.spfilename</h1>
<p><em><strong>Readable from:</strong></em> anywhere<br>
<em><strong>Writable from:</strong></em> anywhere</p>
<pre class=" language-lua"><code class="prism language-lua">field kobold<span class="token punctuation">.</span>spfilename<span class="token punctuation">:</span> string?
</code></pre>
<p>The name of the soft prompt file to use (as a string), including the file extension. If not using a soft prompt, this is <code>nil</code> instead.</p>
<p>You can also set the soft prompt to use by setting this to a string or <code>nil</code>.</p>
<p>Modifying this field from inside of a generation modifier triggers a regeneration, which means that the context is recomputed after modification and generation begins again with the new context and previously generated tokens. This incurs a small performance penalty and should not be performed in excess.</p>
<h1 id="kobold.story">kobold.story</h1>
<p><em><strong>Readable from:</strong></em> anywhere<br>
<em><strong>Writable from:</strong></em> nowhere</p>

View File

@ -29,6 +29,7 @@ global kobold: KoboldLib
* `kobold.num_outputs`
* `kobold.outputs`
* `kobold.settings`
* `kobold.spfilename`
* `kobold.story`
* `kobold.submission`
* `kobold.worldinfo`
@ -372,6 +373,21 @@ Modifying certain fields from inside of a generation modifier triggers a regener
* `kobold.settings.setwidepth` (World Info Depth)
* `kobold.settings.setuseprompt` (Always Use Prompt)
# kobold.spfilename
***Readable from:*** anywhere
***Writable from:*** anywhere
```lua
field kobold.spfilename: string?
```
The name of the soft prompt file to use (as a string), including the file extension. If not using a soft prompt, this is `nil` instead.
You can also set the soft prompt to use by setting this to a string or `nil`.
Modifying this field from inside of a generation modifier triggers a regeneration, which means that the context is recomputed after modification and generation begins again with the new context and previously generated tokens. This incurs a small performance penalty and should not be performed in excess.
# kobold.story
***Readable from:*** anywhere

192
utils.py
View File

@ -1,5 +1,24 @@
from threading import Timer
import re
import shutil
import json
import subprocess
import tempfile
import requests
import requests.adapters
import time
from tqdm.auto import tqdm
import os
import itertools
from typing import Optional
vars = None
num_shards: Optional[int] = None
current_shard = 0
from_pretrained_model_name = ""
from_pretrained_index_filename: Optional[str] = None
from_pretrained_kwargs = {}
bar = None
#==================================================================#
# Decorator to prevent a function's actions from being run until
@ -111,8 +130,171 @@ def cleanfilename(filename):
filename = "".join(c for c in filename if c not in filteredcharacters).rstrip()
return filename
#==================================================================#
# Newline substitution for fairseq models
#==================================================================#
def encodenewlines(txt):
if(vars.newlinemode == "s"):
return txt.replace('\n', "</s>")
return txt
def decodenewlines(txt):
if(vars.newlinemode == "s"):
return txt.replace("</s>", '\n')
if(vars.newlinemode == "ns"):
return txt.replace("</s>", '')
return txt
#==================================================================#
# Returns number of layers given an HF model config
#==================================================================#
def num_layers(config):
return config.num_layers if hasattr(config, "num_layers") else config.n_layer if hasattr(config, "n_layer") else config.num_hidden_layers
#==================================================================#
# Downloads huggingface checkpoints using aria2c if possible
#==================================================================#
def aria2_hook(pretrained_model_name_or_path: str, force_download=False, cache_dir=None, proxies=None, resume_download=False, local_files_only=False, use_auth_token=None, user_agent=None, revision=None, mirror=None, **kwargs):
import transformers
import transformers.modeling_utils
from huggingface_hub import HfFolder
if shutil.which("aria2c") is None: # Don't do anything if aria2 is not installed
return
if local_files_only: # If local_files_only is true, we obviously don't need to download anything
return
if os.path.isdir(pretrained_model_name_or_path) or os.path.isfile(pretrained_model_name_or_path) or os.path.isfile(pretrained_model_name_or_path + ".index") or transformers.modeling_utils.is_remote_url(pretrained_model_name_or_path):
return
if proxies:
print("WARNING: KoboldAI does not support using aria2 to download models from huggingface.co through a proxy. Disabling aria2 download mode.")
return
if use_auth_token:
if isinstance(use_auth_token, str):
token = use_auth_token
else:
token = HfFolder.get_token()
if token is None:
raise EnvironmentError("You specified use_auth_token=True, but a huggingface token was not found.")
_cache_dir = str(cache_dir) if cache_dir is not None else transformers.TRANSFORMERS_CACHE
sharded = False
headers = {"user-agent": transformers.file_utils.http_user_agent(user_agent)}
if use_auth_token:
headers["authorization"] = f"Bearer {use_auth_token}"
def is_cached(url):
try:
transformers.file_utils.get_from_cache(url, cache_dir=cache_dir, local_files_only=True)
except FileNotFoundError:
return False
return True
while True: # Try to get the huggingface.co URL of the model's pytorch_model.bin or pytorch_model.bin.index.json file
try:
filename = transformers.modeling_utils.WEIGHTS_INDEX_NAME if sharded else transformers.modeling_utils.WEIGHTS_NAME
except AttributeError:
return
url = transformers.file_utils.hf_bucket_url(pretrained_model_name_or_path, filename, revision=revision, mirror=mirror)
if is_cached(url) or requests.head(url, allow_redirects=True, proxies=proxies, headers=headers):
break
if sharded:
return
else:
sharded = True
if not sharded: # If the model has a pytorch_model.bin file, that's the only file to download
filenames = [transformers.modeling_utils.WEIGHTS_NAME]
else: # Otherwise download the pytorch_model.bin.index.json and then let aria2 download all the pytorch_model-#####-of-#####.bin files mentioned inside it
map_filename = transformers.file_utils.cached_path(url, cache_dir=cache_dir, force_download=force_download, proxies=proxies, resume_download=resume_download, use_auth_token=use_auth_token, user_agent=user_agent)
with open(map_filename) as f:
map_data = json.load(f)
filenames = set(map_data["weight_map"].values())
urls = [transformers.file_utils.hf_bucket_url(pretrained_model_name_or_path, n, revision=revision, mirror=mirror) for n in filenames]
if not force_download:
urls = [u for u in urls if not is_cached(u)]
if not urls:
return
etags = [h.get("X-Linked-Etag") or h.get("ETag") for u in urls for h in [requests.head(u, headers=headers, allow_redirects=False, proxies=proxies, timeout=10).headers]]
headers = [requests.head(u, headers=headers, allow_redirects=True, proxies=proxies, timeout=10).headers for u in urls]
filenames = [transformers.file_utils.url_to_filename(u, t) for u, t in zip(urls, etags)]
for n in filenames:
path = os.path.join(_cache_dir, "kai-tempfile." + n + ".aria2")
if os.path.exists(path):
os.remove(path)
path = os.path.join(_cache_dir, "kai-tempfile." + n)
if os.path.exists(path):
os.remove(path)
if force_download:
path = os.path.join(_cache_dir, n + ".json")
if os.path.exists(path):
os.remove(path)
path = os.path.join(_cache_dir, n)
if os.path.exists(path):
os.remove(path)
total_length = sum(int(h["Content-Length"]) for h in headers)
lengths = {}
aria2_config = "\n".join(f"{u}\n out=kai-tempfile.{n}" for u, n in zip(urls, filenames)).encode()
s = requests.Session()
s.mount("http://", requests.adapters.HTTPAdapter(max_retries=requests.adapters.Retry(total=120, backoff_factor=1)))
bar = None
done = False
secret = os.urandom(17).hex()
try:
with tempfile.NamedTemporaryFile("w+b", delete=False) as f:
f.write(aria2_config)
f.flush()
p = subprocess.Popen(["aria2c", "-x", "10", "-s", "10", "-j", "10", "--enable-rpc=true", f"--rpc-secret={secret}", "--rpc-listen-port", str(vars.aria2_port), "--disable-ipv6", "--file-allocation=trunc", "--allow-overwrite", "--auto-file-renaming=false", "-d", _cache_dir, "-i", f.name, "-U", transformers.file_utils.http_user_agent(user_agent)] + (["-c"] if not force_download else []) + ([f"--header='Authorization: Bearer {token}'"] if use_auth_token else []), stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
while p.poll() is None:
r = s.post(f"http://localhost:{vars.aria2_port}/jsonrpc", json={"jsonrpc": "2.0", "id": "kai", "method": "aria2.tellActive", "params": [f"token:{secret}"]}).json()["result"]
if not r:
s.close()
if bar is not None:
bar.n = bar.total
bar.close()
p.terminate()
done = True
break
if bar is None:
bar = tqdm(total=total_length, desc=f"[aria2] Downloading model", unit="B", unit_scale=True, unit_divisor=1000)
visited = set()
for x in r:
filename = x["files"][0]["path"]
lengths[filename] = (int(x["completedLength"]), int(x["totalLength"]))
visited.add(filename)
for k, v in lengths.items():
if k not in visited:
lengths[k] = (v[1], v[1])
bar.n = sum(v[0] for v in lengths.values())
bar.update()
time.sleep(0.1)
path = f.name
except Exception as e:
p.terminate()
raise e
finally:
try:
os.remove(path)
except OSError:
pass
code = p.wait()
if not done and code:
raise OSError(f"aria2 exited with exit code {code}")
for u, t, n in zip(urls, etags, filenames):
os.rename(os.path.join(_cache_dir, "kai-tempfile." + n), os.path.join(_cache_dir, n))
with open(os.path.join(_cache_dir, n + ".json"), "w") as f:
json.dump({"url": u, "etag": t}, f)
#==================================================================#
# Given the path to a pytorch_model.bin.index.json, returns how many
# shards there are in the model
#==================================================================#
def get_num_shards(filename):
with open(filename) as f:
map_data = json.load(f)
return len(set(map_data["weight_map"].values()))
#==================================================================#
# Given the name/path of a sharded model and the path to a
# pytorch_model.bin.index.json, returns a list of weight names in the
# sharded model. Requires lazy loader to be enabled to work properl
#==================================================================#
def get_sharded_checkpoint_num_tensors(pretrained_model_name_or_path, filename, cache_dir=None, force_download=False, proxies=None, resume_download=False, local_files_only=False, use_auth_token=None, user_agent=None, revision=None, mirror=None, **kwargs):
import transformers.modeling_utils
import torch
shard_paths, _ = transformers.modeling_utils.get_checkpoint_shard_files(pretrained_model_name_or_path, filename, cache_dir=cache_dir, force_download=force_download, proxies=proxies, resume_download=resume_download, local_files_only=local_files_only, use_auth_token=use_auth_token, user_agent=user_agent, revision=revision, mirror=mirror)
return list(itertools.chain(*(torch.load(p, map_location="cpu").keys() for p in shard_paths)))

View File

@ -62,7 +62,7 @@ class TailFreeLogitsWarper(LogitsWarper):
def __init__(self, tfs: float, filter_value: float = -float("Inf"), min_tokens_to_keep: int = 1):
tfs = float(tfs)
if tfs < 0 or tfs > 1.0:
raise ValueError(f"`tfs` has to be a float > 0 and < 1, but is {tfs}")
raise ValueError(f"`tfs` has to be a float >= 0 and <= 1, but is {tfs}")
self.tfs = tfs
self.filter_value = filter_value
self.min_tokens_to_keep = min_tokens_to_keep
@ -98,3 +98,53 @@ class TailFreeLogitsWarper(LogitsWarper):
indices_to_remove = sorted_indices_to_remove.scatter(1, sorted_indices, sorted_indices_to_remove)
scores = scores.masked_fill(indices_to_remove, self.filter_value)
return scores
class TypicalLogitsWarper(LogitsWarper):
'''
Typical sampling, described in https://arxiv.org/pdf/2202.00666.pdf
'''
def __init__(self, typical: float, filter_value: float = -float("Inf"), min_tokens_to_keep: int = 1):
typical = float(typical)
if typical < 0 or typical > 1.0:
raise ValueError(f"`typical` has to be a float >= 0 and <= 1, but is {typical}")
self.typical = typical
self.filter_value = filter_value
self.min_tokens_to_keep = min_tokens_to_keep
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
if self.filter_value >= 1.0:
return scores
# Compute softmax probabilities and the natural logarithms of them
probs = scores.softmax(dim=-1)
log_probs = probs.log()
# Compute the negative of entropy, which is the sum of p*ln(p) for all p
# in the set of softmax probabilities of the logits
neg_entropy = (probs * log_probs).nansum(dim=-1, keepdim=True)
# Determine absolute difference between the negative entropy and the
# log probabilities
entropy_deviation = (neg_entropy - log_probs).abs()
# Keep certain tokens such that the sum of the entropy_deviation of the
# kept tokens is the smallest possible value such that the sum of the
# softmax probabilities of the kept tokens is at least the threshold
# value (by sorting the tokens in ascending order of entropy_deviation
# and then keeping the smallest possible number of tokens from the
# beginning such that sum of softmax probabilities is at or above the
# threshold)
_, sorted_indices = torch.sort(entropy_deviation)
sorted_logits = probs.gather(-1, sorted_indices)
sorted_indices_to_remove = sorted_logits.cumsum(dim=-1) >= self.typical
sorted_indices_to_remove = sorted_indices_to_remove.roll(1, dims=-1)
min_tokens_to_keep = max(self.min_tokens_to_keep, 1)
# Keep at least min_tokens_to_keep
sorted_indices_to_remove[..., : min_tokens_to_keep] = 0
indices_to_remove = sorted_indices_to_remove.scatter(1, sorted_indices, sorted_indices_to_remove)
scores = scores.masked_fill(indices_to_remove, self.filter_value)
return scores