Promote Colabcpp

Crash without a GPU
Echidna
2024-01-02 14:08:53 +01:00 · 2023-11-05 01:42:13 +01:00 · 2023-10-28 03:05:02 +02:00 · 2023-10-27 15:58:49 +02:00 · 2023-10-27 15:57:08 +02:00 · 2023-10-27 15:52:37 +02:00
19 changed files with 451 additions and 299 deletions
--- a/README.md
+++ b/README.md
@ -48,7 +48,7 @@ If you would like to play KoboldAI online for free on a powerful computer you ca

 Each edition features different models and requires different hardware to run, this means that if you are unable to obtain a TPU or a GPU you might still be able to use the other version. The models you can use are listed underneath the edition. To open a Colab click the big link featuring the editions name.

-## [TPU Edition Model Descriptions](https://colab.research.google.com/github/KoboldAI/KoboldAI-Client/blob/main/colab/TPU.ipynb)
+## [Models the TPU can run:](https://colab.research.google.com/github/KoboldAI/KoboldAI-Client/blob/main/colab/TPU.ipynb)

 | Model | Style | Description |
 | --- | --- | --- |
@ -64,22 +64,26 @@ Each edition features different models and requires different hardware to run, t
 | [Fairseq Dense](https://huggingface.co/KoboldAI/fairseq-dense-13B) | Generic | Trained by Facebook Researchers this model stems from the MOE research project within Fairseq. This particular version has been converted by us for use in KoboldAI. It is known to be on par with the larger 20B model from EleutherAI and considered as better for pop culture and language tasks. Because the model has never seen a new line (enter) it may perform worse on formatting and paragraphing. Compared to other models the dataset focuses primarily on literature and contains little else. |
 | [GPT-J-6B](https://huggingface.co/EleutherAI/gpt-j-6B) by EleutherAI | Generic | This model serves as the basis for most other 6B models (Some being based on Fairseq Dense instead). Being trained on the Pile and not biased towards anything in particular it is suitable for a variety of tasks such as writing, Q&A and coding tasks. You will likely get better result with larger generic models or finetuned models. |

-## [GPU Edition Model Descriptions](https://colab.research.google.com/github/KoboldAI/KoboldAI-Client/blob/main/colab/GPU.ipynb)
+## [Models the Colab GPU can run:](https://colab.research.google.com/github/KoboldAI/KoboldAI-Client/blob/main/colab/GPU.ipynb)

 | Model | Style | Description |
 | --- | --- | --- |
 | [Nerys](https://huggingface.co/KoboldAI/fairseq-dense-2.7B-Nerys) by Mr Seeker | Novel/Adventure | Nerys is a hybrid model based on Pike (A newer Janeway), on top of the Pike dataset you also get some Light Novels, Adventure mode support and a little bit of Shinen thrown in the mix. The end result is a very diverse model that is heavily biased towards SFW novel writing, but one that can go beyond its novel training and make for an excellent adventure model to. Adventure mode is best played from a second person perspective, but can be played in first or third person as well. Novel writing can be done best from the first or third person. |
-| [Erebus](https://huggingface.co/KoboldAI/OPT-2.7B-Erebus) by Mr Seeker | NSFW | Erebus is our community's flagship NSFW model, being a combination of multiple large datasets that include Literotica, Shinen and erotic novels from Nerys and featuring thourough tagging support it covers the vast majority of erotic writing styles. This model is capable of replacing both the Lit and Shinen models in terms of content and style and has been well received as (one of) the best NSFW models out there. If you wish to use this model for commercial or non research usage we recommend choosing the 20B version as that one is not subject to the restrictive OPT license. |
+| [Tiefighter 13B by KoboldAI](https://huggingface.co/KoboldAI/LLaMA2-13B-Tiefighter) | Hybrid | Tiefighter 13B is a very versitile fiction Hybrid, it can write, chat and play adventure games and can also answer regular instructions (Although we do not recommend this model for factual use due to its fictional nature). This is an excellent starting model, for the best results avoid using Second person writing in your chats unless you are wanting it to become a text adventure.|
 | [Janeway](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Janeway) by Mr Seeker | Novel | Janeway is a model created from Picard's dataset combined with a brand new collection of ebooks. This model is trained on 20% more content than Picard and has been trained on literature from various genres. Although the model is mainly focussed on SFW, romantic scenes might involve a degree of nudity. |
 | [Picard](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Picard) by Mr Seeker | Novel | Picard is a model trained for SFW Novels based on Neo 2.7B. It is focused on Novel style writing without the NSFW bias. While the name suggests a sci-fi model this model is designed for Novels of a variety of genre's. It is meant to be used in KoboldAI's regular mode. |
 | [AID](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-AID) by melastacho | Adventure | Also know as Adventure 2.7B this is a clone of the AI Dungeon Classic model and is best known for the epic wackey adventures that AI Dungeon Classic players love. |
-| [Horni LN](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Horni-LN) by finetune | Novel | This model is based on Horni 2.7B and retains its NSFW knowledge, but was then further biased towards SFW novel stories. If you seek a balance between a SFW Novel model and a NSFW model this model should be a good choice. |
-| [Horni](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Horni) by finetune | NSFW | This model is tuned on Literotica to produce a Novel style model biased towards NSFW content. Can still be used for SFW stories but will have a bias towards NSFW content. It is meant to be used in KoboldAI's regular mode. |
-| [Shinen](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Shinen) by Mr Seeker | NSFW | Shinen is an alternative to the Horni model designed to be more explicit. If Horni is to tame for you Shinen might produce better results. While it is a Novel model it is unsuitable for SFW stories due to its heavy NSFW bias. Shinen will not hold back. It is meant to be used in KoboldAI's regular mode. |
 | [OPT](https://huggingface.co/facebook/opt-2.7b) by Metaseq | Generic | OPT is considered one of the best base models as far as content goes, its behavior has the strengths of both GPT-Neo and Fairseq Dense. Compared to Neo duplicate and unnecessary content has been left out, while additional literature was added in similar to the Fairseq Dense model. The Fairseq Dense model however lacks the broader data that OPT does have. The biggest downfall of OPT is its license, which prohibits any commercial usage, or usage beyond research purposes. |
 | [Fairseq Dense](https://huggingface.co/KoboldAI/fairseq-dense-2.7B) | Generic | Trained by Facebook Researchers this model stems from the MOE research project within Fairseq. This particular version has been converted by us for use in KoboldAI. It is known to be on par with the larger models from EleutherAI and considered as better for pop culture and language tasks. Because the model has never seen a new line (enter) it may perform worse on formatting and paragraphing. Compared to other models the dataset focuses primarily on literature and contains little else. |
+| [MythoMax 13B](https://huggingface.co/TheBloke/MythoMax-L2-13B-GPTQ) by Gryphe | Roleplay | An improved, potentially even perfected variant of MythoMix, my MythoLogic-L2 and Huginn merge using a highly experimental tensor type merge technique¹. |
+| [Holomax 13B by KoboldAI](https://huggingface.co/KoboldAI/LLaMA2-13B-Holomax) | Adventure | This is an expansion merge to the well-praised MythoMax model from Gryphe (60%) using MrSeeker's KoboldAI Holodeck model (40%). The goal of this model is to enhance story-writing capabilities while preserving the desirable traits of the MythoMax model as much as possible (It does limit chat reply length). |
+| [Airoboros 13B](https://huggingface.co/jondurbin/airoboros-13b) by Jon Durbin | Generic | This is an instruction fine-tuned llama-2 model, using synthetic instructions generated by airoboros⁵. |
+| [Emerhyst 13B](https://huggingface.co/Undi95/Emerhyst-13B) by Undi | Roleplay | An attempt using BlockMerge_Gradient to get better result. In addition, LimaRP v3 was used⁷. |
+| [Chronos 13B](https://huggingface.co/elinas/chronos-13b) by Elinas | Generic | This model is primarily focused on chat, roleplay, and storywriting, but can accomplish other tasks such as simple reasoning and coding. Chronos generates very long outputs with coherent text, largely due to the human inputs it was trained on. |
+| [Spring Dragon by Henk717](https://huggingface.co/Henk717/spring-dragon) | Adventure | This model is a recreation attempt of the AI Dungeon 2 Dragon model. To achieve this, the "text_adventures.txt" dataset was used, which was bundled with the original AI Dungeon 2 GitHub release prior to the online service. It is worth noting that the same dataset file was used to create the Dragon model, where Dragon is a GPT-3 175B Davinci model from 2020. |
+| [Holodeck By KoboldAI](https://huggingface.co/KoboldAI/LLAMA2-13B-Holodeck-1) | Adventure |LLAMA2 13B-Holodeck is a finetune created using Meta's llama 2 model.The training data contains around 3000 ebooks in various genres. Most parts of the dataset have been prepended using the following text: [Genre: <genre1>, <genre2>|
 | [Neo](https://huggingface.co/EleutherAI/gpt-neo-2.7B) by EleutherAI | Generic | This is the base model for all the other 2.7B models, it is best used when you have a use case that we have no other models available for, such as writing blog articles or programming. It can also be a good basis for the experience of some of the softprompts if your softprompt is not about a subject the other models cover. |
-
+| [Various 2.7b models]() by various | Various smaller models are also possible to load in GPU colab. | |
 ### Styles

 | Type | Description |
@ -105,7 +109,7 @@ KoboldAI has a large number of dependencies you will need to install on your com

 ### Downloading the latest version of KoboldAI

-KoboldAI is a rolling release on our github, the code you see is also the game. You can the software by clicking on the green Code button at the top of the page and clicking Download ZIP.
+KoboldAI is a rolling release on our github, the code you see is also the game. You can download the software by clicking on the green Code button at the top of the page and clicking Download ZIP, or use the `git clone` command instead. Then, on Windows you need to you run install_requirements.bat (using admin mode is recommanded to avoid errors), and once it's done, or if you're on Linux, either play.bat/sh or remote-play.bat/sh to run it.

 The easiest way for Windows users is to use the [offline installer](https://sourceforge.net/projects/koboldai/files/latest/download) below.

@ -228,4 +232,4 @@ Did we miss your contribution? Feel free to issue a commit adding your name to t

 KoboldAI is licensed with a AGPL license, in short this means that it can be used by anyone for any purpose. However, if you decide to make a publicly available instance your users are entitled to a copy of the source code including all modifications that you have made (which needs to be available trough an interface such as a button on your website), you may also not distribute this project in a form that does not contain the source code (Such as compiling / encrypting the code and distributing this version without also distributing the source code that includes the changes that you made. You are allowed to distribute this in a closed form if you also provide a separate archive with the source code.).

-umamba.exe is bundled for convenience because we observed that many of our users had trouble with command line download methods, it is not part of our project and does not fall under the AGPL license. It is licensed under the BSD-3-Clause license. Other files with differing licenses will have a reference or embedded version of this license within the file. It has been sourced from https://anaconda.org/conda-forge/micromamba/files and its source code can be found here : https://github.com/mamba-org/mamba/tree/master/micromamba
+umamba.exe is bundled for convenience because we observed that many of our users had trouble with command line download methods, it is not part of our project and does not fall under the AGPL license. It is licensed under the BSD-3-Clause license. Other files with differing licenses will have a reference or embedded version of this license within the file. It has been sourced from https://anaconda.org/conda-forge/micromamba/files and its source code can be found here : https://github.com/mamba-org/mamba/tree/master/micromamba
--- a/aiserver.py
+++ b/aiserver.py
@ -165,13 +165,16 @@ model_menu = {
        ],
    'nsfwlist': [
        ["Erebus 20B (NSFW)", "KoboldAI/GPT-NeoX-20B-Erebus", "64GB", False],
+        ["Nerybus 13B (NSFW)", "KoboldAI/OPT-13B-Nerybus-Mix", "32GB", False],
        ["Erebus 13B (NSFW)", "KoboldAI/OPT-13B-Erebus", "32GB", False],
        ["Shinen FSD 13B (NSFW)", "KoboldAI/fairseq-dense-13B-Shinen", "32GB", False],
+        ["Nerybus 6.7B (NSFW)", "KoboldAI/OPT-6.7B-Nerybus-Mix", "16GB", False],
        ["Erebus 6.7B (NSFW)", "KoboldAI/OPT-6.7B-Erebus", "16GB", False],
        ["Shinen FSD 6.7B (NSFW)", "KoboldAI/fairseq-dense-6.7B-Shinen", "16GB", False],
        ["Lit V2 6B (NSFW)", "hakurei/litv2-6B-rev3", "16GB", False],
        ["Lit 6B (NSFW)", "hakurei/lit-6B", "16GB", False],
        ["Shinen 6B (NSFW)", "KoboldAI/GPT-J-6B-Shinen", "16GB", False],
+        ["Nerybus 2.7B (NSFW)", "KoboldAI/OPT-2.7B-Nerybus-Mix", "8GB", False],
        ["Erebus 2.7B (NSFW)", "KoboldAI/OPT-2.7B-Erebus", "8GB", False],
        ["Horni 2.7B (NSFW)", "KoboldAI/GPT-Neo-2.7B-Horni", "8GB", False],
        ["Shinen 2.7B (NSFW)", "KoboldAI/GPT-Neo-2.7B-Shinen", "8GB", False],
@ -1511,7 +1514,7 @@ def get_model_info(model, directory=""):
        models_on_url = True
        url = True
        key = True
-        default_url = 'https://koboldai.net'
+        default_url = 'https://horde.koboldai.net'
        multi_online_models = True
        if path.exists(get_config_filename(model)):
            with open(get_config_filename(model), "r") as file:
@ -1590,13 +1593,13 @@ def get_layer_count(model, directory=""):
                model = directory
            from transformers import AutoConfig
            if(os.path.isdir(model.replace('/', '_'))):
-                model_config = AutoConfig.from_pretrained(model.replace('/', '_'), revision=vars.revision, cache_dir="cache")
+                model_config = AutoConfig.from_pretrained(model.replace('/', '_'), revision=args.revision, cache_dir="cache")
            elif(os.path.isdir("models/{}".format(model.replace('/', '_')))):
-                model_config = AutoConfig.from_pretrained("models/{}".format(model.replace('/', '_')), revision=vars.revision, cache_dir="cache")
+                model_config = AutoConfig.from_pretrained("models/{}".format(model.replace('/', '_')), revision=args.revision, cache_dir="cache")
            elif(os.path.isdir(directory)):
-                model_config = AutoConfig.from_pretrained(directory, revision=vars.revision, cache_dir="cache")
+                model_config = AutoConfig.from_pretrained(directory, revision=args.revision, cache_dir="cache")
            else:
-                model_config = AutoConfig.from_pretrained(model, revision=vars.revision, cache_dir="cache")
+                model_config = AutoConfig.from_pretrained(model, revision=args.revision, cache_dir="cache")
        try:
            if ((utils.HAS_ACCELERATE and model_config.model_type != 'gpt2') or model_config.model_type in ("gpt_neo", "gptj", "xglm", "opt")) and not vars.nobreakmodel:
                return utils.num_layers(model_config)
@ -1671,7 +1674,7 @@ def get_cluster_models(msg):
    # Get list of models from public cluster
    logger.init("KAI Horde Models", status="Retrieving")
    try:
-        req = requests.get("{}/api/v1/models".format(url))
+        req = requests.get(f"{url}/api/v2/status/models?type=text")
    except requests.exceptions.ConnectionError:
        logger.init_err("KAI Horde Models", status="Failed")
        logger.error("Provided KoboldAI Horde URL unreachable")
@ -1687,10 +1690,11 @@ def get_cluster_models(msg):
    engines = req.json()
    logger.debug(engines)
    try:
-        engines = [[en, en] for en in engines]
+        engines = [[en["name"], en["name"]] for en in engines]
    except:
        logger.error(engines)
        raise
+    logger.debug(engines)
    
    online_model = ""
    changed=False
@ -1936,34 +1940,26 @@ def patch_transformers():

    from torch.nn import functional as F

-    class ProbabilityVisualizerLogitsProcessor(LogitsProcessor):
-        def __init__(self):
-            pass
+    def visualize_probabilities(scores: torch.FloatTensor) -> None:
+        assert scores.ndim == 2

-        def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
-            assert scores.ndim == 2
-            assert input_ids.ndim == 2
+        if vars.numseqs > 1 or not vars.show_probs:
+            return

-            if vars.numseqs > 1 or not vars.show_probs:
-                return scores
+        probs = F.softmax(scores, dim = -1).cpu().numpy()[0]
+        token_prob_info = []
+        for token_id, score in sorted(enumerate(probs), key=lambda x: x[1], reverse=True)[:8]:
+            token_prob_info.append({
+                "tokenId": token_id,
+                "decoded": utils.decodenewlines(tokenizer.decode(token_id)),
+                "score": float(score),
+            })

-            probs = F.softmax(scores, dim = -1).cpu().numpy()[0]
-
-            token_prob_info = []
-            for token_id, score in sorted(enumerate(probs), key=lambda x: x[1], reverse=True)[:8]:
-                token_prob_info.append({
-                    "tokenId": token_id,
-                    "decoded": utils.decodenewlines(tokenizer.decode(token_id)),
-                    "score": float(score),
-                })
-
-            vars.token_stream_queue.probability_buffer = token_prob_info
-            return scores
+        vars.token_stream_queue.probability_buffer = token_prob_info
    
    def new_get_logits_processor(*args, **kwargs) -> LogitsProcessorList:
        processors = new_get_logits_processor.old_get_logits_processor(*args, **kwargs)
        processors.insert(0, LuaLogitsProcessor())
-        processors.append(ProbabilityVisualizerLogitsProcessor())
        return processors
    new_get_logits_processor.old_get_logits_processor = transformers.generation_utils.GenerationMixin._get_logits_processor
    transformers.generation_utils.GenerationMixin._get_logits_processor = new_get_logits_processor
@ -1985,6 +1981,7 @@ def patch_transformers():
                sampler_order = [6] + sampler_order
            for k in sampler_order:
                scores = self.__warper_list[k](input_ids, scores, *args, **kwargs)
+            visualize_probabilities(scores)
            return scores

    def new_get_logits_warper(beams: int = 1,) -> LogitsProcessorList:
@ -2238,19 +2235,19 @@ def load_model(use_gpu=True, gpu_layers=None, disk_layers=None, initial_load=Fal
        from transformers import AutoConfig
        if(os.path.isdir(vars.custmodpth.replace('/', '_'))):
            try:
-                model_config = AutoConfig.from_pretrained(vars.custmodpth.replace('/', '_'), revision=vars.revision, cache_dir="cache")
+                model_config = AutoConfig.from_pretrained(vars.custmodpth.replace('/', '_'), revision=args.revision, cache_dir="cache")
                vars.model_type = model_config.model_type
            except ValueError as e:
                vars.model_type = "not_found"
        elif(os.path.isdir("models/{}".format(vars.custmodpth.replace('/', '_')))):
            try:
-                model_config = AutoConfig.from_pretrained("models/{}".format(vars.custmodpth.replace('/', '_')), revision=vars.revision, cache_dir="cache")
+                model_config = AutoConfig.from_pretrained("models/{}".format(vars.custmodpth.replace('/', '_')), revision=args.revision, cache_dir="cache")
                vars.model_type = model_config.model_type
            except ValueError as e:
                vars.model_type = "not_found"
        else:
            try:
-                model_config = AutoConfig.from_pretrained(vars.custmodpth, revision=vars.revision, cache_dir="cache")
+                model_config = AutoConfig.from_pretrained(vars.custmodpth, revision=args.revision, cache_dir="cache")
                vars.model_type = model_config.model_type
            except ValueError as e:
                vars.model_type = "not_found"
@ -2387,6 +2384,7 @@ def load_model(use_gpu=True, gpu_layers=None, disk_layers=None, initial_load=Fal
                    with zipfile.ZipFile(f, "r") as z:
                        try:
                            last_storage_key = None
+                            zipfolder = os.path.basename(os.path.normpath(f)).split('.')[0]
                            f = None
                            current_offset = 0
                            able_to_pin_layers = True
@ -2398,7 +2396,10 @@ def load_model(use_gpu=True, gpu_layers=None, disk_layers=None, initial_load=Fal
                                    last_storage_key = storage_key
                                    if isinstance(f, zipfile.ZipExtFile):
                                        f.close()
-                                    f = z.open(f"archive/data/{storage_key}")
+                                    try:
+                                        f = z.open(f"archive/data/{storage_key}")
+                                    except:
+                                        f = z.open(f"{zipfolder}/data/{storage_key}")
                                    current_offset = 0
                                if current_offset != model_dict[key].seek_offset:
                                    f.read(model_dict[key].seek_offset - current_offset)
@ -2485,19 +2486,19 @@ def load_model(use_gpu=True, gpu_layers=None, disk_layers=None, initial_load=Fal
                with(maybe_use_float16()):
                    try:
                        if os.path.exists(vars.custmodpth):
-                            model = GPT2LMHeadModel.from_pretrained(vars.custmodpth, revision=vars.revision, cache_dir="cache")
-                            tokenizer = GPT2Tokenizer.from_pretrained(vars.custmodpth, revision=vars.revision, cache_dir="cache")
+                            model = GPT2LMHeadModel.from_pretrained(vars.custmodpth, revision=args.revision, cache_dir="cache")
+                            tokenizer = GPT2Tokenizer.from_pretrained(vars.custmodpth, revision=args.revision, cache_dir="cache")
                        elif os.path.exists(os.path.join("models/", vars.custmodpth)):
-                            model = GPT2LMHeadModel.from_pretrained(os.path.join("models/", vars.custmodpth), revision=vars.revision, cache_dir="cache")
-                            tokenizer = GPT2Tokenizer.from_pretrained(os.path.join("models/", vars.custmodpth), revision=vars.revision, cache_dir="cache")
+                            model = GPT2LMHeadModel.from_pretrained(os.path.join("models/", vars.custmodpth), revision=args.revision, cache_dir="cache")
+                            tokenizer = GPT2Tokenizer.from_pretrained(os.path.join("models/", vars.custmodpth), revision=args.revision, cache_dir="cache")
                        else:
-                            model = GPT2LMHeadModel.from_pretrained(vars.custmodpth, revision=vars.revision, cache_dir="cache")
-                            tokenizer = GPT2Tokenizer.from_pretrained(vars.custmodpth, revision=vars.revision, cache_dir="cache")
+                            model = GPT2LMHeadModel.from_pretrained(vars.custmodpth, revision=args.revision, cache_dir="cache")
+                            tokenizer = GPT2Tokenizer.from_pretrained(vars.custmodpth, revision=args.revision, cache_dir="cache")
                    except Exception as e:
                        if("out of memory" in traceback.format_exc().lower()):
                            raise RuntimeError("One of your GPUs ran out of memory when KoboldAI tried to load your model.")
                        raise e
-                tokenizer = GPT2Tokenizer.from_pretrained(vars.custmodpth, revision=vars.revision, cache_dir="cache")
+                tokenizer = GPT2Tokenizer.from_pretrained(vars.custmodpth, revision=args.revision, cache_dir="cache")
                model.save_pretrained("models/{}".format(vars.model.replace('/', '_')), max_shard_size="500MiB")
                tokenizer.save_pretrained("models/{}".format(vars.model.replace('/', '_')))
                vars.modeldim = get_hidden_size_from_model(model)
@ -2544,38 +2545,38 @@ def load_model(use_gpu=True, gpu_layers=None, disk_layers=None, initial_load=Fal
                        lowmem = {}
                    if(os.path.isdir(vars.custmodpth)):
                        try:
-                            tokenizer = AutoTokenizer.from_pretrained(vars.custmodpth, revision=vars.revision, cache_dir="cache", use_fast=False)
+                            tokenizer = AutoTokenizer.from_pretrained(vars.custmodpth, revision=args.revision, cache_dir="cache", use_fast=False)
                        except Exception as e:
                            try:
-                                tokenizer = AutoTokenizer.from_pretrained(vars.custmodpth, revision=vars.revision, cache_dir="cache")
+                                tokenizer = AutoTokenizer.from_pretrained(vars.custmodpth, revision=args.revision, cache_dir="cache")
                            except Exception as e:
                                try:
-                                    tokenizer = GPT2Tokenizer.from_pretrained(vars.custmodpth, revision=vars.revision, cache_dir="cache")
+                                    tokenizer = GPT2Tokenizer.from_pretrained(vars.custmodpth, revision=args.revision, cache_dir="cache")
                                except Exception as e:
-                                    tokenizer = GPT2Tokenizer.from_pretrained("gpt2", revision=vars.revision, cache_dir="cache")
+                                    tokenizer = GPT2Tokenizer.from_pretrained("gpt2", revision=args.revision, cache_dir="cache")
                        try:
-                            model     = AutoModelForCausalLM.from_pretrained(vars.custmodpth, revision=vars.revision, cache_dir="cache", **lowmem)
+                            model     = AutoModelForCausalLM.from_pretrained(vars.custmodpth, revision=args.revision, cache_dir="cache", **lowmem)
                        except Exception as e:
                            if("out of memory" in traceback.format_exc().lower()):
                                raise RuntimeError("One of your GPUs ran out of memory when KoboldAI tried to load your model.")
-                            model     = GPTNeoForCausalLM.from_pretrained(vars.custmodpth, revision=vars.revision, cache_dir="cache", **lowmem)
+                            model     = GPTNeoForCausalLM.from_pretrained(vars.custmodpth, revision=args.revision, cache_dir="cache", **lowmem)
                    elif(os.path.isdir("models/{}".format(vars.model.replace('/', '_')))):
                        try:
-                            tokenizer = AutoTokenizer.from_pretrained("models/{}".format(vars.model.replace('/', '_')), revision=vars.revision, cache_dir="cache", use_fast=False)
+                            tokenizer = AutoTokenizer.from_pretrained("models/{}".format(vars.model.replace('/', '_')), revision=args.revision, cache_dir="cache", use_fast=False)
                        except Exception as e:
                            try:
-                                tokenizer = AutoTokenizer.from_pretrained("models/{}".format(vars.model.replace('/', '_')), revision=vars.revision, cache_dir="cache")
+                                tokenizer = AutoTokenizer.from_pretrained("models/{}".format(vars.model.replace('/', '_')), revision=args.revision, cache_dir="cache")
                            except Exception as e:
                                try:
-                                    tokenizer = GPT2Tokenizer.from_pretrained("models/{}".format(vars.model.replace('/', '_')), revision=vars.revision, cache_dir="cache")
+                                    tokenizer = GPT2Tokenizer.from_pretrained("models/{}".format(vars.model.replace('/', '_')), revision=args.revision, cache_dir="cache")
                                except Exception as e:
-                                    tokenizer = GPT2Tokenizer.from_pretrained("gpt2", revision=vars.revision, cache_dir="cache")
+                                    tokenizer = GPT2Tokenizer.from_pretrained("gpt2", revision=args.revision, cache_dir="cache")
                        try:
-                            model     = AutoModelForCausalLM.from_pretrained("models/{}".format(vars.model.replace('/', '_')), revision=vars.revision, cache_dir="cache", **lowmem)
+                            model     = AutoModelForCausalLM.from_pretrained("models/{}".format(vars.model.replace('/', '_')), revision=args.revision, cache_dir="cache", **lowmem)
                        except Exception as e:
                            if("out of memory" in traceback.format_exc().lower()):
                                raise RuntimeError("One of your GPUs ran out of memory when KoboldAI tried to load your model.")
-                            model     = GPTNeoForCausalLM.from_pretrained("models/{}".format(vars.model.replace('/', '_')), revision=vars.revision, cache_dir="cache", **lowmem)
+                            model     = GPTNeoForCausalLM.from_pretrained("models/{}".format(vars.model.replace('/', '_')), revision=args.revision, cache_dir="cache", **lowmem)
                    else:
                        old_rebuild_tensor = torch._utils._rebuild_tensor
                        def new_rebuild_tensor(storage: Union[torch_lazy_loader.LazyTensor, torch.Storage], storage_offset, shape, stride):
@ -2591,21 +2592,21 @@ def load_model(use_gpu=True, gpu_layers=None, disk_layers=None, initial_load=Fal
                        torch._utils._rebuild_tensor = new_rebuild_tensor

                        try:
-                            tokenizer = AutoTokenizer.from_pretrained(vars.model, revision=vars.revision, cache_dir="cache", use_fast=False)
+                            tokenizer = AutoTokenizer.from_pretrained(vars.model, revision=args.revision, cache_dir="cache", use_fast=False)
                        except Exception as e:
                            try:
-                                tokenizer = AutoTokenizer.from_pretrained(vars.model, revision=vars.revision, cache_dir="cache")
+                                tokenizer = AutoTokenizer.from_pretrained(vars.model, revision=args.revision, cache_dir="cache")
                            except Exception as e:
                                try:
-                                    tokenizer = GPT2Tokenizer.from_pretrained(vars.model, revision=vars.revision, cache_dir="cache")
+                                    tokenizer = GPT2Tokenizer.from_pretrained(vars.model, revision=args.revision, cache_dir="cache")
                                except Exception as e:
-                                    tokenizer = GPT2Tokenizer.from_pretrained("gpt2", revision=vars.revision, cache_dir="cache")
+                                    tokenizer = GPT2Tokenizer.from_pretrained("gpt2", revision=args.revision, cache_dir="cache")
                        try:
-                            model     = AutoModelForCausalLM.from_pretrained(vars.model, revision=vars.revision, cache_dir="cache", **lowmem)
+                            model     = AutoModelForCausalLM.from_pretrained(vars.model, revision=args.revision, cache_dir="cache", **lowmem)
                        except Exception as e:
                            if("out of memory" in traceback.format_exc().lower()):
                                raise RuntimeError("One of your GPUs ran out of memory when KoboldAI tried to load your model.")
-                            model     = GPTNeoForCausalLM.from_pretrained(vars.model, revision=vars.revision, cache_dir="cache", **lowmem)
+                            model     = GPTNeoForCausalLM.from_pretrained(vars.model, revision=args.revision, cache_dir="cache", **lowmem)

                        torch._utils._rebuild_tensor = old_rebuild_tensor

@ -2622,10 +2623,10 @@ def load_model(use_gpu=True, gpu_layers=None, disk_layers=None, initial_load=Fal
                                import huggingface_hub
                                legacy = packaging.version.parse(transformers_version) < packaging.version.parse("4.22.0.dev0")
                                # Save the config.json
-                                shutil.move(os.path.realpath(huggingface_hub.hf_hub_download(vars.model, transformers.configuration_utils.CONFIG_NAME, revision=vars.revision, cache_dir="cache", local_files_only=True, legacy_cache_layout=legacy)), os.path.join("models/{}".format(vars.model.replace('/', '_')), transformers.configuration_utils.CONFIG_NAME))
+                                shutil.move(os.path.realpath(huggingface_hub.hf_hub_download(vars.model, transformers.configuration_utils.CONFIG_NAME, revision=args.revision, cache_dir="cache", local_files_only=True, legacy_cache_layout=legacy)), os.path.join("models/{}".format(vars.model.replace('/', '_')), transformers.configuration_utils.CONFIG_NAME))
                                if(utils.num_shards is None):
                                    # Save the pytorch_model.bin of an unsharded model
-                                    shutil.move(os.path.realpath(huggingface_hub.hf_hub_download(vars.model, transformers.modeling_utils.WEIGHTS_NAME, revision=vars.revision, cache_dir="cache", local_files_only=True, legacy_cache_layout=legacy)), os.path.join("models/{}".format(vars.model.replace('/', '_')), transformers.modeling_utils.WEIGHTS_NAME))
+                                    shutil.move(os.path.realpath(huggingface_hub.hf_hub_download(vars.model, transformers.modeling_utils.WEIGHTS_NAME, revision=args.revision, cache_dir="cache", local_files_only=True, legacy_cache_layout=legacy)), os.path.join("models/{}".format(vars.model.replace('/', '_')), transformers.modeling_utils.WEIGHTS_NAME))
                                else:
                                    with open(utils.from_pretrained_index_filename) as f:
                                        map_data = json.load(f)
@ -2634,7 +2635,7 @@ def load_model(use_gpu=True, gpu_layers=None, disk_layers=None, initial_load=Fal
                                    shutil.move(os.path.realpath(utils.from_pretrained_index_filename), os.path.join("models/{}".format(vars.model.replace('/', '_')), transformers.modeling_utils.WEIGHTS_INDEX_NAME))
                                    # Then save the pytorch_model-#####-of-#####.bin files
                                    for filename in filenames:
-                                        shutil.move(os.path.realpath(huggingface_hub.hf_hub_download(vars.model, filename, revision=vars.revision, cache_dir="cache", local_files_only=True, legacy_cache_layout=legacy)), os.path.join("models/{}".format(vars.model.replace('/', '_')), filename))
+                                        shutil.move(os.path.realpath(huggingface_hub.hf_hub_download(vars.model, filename, revision=args.revision, cache_dir="cache", local_files_only=True, legacy_cache_layout=legacy)), os.path.join("models/{}".format(vars.model.replace('/', '_')), filename))
                            shutil.rmtree("cache/")

                if(vars.badwordsids is vars.badwordsids_default and vars.model_type not in ("gpt2", "gpt_neo", "gptj")):
@ -2680,7 +2681,7 @@ def load_model(use_gpu=True, gpu_layers=None, disk_layers=None, initial_load=Fal
        
        else:
            from transformers import GPT2Tokenizer
-            tokenizer = GPT2Tokenizer.from_pretrained("gpt2", revision=vars.revision, cache_dir="cache")
+            tokenizer = GPT2Tokenizer.from_pretrained("gpt2", revision=args.revision, cache_dir="cache")
    else:
        from transformers import PreTrainedModel
        from transformers import modeling_utils
@ -2779,11 +2780,11 @@ def load_model(use_gpu=True, gpu_layers=None, disk_layers=None, initial_load=Fal
        # If we're running Colab or OAI, we still need a tokenizer.
        if(vars.model in ("Colab", "API", "CLUSTER")):
            from transformers import GPT2Tokenizer
-            tokenizer = GPT2Tokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B", revision=vars.revision, cache_dir="cache")
+            tokenizer = GPT2Tokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B", revision=args.revision, cache_dir="cache")
            loadsettings()
        elif(vars.model == "OAI"):
            from transformers import GPT2Tokenizer
-            tokenizer = GPT2Tokenizer.from_pretrained("gpt2", revision=vars.revision, cache_dir="cache")
+            tokenizer = GPT2Tokenizer.from_pretrained("gpt2", revision=args.revision, cache_dir="cache")
            loadsettings()
        # Load the TPU backend if requested
        elif(vars.use_colab_tpu or vars.model in ("TPUMeshTransformerGPTJ", "TPUMeshTransformerGPTNeoX")):
@ -3040,7 +3041,7 @@ def lua_decode(tokens):
    if("tokenizer" not in globals()):
        from transformers import GPT2Tokenizer
        global tokenizer
-        tokenizer = GPT2Tokenizer.from_pretrained("gpt2", revision=vars.revision, cache_dir="cache")
+        tokenizer = GPT2Tokenizer.from_pretrained("gpt2", revision=args.revision, cache_dir="cache")
    return utils.decodenewlines(tokenizer.decode(tokens))

 #==================================================================#
@ -3052,7 +3053,7 @@ def lua_encode(string):
    if("tokenizer" not in globals()):
        from transformers import GPT2Tokenizer
        global tokenizer
-        tokenizer = GPT2Tokenizer.from_pretrained("gpt2", revision=vars.revision, cache_dir="cache")
+        tokenizer = GPT2Tokenizer.from_pretrained("gpt2", revision=args.revision, cache_dir="cache")
    return tokenizer.encode(utils.encodenewlines(string), max_length=int(4e9), truncation=True)

 #==================================================================#
@ -4201,19 +4202,19 @@ def actionsubmit(data, actionmode=0, force_submit=False, force_prompt_gen=False,
                try:
                    if(os.path.isdir(tokenizer_id)):
                        try:
-                            tokenizer = AutoTokenizer.from_pretrained(tokenizer_id, revision=vars.revision, cache_dir="cache")
+                            tokenizer = AutoTokenizer.from_pretrained(tokenizer_id, revision=args.revision, cache_dir="cache")
                        except:
-                            tokenizer = AutoTokenizer.from_pretrained(tokenizer_id, revision=vars.revision, cache_dir="cache", use_fast=False)
+                            tokenizer = AutoTokenizer.from_pretrained(tokenizer_id, revision=args.revision, cache_dir="cache", use_fast=False)
                    elif(os.path.isdir("models/{}".format(tokenizer_id.replace('/', '_')))):
                        try:
-                            tokenizer = AutoTokenizer.from_pretrained("models/{}".format(tokenizer_id.replace('/', '_')), revision=vars.revision, cache_dir="cache")
+                            tokenizer = AutoTokenizer.from_pretrained("models/{}".format(tokenizer_id.replace('/', '_')), revision=args.revision, cache_dir="cache")
                        except:
-                            tokenizer = AutoTokenizer.from_pretrained("models/{}".format(tokenizer_id.replace('/', '_')), revision=vars.revision, cache_dir="cache", use_fast=False)
+                            tokenizer = AutoTokenizer.from_pretrained("models/{}".format(tokenizer_id.replace('/', '_')), revision=args.revision, cache_dir="cache", use_fast=False)
                    else:
                        try:
-                            tokenizer = AutoTokenizer.from_pretrained(tokenizer_id, revision=vars.revision, cache_dir="cache")
+                            tokenizer = AutoTokenizer.from_pretrained(tokenizer_id, revision=args.revision, cache_dir="cache")
                        except:
-                            tokenizer = AutoTokenizer.from_pretrained(tokenizer_id, revision=vars.revision, cache_dir="cache", use_fast=False)
+                            tokenizer = AutoTokenizer.from_pretrained(tokenizer_id, revision=args.revision, cache_dir="cache", use_fast=False)
                except:
                    logger.warning(f"Unknown tokenizer {repr(tokenizer_id)}")
                vars.api_tokenizer_id = tokenizer_id
@ -4625,7 +4626,7 @@ def calcsubmitbudget(actionlen, winfo, mem, anotetxt, actions, submission=None,
    if("tokenizer" not in globals()):
        from transformers import GPT2Tokenizer
        global tokenizer
-        tokenizer = GPT2Tokenizer.from_pretrained("gpt2", revision=vars.revision, cache_dir="cache")
+        tokenizer = GPT2Tokenizer.from_pretrained("gpt2", revision=args.revision, cache_dir="cache")

    lnheader = len(tokenizer._koboldai_header)

@ -5272,15 +5273,21 @@ def sendtocluster(txt, min, max):
    cluster_metadata = {
        'prompt': txt,
        'params': reqdata,
-        'api_key': vars.apikey,
        'models': vars.cluster_requested_models,
-    }
+        'trusted_workers': False,
+    }    
+    client_agent = "KoboldAI:1.19.3:koboldai.org"
+    cluster_headers = {
+        'apikey': vars.apikey,
+        "Client-Agent": client_agent
+    }    
    logger.debug(f"Horde Payload: {cluster_metadata}")
    try:
        # Create request
        req = requests.post(
-            vars.colaburl[:-8] + "/api/v1/generate/sync",
+            vars.colaburl[:-8] + "/api/v2/generate/text/async",
            json=cluster_metadata,
+            headers=cluster_headers,
        )
    except requests.exceptions.ConnectionError:
        errmsg = f"Horde unavailable. Please try again later"
@ -5308,13 +5315,76 @@ def sendtocluster(txt, min, max):
        emit('from_server', {'cmd': 'errmsg', 'data': errmsg}, broadcast=True)
        set_aibusy(0)
        return
-    gen_servers = [(cgen['server_name'],cgen['server_id']) for cgen in js]
-    logger.info(f"Generations by: {gen_servers}")
+
+    request_id = js["id"]
+    logger.debug("Horde Request ID: {}".format(request_id))
+    
+    cluster_agent_headers = {
+        "Client-Agent": client_agent
+    }            
+    finished = False
+
+    while not finished:
+        try: 
+            req = requests.get(vars.colaburl[:-8] + "/api/v2/generate/text/status/" + request_id, headers=cluster_agent_headers)
+        except requests.exceptions.ConnectionError:
+            errmsg = f"Horde unavailable. Please try again later"
+            logger.error(errmsg)
+            emit('from_server', {'cmd': 'errmsg', 'data': errmsg}, broadcast=True)
+            set_aibusy(0)
+            return
+
+        if not req.ok:
+            errmsg = f"KoboldAI API Error: Failed to get a standard reply from the Horde. Please check the console."
+            logger.error(req.text)
+            emit('from_server', {'cmd': 'errmsg', 'data': errmsg}, broadcast=True)
+            set_aibusy(0)
+            return
+
+        try:
+            req_status = req.json()
+        except requests.exceptions.JSONDecodeError:
+            errmsg = f"Unexpected message received from the KoboldAI Horde: '{req.text}'"
+            logger.error(errmsg)
+            emit('from_server', {'cmd': 'errmsg', 'data': errmsg}, broadcast=True)
+            set_aibusy(0)
+            return
+
+        if "done" not in req_status:
+            errmsg = f"Unexpected response received from the KoboldAI Horde: '{js}'"
+            logger.error(errmsg)
+            emit('from_server', {'cmd': 'errmsg', 'data': errmsg}, broadcast=True)
+            set_aibusy(0)
+            return
+
+        finished = req_status["done"]
+
+        if not finished:
+            logger.debug(req_status)
+            time.sleep(1)
+    
+    logger.debug("Last Horde Status Message: {}".format(js))
+    if req_status["faulted"]:
+        errmsg = "Horde Text generation faulted! Please try again"
+        logger.error(errmsg)
+        emit('from_server', {'cmd': 'errmsg', 'data': errmsg}, broadcast=True)
+        set_aibusy(0)
+        return
+    
+    generations = req_status['generations']
+    gen_workers = [(cgen['worker_name'],cgen['worker_id']) for cgen in generations]
+    logger.info(f"Generations by: {gen_workers}")
+
+
+
+
+
+
    # Just in case we want to announce it to the user
-    if len(js) == 1:        
-        warnmsg = f"Text generated by {js[0]['server_name']}"
+    if len(generations) == 1:        
+        warnmsg = f"Text generated by {[w[0] for w in gen_workers]}"
        emit('from_server', {'cmd': 'warnmsg', 'data': warnmsg}, broadcast=True)
-    genout = [cgen['text'] for cgen in js]
+    genout = [cgen['text'] for cgen in generations]

    for i in range(vars.numseqs):
        vars.lua_koboldbridge.outputs[i+1] = genout[i]
--- a/colab/GPU.ipynb
+++ b/colab/GPU.ipynb
@ -1,23 +1,4 @@
 {
-  "nbformat": 4,
-  "nbformat_minor": 0,
-  "metadata": {
-    "colab": {
-      "name": "ColabKobold GPU",
-      "private_outputs": true,
-      "provenance": [],
-      "collapsed_sections": [],
-      "include_colab_link": true
-    },
-    "kernelspec": {
-      "display_name": "Python 3",
-      "name": "python3"
-    },
-    "language_info": {
-      "name": "python"
-    },
-    "accelerator": "GPU"
-  },
  "cells": [
    {
      "cell_type": "markdown",
@ -35,60 +16,99 @@
        "id": "kX9y5koxa58q"
      },
      "source": [
+        "## [You can get faster generations and higher context with our Koboldcpp Notebook](https://koboldai.org/colabcpp)\n",
+        "\n",
        "# Welcome to KoboldAI on Google Colab, GPU Edition!\n",
        "KoboldAI is a powerful and easy way to use a variety of AI based text generation experiences. You can use it to write stories, blog posts, play a text adventure game, use it like a chatbot and more! In some cases it might even help you with an assignment or programming task (But always make sure the information the AI mentions is correct, it loves to make stuff up).\n",
        "\n",
        "For more information about KoboldAI check our our Github readme : https://github.com/KoboldAI/KoboldAI-Client/blob/main/readme.md\n",
        "\n",
-        "For the larger AI models (That are typically more coherent) check out our **[TPU edition](https://colab.research.google.com/github/KoboldAI/KoboldAI-Client/blob/main/colab/TPU.ipynb)**!"
+        "---\n",
+        "## How to load KoboldAI: Everything you need to know\n",
+        "1. On a phone? First put your browser in desktop mode because of a Google Colab bug. Otherwise nothing will happen when you click the play button. Then tap the play button next to \"<-- Tap This if you play on Mobile\", you will see an audio player. Keep the audio player playing so Colab does not get shut down in the background.\n",
+        "2. Select the desired model, you will find a description of all the available models further down the page.\n",
+        "3. Click the play button next to \"<-- Select your model below and then click this to start KoboldAI\".\n",
+        "4. Got a message saying no accelerator is available? Click cancel, and try again in a few minutes. If you do not manage to get a session when you frequently try again try at a different time of day, colab can be busy or your priority may have been lowered by frequent usage.\n",
+        "5. After everything is done loading you will get a link that you can use to open KoboldAI. In case of Localtunnel you will also be warned that some people are abusing Localtunnel for phishing, once you acknowledge this warning you will be taken to KoboldAI's interface. If you picked Cloudflare and get a 1033 error refresh the error page after waiting one minute.\n",
+        "\n",
+        "---\n",
+        "\n",
+        "Further down the page you can find descriptions of the models, and tips to get the most out of your Google Colab experience.\n",
+        "\n",
+        "Make sure to keep this page open while you are using KoboldAI, and check back regularly to see if you got a Captcha. Failure to complete the captcha's in time can result in termination of your session or a lower priority towards the TPUs.\n",
+        "\n",
+        "Firefox users need to disable the enhanced tracking protection or use a different browser in order to be able to use Google Colab without errors (This is not something we can do anything about, the cookie blocker breaks the Google Drive integration because it uses different domains)."
      ]
    },
    {
      "cell_type": "code",
+      "execution_count": null,
      "metadata": {
        "id": "ewkXkyiFP2Hq"
      },
+      "outputs": [],
      "source": [
        "#@title <-- Tap this if you play on Mobile { display-mode: \"form\" }\n",
        "%%html\n",
        "<b>Press play on the music player to keep the tab alive, then start KoboldAI below (Uses only 13MB of data)</b><br/>\n",
-        "<audio src=\"https://henk.tech/colabkobold/silence.m4a\" controls>"
-      ],
-      "execution_count": null,
-      "outputs": []
+        "<audio src=\"https://raw.githubusercontent.com/KoboldAI/KoboldAI-Client/main/colab/silence.m4a\" controls>"
+      ]
    },
    {
      "cell_type": "code",
+      "execution_count": null,
      "metadata": {
-        "id": "lVftocpwCoYw",
-        "cellView": "form"
+        "cellView": "form",
+        "id": "lVftocpwCoYw"
      },
+      "outputs": [],
      "source": [
        "#@title <b><-- Select your model below and then click this to start KoboldAI</b>\n",
        "#@markdown You can find a description of the models below along with instructions on how to start KoboldAI.\n",
        "\n",
-        "Model = \"Nerys 2.7B\" #@param [\"Nerys 2.7B\", \"AID 2.7B\", \"Erebus 2.7B\", \"Janeway 2.7B\", \"Picard 2.7B\", \"Horni LN 2.7B\", \"Horni 2.7B\", \"Shinen 2.7B\", \"OPT 2.7B\", \"Fairseq Dense 2.7B\", \"Neo 2.7B\"] {allow-input: true}\n",
+        "Model = \"Nerys V2 6B\" #@param [\"Tiefighter 13B (United)\", \"Echidna 13B (United)\", \"HoloMax 13B (United)\", \"Emerhyst 13B (United)\", \"MythoMax 13B (United)\", \"Huginn 13B (United)\", \"Chronos 13B (United)\", \"Airoboros M2.0 13B (United)\", \"Holodeck 13B (United)\", \"Spring Dragon 13B (United)\", \"Nerys V2 6B\", \"Skein 6B\", \"Janeway 6B\", \"Adventure 6B\", \"Nerys 2.7B\", \"AID 2.7B\", \"Janeway 2.7B\", \"Picard 2.7B\", \"OPT 2.7B\", \"Fairseq Dense 2.7B\", \"Neo 2.7B\"] {allow-input: true}\n",
+        "Revision = \"\" #@param [\"\"]{allow-input: true}\n",
        "Version = \"Official\" #@param [\"Official\", \"United\"] {allow-input: true}\n",
-        "Provider = \"Localtunnel\" #@param [\"Localtunnel\", \"Cloudflare\"]\n",
-		"use_google_drive = True #@param {type:\"boolean\"}\n",
+        "Provider = \"Cloudflare\" #@param [\"Localtunnel\", \"Cloudflare\"]\n",
+        "use_google_drive = True #@param {type:\"boolean\"}\n",
+        "\n",
+        "import os\n",
+        "if not os.path.isfile(\"/opt/bin/nvidia-smi\"):\n",
+        "  raise RuntimeError(\"⚠️Colab did not give you a GPU due to usage limits, this can take a few hours before they let you back in. Check out https://lite.koboldai.net for a free alternative (that does not provide an API link but can load KoboldAI saves and chat cards) or subscribe to Colab Pro for immediate access.⚠️\")\n",
        "\n",
        "!nvidia-smi\n",
        "from google.colab import drive\n",
        "if use_google_drive:\n",
-		"  drive.mount('/content/drive/')\n",
-		"else:\n",
-		"  import os\n",
-		"  if not os.path.exists(\"/content/drive\"):\n",
-		"    os.mkdir(\"/content/drive\")\n",
-		"  if not os.path.exists(\"/content/drive/MyDrive/\"):\n",
-		"    os.mkdir(\"/content/drive/MyDrive/\")\n",
+        "  drive.mount('/content/drive/')\n",
+        "else:\n",
+        "  import os\n",
+        "  if not os.path.exists(\"/content/drive\"):\n",
+        "    os.mkdir(\"/content/drive\")\n",
+        "  if not os.path.exists(\"/content/drive/MyDrive/\"):\n",
+        "    os.mkdir(\"/content/drive/MyDrive/\")\n",
        "\n",
-        "if Model == \"Nerys 2.7B\":\n",
-        "  Model = \"KoboldAI/fairseq-dense-2.7B-Nerys\"\n",
+        "if Model == \"Nerys V2 6B\":\n",
+        "  Model = \"KoboldAI/OPT-6B-nerys-v2\"\n",
        "  path = \"\"\n",
        "  download = \"\"\n",
-        "elif Model == \"Erebus 2.7B\":\n",
-        "  Model = \"KoboldAI/OPT-2.7B-Erebus\"\n",
+        "elif Model == \"Skein 6B\":\n",
+        "  Model = \"KoboldAI/GPT-J-6B-Skein\"\n",
+        "  path = \"\"\n",
+        "  download = \"\"\n",
+        "elif Model == \"Janeway 6B\":\n",
+        "  Model = \"KoboldAI/GPT-J-6B-Janeway\"\n",
+        "  path = \"\"\n",
+        "  download = \"\"\n",
+        "elif Model == \"Adventure 6B\":\n",
+        "  Model = \"KoboldAI/GPT-J-6B-Adventure\"\n",
+        "  path = \"\"\n",
+        "  download = \"\"\n",
+        "elif Model == \"Shinen 6B\":\n",
+        "  Model = \"KoboldAI/GPT-J-6B-Shinen\"\n",
+        "  path = \"\"\n",
+        "  download = \"\"\n",
+        "elif Model == \"Nerys 2.7B\":\n",
+        "  Model = \"KoboldAI/fairseq-dense-2.7B-Nerys\"\n",
        "  path = \"\"\n",
        "  download = \"\"\n",
        "elif Model == \"Janeway 2.7B\":\n",
@ -103,18 +123,6 @@
        "  Model = \"KoboldAI/GPT-Neo-2.7B-AID\"\n",
        "  path = \"\"\n",
        "  download = \"\"\n",
-        "elif Model == \"Horni LN 2.7B\":\n",
-        "  Model = \"KoboldAI/GPT-Neo-2.7B-Horni-LN\"\n",
-        "  path = \"\"\n",
-        "  download = \"\"\n",
-        "elif Model == \"Horni 2.7B\":\n",
-        "  Model = \"KoboldAI/GPT-Neo-2.7B-Horni\"\n",
-        "  path = \"\"\n",
-        "  download = \"\"\n",
-        "elif Model == \"Shinen 2.7B\":\n",
-        "  Model = \"KoboldAI/GPT-Neo-2.7B-Shinen\"\n",
-        "  path = \"\"\n",
-        "  download = \"\"\n",
        "elif Model == \"Fairseq Dense 2.7B\":\n",
        "  Model = \"KoboldAI/fairseq-dense-2.7B\"\n",
        "  path = \"\"\n",
@ -127,55 +135,95 @@
        "  Model = \"EleutherAI/gpt-neo-2.7B\"\n",
        "  path = \"\"\n",
        "  download = \"\"\n",
+        "elif Model == \"Tiefighter 13B (United)\":\n",
+        "  Model = \"KoboldAI/LLaMA2-13B-Tiefighter\"\n",
+        "  path = \"\"\n",
+        "  download = \"\"\n",
+        "  Version = \"United\"\n",
+        "elif Model == \"Echidna 13B (United)\":\n",
+        "  Model = \"NeverSleep/Echidna-13b-v0.3\"\n",
+        "  path = \"\"\n",
+        "  download = \"\"\n",
+        "  Version = \"United\"\n",
+        "elif Model == \"Huginn 13B (United)\":\n",
+        "  Model = \"The-Face-Of-Goonery/Huginn-13b-v1.2\"\n",
+        "  path = \"\"\n",
+        "  download = \"\"\n",
+        "  Version = \"United\"\n",
+        "elif Model == \"Chronos 13B (United)\":\n",
+        "  Model = \"elinas/chronos-13b-v2\"\n",
+        "  path = \"\"\n",
+        "  download = \"\"\n",
+        "  Version = \"United\"\n",
+        "elif Model == \"Airoboros M2.0 13B (United)\":\n",
+        "  Model = \"jondurbin/airoboros-l2-13b-gpt4-m2.0\"\n",
+        "  path = \"\"\n",
+        "  download = \"\"\n",
+        "  Version = \"United\"\n",
+        "elif Model == \"Emerhyst 13B (United)\":\n",
+        "  Model = \"Undi95/Emerhyst-13B\"\n",
+        "  path = \"\"\n",
+        "  download = \"\"\n",
+        "  Version = \"United\"\n",
+        "elif Model == \"MythoMax 13B (United)\":\n",
+        "  Model = \"Gryphe/MythoMax-L2-13b\"\n",
+        "  Revision = \"\"\n",
+        "  path = \"\"\n",
+        "  download = \"\"\n",
+        "  Version = \"United\"\n",
+        "elif Model == \"Spring Dragon 13B (United)\":\n",
+        "  Model = \"Henk717/spring-dragon\"\n",
+        "  path = \"\"\n",
+        "  download = \"\"\n",
+        "  Version = \"United\"\n",
+        "elif Model == \"Holodeck 13B (United)\":\n",
+        "  Model = \"KoboldAI/LLAMA2-13B-Holodeck-1\"\n",
+        "  path = \"\"\n",
+        "  download = \"\"\n",
+        "  Version = \"United\"\n",
+        "elif Model == \"HoloMax 13B (United)\":\n",
+        "  Model = \"KoboldAI/LLaMA2-13B-Holomax\"\n",
+        "  path = \"\"\n",
+        "  download = \"\"\n",
+        "  Version = \"United\"\n",
        "\n",
        "if Provider == \"Localtunnel\":\n",
        "  tunnel = \"--localtunnel yes\"\n",
        "else:\n",
        "  tunnel = \"\"\n",
        "\n",
-        "!wget https://koboldai.org/ckds -O - | bash /dev/stdin -m $Model -g $Version $tunnel"
-      ],
-      "execution_count": null,
-      "outputs": []
+        "!wget https://koboldai.org/ckds -O - | bash /dev/stdin -m $Model -g $Version $Revision $tunnel"
+      ]
    },
    {
      "cell_type": "markdown",
+      "metadata": {
+        "id": "Lrm840I33hkC"
+      },
      "source": [
        "# GPU Edition Model Descriptions\n",
        "| Model | Style | Description |\n",
        "| --- | --- | --- |\n",
        "| [Nerys](https://huggingface.co/KoboldAI/fairseq-dense-2.7B-Nerys) by Mr Seeker | Novel/Adventure | Nerys is a hybrid model based on Pike (A newer Janeway), on top of the Pike dataset you also get some Light Novels, Adventure mode support and a little bit of Shinen thrown in the mix. The end result is a very diverse model that is heavily biased towards SFW novel writing, but one that can go beyond its novel training and make for an excellent adventure model to. Adventure mode is best played from a second person perspective, but can be played in first or third person as well. Novel writing can be done best from the first or third person. |\n",
-        "| [Erebus](https://huggingface.co/KoboldAI/OPT-2.7B-Erebus) by Mr Seeker | NSFW | Erebus is our community's flagship NSFW model, being a combination of multiple large datasets that include Literotica, Shinen and erotic novels from Nerys and featuring thourough tagging support it covers the vast majority of erotic writing styles. This model is capable of replacing both the Lit and Shinen models in terms of content and style and has been well received as (one of) the best NSFW models out there. If you wish to use this model for commercial or non research usage we recommend choosing the 20B version as that one is not subject to the restrictive OPT license. |\n",
+        "| [Tiefighter 13B by KoboldAI](https://huggingface.co/KoboldAI/LLaMA2-13B-Tiefighter) | Hybrid | Tiefighter 13B is a very versitile fiction Hybrid, it can write, chat and play adventure games and can also answer regular instructions (Although we do not recommend this model for factual use due to its fictional nature). This is an excellent starting model, for the best results avoid using Second person writing in your chats unless you are wanting it to become a text adventure.|\n",
        "| [Janeway](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Janeway) by Mr Seeker | Novel | Janeway is a model created from Picard's dataset combined with a brand new collection of ebooks. This model is trained on 20% more content than Picard and has been trained on literature from various genres. Although the model is mainly focussed on SFW, romantic scenes might involve a degree of nudity. |\n",
        "| [Picard](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Picard) by Mr Seeker | Novel | Picard is a model trained for SFW Novels based on Neo 2.7B. It is focused on Novel style writing without the NSFW bias. While the name suggests a sci-fi model this model is designed for Novels of a variety of genre's. It is meant to be used in KoboldAI's regular mode. |\n",
        "| [AID](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-AID) by melastacho | Adventure | Also know as Adventure 2.7B this is a clone of the AI Dungeon Classic model and is best known for the epic wackey adventures that AI Dungeon Classic players love. |\n",
-        "| [Horni LN](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Horni-LN) by finetune | Novel | This model is based on Horni 2.7B and retains its NSFW knowledge, but was then further biased towards SFW novel stories. If you seek a balance between a SFW Novel model and a NSFW model this model should be a good choice. |\n",
-        "| [Horni](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Horni) by finetune | NSFW | This model is tuned on Literotica to produce a Novel style model biased towards NSFW content. Can still be used for SFW stories but will have a bias towards NSFW content. It is meant to be used in KoboldAI's regular mode. |\n",
-        "| [Shinen](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Shinen) by Mr Seeker | NSFW | Shinen is an alternative to the Horni model designed to be more explicit. If Horni is to tame for you Shinen might produce better results. While it is a Novel model it is unsuitable for SFW stories due to its heavy NSFW bias. Shinen will not hold back. It is meant to be used in KoboldAI's regular mode. |\n",
        "| [OPT](https://huggingface.co/facebook/opt-2.7b) by Metaseq | Generic | OPT is considered one of the best base models as far as content goes, its behavior has the strengths of both GPT-Neo and Fairseq Dense. Compared to Neo duplicate and unnecessary content has been left out, while additional literature was added in similar to the Fairseq Dense model. The Fairseq Dense model however lacks the broader data that OPT does have. The biggest downfall of OPT is its license, which prohibits any commercial usage, or usage beyond research purposes. |\n",
        "| [Fairseq Dense](https://huggingface.co/KoboldAI/fairseq-dense-2.7B) | Generic | Trained by Facebook Researchers this model stems from the MOE research project within Fairseq. This particular version has been converted by us for use in KoboldAI. It is known to be on par with the larger models from EleutherAI and considered as better for pop culture and language tasks. Because the model has never seen a new line (enter) it may perform worse on formatting and paragraphing. Compared to other models the dataset focuses primarily on literature and contains little else. |\n",
+        "| [MythoMax 13B](https://huggingface.co/TheBloke/MythoMax-L2-13B-GPTQ) by Gryphe | Roleplay | An improved, potentially even perfected variant of MythoMix, my MythoLogic-L2 and Huginn merge using a highly experimental tensor type merge technique¹. |\n",
+        "| [Holomax 13B by KoboldAI](https://huggingface.co/KoboldAI/LLaMA2-13B-Holomax) | Adventure | This is an expansion merge to the well-praised MythoMax model from Gryphe (60%) using MrSeeker's KoboldAI Holodeck model (40%). The goal of this model is to enhance story-writing capabilities while preserving the desirable traits of the MythoMax model as much as possible (It does limit chat reply length). |\n",
+        "| [Airoboros 13B](https://huggingface.co/jondurbin/airoboros-13b) by Jon Durbin | Generic | This is an instruction fine-tuned llama-2 model, using synthetic instructions generated by airoboros⁵. |\n",
+        "| [Emerhyst 13B](https://huggingface.co/Undi95/Emerhyst-13B) by Undi | Roleplay | An attempt using BlockMerge_Gradient to get better result. In addition, LimaRP v3 was used⁷. |\n",
+        "| [Chronos 13B](https://huggingface.co/elinas/chronos-13b) by Elinas | Generic | This model is primarily focused on chat, roleplay, and storywriting, but can accomplish other tasks such as simple reasoning and coding. Chronos generates very long outputs with coherent text, largely due to the human inputs it was trained on. |\n",
+        "| [Spring Dragon by Henk717](https://huggingface.co/Henk717/spring-dragon) | Adventure | This model is a recreation attempt of the AI Dungeon 2 Dragon model. To achieve this, the \"text_adventures.txt\" dataset was used, which was bundled with the original AI Dungeon 2 GitHub release prior to the online service. It is worth noting that the same dataset file was used to create the Dragon model, where Dragon is a GPT-3 175B Davinci model from 2020. |\n",
+        "| [Holodeck By KoboldAI](https://huggingface.co/KoboldAI/LLAMA2-13B-Holodeck-1) | Adventure |LLAMA2 13B-Holodeck is a finetune created using Meta's llama 2 model.The training data contains around 3000 ebooks in various genres. Most parts of the dataset have been prepended using the following text: [Genre: <genre1>, <genre2>|\n",
        "| [Neo](https://huggingface.co/EleutherAI/gpt-neo-2.7B) by EleutherAI | Generic | This is the base model for all the other 2.7B models, it is best used when you have a use case that we have no other models available for, such as writing blog articles or programming. It can also be a good basis for the experience of some of the softprompts if your softprompt is not about a subject the other models cover. |\n",
        "\n",
-        "# [TPU Edition Model Descriptions](https://colab.research.google.com/github/KoboldAI/KoboldAI-Client/blob/main/colab/TPU.ipynb)\n",
-        "\n",
-        "| Model | Style | Description |\n",
-        "| --- | --- | --- |\n",
-        "| [Nerys](https://huggingface.co/KoboldAI/fairseq-dense-13B-Nerys) by Mr Seeker | Novel/Adventure | Nerys is a hybrid model based on Pike (A newer Janeway), on top of the Pike dataset you also get some Light Novels, Adventure mode support and a little bit of Shinen thrown in the mix. The end result is a very diverse model that is heavily biased towards SFW novel writing, but one that can go beyond its novel training and make for an excellent adventure model to. Adventure mode is best played from a second person perspective, but can be played in first or third person as well. Novel writing can be done best from the first or third person. |\n",
-        "| [Erebus](https://huggingface.co/KoboldAI/OPT-13B-Erebus) by Mr Seeker | NSFW | Erebus is our community's flagship NSFW model, being a combination of multiple large datasets that include Literotica, Shinen and erotic novels from Nerys and featuring thourough tagging support it covers the vast majority of erotic writing styles. This model is capable of replacing both the Lit and Shinen models in terms of content and style and has been well received as (one of) the best NSFW models out there. If you wish to use this model for commercial or non research usage we recommend choosing the 20B version as that one is not subject to the restrictive OPT license. |\n",
-        "| [Janeway](https://huggingface.co/KoboldAI/fairseq-dense-13B-Janeway) by Mr Seeker | Novel | Janeway is a model created from Picard's dataset combined with a brand new collection of ebooks. This model is trained on 20% more content than Picard and has been trained on literature from various genres. Although the model is mainly focussed on SFW, romantic scenes might involve a degree of nudity. |\n",
-        "| [Shinen](https://huggingface.co/KoboldAI/fairseq-dense-13B-Shinen) by Mr Seeker | NSFW | Shinen is an NSFW model trained on a variety of stories from the website Sexstories it contains many different kinks. It has been merged into the larger (and better) Erebus model. |\n",
-        "| [Skein](https://huggingface.co/KoboldAI/GPT-J-6B-Skein) by VE\\_FORBRYDERNE | Adventure | Skein is best used with Adventure mode enabled, it consists of a 4 times larger adventure dataset than the Adventure model making it excellent for text adventure gaming. On top of that it also consists of light novel training further expanding its knowledge and writing capabilities. It can be used with the You filter bias if you wish to write Novels with it, but dedicated Novel models can perform better for this task. |\n",
-        "| [Adventure](https://huggingface.co/KoboldAI/GPT-J-6B-Adventure) by VE\\_FORBRYDERNE | Adventure | Adventure is a 6B model designed to mimick the behavior of AI Dungeon. It is exclusively for Adventure Mode and can take you on the epic and wackey adventures that AI Dungeon players love. It also features the many tropes of AI Dungeon as it has been trained on very similar data. It must be used in second person (You). |\n",
-        "| [Lit](https://huggingface.co/hakurei/lit-6B) ([V2](https://huggingface.co/hakurei/litv2-6B-rev3)) by Haru | NSFW | Lit is a great NSFW model trained by Haru on both a large set of Literotica stories and high quality novels along with tagging support. Creating a high quality model for your NSFW stories. This model is exclusively a novel model and is best used in third person. |\n",
-        "| [OPT](https://huggingface.co/facebook/opt-13b) by Metaseq | Generic | OPT is considered one of the best base models as far as content goes, its behavior has the strengths of both GPT-Neo and Fairseq Dense. Compared to Neo duplicate and unnecessary content has been left out, while additional literature was added in similar to the Fairseq Dense model. The Fairseq Dense model however lacks the broader data that OPT does have. The biggest downfall of OPT is its license, which prohibits any commercial usage, or usage beyond research purposes. |\n",
-        "| [Neo(X)](https://huggingface.co/EleutherAI/gpt-neox-20b) by EleutherAI | Generic | NeoX is the largest EleutherAI model currently available, being a generic model it is not particularly trained towards anything and can do a variety of writing, Q&A and coding tasks. 20B's performance is closely compared to the 13B models and it is worth trying both especially if you have a task that does not involve english writing. Its behavior will be similar to the GPT-J-6B model since they are trained on the same dataset but with more sensitivity towards repetition penalty and with more knowledge. |\n",
-        "| [Fairseq Dense](https://huggingface.co/KoboldAI/fairseq-dense-13B) | Generic | Trained by Facebook Researchers this model stems from the MOE research project within Fairseq. This particular version has been converted by us for use in KoboldAI. It is known to be on par with the larger 20B model from EleutherAI and considered as better for pop culture and language tasks. Because the model has never seen a new line (enter) it may perform worse on formatting and paragraphing. Compared to other models the dataset focuses primarily on literature and contains little else. |\n",
-        "| [GPT-J-6B](https://huggingface.co/EleutherAI/gpt-j-6B) by EleutherAI | Generic | This model serves as the basis for most other 6B models (Some being based on Fairseq Dense instead). Being trained on the Pile and not biased towards anything in particular it is suitable for a variety of tasks such as writing, Q&A and coding tasks. You will likely get better result with larger generic models or finetuned models. |\n",
        "\n",
        "| Style     | Description                                                  |\n",
        "| --------- | ------------------------------------------------------------ |\n",
        "| Novel     | For regular story writing, not compatible with Adventure mode or other specialty modes. |\n",
-        "| NSFW      | Indicates that the model is strongly biased towards NSFW content and is not suitable for children, work environments or livestreaming. Most NSFW models are also Novel models in nature. |\n",
        "| Adventure | These models are excellent for people willing to play KoboldAI like a Text Adventure game and are meant to be used with Adventure mode enabled. Even if you wish to use it as a Novel style model you should always have Adventure mode on and set it to story. These models typically have a strong bias towards the use of the word You and without Adventure mode enabled break the story flow and write actions on your behalf. |\n",
        "| Generic   | Generic models are not trained towards anything specific, typically used as a basis for other tasks and models. They can do everything the other models can do, but require much more handholding to work properly. Generic models are an ideal basis for tasks that we have no specific model for, or for experiencing a softprompt in its raw form. |\n",
        "\n",
@ -191,10 +239,39 @@
        "7. As you play KoboldAI, keep this Colab tab open in the background and check occationally for Captcha's so they do not shut your instance down. If you do get shut down you can always download a copy of your gamesave in the Save menu inside KoboldAI. Stories are never lost as long as you keep KoboldAI open in your browser.\n",
        "\n",
        "Get a error message saying you do not have access to a GPU/TPU instance? Do not continue and try again later, KoboldAI will not run correctly without them."
-      ],
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
      "metadata": {
-        "id": "Lrm840I33hkC"
-      }
+        "cellView": "form",
+        "id": "5k8fK4F6UiTs"
+      },
+      "outputs": [],
+      "source": [
+        "#@title <b>Model Cleaner</b>\n",
+        "#@markdown Out of space? Run this to remove all cached models (Google Drive models are not effected).\n",
+        "!rm -rf /content/KoboldAI-Client/cache/*\n"
+      ]
    }
-  ]
+  ],
+  "metadata": {
+    "accelerator": "GPU",
+    "colab": {
+      "name": "ColabKobold GPU",
+      "private_outputs": true,
+      "provenance": [],
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
 }
--- a/colab/TPU.ipynb
+++ b/colab/TPU.ipynb
@ -46,7 +46,7 @@
        "#@title <-- Tap this if you play on Mobile { display-mode: \"form\" }\n",
        "%%html\n",
        "<b>Press play on the music player to keep the tab alive, then start KoboldAI below (Uses only 13MB of data)</b><br/>\n",
-        "<audio src=\"https://henk.tech/colabkobold/silence.m4a\" controls>"
+        "<audio src=\"https://raw.githubusercontent.com/KoboldAI/KoboldAI-Client/main/colab/silence.m4a\" controls>"
      ],
      "metadata": {
        "id": "ZIL7itnNaw5V"
@ -66,10 +66,10 @@
        "#@title <b><-- Select your model below and then click this to start KoboldAI</b>\n",
        "#@markdown You can find a description of the models below along with instructions on how to start KoboldAI.\n",
        "\n",
-        "Model = \"Nerys 13B V2\" #@param [\"Nerys 13B V2\", \"Erebus 13B\", \"Janeway 13B\", \"Shinen 13B\", \"Skein 20B\", \"Erebus 20B\", \"Skein 6B\", \"Janeway 6B\", \"Adventure 6B\", \"Shinen 6B\", \"Lit V2 6B\", \"Lit 6B\", \"NeoX 20B\", \"OPT 13B\", \"Fairseq Dense 13B\", \"GPT-J-6B\"] {allow-input: true}\n",
+        "Model = \"Nerys 13B V2\" #@param [\"Nerys 13B V2\", \"Janeway 13B\", \"Skein 20B\", \"Skein 6B\", \"Janeway 6B\", \"Adventure 6B\", \"NeoX 20B\", \"OPT 13B\", \"Fairseq Dense 13B\", \"GPT-J-6B\"] {allow-input: true}\n",
        "Version = \"Official\" #@param [\"Official\", \"United\"] {allow-input: true}\n",
-        "Provider = \"Localtunnel\" #@param [\"Localtunnel\", \"Cloudflare\"]\n",
-		"use_google_drive = True #@param {type:\"boolean\"}\n",
+        "Provider = \"Cloudflare\" #@param [\"Localtunnel\", \"Cloudflare\"]\n",
+        "use_google_drive = True #@param {type:\"boolean\"}\n",
        "\n",
        "import os\n",
        "try:\n",
@ -81,13 +81,15 @@
        "print('Now we will need your Google Drive to store settings and saves, you must login with the same account you used for Colab.')\n",
        "from google.colab import drive\n",
        "if use_google_drive:\n",
-		"  drive.mount('/content/drive/')\n",
-		"else:\n",
-		"  import os\n",
-		"  if not os.path.exists(\"/content/drive\"):\n",
-		"    os.mkdir(\"/content/drive\")\n",
-		"  if not os.path.exists(\"/content/drive/MyDrive/\"):\n",
-		"    os.mkdir(\"/content/drive/MyDrive/\")\n",
+        "  drive.mount('/content/drive/')\n",
+        "else:\n",
+        "  import os\n",
+        "  if not os.path.exists(\"/content/drive\"):\n",
+        "    os.mkdir(\"/content/drive\")\n",
+        "  if not os.path.exists(\"/content/drive/MyDrive/\"):\n",
+        "    os.mkdir(\"/content/drive/MyDrive/\")\n",
+        "\n",
+        "Revision = \"\"\n",
        "\n",
        "if Model == \"Janeway 13B\":\n",
        "  Model = \"KoboldAI/fairseq-dense-13B-Janeway\"\n",
@ -97,18 +99,6 @@
        "  Model = \"KoboldAI/OPT-13B-Nerys-v2\"\n",
        "  path = \"\"\n",
        "  download = \"\"\n",
-        "elif Model == \"Erebus 13B\":\n",
-        "  Model = \"KoboldAI/OPT-13B-Erebus\"\n",
-        "  path = \"\"\n",
-        "  download = \"\"\n",
-        "elif Model == \"Shinen 13B\":\n",
-        "  Model = \"KoboldAI/fairseq-dense-13B-Shinen\"\n",
-        "  path = \"\"\n",
-        "  download = \"\"\n",
-        "elif Model == \"Erebus 20B\":\n",
-        "  Model = \"KoboldAI/GPT-NeoX-20B-Erebus\"\n",
-        "  path = \"\"\n",
-        "  download = \"\"\n",
        "elif Model == \"Skein 20B\":\n",
        "  Model = \"KoboldAI/GPT-NeoX-20B-Skein\"\n",
        "  path = \"\"\n",
@ -129,18 +119,6 @@
        "  Model = \"KoboldAI/GPT-J-6B-Adventure\"\n",
        "  path = \"\"\n",
        "  download = \"\"\n",
-        "elif Model == \"Lit V2 6B\":\n",
-        "  Model = \"hakurei/litv2-6B-rev3\"\n",
-        "  path = \"\"\n",
-        "  download = \"\"\n",
-        "elif Model == \"Lit 6B\":\n",
-        "  Model = \"hakurei/lit-6B\"\n",
-        "  path = \"\"\n",
-        "  download = \"\"\n",
-        "elif Model == \"Shinen 6B\":\n",
-        "  Model = \"KoboldAI/GPT-J-6B-Shinen\"\n",
-        "  path = \"\"\n",
-        "  download = \"\"\n",
        "elif Model == \"OPT 13B\":\n",
        "  Model = \"facebook/opt-13b\"\n",
        "  path = \"\"\n",
@ -162,7 +140,7 @@
        "else:\n",
        "  tunnel = \"\"\n",
        "\n",
-        "!wget https://koboldai.org/ckds -O - | bash /dev/stdin $path$download -m $Model -g $Version $tunnel"
+        "!wget https://koboldai.org/ckds -O - | bash /dev/stdin $path$download -m $Model -g $Version $tunnel $Revision"
      ]
    },
    {
@ -173,12 +151,9 @@
        "| Model | Style | Description |\n",
        "| --- | --- | --- |\n",
        "| [Nerys](https://huggingface.co/KoboldAI/fairseq-dense-13B-Nerys) by Mr Seeker | Novel/Adventure | Nerys is a hybrid model based on Pike (A newer Janeway), on top of the Pike dataset you also get some Light Novels, Adventure mode support and a little bit of Shinen thrown in the mix. The end result is a very diverse model that is heavily biased towards SFW novel writing, but one that can go beyond its novel training and make for an excellent adventure model to. Adventure mode is best played from a second person perspective, but can be played in first or third person as well. Novel writing can be done best from the first or third person. |\n",
-        "| [Erebus](https://huggingface.co/KoboldAI/OPT-13B-Erebus) by Mr Seeker | NSFW | Erebus is our community's flagship NSFW model, being a combination of multiple large datasets that include Literotica, Shinen and erotic novels from Nerys and featuring thourough tagging support it covers the vast majority of erotic writing styles. This model is capable of replacing both the Lit and Shinen models in terms of content and style and has been well received as (one of) the best NSFW models out there. If you wish to use this model for commercial or non research usage we recommend choosing the 20B version as that one is not subject to the restrictive OPT license. |\n",
        "| [Janeway](https://huggingface.co/KoboldAI/fairseq-dense-13B-Janeway) by Mr Seeker | Novel | Janeway is a model created from Picard's dataset combined with a brand new collection of ebooks. This model is trained on 20% more content than Picard and has been trained on literature from various genres. Although the model is mainly focussed on SFW, romantic scenes might involve a degree of nudity. |\n",
-        "| [Shinen](https://huggingface.co/KoboldAI/fairseq-dense-13B-Shinen) by Mr Seeker | NSFW | Shinen is an NSFW model trained on a variety of stories from the website Sexstories it contains many different kinks. It has been merged into the larger (and better) Erebus model. |\n",
        "| [Skein](https://huggingface.co/KoboldAI/GPT-J-6B-Skein) by VE\\_FORBRYDERNE | Adventure | Skein is best used with Adventure mode enabled, it consists of a 4 times larger adventure dataset than the Adventure model making it excellent for text adventure gaming. On top of that it also consists of light novel training further expanding its knowledge and writing capabilities. It can be used with the You filter bias if you wish to write Novels with it, but dedicated Novel models can perform better for this task. |\n",
        "| [Adventure](https://huggingface.co/KoboldAI/GPT-J-6B-Adventure) by VE\\_FORBRYDERNE | Adventure | Adventure is a 6B model designed to mimick the behavior of AI Dungeon. It is exclusively for Adventure Mode and can take you on the epic and wackey adventures that AI Dungeon players love. It also features the many tropes of AI Dungeon as it has been trained on very similar data. It must be used in second person (You). |\n",
-        "| [Lit](https://huggingface.co/hakurei/lit-6B) ([V2](https://huggingface.co/hakurei/litv2-6B-rev3)) by Haru | NSFW | Lit is a great NSFW model trained by Haru on both a large set of Literotica stories and high quality novels along with tagging support. Creating a high quality model for your NSFW stories. This model is exclusively a novel model and is best used in third person. |\n",
        "| [OPT](https://huggingface.co/facebook/opt-13b) by Metaseq | Generic | OPT is considered one of the best base models as far as content goes, its behavior has the strengths of both GPT-Neo and Fairseq Dense. Compared to Neo duplicate and unnecessary content has been left out, while additional literature was added in similar to the Fairseq Dense model. The Fairseq Dense model however lacks the broader data that OPT does have. The biggest downfall of OPT is its license, which prohibits any commercial usage, or usage beyond research purposes. |\n",
        "| [Neo(X)](https://huggingface.co/EleutherAI/gpt-neox-20b) by EleutherAI | Generic | NeoX is the largest EleutherAI model currently available, being a generic model it is not particularly trained towards anything and can do a variety of writing, Q&A and coding tasks. 20B's performance is closely compared to the 13B models and it is worth trying both especially if you have a task that does not involve english writing. Its behavior will be similar to the GPT-J-6B model since they are trained on the same dataset but with more sensitivity towards repetition penalty and with more knowledge. |\n",
        "| [Fairseq Dense](https://huggingface.co/KoboldAI/fairseq-dense-13B) | Generic | Trained by Facebook Researchers this model stems from the MOE research project within Fairseq. This particular version has been converted by us for use in KoboldAI. It is known to be on par with the larger 20B model from EleutherAI and considered as better for pop culture and language tasks. Because the model has never seen a new line (enter) it may perform worse on formatting and paragraphing. Compared to other models the dataset focuses primarily on literature and contains little else. |\n",
@ -189,13 +164,9 @@
        "| Model | Style | Description |\n",
        "| --- | --- | --- |\n",
        "| [Nerys](https://huggingface.co/KoboldAI/fairseq-dense-2.7B-Nerys) by Mr Seeker | Novel/Adventure | Nerys is a hybrid model based on Pike (A newer Janeway), on top of the Pike dataset you also get some Light Novels, Adventure mode support and a little bit of Shinen thrown in the mix. The end result is a very diverse model that is heavily biased towards SFW novel writing, but one that can go beyond its novel training and make for an excellent adventure model to. Adventure mode is best played from a second person perspective, but can be played in first or third person as well. Novel writing can be done best from the first or third person. |\n",
-        "| [Erebus](https://huggingface.co/KoboldAI/OPT-2.7B-Erebus) by Mr Seeker | NSFW | Erebus is our community's flagship NSFW model, being a combination of multiple large datasets that include Literotica, Shinen and erotic novels from Nerys and featuring thourough tagging support it covers the vast majority of erotic writing styles. This model is capable of replacing both the Lit and Shinen models in terms of content and style and has been well received as (one of) the best NSFW models out there. If you wish to use this model for commercial or non research usage we recommend choosing the 20B version as that one is not subject to the restrictive OPT license. |\n",
        "| [Janeway](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Janeway) by Mr Seeker | Novel | Janeway is a model created from Picard's dataset combined with a brand new collection of ebooks. This model is trained on 20% more content than Picard and has been trained on literature from various genres. Although the model is mainly focussed on SFW, romantic scenes might involve a degree of nudity. |\n",
        "| [Picard](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Picard) by Mr Seeker | Novel | Picard is a model trained for SFW Novels based on Neo 2.7B. It is focused on Novel style writing without the NSFW bias. While the name suggests a sci-fi model this model is designed for Novels of a variety of genre's. It is meant to be used in KoboldAI's regular mode. |\n",
        "| [AID](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-AID) by melastacho | Adventure | Also know as Adventure 2.7B this is a clone of the AI Dungeon Classic model and is best known for the epic wackey adventures that AI Dungeon Classic players love. |\n",
-        "| [Horni LN](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Horni-LN) by finetune | Novel | This model is based on Horni 2.7B and retains its NSFW knowledge, but was then further biased towards SFW novel stories. If you seek a balance between a SFW Novel model and a NSFW model this model should be a good choice. |\n",
-        "| [Horni](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Horni) by finetune | NSFW | This model is tuned on Literotica to produce a Novel style model biased towards NSFW content. Can still be used for SFW stories but will have a bias towards NSFW content. It is meant to be used in KoboldAI's regular mode. |\n",
-        "| [Shinen](https://huggingface.co/KoboldAI/GPT-Neo-2.7B-Shinen) by Mr Seeker | NSFW | Shinen is an alternative to the Horni model designed to be more explicit. If Horni is to tame for you Shinen might produce better results. While it is a Novel model it is unsuitable for SFW stories due to its heavy NSFW bias. Shinen will not hold back. It is meant to be used in KoboldAI's regular mode. |\n",
        "| [OPT](https://huggingface.co/facebook/opt-2.7b) by Metaseq | Generic | OPT is considered one of the best base models as far as content goes, its behavior has the strengths of both GPT-Neo and Fairseq Dense. Compared to Neo duplicate and unnecessary content has been left out, while additional literature was added in similar to the Fairseq Dense model. The Fairseq Dense model however lacks the broader data that OPT does have. The biggest downfall of OPT is its license, which prohibits any commercial usage, or usage beyond research purposes. |\n",
        "| [Fairseq Dense](https://huggingface.co/KoboldAI/fairseq-dense-2.7B) | Generic | Trained by Facebook Researchers this model stems from the MOE research project within Fairseq. This particular version has been converted by us for use in KoboldAI. It is known to be on par with the larger models from EleutherAI and considered as better for pop culture and language tasks. Because the model has never seen a new line (enter) it may perform worse on formatting and paragraphing. Compared to other models the dataset focuses primarily on literature and contains little else. |\n",
        "| [Neo](https://huggingface.co/EleutherAI/gpt-neo-2.7B) by EleutherAI | Generic | This is the base model for all the other 2.7B models, it is best used when you have a use case that we have no other models available for, such as writing blog articles or programming. It can also be a good basis for the experience of some of the softprompts if your softprompt is not about a subject the other models cover. |\n",
@ -204,7 +175,6 @@
        "| Style | Description |\n",
        "| --- | --- |\n",
        "| Novel | For regular story writing, not compatible with Adventure mode or other specialty modes. |\n",
-        "| NSFW | Indicates that the model is strongly biased towards NSFW content and is not suitable for children, work environments or livestreaming. Most NSFW models are also Novel models in nature. |\n",
        "| Adventure | These models are excellent for people willing to play KoboldAI like a Text Adventure game and are meant to be used with Adventure mode enabled. Even if you wish to use it as a Novel style model you should always have Adventure mode on and set it to story. These models typically have a strong bias towards the use of the word You and without Adventure mode enabled break the story flow and write actions on your behalf. |\n",
        "| Generic | Generic models are not trained towards anything specific, typically used as a basis for other tasks and models. They can do everything the other models can do, but require much more handholding to work properly. Generic models are an ideal basis for tasks that we have no specific model for, or for experiencing a softprompt in its raw form. |\n",
        "\n",
@ -240,7 +210,6 @@
      "name": "ColabKobold TPU",
      "provenance": [],
      "private_outputs": true,
-      "collapsed_sections": [],
      "include_colab_link": true
    },
    "kernelspec": {
--- a/colab/silence.m4a
+++ b/colab/silence.m4a
--- a/docker-cuda/Dockerfile
+++ b/docker-cuda/Dockerfile
@ -6,4 +6,4 @@ WORKDIR /content/
 COPY env.yml /home/micromamba/env.yml
 RUN micromamba install -y -n base -f /home/micromamba/env.yml
 USER root
-RUN apt update && apt install xorg -y
+RUN apt update && apt install xorg aria2 -y
--- a/docker-cuda/docker-compose.yml
+++ b/docker-cuda/docker-compose.yml
@ -5,6 +5,8 @@ services:
    environment:
      - DISPLAY=${DISPLAY} 
    network_mode: "host"
+    security_opt:
+      - label:disable
    volumes:
      - /tmp/.X11-unix:/tmp/.X11-unix
      - /etc/protocols:/etc/protocols:ro
--- a/docker-rocm/Dockerfile
+++ b/docker-rocm/Dockerfile
@ -3,4 +3,4 @@ WORKDIR /content/
 COPY env.yml /home/micromamba/env.yml
 RUN micromamba install -y -n base -f /home/micromamba/env.yml
 USER root
-RUN apt update && apt install xorg libsqlite3-0 -y
+RUN apt update && apt install xorg libsqlite3-0 aria2 -y
--- a/docker-rocm/docker-compose.yml
+++ b/docker-rocm/docker-compose.yml
@ -5,6 +5,8 @@ services:
    environment:
      - DISPLAY=${DISPLAY} 
    network_mode: "host"
+    security_opt:
+      - label:disable
    volumes:
      - /tmp/.X11-unix:/tmp/.X11-unix
      - /etc/protocols:/etc/protocols:ro
--- a/docker-standalone/docker-helper.sh
+++ b/docker-standalone/docker-helper.sh
@ -9,7 +9,7 @@ if [[ ! -v KOBOLDAI_DATADIR ]];then
 fi

 mkdir $KOBOLDAI_DATADIR/stories
-if [[ ! -v KOBOLDAI_MODELDIR ]];then
+if [[ -v KOBOLDAI_MODELDIR ]];then
 	mkdir $KOBOLDAI_MODELDIR/models
 fi
 mkdir $KOBOLDAI_DATADIR/settings
@ -28,7 +28,7 @@ rm -rf userscripts/
 rm softprompts
 rm -rf softprompts/

-if [[ ! -v KOBOLDAI_MODELDIR ]];then
+if [[ -v KOBOLDAI_MODELDIR ]];then
 	rm models
 	rm -rf models/
 	#rm cache
@ -39,7 +39,7 @@ ln -s $KOBOLDAI_DATADIR/stories/ stories
 ln -s $KOBOLDAI_DATADIR/settings/ settings
 ln -s $KOBOLDAI_DATADIR/softprompts/ softprompts
 ln -s $KOBOLDAI_DATADIR/userscripts/ userscripts
-if [[ ! -v KOBOLDAI_MODELDIR ]];then
+if [[ -v KOBOLDAI_MODELDIR ]];then
 	ln -s $KOBOLDAI_MODELDIR/models/ models
 	#ln -s $KOBOLDAI_MODELDIR/cache/ cache
 fi
--- a/environments/huggingface.yml
+++ b/environments/huggingface.yml
@ -5,12 +5,15 @@ channels:
  - defaults
 dependencies:
  - colorama
-  - flask-socketio
-  - flask-session
+  - flask=2.2.3
+  - flask-socketio=5.3.2
+  - flask-session=0.4.0
+  - python-socketio=5.7.2
  - pytorch=1.11.*
  - python=3.8.*
  - cudatoolkit=11.1
-  - eventlet
+  - eventlet=0.33.3
+  - dnspython=2.2.1
  - markdown
  - bleach=4.1.0
  - pip
@ -23,10 +26,12 @@ dependencies:
  - termcolor
  - psutil
  - pip:
-    - flask-cloudflared
+    - flask-cloudflared==0.0.10
    - flask-ngrok
+    - Werkzeug==2.3.7
    - lupa==1.10
-    - transformers>=4.20.1
-    - huggingface_hub>=0.10.1
+    - transformers==4.24.0
+    - huggingface_hub==0.12.1
+    - safetensors
    - accelerate
    - git+https://github.com/VE-FORBRYDERNE/mkultra
--- a/environments/rocm.yml
+++ b/environments/rocm.yml
@ -4,10 +4,13 @@ channels:
  - defaults
 dependencies:
  - colorama
-  - flask-socketio
-  - flask-session
+  - flask=2.2.3
+  - flask-socketio=5.3.2
+  - flask-session=0.4.0
+  - python-socketio=5.7.2
  - python=3.8.*
-  - eventlet
+  - eventlet=0.33.3
+  - dnspython=2.2.1
  - markdown
  - bleach=4.1.0
  - pip
@ -21,12 +24,13 @@ dependencies:
  - psutil
  - pip:
    - --extra-index-url https://download.pytorch.org/whl/rocm5.1.1
-    - torch
-    - torchvision
-    - flask-cloudflared
+    - torch==1.12.1+rocm5.1.1
+    - flask-cloudflared==0.0.10
    - flask-ngrok
+    - Werkzeug==2.3.7
    - lupa==1.10
-    - transformers>=4.20.1
-    - huggingface_hub>=0.10.1
+    - transformers==4.24.0
+    - huggingface_hub==0.12.1
+    - safetensors
    - accelerate
    - git+https://github.com/VE-FORBRYDERNE/mkultra
--- a/install_requirements.sh
+++ b/install_requirements.sh
@ -1,12 +1,12 @@
 #!/bin/bash
-if [[ $1 = "cuda" ]]; then
+if [[ $1 = "cuda" || $1 = "CUDA" ]]; then
 wget -qO- https://micromamba.snakepit.net/api/micromamba/linux-64/latest | tar -xvj bin/micromamba
 bin/micromamba create -f environments/huggingface.yml -r runtime -n koboldai -y
 # Weird micromamba bug causes it to fail the first time, running it twice just to be safe, the second time is much faster
 bin/micromamba create -f environments/huggingface.yml -r runtime -n koboldai -y
 exit
 fi
-if [[ $1 = "rocm" ]]; then
+if [[ $1 = "rocm" || $1 = "ROCM" ]]; then
 wget -qO- https://micromamba.snakepit.net/api/micromamba/linux-64/latest | tar -xvj bin/micromamba
 bin/micromamba create -f environments/rocm.yml -r runtime -n koboldai-rocm -y
 # Weird micromamba bug causes it to fail the first time, running it twice just to be safe, the second time is much faster
--- a/requirements.txt
+++ b/requirements.txt
@ -1,21 +1,25 @@
-transformers>=4.20.1
-huggingface_hub>=0.10.1
-Flask
-Flask-SocketIO
+transformers==4.24.0
+huggingface_hub==0.12.1
+Flask==2.2.3
+Flask-SocketIO==5.3.2
+Werkzeug==2.3.7
+python-socketio==5.7.2
 requests
 torch >= 1.9, < 1.13
-flask-cloudflared
+flask-cloudflared==0.0.10
 flask-ngrok
-eventlet
+eventlet==0.33.3
+dnspython==2.2.1
 lupa==1.10
 markdown
 bleach==4.1.0
 sentencepiece
 protobuf
 accelerate
-flask-session
+flask-session==0.4.0
 marshmallow>=3.13
 apispec-webframeworks
 loguru
 termcolor
+safetensors
 git+https://github.com/VE-FORBRYDERNE/mkultra
--- a/requirements_mtj.txt
+++ b/requirements_mtj.txt
@ -2,22 +2,26 @@ torch >= 1.9, < 1.13
 numpy
 tqdm
 requests
-dm-haiku == 0.0.5
-jax == 0.2.21
-jaxlib >= 0.1.69, <= 0.3.7
-transformers >= 4.20.1
-huggingface_hub >= 0.10.1
+dm-haiku==0.0.9
+jax==0.3.25
+jaxlib==0.3.25
+chex == 0.1.5
+transformers == 4.24.0
+huggingface_hub==0.12.1
 progressbar2
 git+https://github.com/VE-FORBRYDERNE/mesh-transformer-jax@ck
-flask
-Flask-SocketIO
-flask-cloudflared >= 0.0.5
+Flask==2.2.3
+Flask-SocketIO==5.3.2
+python-socketio==5.7.2
+flask-cloudflared==0.0.10
 flask-ngrok
-eventlet
+Werkzeug==2.3.7
+eventlet==0.33.3
+dnspython==2.2.1
 lupa==1.10
 markdown
 bleach==4.1.0
-flask-session
+flask-session==0.4.0
 marshmallow>=3.13
 apispec-webframeworks
 loguru
--- a/static/application.js
+++ b/static/application.js
@ -3492,28 +3492,26 @@ $(document).ready(function(){

 	// Shortcuts
 	$(window).keydown(function (ev) {
-		// Only ctrl prefixed (for now)
-		if (!ev.ctrlKey) return;
-
-		let handled = true;
-		switch (ev.key) {
-			// Ctrl+Z - Back
-			case "z":
-				button_actback.click();
-				break;
-			// Ctrl+Y - Forward
-			case "y":
-				button_actfwd.click();
-				break;
-			// Ctrl+E - Retry
-			case "e":
-				button_actretry.click();
-				break;
-			default:
-				handled = false;
+		if (ev.altKey)
+			switch (ev.key) {
+				// Alt+Z - Back
+				case "z":
+					button_actback.click();
+					break;
+				// Alt+Y - Forward
+				case "y":
+					button_actfwd.click();
+					break;
+				// Alt+R - Retry
+				case "r":
+					button_actretry.click();
+					break;
+				default:
+					return;
+		} else {
+			return;
 		}
-
-		if (handled) ev.preventDefault();
+		ev.preventDefault();
 	});

 	$("#anotetemplate").on("input", function() {
@ -3796,4 +3794,4 @@ function getSelectedOptions(element) {
 		output.push(item.value);
 	}
    return output;
-}
+}
--- a/torch_lazy_loader.py
+++ b/torch_lazy_loader.py
@ -54,6 +54,7 @@ import numpy as np
 import collections
 import _codecs
 import utils
+import os
 from torch.nn import Module
 from typing import Any, Callable, Dict, Optional, Tuple, Type, Union

@ -93,12 +94,16 @@ class LazyTensor:
    def __repr__(self):
        return self.__view(repr)

-    def materialize(self, checkpoint: Union[zipfile.ZipFile, zipfile.ZipExtFile], map_location=None, no_grad=True) -> torch.Tensor:
+    def materialize(self, checkpoint: Union[zipfile.ZipFile, zipfile.ZipExtFile], map_location=None, no_grad=True, filename="pytorch_model.bin") -> torch.Tensor:
+        filename = os.path.basename(os.path.normpath(filename)).split('.')[0]
        size = reduce(lambda x, y: x * y, self.shape, 1)
        dtype = self.dtype
        nbytes = size if dtype is torch.bool else size * ((torch.finfo if dtype.is_floating_point else torch.iinfo)(dtype).bits >> 3)
        if isinstance(checkpoint, zipfile.ZipFile):
-            f = checkpoint.open(f"archive/data/{self.key}", "r")
+            try:
+                f = checkpoint.open(f"archive/data/{self.key}", "r")
+            except:
+                f = checkpoint.open(f"{filename}/data/{self.key}", "r")
            f.read(self.seek_offset)
        else:
            f = checkpoint
--- a/tpu_mtj_backend.py
+++ b/tpu_mtj_backend.py
@ -1049,7 +1049,7 @@ def read_neox_checkpoint(state, path, config, checkpoint_shards=2):
                raise RuntimeError(error)


-def load_model(path: str, driver_version="tpu_driver0.1_dev20210607", hf_checkpoint=False, **kwargs) -> None:
+def load_model(path: str, driver_version="tpu_driver_20221109", hf_checkpoint=False, socketio_queue=None, initial_load=False, logger=None, **kwargs) -> None:
    global thread_resources_env, seq, tokenizer, network, params, pad_token_id

    if "pad_token_id" in kwargs:
@ -1149,7 +1149,8 @@ def load_model(path: str, driver_version="tpu_driver0.1_dev20210607", hf_checkpo
            params[param] = default_params[param]

    # Use an optimization that will allow us to avoid one extra transpose operation
-    params["transposed_linear"] = True
+    if hf_checkpoint:
+        params["transposed_linear"] = True

    # Load tokenizer
    if vars.model == "TPUMeshTransformerGPTNeoX":
@ -1194,10 +1195,6 @@ def load_model(path: str, driver_version="tpu_driver0.1_dev20210607", hf_checkpo
    thread_resources_env = maps.ResourceEnv(maps.Mesh(devices, ('dp', 'mp')), ())
    maps.thread_resources.env = thread_resources_env

-    global shard_xmap, batch_xmap
-    shard_xmap = __shard_xmap()
-    batch_xmap = __batch_xmap(shard_dim=cores_per_replica)
-
    global badwords
    # These are the tokens that we don't want the AI to ever write
    badwords = jnp.array(vars.badwordsids).squeeze()
@ -1243,6 +1240,7 @@ def load_model(path: str, driver_version="tpu_driver0.1_dev20210607", hf_checkpo
    from tqdm.auto import tqdm
    import functools

+
    def callback(model_dict, f, **_):
        if callback.nested:
            return
@ -1250,6 +1248,7 @@ def load_model(path: str, driver_version="tpu_driver0.1_dev20210607", hf_checkpo
        with zipfile.ZipFile(f, "r") as z:
            try:
                last_storage_key = None
+                zipfolder = os.path.basename(os.path.normpath(f)).split('.')[0]
                f = None
                current_offset = 0
                if utils.current_shard == 0:
@ -1282,7 +1281,10 @@ def load_model(path: str, driver_version="tpu_driver0.1_dev20210607", hf_checkpo
                        last_storage_key = storage_key
                        if isinstance(f, zipfile.ZipExtFile):
                            f.close()
-                        f = z.open(f"archive/data/{storage_key}")
+                        try:
+                            f = z.open(f"archive/data/{storage_key}")
+                        except:
+                            f = z.open(f"{zipfolder}/data/{storage_key}")
                        current_offset = 0
                    if current_offset != model_dict[key].seek_offset:
                        f.read(model_dict[key].seek_offset - current_offset)
@ -1312,19 +1314,20 @@ def load_model(path: str, driver_version="tpu_driver0.1_dev20210607", hf_checkpo
                    #if "no_transpose" not in transforms and tensor.ndim == 2:
                    #    tensor = tensor.T
                    tensor.unsqueeze_(0)
-                    if tensor.dtype is torch.float16 or tensor.dtype is torch.float32:
-                        tensor = tensor.bfloat16()
+                    

                    # Shard the tensor so that parts of the tensor can be used
                    # on different TPU cores
+                    tensor = reshard_reverse(
+                        tensor,
+                        params["cores_per_replica"],
+                        network.state["params"][spec["module"]][spec["param"]].shape,
+                    )
+                    tensor = jnp.array(tensor.detach())
+                    if tensor.dtype is torch.float16 or tensor.dtype is torch.float32:
+                        tensor = tensor.bfloat16()
                    network.state["params"][spec["module"]][spec["param"]] = move_xmap(
-                        jax.dlpack.from_dlpack(torch.utils.dlpack.to_dlpack(
-                            reshard_reverse(
-                                tensor,
-                                params["cores_per_replica"],
-                                network.state["params"][spec["module"]][spec["param"]].shape,
-                            )
-                        )).copy(),
+                        tensor,
                        np.empty(params["cores_per_replica"]),
                    )

@ -1411,3 +1414,6 @@ def load_model(path: str, driver_version="tpu_driver0.1_dev20210607", hf_checkpo
                model     = GPTNeoForCausalLM.from_pretrained(vars.model, revision=vars.revision, cache_dir="cache")

    #network.state = network.move_xmap(network.state, np.zeros(cores_per_replica))
+    global shard_xmap, batch_xmap
+    shard_xmap = __shard_xmap()
+    batch_xmap = __batch_xmap(shard_dim=cores_per_replica)
--- a/utils.py
+++ b/utils.py
@ -261,7 +261,7 @@ def _transformers22_aria2_hook(pretrained_model_name_or_path: str, force_downloa
            if token is None:
                raise EnvironmentError("You specified use_auth_token=True, but a huggingface token was not found.")
    _cache_dir = str(cache_dir) if cache_dir is not None else transformers.TRANSFORMERS_CACHE
-    _revision = revision if revision is not None else huggingface_hub.constants.DEFAULT_REVISION
+    _revision = args.revision if args.revision is not None else huggingface_hub.constants.DEFAULT_REVISION
    sharded = False
    headers = {"user-agent": transformers.file_utils.http_user_agent(user_agent)}
    if use_auth_token:
@ -272,7 +272,7 @@ def _transformers22_aria2_hook(pretrained_model_name_or_path: str, force_downloa

    def is_cached(filename):
        try:
-            huggingface_hub.hf_hub_download(pretrained_model_name_or_path, filename, cache_dir=cache_dir, local_files_only=True)
+            huggingface_hub.hf_hub_download(pretrained_model_name_or_path, filename, cache_dir=cache_dir, local_files_only=True, revision=_revision)
        except ValueError:
            return False
        return True
@ -281,7 +281,7 @@ def _transformers22_aria2_hook(pretrained_model_name_or_path: str, force_downloa
            filename = transformers.modeling_utils.WEIGHTS_INDEX_NAME if sharded else transformers.modeling_utils.WEIGHTS_NAME
        except AttributeError:
            return
-        url = huggingface_hub.hf_hub_url(pretrained_model_name_or_path, filename, revision=revision)
+        url = huggingface_hub.hf_hub_url(pretrained_model_name_or_path, filename, revision=_revision)
        if is_cached(filename) or requests.head(url, allow_redirects=True, proxies=proxies, headers=headers):
            break
        if sharded:
@ -295,7 +295,7 @@ def _transformers22_aria2_hook(pretrained_model_name_or_path: str, force_downloa
        with open(map_filename) as f:
            map_data = json.load(f)
        filenames = set(map_data["weight_map"].values())
-    urls = [huggingface_hub.hf_hub_url(pretrained_model_name_or_path, n, revision=revision) for n in filenames]
+    urls = [huggingface_hub.hf_hub_url(pretrained_model_name_or_path, n, revision=_revision) for n in filenames]
    if not force_download:
        urls = [u for u, n in zip(urls, filenames) if not is_cached(n)]
        if not urls:
@ -460,6 +460,7 @@ def aria2_hook(pretrained_model_name_or_path: str, force_download=False, cache_d
    import transformers
    import transformers.modeling_utils
    from huggingface_hub import HfFolder
+    _revision = args.revision if args.revision is not None else huggingface_hub.constants.DEFAULT_REVISION
    if shutil.which("aria2c") is None:  # Don't do anything if aria2 is not installed
        return
    if local_files_only:  # If local_files_only is true, we obviously don't need to download anything
@ -494,7 +495,7 @@ def aria2_hook(pretrained_model_name_or_path: str, force_download=False, cache_d
            filename = transformers.modeling_utils.WEIGHTS_INDEX_NAME if sharded else transformers.modeling_utils.WEIGHTS_NAME
        except AttributeError:
            return
-        url = huggingface_hub.hf_hub_url(pretrained_model_name_or_path, filename, revision=revision)
+        url = huggingface_hub.hf_hub_url(pretrained_model_name_or_path, filename, revision=_revision)
        if is_cached(url) or requests.head(url, allow_redirects=True, proxies=proxies, headers=headers):
            break
        if sharded:
@ -508,7 +509,7 @@ def aria2_hook(pretrained_model_name_or_path: str, force_download=False, cache_d
        with open(map_filename) as f:
            map_data = json.load(f)
        filenames = set(map_data["weight_map"].values())
-    urls = [huggingface_hub.hf_hub_url(pretrained_model_name_or_path, n, revision=revision) for n in filenames]
+    urls = [huggingface_hub.hf_hub_url(pretrained_model_name_or_path, n, revision=_revision) for n in filenames]
    if not force_download:
        urls = [u for u in urls if not is_cached(u)]
        if not urls:
@ -555,7 +556,8 @@ def get_num_shards(filename):
 def get_sharded_checkpoint_num_tensors(pretrained_model_name_or_path, filename, cache_dir=None, force_download=False, proxies=None, resume_download=False, local_files_only=False, use_auth_token=None, user_agent=None, revision=None, **kwargs):
    import transformers.modeling_utils
    import torch
-    shard_paths, _ = transformers.modeling_utils.get_checkpoint_shard_files(pretrained_model_name_or_path, filename, cache_dir=cache_dir, force_download=force_download, proxies=proxies, resume_download=resume_download, local_files_only=local_files_only, use_auth_token=use_auth_token, user_agent=user_agent, revision=revision)
+    _revision = args.revision if args.revision is not None else huggingface_hub.constants.DEFAULT_REVISION
+    shard_paths, _ = transformers.modeling_utils.get_checkpoint_shard_files(pretrained_model_name_or_path, filename, cache_dir=cache_dir, force_download=force_download, proxies=proxies, resume_download=resume_download, local_files_only=local_files_only, use_auth_token=use_auth_token, user_agent=user_agent, revision=_revision)
    return list(itertools.chain(*(torch.load(p, map_location="cpu").keys() for p in shard_paths)))

 #==================================================================#
Author	SHA1	Message	Date
henk717	f49d763e2a	Promote Colabcpp	2024-01-02 14:08:53 +01:00
henk717	fd24d95981	Crash without a GPU	2023-11-05 01:42:13 +01:00
henk717	61a0042c66	Echidna	2023-10-28 03:05:02 +02:00
henk717	8b7ab2f93b	Match colab description for Tiefighter	2023-10-27 15:58:49 +02:00
henk717	0ea758b789	Better Tiefighter description	2023-10-27 15:57:08 +02:00
henk717	2db1812ee4	Merge pull request #409 from RecoveredApparatus/main Updated Model list and description in Read.md and GPU.ipynb markdown	2023-10-27 15:52:37 +02:00
anhad	3287328fe4	Update the model list in both Read.md and Colab markdown	2023-10-25 14:53:00 +05:30
anhad	a92951f47e	Updated Readme.md	2023-10-24 10:08:34 +05:30
henk717	7d39b353c0	Tiefighter on Colab	2023-10-19 20:07:01 +02:00
henk717	58b4c48fdb	Disable GPTQ for now to enable higher context	2023-10-16 21:23:27 +02:00
henk717	bf61e5ef02	Emerhyst	2023-10-09 05:06:26 +02:00
Henk	386fd1f034	Werkzeug Fix	2023-10-06 13:51:52 +02:00
henk717	d86f61151b	Working revision support	2023-08-23 22:07:37 +02:00
henk717	ebab774aab	Add Holomax	2023-08-14 18:19:03 +02:00
henk717	ee93fe6e4a	Add model cleaner	2023-08-11 22:39:49 +02:00
henk717	9cb93d6b4c	Add some 13B's for easier beta testing	2023-08-10 23:56:44 +02:00
henk717	d6b1ff513d	More cleanup - TPU	2023-05-11 15:24:25 +02:00
henk717	c11a269493	Model cleanup - GPU	2023-05-11 02:55:28 +02:00
henk717	148f900324	Cleaned up model list - TPU	2023-05-11 02:52:45 +02:00
henk717	b66110ea54	Created using Colaboratory	2023-05-08 18:54:41 +02:00
henk717	d2b399d7bc	Merge pull request #311 from SmolBleat/main Add Nerybus Models	2023-05-08 16:59:24 +02:00
henk717	f2b643a639	Merge pull request #239 from waffshappen/patch-2 Allow Project File Access with Podman+Selinux	2023-05-08 16:58:51 +02:00
Henk	1499763472	Flask fix	2023-04-29 02:44:41 +02:00
SmolBleat	692fe2e5ee	Add Nerybus Models	2023-04-24 21:01:29 +02:00
henk717	c3bf89a94f	Missed a spot	2023-04-24 19:20:35 +02:00
henk717	1ae1d499e8	Remove banned model	2023-04-24 19:15:17 +02:00
henk717	b808f039ab	Pin TPU driver	2023-04-23 20:21:28 +02:00
henk717	d88f109073	TPU Fix Fix	2023-04-23 18:49:25 +02:00
henk717	b4cb09590f	Update requirements_mtj.txt	2023-04-23 18:23:38 +02:00
henk717	5f0e2001a7	Remove broken TPU disclaimer	2023-04-23 17:50:03 +02:00
henk717	dddde7dbc3	Merge pull request #306 from Zurnaz/tpu_fix Fix: TPU driver error	2023-04-23 12:32:14 +02:00
Bogdan Drema	92a0bf9524	Fix: TPU driver error to_dlpack/from_dlpack was causing issues with tensor with new jax version	2023-04-23 00:49:42 +01:00
henk717	e4c15fe1f6	Update install_requirements.sh	2023-04-21 03:00:52 +02:00
henk717	b432d55d99	Merge pull request #291 from Relys/patch-1 Update install_requirements.sh	2023-04-20 14:03:35 +02:00
henk717	ee6e7e9b72	Colab description changes	2023-04-17 22:59:55 +02:00
Syler Clayton	860b697a70	Update install_requirements.sh Made parameter case insensitive.	2023-04-15 09:51:45 -07:00
henk717	29c2d4b7a6	Removing Pygmalion from the TPU colab to get it unbanned	2023-04-04 19:51:18 +02:00
henk717	fd12214091	Clean the description of the GPU colab	2023-04-04 19:40:22 +02:00
henk717	bb51127bbf	We no longer support Pygmalion on Colab due to Google's Pygmalion ban	2023-04-04 19:37:15 +02:00
henk717	72b4669563	Fix the chex dependency	2023-03-30 23:41:35 +02:00
henk717	ab779efe0e	Merge pull request #276 from YellowRoseCx/stable-branch Update README and remove unavailable model from gpu.ipynb	2023-03-30 00:50:15 +02:00
YellowRoseCx	3c48a77a52	Update README.md changed Colab GPU models listed to their higher quality counter parts	2023-03-29 17:44:44 -05:00
YellowRoseCx	f826930c02	Update GPU.ipynb removed litv2-6B-rev3	2023-03-29 17:41:01 -05:00
henk717	66264d38c4	Add Mixes	2023-03-28 00:23:10 +02:00
henk717	94eb8ff825	TPU Message	2023-03-19 14:52:14 +01:00
Henk	219b824b9b	SocketIO Requirements Pin	2023-03-17 01:28:59 +01:00
henk717	ffa5c0bc13	Empty Revision Fix	2023-03-08 20:52:03 +01:00
henk717	487739911a	Restore Pygmalion 6B Dev	2023-03-08 18:44:03 +01:00
Henk	2ed6cdb411	Huggingface Hub Pin	2023-03-08 18:03:36 +01:00
henk717	142cb354f9	Nerybus 13B - TPU colab	2023-03-01 22:33:11 +01:00
Henk	93bf023bd7	Use our own horde URL	2023-03-01 17:54:39 +01:00
henk717	750cc3d2dc	Merge pull request #245 from db0/kaimergemain2 Makes prod version of KAI work with merged hordes in stablehorde.net	2023-03-01 17:52:58 +01:00
Henk	0e06fc371f	Modeldir Fix	2023-02-27 17:46:33 +01:00
Divided by Zer0	6426e3ca24	changes	2023-02-23 18:34:46 +01:00
Divided by Zer0	2de9672b95	attempt1	2023-02-23 18:27:11 +01:00
henk717	c27faf56e6	Updated Silence Audio - GPU	2023-02-20 18:43:06 +01:00
henk717	5962a6cb4f	Updated Audio Link - TPU	2023-02-20 18:41:06 +01:00
Henk	1378fe8beb	Silence file for colab	2023-02-20 18:28:58 +01:00
waffshappen	a0d4497c95	Also update CUDA container	2023-02-16 10:37:58 +00:00
waffshappen	d026bd79cb	Allow Project File Access with Podman+Selinux With selinux enabled distros containers accessing KoboldAIs main directory as content, as planned here, will likely generally be denied (atleast with podman). Option 1 would be to mark it with the right label - like :z - but that has other Implications for the content directory. The other fix, if uglier, is to run the container without labels being enforced and thus allow the file access as the same user and with no further sideeffects to the project file labelling.	2023-02-15 23:32:41 +00:00
Henk	cc01ad730a	Don't install safetensors for MTJ	2023-02-11 11:20:21 +01:00
Henk	b58daa1ba1	Pin Flask-cloudflared	2023-02-10 19:11:13 +01:00
henk717	661bd5c99e	Hide Pygmalion 6B Dev, currently only supported on the GPU	2023-01-31 19:24:19 +01:00
Henk	257a535be5	Revision Fixes Fixes	2023-01-31 05:17:34 +01:00
Henk	739cccd8ed	Revision Fixes	2023-01-31 04:48:46 +01:00
henk717	e9cf9fa6d0	Pygmalion Dev support	2023-01-27 05:20:09 +01:00
henk717	031c06347f	Streamlining Revision Support	2023-01-24 13:51:08 +01:00
henk717	a185cbd015	Fix Defaults	2023-01-24 13:31:45 +01:00
henk717	a046db4ded	Gemaakt met Colaboratory	2023-01-24 13:16:49 +01:00
henk717	47a27fa906	Cloudflare as default again - GPU	2023-01-23 18:15:37 +01:00
henk717	24f50d6fb7	Download Manager Support docker-rocm	2023-01-18 02:04:45 +01:00
henk717	22acde1ab7	Download Manager Support docker-cuda	2023-01-18 02:04:14 +01:00
Henk	e9859cf17d	DNSPython workaround DNSPython had an update eventlet is not ready for. We now manually cap DNSPython to ensure the installations still happen correctly.	2023-01-16 16:32:17 +01:00
Henk	307fc97b9d	ROCm Dependency Bump/Fix	2023-01-13 22:49:32 +01:00
henk717	4a88e41d14	Pygmalion 6B	2023-01-10 17:22:03 +01:00
henk717	1628b789d1	Add Pygmalion	2023-01-09 23:36:43 +01:00
Henk	857476ef6b	ROCm torch version pin	2023-01-08 17:10:59 +01:00
Henk	7fc5c46c1d	Add Safetensors Having the dependency adds basic support for safetensor models.	2023-01-06 16:40:56 +01:00
henk717	1dbc987048	6B models now Colab Free has beefier GPU's	2022-12-21 16:52:41 +01:00
henk717	a04f99891f	Merge pull request #194 from Gouvernathor/patch-1 Update usage instructions for git-clone use	2022-12-21 15:36:15 +01:00
henk717	75fecb86cc	Merge pull request #196 from henk717/united Improved model support & Shortcut Fixes	2022-12-18 20:18:13 +01:00
Gouvernathor	a4f49c097a	Add git clone command and Linux case	2022-12-17 12:13:51 +01:00
Gouvernathor	55cf5f2f67	Update usage instructions for git-clone use	2022-12-17 04:10:56 +01:00
henk717	23b2d3a99e	Merge pull request #236 from one-some/united Move shortcuts to Alt	2022-12-17 00:37:18 +01:00
somebody	9efbe381cf	Move shortcuts to Alt from Ctrl	2022-12-16 16:47:22 -06:00
henk717	0a926e41e4	Merge pull request #235 from VE-FORBRYDERNE/patch Fix materialize function for galactica models	2022-12-12 20:15:54 +01:00
vfbd	33ba3e7e27	Fix materialize function for galactica models	2022-12-12 14:11:08 -05:00
Henk	eeb1774d42	Cleaner implementation of zipfolder	2022-12-10 19:23:08 +01:00
Henk	9a8e8a0005	New pytorch zipfile support	2022-12-10 19:11:07 +01:00
henk717	dd7363548c	Merge pull request #191 from henk717/united Probability Viewer Fix	2022-12-09 21:56:41 +01:00
henk717	686845cd21	Merge pull request #234 from one-some/united Move probability visualization to after logitwarpers	2022-12-09 21:22:33 +01:00
somebody	e6656d68a1	Move probability visualization to after logitwarpers	2022-12-09 13:47:38 -06:00
henk717	55ef53f39b	Typo fix	2022-12-08 15:17:10 +01:00
henk717	0b3e22ee13	Merge pull request #185 from henk717/united Pin transformers version	2022-12-02 02:03:23 +01:00
Henk	d0cb463c53	Pin transformers version To avoid breaking changes lets force the exact transformers version we code against. This will be automatically picked up by all the automatic updaters.	2022-12-02 01:48:12 +01:00
henk717	e8245478d6	Merge pull request #184 from henk717/united Cap transformers version	2022-12-02 01:27:18 +01:00
henk717	f72ceeadd0	Cap transformers version Since MTJ is low level, we force a fixed transformers version to have more controlled updates when needed	2022-12-02 01:10:59 +01:00
henk717	04d9172fcd	Merge pull request #180 from VE-FORBRYDERNE/patch Only enable TPU transpose optimization if loading from HF model	2022-11-21 20:02:14 +01:00
vfbd	9a3f0eaab2	Only enable TPU transpose optimization if loading from HF model	2022-11-21 13:47:18 -05:00