SillyTavern/public/scripts/gpt-2-3-tokenizer/README.md
2023-07-20 20:32:15 +03:00

29 lines
1020 B
Markdown

# GPT-2/3 Tokenizer
GPT-2/3 byte pair encoder/decoder/tokenizer based on [@latitudegames/GPT-3-Encoder](https://github.com/latitudegames/GPT-3-Encoder) that works in the browser and Deno.
See also: [JS byte pair encoder for OpenAI's CLIP model](https://github.com/josephrocca/clip-bpe-js).
```js
import {encode, decode} from "https://deno.land/x/gpt_2_3_tokenizer@v0.0.2/mod.js";
let text = "hello world";
console.log(encode(text)); // [258, 18798, 995]
console.log(decode(encode(text))); // "hello world"
```
or:
```js
let mod = await import("https://deno.land/x/gpt_2_3_tokenizer@v0.0.2/mod.js");
mod.encode("hello world"); // [258, 18798, 995]
```
or to include it as a global variable in the browser:
```html
<script type=module>
import tokenizer from "https://deno.land/x/gpt_2_3_tokenizer@v0.0.2/mod.js";
window.tokenizer = tokenizer;
</script>
```
# License
The [original code is MIT Licensed](https://github.com/latitudegames/GPT-3-Encoder/blob/master/LICENSE) and so are any changes made by this repo.