13 Commits

Author SHA1 Message Date
Gnome Ann
e7f65cee09 XGLM breakmodel 2022-02-01 13:04:35 -05:00
Gnome Ann
95aff61781 Don't pin CPU layers after running out of pinned memory 2021-11-26 10:31:15 -05:00
Gnome Ann
25c9be5d02 Breakmodel support for GPTJModel 2021-11-25 18:09:16 -05:00
Gnome Ann
f8bcc3411b In breakmodel mode, move layers to GPU as soon as model loads
Rather than during the first generation.
2021-11-25 11:44:41 -05:00
Gnome Ann
3649ba9fa4 Breakmodel's CUDA stream should be on primary device 2021-10-06 12:04:56 -04:00
Gnome Ann
f9e6a6da17 Slightly increased performance in breakmodel mode
Commit a283d34b2731abfe7f5f1e939117491f0755cedb made breakmodel mode
slower. Performance has been restored to how it was before that commit.
2021-10-05 10:25:06 -04:00
Gnome Ann
a283d34b27 Multiple GPU support 2021-10-05 09:38:57 -04:00
Gnome Ann
0937bb33e7 Clarify licensing for breakmodel.py 2021-10-02 12:19:37 -04:00
Gnome Ann
4d9eab3785 K80 test 2021-09-23 20:57:18 -04:00
Gnome Ann
b5c28f4e07 Fix for when breakmodel layers is 0 2021-08-28 02:19:51 -04:00
Gnome Ann
8bfcf86a8b Fix for non-rotary models without "rotary" in config.json 2021-08-20 13:00:53 -04:00
Gnome Ann
eef0db8dee Specifically import torch.cuda.comm in breakmodel.py 2021-08-20 10:47:54 -04:00
Gnome Ann
b1c13f832a Implement arrmansa's low VRAM patch 2021-08-20 10:25:03 -04:00