Yahooo, decoupling from koboldai_vars. This makes the generation test
pass in `test_generation.py`, and makes full determinism outside of
core_generate work.
We now do some basic tests for:
- hf torch loading (normal, lazy, lowmem)
- hf torch generation (shape batches, shape tokencount, faulty
determinism)
Currently full determinism is failing; yahoo, the tests work!
All of the tests initally failed (note the test environment functions
different from the aiserver environment due to aiserver doing a lot of
initalizing stuff, working on phasing that out) but now only one fails.
Very useful for finding bugs!
A rather embarassing way to spend an hour debugging after I told myself
"I'd better remember to add this important thing to the torch side".
Samplers were being applied when in their "off values" causing
boring mathmatical operations to take place (ie anything x 0 is always
0)