After extensive testing, I've adjusted repetition penalty slightly to be the same as simple-proxy-for-tavern's default preset and ooba's LLaMA-Precise settings preset. This fixed some models talking/acting as User.
Here are two presets I've found very useful for Llama 2-based models:
- Deterministic takes away the randomness and is good for testing/comparing models because same input equals same output.
- Storywriter-Llama2 is the Storywriter preset adjusted for Llama 2's 4K context size. It also works well against Llama 2's repetition/looping issues.