Rather than using a separate BERT model to classify the last message,
use the LLM itself to get the classified expression label as a JSON
and set that as the current sprite. Doing this should take more information
into consideration and cut down on extra processing.
This is made possible by the use of constrained generation with JSON
schemas. Only available to TabbyAPI since it's the only backend that
supports the use of JSON schemas, but there can hopefully be a way
to use this with other backends as well.
Intercepts the generation and sets top_k = 1 (for greedy sampling)
and the json_schema to an emotion enum. Doing this also prevents
reingestion of the entire context every time a message is sent and
then asked to be classified, which doesn't compromise the chat
experience.
Signed-off-by: kingbri <bdashore3@proton.me>