by Tepix 3 days ago

Sounds good. I saw that you use the FP8 version of the model. Do you also quantize the KV cache?

sacrelege 3 days ago | [-0 more]

no I don't, since there seem to be a silent degradation bug