by scottcha 3 hours ago
Hi I'm the CTO of neuralwatt, would love to hear your feedback on what your experience was. Feel free to email me scott@neuralwatt.com. Also for GLM5.2 we run the FP8 quantization at 1M context which is a common deployment target.