A quantization setup used for GLM-4.7:

  • Weights: NVFP4
  • KV cache: BF16
  • Tooling: NVIDIA/Model-Optimizer
  • Deploy with TensorRT-LLM
Downloads last month
63
Safetensors
Model size
177B params
Tensor type
BF16
F32
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for soundsgoodai/GLM-4.7-NVFP4-KV-cache-BF16

Base model

zai-org/GLM-4.7
Quantized
(42)
this model