Quantization

Research papers, repositories, and articles about quantization

Showing 1 of 1 items

Creating the NVIDIA Nemotron 3 Ultra NVFP4 Checkpoint with NVIDIA Model Optimizer

NVIDIA walks through how they quantized a 550B-parameter Nemotron 3 Ultra model to NVFP4 while matching BF16 accuracy and gaining up to ~5.9x throughput. They share concrete layer-by-layer recipes and scaling tricks like "four-over-six" FP4 scaling. If you're chasing cheaper training or serving on Blackwell, this is a detailed playbook. ([developer.nvidia.com](https://developer.nvidia.com/blog/creating-the-nvidia-nemotron-3-ultra-nvfp4-checkpoint-with-nvidia-model-optimizer/))

NVIDIA Technical Blog