Qwen Image Edit ModelOpt FP8 SGLang Transformer

This repository contains a SGLang-ready ModelOpt FP8 transformer override for Qwen/Qwen-Image-Edit. It only replaces the transformer weights; tokenizer, image encoder, scheduler, VAE, and other non-transformer components are loaded from the original base model.

The checkpoint is intended for SGLang Diffusion with the Qwen Image Edit FP8 support from sgl-project/sglang#23155.

Usage

Use your own input image, or download the validation input image from this repository:

huggingface-cli download BBuf/Qwen-Image-Edit-ModelOpt-FP8-SGLang \
  validation/assets/qwen_image_edit_input.png \
  --local-dir /tmp/qwen-image-edit-fp8
sglang generate \
  --backend=sglang \
  --model-id=Qwen-Image-Edit \
  --model-path Qwen/Qwen-Image-Edit \
  --transformer-path BBuf/Qwen-Image-Edit-ModelOpt-FP8-SGLang \
  --prompt "A clean product photo of a small ceramic teapot on a wooden table, soft daylight, sharp details." \
  --image-path /tmp/qwen-image-edit-fp8/validation/assets/qwen_image_edit_input.png \
  --width=512 \
  --height=512 \
  --num-inference-steps=8 \
  --guidance-scale=4.0 \
  --seed=42 \
  --num-gpus=1 \
  --dit-cpu-offload false \
  --dit-layerwise-offload false \
  --warmup \
  --save-output

H100 Validation Snapshot

Validation was run on one H100 GPU using rank0 with --backend=sglang. The FP8 image below is from the fixed checkpoint after keeping the validated sensitive Qwen Image fallback tensors in BF16.

Artifacts:

Input BF16, 512x512, 8 steps FP8 fixed, 512x512, 8 steps
Input image BF16 edit output FP8 fixed edit output

Benchmark, warmup excluded:

Metric BF16 FP8 fixed Delta Speedup
E2E latency 6.792 s 6.085 s -0.707 s (-10.4%) 1.12x
Denoising stage 5.204 s 4.524 s -0.680 s (-13.1%) 1.15x
Decoding stage 154.77 ms 121.06 ms -33.72 ms (-21.8%) 1.28x
Image encoding 1.316 s 1.328 s +0.011 s (+0.9%) 0.99x
Image VAE encoding 100.62 ms 94.93 ms -5.69 ms (-5.7%) 1.06x

Notes:

  • Validation prompt: A clean product photo of a small ceramic teapot on a wooden table, soft daylight, sharp details.
  • Validation settings: 512x512, 8 inference steps, guidance_scale=4.0, seed=42, --dit-cpu-offload false, --dit-layerwise-offload false, --warmup.

Conversion Notes

The checkpoint was converted from a NVIDIA ModelOpt FP8 export with SGLang's build_modelopt_fp8_transformer tool. Most linear weights are FP8. The validated fallback set keeps numerically sensitive tensors in BF16, including the Qwen Image image-MLP output projection family needed for normal image quality.

Downloads last month
81
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BBuf/Qwen-Image-Edit-ModelOpt-FP8-SGLang

Quantized
(12)
this model