Qwen Image Edit ModelOpt FP8 SGLang Transformer

This repository contains a SGLang-ready ModelOpt FP8 transformer override for Qwen/Qwen-Image-Edit. It only replaces the transformer weights; tokenizer, image encoder, scheduler, VAE, and other non-transformer components are loaded from the original base model.

The checkpoint is intended for SGLang Diffusion with the Qwen Image Edit FP8 support from sgl-project/sglang#23155.

Usage

Use your own input image, or download the validation input image from this repository:

huggingface-cli download BBuf/Qwen-Image-Edit-ModelOpt-FP8-SGLang \
  validation/assets/qwen_image_edit_input.png \
  --local-dir /tmp/qwen-image-edit-fp8

sglang generate \
  --backend=sglang \
  --model-id=Qwen-Image-Edit \
  --model-path Qwen/Qwen-Image-Edit \
  --transformer-path BBuf/Qwen-Image-Edit-ModelOpt-FP8-SGLang \
  --prompt "A clean product photo of a small ceramic teapot on a wooden table, soft daylight, sharp details." \
  --image-path /tmp/qwen-image-edit-fp8/validation/assets/qwen_image_edit_input.png \
  --width=512 \
  --height=512 \
  --num-inference-steps=8 \
  --guidance-scale=4.0 \
  --seed=42 \
  --num-gpus=1 \
  --dit-cpu-offload false \
  --dit-layerwise-offload false \
  --warmup \
  --save-output

H100 Validation Snapshot

Validation was run on one H100 GPU using rank0 with --backend=sglang. The FP8 image below is from the fixed checkpoint after keeping the validated sensitive Qwen Image fallback tensors in BF16.

Artifacts:

Validation tree: validation/
Input image: validation/assets/qwen_image_edit_input.png
BF16 command: validation/commands/bf16_qwen_image_edit_512_8_benchmark.sh
FP8 command: validation/commands/fp8_fixed_qwen_image_edit_512_8_benchmark.sh
Benchmark comparison: qwen_image_edit_bf16_vs_fp8_fixed_512_8_compare.md

Input	BF16, 512x512, 8 steps	FP8 fixed, 512x512, 8 steps

Benchmark, warmup excluded:

Metric	BF16	FP8 fixed	Delta	Speedup
E2E latency	6.792 s	6.085 s	-0.707 s (-10.4%)	1.12x
Denoising stage	5.204 s	4.524 s	-0.680 s (-13.1%)	1.15x
Decoding stage	154.77 ms	121.06 ms	-33.72 ms (-21.8%)	1.28x
Image encoding	1.316 s	1.328 s	+0.011 s (+0.9%)	0.99x
Image VAE encoding	100.62 ms	94.93 ms	-5.69 ms (-5.7%)	1.06x

Notes:

Validation prompt: A clean product photo of a small ceramic teapot on a wooden table, soft daylight, sharp details.
Validation settings: 512x512, 8 inference steps, guidance_scale=4.0, seed=42, --dit-cpu-offload false, --dit-layerwise-offload false, --warmup.

Conversion Notes

The checkpoint was converted from a NVIDIA ModelOpt FP8 export with SGLang's build_modelopt_fp8_transformer tool. Most linear weights are FP8. The validated fallback set keeps numerically sensitive tensors in BF16, including the Qwen Image image-MLP output projection family needed for normal image quality.

Downloads last month: 81

Model tree for BBuf/Qwen-Image-Edit-ModelOpt-FP8-SGLang

Base model

Qwen/Qwen-Image-Edit

Quantized

(12)

this model