ZwZ-4B-GGUF

This repository provides GGUF-format weights for ZwZ-4B, split into two components:

Language model (LLM): FP16, Q8_0, Q4_K_M
Vision encoder (mmproj): FP16, Q8_0, Q4_K_M

These files are compatible with llama.cpp, Ollama, and other GGUF-based tools, supporting inference on CPU, NVIDIA GPU (CUDA), Apple Silicon (Metal), Intel GPUs (SYCL), and more. You can mix precision levels for the language and vision components based on your hardware and performance needs, and even perform custom quantization starting from the FP16 weights.

Enjoy running this multimodal model on your personal device! 🚀

How to Use

To use these models with llama.cpp, please ensure you are using the latest version—either by building from source or downloading the most recent release according to the devices.

You can run inference via the command line or through a web-based chat interface.

CLI Inference (`llama-mtmd-cli`)

For example, to run ZwZ-4B with an Q8_0 vision encoder and Q8_0 quantized LLM:

llama-mtmd-cli \
  -m path/to/ZwZ-4B-Q8_0.gguf \
  --mmproj mmproj-ZwZ-4B-Q8_0.gguf\
  --image test.jpeg \
  -p "What is the publisher name of the newspaper?" \
  --temp 1.0 --top-k 20 --top-p 0.95 -n 1024

Web Chat (using `llama-server`)

To serve ZwZ-4B via an OpenAI-compatible API with a web UI:

llama-server \
  -m path/to/ZwZ-4B-Q8_0.gguf  \
  --mmproj mmproj-ZwZ-4B-Q8_0.gguf

Citation

@article{wei2026zooming,
  title={Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception},
  author={Wei, Lai and He, Liangbo and Lan, Jun and Dong, Lingzhong and Cai, Yutong and Li, Siyuan and Zhu, Huijia and Wang, Weiqiang and Kong, Linghe and Wang, Yue and Zhang, Zhuosheng and Huang, Weiran},
  journal={arXiv preprint arXiv:2602.11858},
  year={2026}
}

License

This model follows the license of Apache 2.0 License.

Downloads last month: 199

GGUF

Model size

4B params

Architecture

qwen3vl

Hardware compatibility

4-bit

8-bit

16-bit

Model tree for inclusionAI/ZwZ-4B-GGUF

Base model

Qwen/Qwen3-VL-4B-Instruct

Finetuned

inclusionAI/ZwZ-4B

Quantized

(6)

this model

Datasets used to train inclusionAI/ZwZ-4B-GGUF

Collection including inclusionAI/ZwZ-4B-GGUF

Zooming-without-Zooming

Collection

9 items • Updated 7 days ago • 6

Paper for inclusionAI/ZwZ-4B-GGUF

Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception

Paper • 2602.11858 • Published 29 days ago • 59