Qwen3-VL-2B GGUF

This is a GGUF conversion of Qwen/Qwen3-VL-2B-Instruct - a Vision-Language Model optimized for on-device inference with llama.cpp.

Model Details

Property Value
Original Model Qwen3-VL-2B-Instruct
Parameters 2 billion
Quantization Q8_0
Model Size ~1.7 GB
Vision Encoder Size ~424 MB (Q8_0)
Context Window 8,192 tokens
Architecture Qwen3-VL with native vision encoder

Files

  • Qwen3VL-2B-Instruct-Q8_0.gguf - Main language model
  • mmproj-Qwen3VL-2B-Instruct-Q8_0.gguf - Vision encoder (mmproj)

Intended Use

This model is optimized for:

  • Mobile/Edge Deployment: Runs on iOS devices with 8GB+ RAM
  • llama.cpp Integration: Compatible with llama.cpp vision features
  • On-Device AI: Private, offline image understanding

Capabilities

  • Image Captioning: Describe images in detail
  • Visual Q&A: Answer questions about images
  • Document OCR: Extract text from documents and photos
  • Scene Understanding: Analyze complex visual scenes
  • Superior Quality: Best-in-class for 2B parameter VLMs

Usage with llama.cpp

./llama-llava-cli -m Qwen3VL-2B-Instruct-Q8_0.gguf \
  --mmproj mmproj-Qwen3VL-2B-Instruct-Q8_0.gguf \
  --image your_image.jpg \
  -p "Describe this image in detail"

Prompt Format

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
<|vision_start|><|vision_end|>
{prompt}<|im_end|>
<|im_start|>assistant

License

This model inherits the Apache 2.0 license from the original Qwen3-VL model.

Attribution

Downloads last month
18
GGUF
Model size
2B params
Architecture
qwen3vl
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for jc-builds/qwen3vl-2b-gguf

Quantized
(53)
this model