Qwen3-VL-32B-Instruct-EXL3-4.0bpw
ExLlamaV3 quantization of Qwen/Qwen3-VL-32B-Instruct - A powerful vision-language model for multimodal tasks.
Quantization Details
| Parameter | Value |
|---|---|
| Bits per Weight | 4.0 bpw |
| Head Bits | 6 bpw |
| Calibration Rows | 128 |
| Calibration Context | 4096 tokens |
| Format | ExLlamaV3 (EXL3) |
| Size | ~19 GB |
Model Capabilities
- Vision Understanding: Process images at various resolutions
- Video Analysis: Frame-by-frame understanding
- Context Window: Up to 128K tokens
- Instruction Following: Fine-tuned for chat and task completion
- Multilingual: Strong performance across languages
Hardware Requirements
| GPU | VRAM | Notes |
|---|---|---|
| RTX 4090 | 24 GB | Good fit, comfortable with images |
| RTX 3090 | 24 GB | Works well |
| A100 40GB | 40 GB | Plenty of headroom |
Use Cases
- Live Assistant: Real-time screen understanding
- Document Processing: Extract and analyze document content
- Image Description: Detailed visual descriptions
- Visual Coding: Understand code in screenshots
- Chart/Graph Analysis: Interpret data visualizations
Usage with TabbyAPI
# config.yml
model:
model_dir: models
model_name: Qwen3-VL-32B-Instruct-EXL3-4.0bpw
network:
host: 0.0.0.0
port: 5000
model_defaults:
max_seq_len: 16384
cache_mode: Q4
Recommended Settings
- Temperature: 0.7
- Top-P: 0.8
- Top-K: 20
- Repetition Penalty: 1.05
Comparison with Thinking Variant
| Model | Best For |
|---|---|
| This (Instruct) | Fast responses, direct answers, general tasks |
| Thinking variant | Complex reasoning, step-by-step analysis |
Original Model
This is a quantization of Qwen/Qwen3-VL-32B-Instruct. All credit for the base model goes to the Qwen team at Alibaba.
License
Apache 2.0 (inherited from base model)
- Downloads last month
- 28
Model tree for nullrunner/Qwen3-VL-32B-Instruct-EXL3-4.0bpw
Base model
Qwen/Qwen3-VL-32B-Instruct