Qwen3-Coder-480B-A35B-Instruct GGUF

GGUF quantizations of Qwen/Qwen3-Coder-480B-A35B-Instruct for use with llama.cpp and Ollama.

Model Overview

Qwen3-Coder-480B is Alibaba's most powerful agentic coding model featuring:

480B total parameters with 35B active (MoE architecture)
256K native context (extendable to 1M with YaRN)
Claude Sonnet-level performance on complex coding tasks
Apache 2.0 license - fully open source

Available Quantizations

Quantization	Size	Files	RAM Required	Quality	Description
IQ2_XS	133GB	4	~150GB	Good	Extreme 2-bit, for limited RAM
IQ3_M	218GB	6	~240GB	Better	Balanced 3-bit (coming soon)
IQ4_XS	257GB	7	~280GB	Great	Recommended 4-bit (coming soon)

Quick Start with Ollama

# IQ2_XS quantization
ollama run richardyoung/qwen3-coder:iq2_xs "Write a Python REST API with FastAPI"

# With extended context
ollama run richardyoung/qwen3-coder:iq2_xs --num-ctx 65536 "Analyze this codebase..."

Quick Start with llama.cpp

# Download all IQ2_XS shards
huggingface-cli download richardyoung/Qwen3-Coder-480B-GGUF --include "IQ2_XS/*" --local-dir .

# Run with llama.cpp
./llama-cli -m IQ2_XS/Qwen_Qwen3-Coder-480B-A35B-Instruct-IQ2_XS-00001-of-00004.gguf \
  -c 32768 -n 2048 \
  -p "Write a binary search tree implementation in Python"

System Requirements

Quantization	Minimum RAM	Recommended
IQ2_XS	150GB	192GB unified (M2/M3/M4 Ultra)
IQ3_M	240GB	256GB+
IQ4_XS	280GB	320GB+

Model Capabilities

Complex code generation across all programming languages
Multi-file refactoring and architecture design
Debugging and code analysis
Tool use and function calling
Long-context code understanding
Agentic workflows with planning and execution

Chat Template

<|im_start|>system
You are Qwen3-Coder, an expert AI coding assistant.<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
{assistant_response}<|im_end|>

Credits

Original Model: Qwen/Qwen3-Coder-480B-A35B-Instruct by Alibaba
GGUF Quantization: bartowski
Distribution: Richard Young (deepneuro.ai)

License

Apache 2.0 - Free for commercial and personal use.

Downloads last month: 130

GGUF

Model size

480B params

Architecture

qwen3moe

Hardware compatibility

2-bit

Model tree for richardyoung/Qwen3-Coder-480B-GGUF

Base model

Qwen/Qwen3-Coder-480B-A35B-Instruct

Quantized

(34)

this model

richardyoung
/

Qwen3-Coder-480B-GGUF