Qwen3-Coder-480B-A35B-Instruct GGUF

GGUF quantizations of Qwen/Qwen3-Coder-480B-A35B-Instruct for use with llama.cpp and Ollama.

Model Overview

Qwen3-Coder-480B is Alibaba's most powerful agentic coding model featuring:

  • 480B total parameters with 35B active (MoE architecture)
  • 256K native context (extendable to 1M with YaRN)
  • Claude Sonnet-level performance on complex coding tasks
  • Apache 2.0 license - fully open source

Available Quantizations

Quantization Size Files RAM Required Quality Description
IQ2_XS 133GB 4 ~150GB Good Extreme 2-bit, for limited RAM
IQ3_M 218GB 6 ~240GB Better Balanced 3-bit (coming soon)
IQ4_XS 257GB 7 ~280GB Great Recommended 4-bit (coming soon)

Quick Start with Ollama

# IQ2_XS quantization
ollama run richardyoung/qwen3-coder:iq2_xs "Write a Python REST API with FastAPI"

# With extended context
ollama run richardyoung/qwen3-coder:iq2_xs --num-ctx 65536 "Analyze this codebase..."

Quick Start with llama.cpp

# Download all IQ2_XS shards
huggingface-cli download richardyoung/Qwen3-Coder-480B-GGUF --include "IQ2_XS/*" --local-dir .

# Run with llama.cpp
./llama-cli -m IQ2_XS/Qwen_Qwen3-Coder-480B-A35B-Instruct-IQ2_XS-00001-of-00004.gguf \
  -c 32768 -n 2048 \
  -p "Write a binary search tree implementation in Python"

System Requirements

Quantization Minimum RAM Recommended
IQ2_XS 150GB 192GB unified (M2/M3/M4 Ultra)
IQ3_M 240GB 256GB+
IQ4_XS 280GB 320GB+

Model Capabilities

  • Complex code generation across all programming languages
  • Multi-file refactoring and architecture design
  • Debugging and code analysis
  • Tool use and function calling
  • Long-context code understanding
  • Agentic workflows with planning and execution

Chat Template

<|im_start|>system
You are Qwen3-Coder, an expert AI coding assistant.<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
{assistant_response}<|im_end|>

Credits

Links

License

Apache 2.0 - Free for commercial and personal use.

Downloads last month
130
GGUF
Model size
480B params
Architecture
qwen3moe
Hardware compatibility
Log In to view the estimation

2-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for richardyoung/Qwen3-Coder-480B-GGUF

Quantized
(34)
this model