---
title: 3B Thinking (vLLM + Controller)
emoji: 🏆
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: true
license: apache-2.0
---

This Space wraps `meta-llama/Llama-3.2-3B-Instruct` with a simple
**controller**: brainstorm (high T) → critic (low T) → finalize (low T).

**Setup**
- Attach a GPU (T4 small is fine).
- Add a Space **Secret** `HF_TOKEN` so the app can pull gated weights.

**Notes**
- Uses the tokenizer's chat template for correct formatting.
- Private reasoning stays inside `<THINK>…</THINK>`; only `<FINAL>…</FINAL>` is shown to the user.