---
license: apache-2.0
language:
- en
library_name: transformers
pipeline_tag: text-generation
tags:
- agixt
- agent
- fine-tuned
- qwen
- function-calling
- tool-use
- unsloth
model-index:
- name: AGiXT Fine-Tuned Models
results: []
---
# Introducing AGiXT Fine-Tuned Models: Purpose-Built AI for Intelligent Agents
We're excited to announce the release of four specialized fine-tuned models designed specifically for AGiXT agent interactions. These models represent a significant step forward in creating AI agents that truly understand AGiXT's unique command execution patterns, extension system, and agentic workflows.
## The Training Data
Before diving into the models, let's talk about what makes them special: **the training data**.
### Agent Interaction Dataset (936 examples)
This dataset captures real AGiXT agent behavior patterns including:
- **AGiXT Command Syntax**: Proper `Command Namevalue` formatting
- **Thinking/Answer Structure**: Using `` tags for reasoning and `` tags for responses
- **Tool Delegation Patterns**: When to use "Ask GitHub Copilot" for coding tasks vs. handling requests directly
- **Extension Command Usage**: Correct invocation of 778+ AGiXT commands across extensions like:
- `github_copilot` - Code generation and repository management
- `web_browsing` - Web search, page interaction, arXiv research
- `postgres_database` - Natural language SQL queries
- `essential_abilities` - File operations, workspace management
- `google_sso`, `microsoft365`, `slack` - Third-party integrations
- **Multi-Turn Conversations**: Maintaining context while executing multiple commands
### AbilitySelect + Complexity Dataset (11,140 examples)
A specialized dataset for combined ability selection and complexity scoring:
- **Intent-to-Command Mapping**: Given a user request, select the most appropriate AGiXT command
- **Complexity Scoring (0-100)**: Determine task difficulty for intelligent model routing
- **Extension-Aware Routing**: Understanding which extension provides which capability
- **Dual-Purpose Output**: Single inference returns both `{score}|{ability}` for efficient routing
## The Models
### πΌοΈ AGiXT-Qwen3-VL-4B
**Vision-Language Model | 4B Parameters**
Our flagship multimodal model, fine-tuned from Qwen3-VL-4B-Instruct on the Agent Interaction Dataset.
**What It Learned:**
- AGiXT's XML-based command execution format (``, ``, `` tags)
- When to delegate coding tasks to GitHub Copilot vs. using other extensions
- Proper parameter formatting for all 778+ AGiXT commands
- Multi-step reasoning patterns for complex agent workflows
**Vision Capabilities:**
- Analyze screenshots to understand UI state during web automation tasks
- Process images shared in conversations for context-aware responses
- Support the `View Image` command with intelligent image analysis
**Available Formats:** SafeTensors (16-bit), GGUF (Q4_K_M, Q5_K_M, Q6_K)
---
### πΌοΈ AGiXT-Qwen3-VL-2B
**Compact Vision-Language Model | 2B Parameters**
Same AGiXT training as VL-4B but in a lighter package, fine-tuned from Qwen3-VL-2B-Instruct.
**Ideal For:**
- Resource-constrained deployments (runs on 4GB+ VRAM with quantization)
- Edge deployments and local-first setups
- Faster inference when vision capabilities are needed but latency matters
**Same Training Quality:** Identical Agent Interaction Dataset as the 4B modelβsame command understanding, same AGiXT fluency.
**Available Formats:** SafeTensors (16-bit), GGUF (Q4_K_M, Q5_K_M, Q6_K)
---
### π¬ AGiXT-Qwen3-4B
**Text Model | 4B Parameters**
Our core text model, fine-tuned from Qwen3-4B-Instruct-2507 on the Agent Interaction Dataset.
**What It Learned:**
- **AGiXT Command Execution**: Native understanding of the `` XML format with proper command names and parameters
- **Thinking-First Approach**: Uses `` blocks to reason through problems before executing commands
- **Tool Delegation**: Knows when to use "Ask GitHub Copilot" for coding vs. using built-in abilities
- **Extension Awareness**: Understands capabilities across github_copilot, web_browsing, postgres_database, essential_abilities, and dozens more
- **Structured Responses**: Consistent `` formatting for clean integration with AGiXT's response parsing
**Available Formats:** SafeTensors (16-bit), GGUF (Q4_K_M, Q5_K_M, Q6_K)
---
### β‘ AGiXT-AbilitySelect-270m
**Combined Ability Selection + Complexity Scoring | 270M Parameters**
An ultra-compact dual-purpose model fine-tuned from Gemma-3-1B on the **AbilitySelect + Complexity Dataset (11,140 examples)**βtrained to output both the best command AND a complexity score in a single inference.
**Output Format:** `{score}|{ability}` (e.g., `45|Write to File`)
**What It Learned:**
- **Intent Classification**: Map natural language requests to specific AGiXT commands
- **Complexity Scoring**: Rate task difficulty from 0-100 based on:
- Task type (code generation, file ops, research, debugging)
- Number of steps required
- Whether expert-level reasoning is needed
- **Extension Routing**: Know which of the 778+ commands best matches a request
- **Unified Decision Making**: Score and ability inform each other for better accuracy
**How It's Used in AGiXT:**
This model runs as a fast "router" before the main agent model:
1. User sends a request
2. AbilitySelect returns `score|ability` in sub-100ms
3. AGiXT routes to the appropriate model based on complexity:
- **Score 0-25** β VL-2B (simple tasks: greetings, time, file listing)
- **Score 26-50** β VL-4B (moderate: file editing, searches)
- **Score 51-75** β VL-4B + thinking mode (complex: code generation, multi-step)
- **Score 76-100** β External API like Claude, Gemini, etc. (expert: multi-step code, debugging, architecture)
4. Result: Right-sized model for every task, faster responses, lower cost
**Why a Combined Model?**
- **One inference, two decisions**: Complexity and ability in a single call
- **Speed**: 270M parameters = lightning fast inference (<50ms)
- **Coherent routing**: Score and ability naturally inform each other
- **Resource Efficiency**: Runs alongside larger models without competing for VRAM
- **Simpler architecture**: One router model instead of two
**Available Formats:** SafeTensors (16-bit), GGUF (Q4_K_M, Q5_K_M, Q6_K), ONNX (CPU inference)
---
## Why Fine-Tuned Models Matter for AGiXT
### The Problem with Generic LLMs
Out-of-the-box models don't know AGiXT exists. They struggle with:
- AGiXT's specific XML command syntax (`...`)
- The thinking/answer response structure agents expect
- When to delegate to GitHub Copilot vs. using other tools
- The 778+ available commands and their proper parameters
- Maintaining consistent behavior across multi-turn agent sessions
### What Fine-Tuning Fixes
Our models were trained on **real AGiXT interaction patterns**:
- β
Native command syntaxβno more malformed XML
- β
Proper delegationβcoding tasks go to Copilot, searches go to web_browsing
- β
Correct parametersβknows what each command needs
- β
Consistent structureβ`` then `` then ``
- β
Extension awarenessβunderstands the full AGiXT ecosystem
## How AGiXT Uses These Models Together
These four models work as an integrated system within AGiXT, not as standalone alternatives:
```
User Request: "Write a Python script to process CSV files"
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β AGiXT-AbilitySelect-270m β
β Single inference, dual output β
β (sub-50ms on CPU via ONNX) β
βββββββββββββββββββββββββββββββββββββββ
β
βΌ Returns: "65|Write to File"
β (complexity=65, ability=Write to File)
β
βββββββββββββββββββββββββββββββββββββββ
β Complexity-Based Model Routing β
β Score 65 = High complexity β
β + Check if images attached β
βββββββββββββββββββββββββββββββββββββββ
β
ββββ Score 0-25 βββββββββββββΊ AGiXT-Qwen3-VL-2B (simple tasks)
β "What time is it?" β 8
β
ββββ Score 26-50 ββββββββββββΊ AGiXT-Qwen3-VL-4B (moderate tasks)
β "Search for Python docs" β 35
β
ββββ Score 51-75 ββββββββββββΊ AGiXT-Qwen3-VL-4B + thinking (complex)
β "Write a CSV processor" β 65 βββ This request
β
ββββ Score 76-100 βββββββββββΊ External API (Claude, Gemini, etc.)
"Debug this race condition" β 85
```
### The Flow Explained
1. **AbilitySelect First**: Every request hits the 270M model first. In a single sub-50ms inference, it returns both the complexity score (0-100) AND the most appropriate ability. No separate complexity calculation needed.
2. **Intelligent Routing**: The complexity score directly determines which model handles the request:
- **0-25 (Simple)**: VL-2B handles greetings, time queries, basic file listings
- **26-50 (Moderate)**: VL-4B for file editing, web searches, data retrieval
- **51-75 (Complex)**: VL-4B with extended thinking for code generation, multi-step tasks
- **76-100 (Expert)**: Routes to external APIs (Claude, Gemini, GPT-4, etc.) for multi-step code generation, debugging, architecture
3. **Ability Context**: The selected ability helps the main model focus. If AbilitySelect returns `65|Write to File`, the main model knows this is a file-writing task requiring code generation.
4. **Consistent Quality**: Because all three main models were trained on the same AGiXT dataset, they all produce properly-formatted commands with correct ``, ``, and `` structure. The routing is about efficiencyβusing the right-sized model for each task.
5. **Cost & Speed Optimization**: Simple queries get fast responses from VL-2B. Complex tasks get the full reasoning power of VL-4B. Expert tasks leverage external APIs. You're not paying 4B-model latency for "what time is it?"
## Deployment Options
### Full Precision (16-bit SafeTensors)
Best for: Maximum quality, further fine-tuning, or when VRAM isn't a concern
### GGUF Quantizations
| Quantization | Use Case | Memory Savings |
|-------------|----------|----------------|
| **Q6_K** | Best quality, production deployments | ~50% reduction |
| **Q5_K_M** | Balanced quality and efficiency | ~60% reduction |
| **Q4_K_M** | Resource-constrained environments | ~70% reduction |
## Getting Started
All models are available on HuggingFace:
- [JoshXT/AGiXT-Qwen3-VL-4B](https://huggingface.co/JoshXT/AGiXT-Qwen3-VL-4B) | [GGUF](https://huggingface.co/JoshXT/AGiXT-Qwen3-VL-4B-GGUF)
- [JoshXT/AGiXT-Qwen3-VL-2B](https://huggingface.co/JoshXT/AGiXT-Qwen3-VL-2B) | [GGUF](https://huggingface.co/JoshXT/AGiXT-Qwen3-VL-2B-GGUF)
- [JoshXT/AGiXT-Qwen3-4B](https://huggingface.co/JoshXT/AGiXT-Qwen3-4B) | [GGUF](https://huggingface.co/JoshXT/AGiXT-Qwen3-4B-GGUF)
- [JoshXT/AGiXT-AbilitySelect-270m](https://huggingface.co/JoshXT/AGiXT-AbilitySelect-270m) | [GGUF](https://huggingface.co/JoshXT/AGiXT-AbilitySelect-270m-GGUF) | [ONNX](https://huggingface.co/JoshXT/AGiXT-AbilitySelect-270m-ONNX)
### Usage with ezLocalai (Recommended)
[ezLocalai](https://github.com/DevXT-LLC/ezlocalai) is our recommended local inference serverβit's designed to work seamlessly with AGiXT and supports all the features these models need.
**Why ezLocalai?** We built it to be as easy as possible. Just tell it which model you wantβezLocalai handles everything else:
- **Auto-detects your hardware**: Finds your GPU (NVIDIA/AMD) or falls back to CPU automatically
- **Optimal settings out of the box**: Calculates max context length, temperature, top_p based on your available VRAM/RAM
- **No configuration required**: No editing config files, no tuning parameters, no figuring out quantization levels
- **Just start talking**: Pick a model, wait for download, start chatting
```bash
# Install the CLI
pip install ezlocalai
# Start with AGiXT models
ezlocalai start --model JoshXT/AGiXT-Qwen3-VL-4B-GGUF
# Or run multiple models (comma-separated)
ezlocalai start --model JoshXT/AGiXT-Qwen3-VL-4B-GGUF,JoshXT/AGiXT-AbilitySelect-270m-GGUF
```
Models are downloaded automatically on first use. Once running, access the OpenAI-compatible API at `http://localhost:8091`.
**CLI Commands:**
```bash
ezlocalai stop # Stop the container
ezlocalai restart # Restart the container
ezlocalai status # Check if running and show configuration
ezlocalai logs # Show container logs
ezlocalai update # Pull/rebuild latest images
# Send prompts directly from CLI
ezlocalai prompt "Hello, world!"
ezlocalai prompt "What's in this image?" -image ./photo.jpg
```
ezLocalai handles:
- Automatic GGUF downloading from HuggingFace
- Vision model support with proper image handling
- OpenAI-compatible API that AGiXT expects
- GPU memory management for running multiple models
### Usage with Ollama
```bash
# Create a Modelfile for each model
cat > Modelfile << EOF
FROM ./AGiXT-Qwen3-4B.Q5_K_M.gguf
PARAMETER temperature 0.7
PARAMETER num_ctx 8192
EOF
ollama create agixt-qwen3-4b -f Modelfile
ollama run agixt-qwen3-4b
```
### Usage with AGiXT
Configure your AGiXT agent to use these models via the ezLocalai provider:
```yaml
# Agent settings
provider: ezlocalai
model: AGiXT-Qwen3-4B
vision_model: AGiXT-Qwen3-VL-4B
ability_select_model: AGiXT-AbilitySelect-270m # Returns score|ability
# Complexity-based routing thresholds (optional, these are defaults)
complexity_routing:
simple_max: 25 # Score 0-25 -> VL-2B
moderate_max: 50 # Score 26-50 -> VL-4B
complex_max: 75 # Score 51-75 -> VL-4B + thinking
# Score 76-100 -> External API (GitHub Copilot)
```
AGiXT will automatically:
1. Run every request through AbilitySelect (sub-50ms via ONNX)
2. Parse the `score|ability` response
3. Route to the appropriate model based on complexity score
4. Pass the selected ability as context to the main model
## What's Next
This release is version 1 of our AGiXT-optimized models. We're already working on:
- **Larger Model Variants**: 7B and 14B versions for users who want maximum capability
- **Expanded Training Data**: More extension coverage, more edge cases, more multi-turn examples
- **Domain-Specific Fine-Tunes**: Models optimized for coding agents, research agents, automation agents
- **Continuous Improvement**: As AGiXT adds new extensions, we'll update the training data and retrain
## Training Details
- **Framework**: [Unsloth](https://github.com/unslothai/unsloth) (2x faster training, 60% less memory)
- **Hardware**: NVIDIA RTX 4090 (24GB)
- **Training Method**: LoRA fine-tuning (r=64, alpha=128)
- **Epochs**: 2 per model
- **Quantization**: GGUF via llama.cpp (Q4_K_M, Q5_K_M, Q6_K)
## Acknowledgments
These models were fine-tuned using [Unsloth](https://github.com/unslothai/unsloth), which enabled 2x faster training with significant memory savings. Base models provided by [Qwen](https://huggingface.co/Qwen) and [Google](https://huggingface.co/google).
---
**License:** Apache 2.0
**Questions or Feedback?** Open an issue on [AGiXT GitHub](https://github.com/Josh-XT/AGiXT) or join our community discussions.