---
license: apache-2.0
language:
  - en
library_name: transformers
pipeline_tag: text-generation
tags:
  - agixt
  - agent
  - fine-tuned
  - qwen
  - function-calling
  - tool-use
  - unsloth
model-index:
  - name: AGiXT Fine-Tuned Models
    results: []
---

<p align="center">
  <img src="https://agixt.com/AGiXT_New.svg" alt="AGiXT Logo" width="400">
</p>

# Introducing AGiXT Fine-Tuned Models: Purpose-Built AI for Intelligent Agents

We're excited to announce the release of four specialized fine-tuned models designed specifically for AGiXT agent interactions. These models represent a significant step forward in creating AI agents that truly understand AGiXT's unique command execution patterns, extension system, and agentic workflows.

## The Training Data

Before diving into the models, let's talk about what makes them special: **the training data**.

### Agent Interaction Dataset (936 examples)
This dataset captures real AGiXT agent behavior patterns including:

- **AGiXT Command Syntax**: Proper `<execute><name>Command Name</name><param>value</param></execute>` formatting
- **Thinking/Answer Structure**: Using `<thinking>` tags for reasoning and `<answer>` tags for responses
- **Tool Delegation Patterns**: When to use "Ask GitHub Copilot" for coding tasks vs. handling requests directly
- **Extension Command Usage**: Correct invocation of 778+ AGiXT commands across extensions like:
  - `github_copilot` - Code generation and repository management
  - `web_browsing` - Web search, page interaction, arXiv research
  - `postgres_database` - Natural language SQL queries
  - `essential_abilities` - File operations, workspace management
  - `google_sso`, `microsoft365`, `slack` - Third-party integrations
- **Multi-Turn Conversations**: Maintaining context while executing multiple commands

### AbilitySelect + Complexity Dataset (11,140 examples)
A specialized dataset for combined ability selection and complexity scoring:

- **Intent-to-Command Mapping**: Given a user request, select the most appropriate AGiXT command
- **Complexity Scoring (0-100)**: Determine task difficulty for intelligent model routing
- **Extension-Aware Routing**: Understanding which extension provides which capability
- **Dual-Purpose Output**: Single inference returns both `{score}|{ability}` for efficient routing

## The Models

### 🖼️ AGiXT-Qwen3-VL-4B
**Vision-Language Model | 4B Parameters**

Our flagship multimodal model, fine-tuned from Qwen3-VL-4B-Instruct on the Agent Interaction Dataset.

**What It Learned:**
- AGiXT's XML-based command execution format (`<execute>`, `<thinking>`, `<answer>` tags)
- When to delegate coding tasks to GitHub Copilot vs. using other extensions
- Proper parameter formatting for all 778+ AGiXT commands
- Multi-step reasoning patterns for complex agent workflows

**Vision Capabilities:**
- Analyze screenshots to understand UI state during web automation tasks
- Process images shared in conversations for context-aware responses
- Support the `View Image` command with intelligent image analysis

**Available Formats:** SafeTensors (16-bit), GGUF (Q4_K_M, Q5_K_M, Q6_K)

---

### 🖼️ AGiXT-Qwen3-VL-2B  
**Compact Vision-Language Model | 2B Parameters**

Same AGiXT training as VL-4B but in a lighter package, fine-tuned from Qwen3-VL-2B-Instruct.

**Ideal For:**
- Resource-constrained deployments (runs on 4GB+ VRAM with quantization)
- Edge deployments and local-first setups
- Faster inference when vision capabilities are needed but latency matters

**Same Training Quality:** Identical Agent Interaction Dataset as the 4B model—same command understanding, same AGiXT fluency.

**Available Formats:** SafeTensors (16-bit), GGUF (Q4_K_M, Q5_K_M, Q6_K)

---

### 💬 AGiXT-Qwen3-4B
**Text Model | 4B Parameters**

Our core text model, fine-tuned from Qwen3-4B-Instruct-2507 on the Agent Interaction Dataset.

**What It Learned:**
- **AGiXT Command Execution**: Native understanding of the `<execute>` XML format with proper command names and parameters
- **Thinking-First Approach**: Uses `<thinking>` blocks to reason through problems before executing commands
- **Tool Delegation**: Knows when to use "Ask GitHub Copilot" for coding vs. using built-in abilities
- **Extension Awareness**: Understands capabilities across github_copilot, web_browsing, postgres_database, essential_abilities, and dozens more
- **Structured Responses**: Consistent `<answer>` formatting for clean integration with AGiXT's response parsing

**Available Formats:** SafeTensors (16-bit), GGUF (Q4_K_M, Q5_K_M, Q6_K)

---

### ⚡ AGiXT-AbilitySelect-270m
**Combined Ability Selection + Complexity Scoring | 270M Parameters**

An ultra-compact dual-purpose model fine-tuned from Gemma-3-1B on the **AbilitySelect + Complexity Dataset (11,140 examples)**—trained to output both the best command AND a complexity score in a single inference.

**Output Format:** `{score}|{ability}` (e.g., `45|Write to File`)

**What It Learned:**
- **Intent Classification**: Map natural language requests to specific AGiXT commands
- **Complexity Scoring**: Rate task difficulty from 0-100 based on:
  - Task type (code generation, file ops, research, debugging)
  - Number of steps required
  - Whether expert-level reasoning is needed
- **Extension Routing**: Know which of the 778+ commands best matches a request
- **Unified Decision Making**: Score and ability inform each other for better accuracy

**How It's Used in AGiXT:**
This model runs as a fast "router" before the main agent model:
1. User sends a request
2. AbilitySelect returns `score|ability` in sub-100ms
3. AGiXT routes to the appropriate model based on complexity:
   - **Score 0-25** → VL-2B (simple tasks: greetings, time, file listing)
   - **Score 26-50** → VL-4B (moderate: file editing, searches)
   - **Score 51-75** → VL-4B + thinking mode (complex: code generation, multi-step)
   - **Score 76-100** → External API like Claude, Gemini, etc. (expert: multi-step code, debugging, architecture)
4. Result: Right-sized model for every task, faster responses, lower cost

**Why a Combined Model?**
- **One inference, two decisions**: Complexity and ability in a single call
- **Speed**: 270M parameters = lightning fast inference (<50ms)
- **Coherent routing**: Score and ability naturally inform each other
- **Resource Efficiency**: Runs alongside larger models without competing for VRAM
- **Simpler architecture**: One router model instead of two

**Available Formats:** SafeTensors (16-bit), GGUF (Q4_K_M, Q5_K_M, Q6_K), ONNX (CPU inference)

---

## Why Fine-Tuned Models Matter for AGiXT

### The Problem with Generic LLMs
Out-of-the-box models don't know AGiXT exists. They struggle with:
- AGiXT's specific XML command syntax (`<execute><name>...</name></execute>`)
- The thinking/answer response structure agents expect
- When to delegate to GitHub Copilot vs. using other tools
- The 778+ available commands and their proper parameters
- Maintaining consistent behavior across multi-turn agent sessions

### What Fine-Tuning Fixes
Our models were trained on **real AGiXT interaction patterns**:
- ✅ Native command syntax—no more malformed XML
- ✅ Proper delegation—coding tasks go to Copilot, searches go to web_browsing
- ✅ Correct parameters—knows what each command needs
- ✅ Consistent structure—`<thinking>` then `<execute>` then `<answer>`
- ✅ Extension awareness—understands the full AGiXT ecosystem

## How AGiXT Uses These Models Together

These four models work as an integrated system within AGiXT, not as standalone alternatives:

```
User Request: "Write a Python script to process CSV files"
     │
     ▼
┌─────────────────────────────────────┐
│  AGiXT-AbilitySelect-270m           │
│  Single inference, dual output      │
│  (sub-50ms on CPU via ONNX)         │
└─────────────────────────────────────┘
     │
     ▼ Returns: "65|Write to File"
     │         (complexity=65, ability=Write to File)
     │
┌─────────────────────────────────────┐
│  Complexity-Based Model Routing     │
│  Score 65 = High complexity         │
│  + Check if images attached         │
└─────────────────────────────────────┘
     │
     ├─── Score 0-25 ────────────► AGiXT-Qwen3-VL-2B (simple tasks)
     │    "What time is it?" → 8   
     │
     ├─── Score 26-50 ───────────► AGiXT-Qwen3-VL-4B (moderate tasks)
     │    "Search for Python docs" → 35
     │
     ├─── Score 51-75 ───────────► AGiXT-Qwen3-VL-4B + thinking (complex)
     │    "Write a CSV processor" → 65  ◄── This request
     │
     └─── Score 76-100 ──────────► External API (Claude, Gemini, etc.)
          "Debug this race condition" → 85
```

### The Flow Explained

1. **AbilitySelect First**: Every request hits the 270M model first. In a single sub-50ms inference, it returns both the complexity score (0-100) AND the most appropriate ability. No separate complexity calculation needed.

2. **Intelligent Routing**: The complexity score directly determines which model handles the request:
   - **0-25 (Simple)**: VL-2B handles greetings, time queries, basic file listings
   - **26-50 (Moderate)**: VL-4B for file editing, web searches, data retrieval
   - **51-75 (Complex)**: VL-4B with extended thinking for code generation, multi-step tasks
   - **76-100 (Expert)**: Routes to external APIs (Claude, Gemini, GPT-4, etc.) for multi-step code generation, debugging, architecture

3. **Ability Context**: The selected ability helps the main model focus. If AbilitySelect returns `65|Write to File`, the main model knows this is a file-writing task requiring code generation.

4. **Consistent Quality**: Because all three main models were trained on the same AGiXT dataset, they all produce properly-formatted commands with correct `<thinking>`, `<execute>`, and `<answer>` structure. The routing is about efficiency—using the right-sized model for each task.

5. **Cost & Speed Optimization**: Simple queries get fast responses from VL-2B. Complex tasks get the full reasoning power of VL-4B. Expert tasks leverage external APIs. You're not paying 4B-model latency for "what time is it?"

## Deployment Options

### Full Precision (16-bit SafeTensors)
Best for: Maximum quality, further fine-tuning, or when VRAM isn't a concern

### GGUF Quantizations
| Quantization | Use Case | Memory Savings |
|-------------|----------|----------------|
| **Q6_K** | Best quality, production deployments | ~50% reduction |
| **Q5_K_M** | Balanced quality and efficiency | ~60% reduction |
| **Q4_K_M** | Resource-constrained environments | ~70% reduction |

## Getting Started

All models are available on HuggingFace:

- [JoshXT/AGiXT-Qwen3-VL-4B](https://huggingface.co/JoshXT/AGiXT-Qwen3-VL-4B) | [GGUF](https://huggingface.co/JoshXT/AGiXT-Qwen3-VL-4B-GGUF)
- [JoshXT/AGiXT-Qwen3-VL-2B](https://huggingface.co/JoshXT/AGiXT-Qwen3-VL-2B) | [GGUF](https://huggingface.co/JoshXT/AGiXT-Qwen3-VL-2B-GGUF)
- [JoshXT/AGiXT-Qwen3-4B](https://huggingface.co/JoshXT/AGiXT-Qwen3-4B) | [GGUF](https://huggingface.co/JoshXT/AGiXT-Qwen3-4B-GGUF)
- [JoshXT/AGiXT-AbilitySelect-270m](https://huggingface.co/JoshXT/AGiXT-AbilitySelect-270m) | [GGUF](https://huggingface.co/JoshXT/AGiXT-AbilitySelect-270m-GGUF) | [ONNX](https://huggingface.co/JoshXT/AGiXT-AbilitySelect-270m-ONNX)

### Usage with ezLocalai (Recommended)

[ezLocalai](https://github.com/DevXT-LLC/ezlocalai) is our recommended local inference server—it's designed to work seamlessly with AGiXT and supports all the features these models need.

**Why ezLocalai?** We built it to be as easy as possible. Just tell it which model you want—ezLocalai handles everything else:
- **Auto-detects your hardware**: Finds your GPU (NVIDIA/AMD) or falls back to CPU automatically
- **Optimal settings out of the box**: Calculates max context length, temperature, top_p based on your available VRAM/RAM
- **No configuration required**: No editing config files, no tuning parameters, no figuring out quantization levels
- **Just start talking**: Pick a model, wait for download, start chatting

```bash
# Install the CLI
pip install ezlocalai

# Start with AGiXT models
ezlocalai start --model JoshXT/AGiXT-Qwen3-VL-4B-GGUF

# Or run multiple models (comma-separated)
ezlocalai start --model JoshXT/AGiXT-Qwen3-VL-4B-GGUF,JoshXT/AGiXT-AbilitySelect-270m-GGUF
```

Models are downloaded automatically on first use. Once running, access the OpenAI-compatible API at `http://localhost:8091`.

**CLI Commands:**
```bash
ezlocalai stop      # Stop the container
ezlocalai restart   # Restart the container  
ezlocalai status    # Check if running and show configuration
ezlocalai logs      # Show container logs
ezlocalai update    # Pull/rebuild latest images

# Send prompts directly from CLI
ezlocalai prompt "Hello, world!"
ezlocalai prompt "What's in this image?" -image ./photo.jpg
```

ezLocalai handles:
- Automatic GGUF downloading from HuggingFace
- Vision model support with proper image handling
- OpenAI-compatible API that AGiXT expects
- GPU memory management for running multiple models

### Usage with Ollama

```bash
# Create a Modelfile for each model
cat > Modelfile << EOF
FROM ./AGiXT-Qwen3-4B.Q5_K_M.gguf
PARAMETER temperature 0.7
PARAMETER num_ctx 8192
EOF

ollama create agixt-qwen3-4b -f Modelfile
ollama run agixt-qwen3-4b
```

### Usage with AGiXT

Configure your AGiXT agent to use these models via the ezLocalai provider:

```yaml
# Agent settings
provider: ezlocalai
model: AGiXT-Qwen3-4B
vision_model: AGiXT-Qwen3-VL-4B
ability_select_model: AGiXT-AbilitySelect-270m  # Returns score|ability

# Complexity-based routing thresholds (optional, these are defaults)
complexity_routing:
  simple_max: 25      # Score 0-25 -> VL-2B
  moderate_max: 50    # Score 26-50 -> VL-4B  
  complex_max: 75     # Score 51-75 -> VL-4B + thinking
  # Score 76-100 -> External API (GitHub Copilot)
```

AGiXT will automatically:
1. Run every request through AbilitySelect (sub-50ms via ONNX)
2. Parse the `score|ability` response
3. Route to the appropriate model based on complexity score
4. Pass the selected ability as context to the main model

## What's Next

This release is version 1 of our AGiXT-optimized models. We're already working on:

- **Larger Model Variants**: 7B and 14B versions for users who want maximum capability
- **Expanded Training Data**: More extension coverage, more edge cases, more multi-turn examples
- **Domain-Specific Fine-Tunes**: Models optimized for coding agents, research agents, automation agents
- **Continuous Improvement**: As AGiXT adds new extensions, we'll update the training data and retrain

## Training Details

- **Framework**: [Unsloth](https://github.com/unslothai/unsloth) (2x faster training, 60% less memory)
- **Hardware**: NVIDIA RTX 4090 (24GB)
- **Training Method**: LoRA fine-tuning (r=64, alpha=128)
- **Epochs**: 2 per model
- **Quantization**: GGUF via llama.cpp (Q4_K_M, Q5_K_M, Q6_K)

## Acknowledgments

These models were fine-tuned using [Unsloth](https://github.com/unslothai/unsloth), which enabled 2x faster training with significant memory savings. Base models provided by [Qwen](https://huggingface.co/Qwen) and [Google](https://huggingface.co/google).

---

**License:** Apache 2.0

**Questions or Feedback?** Open an issue on [AGiXT GitHub](https://github.com/Josh-XT/AGiXT) or join our community discussions.