--- license: apache-2.0 language: - en library_name: transformers pipeline_tag: text-generation tags: - agixt - agent - fine-tuned - qwen - function-calling - tool-use - unsloth model-index: - name: AGiXT Fine-Tuned Models results: [] ---

AGiXT Logo

# Introducing AGiXT Fine-Tuned Models: Purpose-Built AI for Intelligent Agents We're excited to announce the release of four specialized fine-tuned models designed specifically for AGiXT agent interactions. These models represent a significant step forward in creating AI agents that truly understand AGiXT's unique command execution patterns, extension system, and agentic workflows. ## The Training Data Before diving into the models, let's talk about what makes them special: **the training data**. ### Agent Interaction Dataset (936 examples) This dataset captures real AGiXT agent behavior patterns including: - **AGiXT Command Syntax**: Proper `Command Namevalue` formatting - **Thinking/Answer Structure**: Using `` tags for reasoning and `` tags for responses - **Tool Delegation Patterns**: When to use "Ask GitHub Copilot" for coding tasks vs. handling requests directly - **Extension Command Usage**: Correct invocation of 778+ AGiXT commands across extensions like: - `github_copilot` - Code generation and repository management - `web_browsing` - Web search, page interaction, arXiv research - `postgres_database` - Natural language SQL queries - `essential_abilities` - File operations, workspace management - `google_sso`, `microsoft365`, `slack` - Third-party integrations - **Multi-Turn Conversations**: Maintaining context while executing multiple commands ### AbilitySelect + Complexity Dataset (11,140 examples) A specialized dataset for combined ability selection and complexity scoring: - **Intent-to-Command Mapping**: Given a user request, select the most appropriate AGiXT command - **Complexity Scoring (0-100)**: Determine task difficulty for intelligent model routing - **Extension-Aware Routing**: Understanding which extension provides which capability - **Dual-Purpose Output**: Single inference returns both `{score}|{ability}` for efficient routing ## The Models ### πŸ–ΌοΈ AGiXT-Qwen3-VL-4B **Vision-Language Model | 4B Parameters** Our flagship multimodal model, fine-tuned from Qwen3-VL-4B-Instruct on the Agent Interaction Dataset. **What It Learned:** - AGiXT's XML-based command execution format (``, ``, `` tags) - When to delegate coding tasks to GitHub Copilot vs. using other extensions - Proper parameter formatting for all 778+ AGiXT commands - Multi-step reasoning patterns for complex agent workflows **Vision Capabilities:** - Analyze screenshots to understand UI state during web automation tasks - Process images shared in conversations for context-aware responses - Support the `View Image` command with intelligent image analysis **Available Formats:** SafeTensors (16-bit), GGUF (Q4_K_M, Q5_K_M, Q6_K) --- ### πŸ–ΌοΈ AGiXT-Qwen3-VL-2B **Compact Vision-Language Model | 2B Parameters** Same AGiXT training as VL-4B but in a lighter package, fine-tuned from Qwen3-VL-2B-Instruct. **Ideal For:** - Resource-constrained deployments (runs on 4GB+ VRAM with quantization) - Edge deployments and local-first setups - Faster inference when vision capabilities are needed but latency matters **Same Training Quality:** Identical Agent Interaction Dataset as the 4B modelβ€”same command understanding, same AGiXT fluency. **Available Formats:** SafeTensors (16-bit), GGUF (Q4_K_M, Q5_K_M, Q6_K) --- ### πŸ’¬ AGiXT-Qwen3-4B **Text Model | 4B Parameters** Our core text model, fine-tuned from Qwen3-4B-Instruct-2507 on the Agent Interaction Dataset. **What It Learned:** - **AGiXT Command Execution**: Native understanding of the `` XML format with proper command names and parameters - **Thinking-First Approach**: Uses `` blocks to reason through problems before executing commands - **Tool Delegation**: Knows when to use "Ask GitHub Copilot" for coding vs. using built-in abilities - **Extension Awareness**: Understands capabilities across github_copilot, web_browsing, postgres_database, essential_abilities, and dozens more - **Structured Responses**: Consistent `` formatting for clean integration with AGiXT's response parsing **Available Formats:** SafeTensors (16-bit), GGUF (Q4_K_M, Q5_K_M, Q6_K) --- ### ⚑ AGiXT-AbilitySelect-270m **Combined Ability Selection + Complexity Scoring | 270M Parameters** An ultra-compact dual-purpose model fine-tuned from Gemma-3-1B on the **AbilitySelect + Complexity Dataset (11,140 examples)**β€”trained to output both the best command AND a complexity score in a single inference. **Output Format:** `{score}|{ability}` (e.g., `45|Write to File`) **What It Learned:** - **Intent Classification**: Map natural language requests to specific AGiXT commands - **Complexity Scoring**: Rate task difficulty from 0-100 based on: - Task type (code generation, file ops, research, debugging) - Number of steps required - Whether expert-level reasoning is needed - **Extension Routing**: Know which of the 778+ commands best matches a request - **Unified Decision Making**: Score and ability inform each other for better accuracy **How It's Used in AGiXT:** This model runs as a fast "router" before the main agent model: 1. User sends a request 2. AbilitySelect returns `score|ability` in sub-100ms 3. AGiXT routes to the appropriate model based on complexity: - **Score 0-25** β†’ VL-2B (simple tasks: greetings, time, file listing) - **Score 26-50** β†’ VL-4B (moderate: file editing, searches) - **Score 51-75** β†’ VL-4B + thinking mode (complex: code generation, multi-step) - **Score 76-100** β†’ External API like Claude, Gemini, etc. (expert: multi-step code, debugging, architecture) 4. Result: Right-sized model for every task, faster responses, lower cost **Why a Combined Model?** - **One inference, two decisions**: Complexity and ability in a single call - **Speed**: 270M parameters = lightning fast inference (<50ms) - **Coherent routing**: Score and ability naturally inform each other - **Resource Efficiency**: Runs alongside larger models without competing for VRAM - **Simpler architecture**: One router model instead of two **Available Formats:** SafeTensors (16-bit), GGUF (Q4_K_M, Q5_K_M, Q6_K), ONNX (CPU inference) --- ## Why Fine-Tuned Models Matter for AGiXT ### The Problem with Generic LLMs Out-of-the-box models don't know AGiXT exists. They struggle with: - AGiXT's specific XML command syntax (`...`) - The thinking/answer response structure agents expect - When to delegate to GitHub Copilot vs. using other tools - The 778+ available commands and their proper parameters - Maintaining consistent behavior across multi-turn agent sessions ### What Fine-Tuning Fixes Our models were trained on **real AGiXT interaction patterns**: - βœ… Native command syntaxβ€”no more malformed XML - βœ… Proper delegationβ€”coding tasks go to Copilot, searches go to web_browsing - βœ… Correct parametersβ€”knows what each command needs - βœ… Consistent structureβ€”`` then `` then `` - βœ… Extension awarenessβ€”understands the full AGiXT ecosystem ## How AGiXT Uses These Models Together These four models work as an integrated system within AGiXT, not as standalone alternatives: ``` User Request: "Write a Python script to process CSV files" β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ AGiXT-AbilitySelect-270m β”‚ β”‚ Single inference, dual output β”‚ β”‚ (sub-50ms on CPU via ONNX) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό Returns: "65|Write to File" β”‚ (complexity=65, ability=Write to File) β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Complexity-Based Model Routing β”‚ β”‚ Score 65 = High complexity β”‚ β”‚ + Check if images attached β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”œβ”€β”€β”€ Score 0-25 ────────────► AGiXT-Qwen3-VL-2B (simple tasks) β”‚ "What time is it?" β†’ 8 β”‚ β”œβ”€β”€β”€ Score 26-50 ───────────► AGiXT-Qwen3-VL-4B (moderate tasks) β”‚ "Search for Python docs" β†’ 35 β”‚ β”œβ”€β”€β”€ Score 51-75 ───────────► AGiXT-Qwen3-VL-4B + thinking (complex) β”‚ "Write a CSV processor" β†’ 65 ◄── This request β”‚ └─── Score 76-100 ──────────► External API (Claude, Gemini, etc.) "Debug this race condition" β†’ 85 ``` ### The Flow Explained 1. **AbilitySelect First**: Every request hits the 270M model first. In a single sub-50ms inference, it returns both the complexity score (0-100) AND the most appropriate ability. No separate complexity calculation needed. 2. **Intelligent Routing**: The complexity score directly determines which model handles the request: - **0-25 (Simple)**: VL-2B handles greetings, time queries, basic file listings - **26-50 (Moderate)**: VL-4B for file editing, web searches, data retrieval - **51-75 (Complex)**: VL-4B with extended thinking for code generation, multi-step tasks - **76-100 (Expert)**: Routes to external APIs (Claude, Gemini, GPT-4, etc.) for multi-step code generation, debugging, architecture 3. **Ability Context**: The selected ability helps the main model focus. If AbilitySelect returns `65|Write to File`, the main model knows this is a file-writing task requiring code generation. 4. **Consistent Quality**: Because all three main models were trained on the same AGiXT dataset, they all produce properly-formatted commands with correct ``, ``, and `` structure. The routing is about efficiencyβ€”using the right-sized model for each task. 5. **Cost & Speed Optimization**: Simple queries get fast responses from VL-2B. Complex tasks get the full reasoning power of VL-4B. Expert tasks leverage external APIs. You're not paying 4B-model latency for "what time is it?" ## Deployment Options ### Full Precision (16-bit SafeTensors) Best for: Maximum quality, further fine-tuning, or when VRAM isn't a concern ### GGUF Quantizations | Quantization | Use Case | Memory Savings | |-------------|----------|----------------| | **Q6_K** | Best quality, production deployments | ~50% reduction | | **Q5_K_M** | Balanced quality and efficiency | ~60% reduction | | **Q4_K_M** | Resource-constrained environments | ~70% reduction | ## Getting Started All models are available on HuggingFace: - [JoshXT/AGiXT-Qwen3-VL-4B](https://huggingface.co/JoshXT/AGiXT-Qwen3-VL-4B) | [GGUF](https://huggingface.co/JoshXT/AGiXT-Qwen3-VL-4B-GGUF) - [JoshXT/AGiXT-Qwen3-VL-2B](https://huggingface.co/JoshXT/AGiXT-Qwen3-VL-2B) | [GGUF](https://huggingface.co/JoshXT/AGiXT-Qwen3-VL-2B-GGUF) - [JoshXT/AGiXT-Qwen3-4B](https://huggingface.co/JoshXT/AGiXT-Qwen3-4B) | [GGUF](https://huggingface.co/JoshXT/AGiXT-Qwen3-4B-GGUF) - [JoshXT/AGiXT-AbilitySelect-270m](https://huggingface.co/JoshXT/AGiXT-AbilitySelect-270m) | [GGUF](https://huggingface.co/JoshXT/AGiXT-AbilitySelect-270m-GGUF) | [ONNX](https://huggingface.co/JoshXT/AGiXT-AbilitySelect-270m-ONNX) ### Usage with ezLocalai (Recommended) [ezLocalai](https://github.com/DevXT-LLC/ezlocalai) is our recommended local inference serverβ€”it's designed to work seamlessly with AGiXT and supports all the features these models need. **Why ezLocalai?** We built it to be as easy as possible. Just tell it which model you wantβ€”ezLocalai handles everything else: - **Auto-detects your hardware**: Finds your GPU (NVIDIA/AMD) or falls back to CPU automatically - **Optimal settings out of the box**: Calculates max context length, temperature, top_p based on your available VRAM/RAM - **No configuration required**: No editing config files, no tuning parameters, no figuring out quantization levels - **Just start talking**: Pick a model, wait for download, start chatting ```bash # Install the CLI pip install ezlocalai # Start with AGiXT models ezlocalai start --model JoshXT/AGiXT-Qwen3-VL-4B-GGUF # Or run multiple models (comma-separated) ezlocalai start --model JoshXT/AGiXT-Qwen3-VL-4B-GGUF,JoshXT/AGiXT-AbilitySelect-270m-GGUF ``` Models are downloaded automatically on first use. Once running, access the OpenAI-compatible API at `http://localhost:8091`. **CLI Commands:** ```bash ezlocalai stop # Stop the container ezlocalai restart # Restart the container ezlocalai status # Check if running and show configuration ezlocalai logs # Show container logs ezlocalai update # Pull/rebuild latest images # Send prompts directly from CLI ezlocalai prompt "Hello, world!" ezlocalai prompt "What's in this image?" -image ./photo.jpg ``` ezLocalai handles: - Automatic GGUF downloading from HuggingFace - Vision model support with proper image handling - OpenAI-compatible API that AGiXT expects - GPU memory management for running multiple models ### Usage with Ollama ```bash # Create a Modelfile for each model cat > Modelfile << EOF FROM ./AGiXT-Qwen3-4B.Q5_K_M.gguf PARAMETER temperature 0.7 PARAMETER num_ctx 8192 EOF ollama create agixt-qwen3-4b -f Modelfile ollama run agixt-qwen3-4b ``` ### Usage with AGiXT Configure your AGiXT agent to use these models via the ezLocalai provider: ```yaml # Agent settings provider: ezlocalai model: AGiXT-Qwen3-4B vision_model: AGiXT-Qwen3-VL-4B ability_select_model: AGiXT-AbilitySelect-270m # Returns score|ability # Complexity-based routing thresholds (optional, these are defaults) complexity_routing: simple_max: 25 # Score 0-25 -> VL-2B moderate_max: 50 # Score 26-50 -> VL-4B complex_max: 75 # Score 51-75 -> VL-4B + thinking # Score 76-100 -> External API (GitHub Copilot) ``` AGiXT will automatically: 1. Run every request through AbilitySelect (sub-50ms via ONNX) 2. Parse the `score|ability` response 3. Route to the appropriate model based on complexity score 4. Pass the selected ability as context to the main model ## What's Next This release is version 1 of our AGiXT-optimized models. We're already working on: - **Larger Model Variants**: 7B and 14B versions for users who want maximum capability - **Expanded Training Data**: More extension coverage, more edge cases, more multi-turn examples - **Domain-Specific Fine-Tunes**: Models optimized for coding agents, research agents, automation agents - **Continuous Improvement**: As AGiXT adds new extensions, we'll update the training data and retrain ## Training Details - **Framework**: [Unsloth](https://github.com/unslothai/unsloth) (2x faster training, 60% less memory) - **Hardware**: NVIDIA RTX 4090 (24GB) - **Training Method**: LoRA fine-tuning (r=64, alpha=128) - **Epochs**: 2 per model - **Quantization**: GGUF via llama.cpp (Q4_K_M, Q5_K_M, Q6_K) ## Acknowledgments These models were fine-tuned using [Unsloth](https://github.com/unslothai/unsloth), which enabled 2x faster training with significant memory savings. Base models provided by [Qwen](https://huggingface.co/Qwen) and [Google](https://huggingface.co/google). --- **License:** Apache 2.0 **Questions or Feedback?** Open an issue on [AGiXT GitHub](https://github.com/Josh-XT/AGiXT) or join our community discussions.