Instructions to use AdvRahul/Axion-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use AdvRahul/Axion-4B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="AdvRahul/Axion-4B", filename="qwen3-4b-instruct-2507-q4_k_m.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use AdvRahul/Axion-4B with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf AdvRahul/Axion-4B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf AdvRahul/Axion-4B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf AdvRahul/Axion-4B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf AdvRahul/Axion-4B:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf AdvRahul/Axion-4B:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf AdvRahul/Axion-4B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf AdvRahul/Axion-4B:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf AdvRahul/Axion-4B:Q4_K_M
Use Docker
docker model run hf.co/AdvRahul/Axion-4B:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use AdvRahul/Axion-4B with Ollama:
ollama run hf.co/AdvRahul/Axion-4B:Q4_K_M
- Unsloth Studio new
How to use AdvRahul/Axion-4B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AdvRahul/Axion-4B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AdvRahul/Axion-4B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for AdvRahul/Axion-4B to start chatting
- Pi new
How to use AdvRahul/Axion-4B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf AdvRahul/Axion-4B:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "AdvRahul/Axion-4B:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use AdvRahul/Axion-4B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf AdvRahul/Axion-4B:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default AdvRahul/Axion-4B:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use AdvRahul/Axion-4B with Docker Model Runner:
docker model run hf.co/AdvRahul/Axion-4B:Q4_K_M
- Lemonade
How to use AdvRahul/Axion-4B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull AdvRahul/Axion-4B:Q4_K_M
Run and chat with the model
lemonade run user.Axion-4B-Q4_K_M
List all available models
lemonade list
llm.create_chat_completion(
messages = "No input example has been defined for this model task."
)AdvRahul/Axion-4B
A safety-enhanced version of Qwen3-4B-Instruct, optimized for reliable and responsible AI applications. ๐ก๏ธ
Axion-4B is a fine-tuned version of the powerful Qwen/Qwen3-4B-Instruct-2507 model. The primary enhancement in this version is its robust safety alignment, making it a more dependable choice for production environments and user-facing applications.
๐ Model Details
- Model Creator: AdvRahul
- Base Model: Qwen/Qwen3-4B-Instruct-2507
- Fine-tuning Focus: Enhanced Safety & Harmlessness via Red-Teaming
- Architecture: Qwen3
- Context Length: 262,144 tokens
- License: Based on the Tongyi Qianwen LICENSE of the original model.
๐ Model Description
Enhanced for Safety
The core purpose of Axion-4B is to provide a safer alternative for developers. The base model underwent extensive red-team testing using advanced protocols to significantly minimize the generation of harmful, biased, or inappropriate content.
Powerful Core Capabilities
While adding a crucial safety layer, Axion-4B retains the exceptional capabilities of its base model, including:
- Strong Logical Reasoning: Excels at complex problems in math, science, and logic.
- Advanced Instruction Following: Reliably adheres to user commands and constraints.
- Multi-lingual Knowledge: Covers a wide range of languages and cultural contexts.
- Massive 256K Context Window: Capable of understanding and processing very long documents.
- Excellent Coding & Tool Use: Proficient in code generation and agentic tasks.
๐ป Quickstart
You can use this model directly with the transformers library (version 4.51.0 or newer is recommended).
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# IMPORTANT: Use the model name for this repository
model_name = "AdvRahul/Axion-4B"
# Load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# Prepare the model input
prompt = "Give me a short introduction to large language models and their safety considerations."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# Generate text
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512 # Limiting for a concise example
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print("Response:", content)
Optimized Deployment
For high-throughput, production-ready deployment, you can use frameworks like vLLM or SGLang to serve the model via an OpenAI-compatible API.
vLLM:
Bash
vllm serve AdvRahul/Axion-4B --max-model-len 262144
SGLang:
Bash
python -m sglang.launch_server --model-path AdvRahul/Axion-4B --context-length 262144
Note: If you encounter out-of-memory (OOM) issues, consider reducing the max context length (e.g., --max-model-len 32768).
โ ๏ธ Ethical Considerations and Limitations
This model was fine-tuned with the explicit goal of improving safety and reducing harmful outputs. However, no AI model is completely immune to risks.
No Guarantees: While the safety alignment is significantly improved, it does not guarantee perfectly harmless outputs in all scenarios.
Inherited Biases: The model may still reflect biases present in the vast amount of data used to train its base model.
Factual Accuracy: Always fact-check critical information, as the model can generate plausible but incorrect statements.
Best Practice: It is strongly recommended that developers implement their own content moderation filters and safety guardrails as part of a comprehensive, defense-in-depth strategy. Thoroughly evaluate the model's performance and safety for your specific use case before deploying it to a live audience.
- Downloads last month
- 3
4-bit
Model tree for AdvRahul/Axion-4B
Base model
Qwen/Qwen3-4B-Instruct-2507
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="AdvRahul/Axion-4B", filename="qwen3-4b-instruct-2507-q4_k_m.gguf", )