Qwen2.5-7B-Instruct — Agent SFT

Fine-tuned model for AgentBench tasks (DB Bench + ALFWorld).

Base Model

Method: SFT with High-Rank LoRA (merged)
LoRA rank: 128
LoRA alpha: 256
LoRA dropout: 0
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Learning rate: 2e-05
Epochs: 2
Batch size: 1 (x8 gradient accumulation)
Warmup ratio: 0.1
LR scheduler: cosine
Weight decay: 0.05
Max grad norm: 1.0
Precision: bf16

This model can be served with vLLM:

vllm serve <model_path> --max-model-len 8192 --gpu-memory-utilization 0.95

Apache-2.0 (following the base model license)

Safetensors

Model size

8B params

Tensor type

BF16

Base model

Finetuned

Finetuned

this model