Qwen 3 4B – Multilingual Fine-Tuned

This is a fine-tuned version of Qwen 3 4B, optimized using the agentlans/multilingual-sft dataset to improve performance across 100+ languages and dialects.

Compared to the original Qwen 3 4B, this model focuses on clear, concise outputs, minimizing verbose reasoning. It's designed as a compact, multilingual alternative similar in behaviour to the Aya models.

πŸ’‘ Intended Use

  • Enhanced multilingual support for over 100 languages
  • Generates short, direct answers rather than long chain-of-thought responses
  • Suitable for general-purpose multilingual tasks where speed and clarity matter

⚠️ Limitations

  • Inherits known biases and limitations from the base Qwen 3 4B model
  • Performance may vary across languages and specific domains
  • Not intended for highly specialized or low-resource language tasks
  • Optimized for single-turn question answering, not for long conversations

πŸ› οΈ Training Details

Dataset

Method

  • Fine-tuned using LoRA (Low-Rank Adaptation)
    • rank=32, alpha=64, dropout=0.3
  • Quantized to 4-bit precision using the BnB method
  • Attention boosted with FlashAttention 2

Hyperparameters

  • Learning rate: 5e-5
  • Batch size: 1 (with gradient accumulation for effective batch size of 8)
  • Optimizer: AdamW (betas=(0.9, 0.999), epsilon=1e-8)
  • Scheduler: Cosine decay
  • Epochs: 1
  • Random seed: 42

Software

  • peft==0.15.1
  • transformers==4.51.3
  • torch==2.6.0+cu124
  • datasets==3.5.0
  • tokenizers==0.21.1

πŸ“„ License

This model is released under the Apache 2.0 License.

Downloads last month
6
Safetensors
Model size
4B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for agentlans/Qwen3-4B-multilingual-sft

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Finetuned
(348)
this model
Quantizations
1 model

Dataset used to train agentlans/Qwen3-4B-multilingual-sft