Qwen2.5-3B-Instruct Fine-tuned on Urdu GSM8K
Model Description
This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct specifically trained on Urdu mathematical reasoning tasks. The model was trained on the PuristanLabs1/GSM8K_Urdu dataset, enabling it to solve grade school math problems with step by step reasoning in Urdu (اردو).
- Base Model: Qwen2.5-3B-Instruct
- Fine-tuning Method: QLoRA (4-bit quantization)
- Dataset: PuristanLabs1/GSM8K_Urdu (~6,365 examples)
- Training Duration: 3 epochs, 1,074 steps
- Language: Urdu (اردو)
- Task: Mathematical reasoning and problem solving
Training Details
Training Configuration
| Parameter | Value |
|---|---|
| Base Model | Qwen2.5-3B-Instruct |
| Training Method | QLoRA (4-bit) |
| LoRA Rank (r) | 16 |
| LoRA Alpha | 16 |
| LoRA Dropout | 0.0 |
| Learning Rate | 2e-4 |
| Scheduler | Cosine with warmup |
| Warmup Ratio | 0.1 |
| Batch Size | 2 per device |
| Gradient Accumulation | 8 steps |
| Effective Batch Size | 16 |
| Max Sequence Length | 1024 tokens |
| Optimizer | AdamW 8-bit |
| Training Epochs | 3 |
| Total Steps | 1,074 |
| Training Time | ~15 hours |
Training Metrics
| Metric | Initial | Final | Improvement |
|---|---|---|---|
| Training Loss | 0.8758 | 0.5461 | ↓ 37.6% |
| Validation Loss | 0.8272 | 0.5502 | ↓ 33.5% |
| Best Validation Loss | - | 0.5502 | @ step 400 |
Dataset Statistics
- Total Examples: 6,365
- Training Set: 5,728 examples (90%)
- Validation Set: 637 examples (10%)
- Average Question Length: 242 characters
- Average Reasoning Length: 265 characters
Trainable Parameters
- Total Parameters: 3,115,872,256
- Trainable Parameters: 29,933,568 (0.96%)
- Training Method: Parameter-efficient fine-tuning with LoRA
Usage
Installation
pip install unsloth transformers accelerate
Basic Usage
from unsloth import FastLanguageModel
# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
"PuristanLabs1/qwen2.5-3B-GSM8K-urdu",
max_seq_length=1024,
load_in_4bit=True,
)
# Enable inference mode
FastLanguageModel.for_inference(model)
# Prepare your question
question = "احمد کے پاس 15 سیب ہیں۔ وہ اپنے 3 دوستوں میں برابر تقسیم کرنا چاہتا ہے۔ ہر دوست کو کتنے سیب ملیں گے؟"
# Format prompt
prompt = f"""<|im_start|>system
آپ ایک ریاضی کے ماہر ہیں جو اردو میں مسائل حل کرتے ہیں۔ ہر مسئلے کو قدم بہ قدم حل کریں۔<|im_end|>
<|im_start|>user
{question}<|im_end|>
<|im_start|>assistant
"""
# Generate response
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
eos_token_id=tokenizer.encode("<|im_end|>", add_special_tokens=False)[0],
)
# Extract answer
response = tokenizer.decode(outputs[0], skip_special_tokens=False)
answer = response.split("<|im_start|>assistant")[-1].split("<|im_end|>")[0].strip()
print(answer)
Output:
ہر دوست کو 15/3=<<15/3=5>>5 سیب ملتے ہیں۔
Advanced Usage with Custom Parameters
# For more creative responses
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.8,
top_p=0.95,
do_sample=True,
repetition_penalty=1.1,
)
# For more deterministic responses
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.3,
top_p=0.9,
do_sample=True,
)
Example Outputs
Example 1: Simple Division
Question: احمد کے پاس 15 سیب ہیں۔ وہ اپنے 3 دوستوں میں برابر تقسیم کرنا چاہتا ہے۔ ہر دوست کو کتنے سیب ملیں گے؟
Model Output:
ہر دوست کو 15/3=<<15/3=5>>5 سیب ملتے ہیں۔
Example 2: Multi-step Problem
Question: ایک دکان میں 5 قلم 125 روپے میں ملتے ہیں۔ اگر میں 12 قلم خریدوں تو کتنے روپے خرچ ہوں گے؟
Model Output:
ایک قلم کی قیمت 125/5 = <<125/5=25>>25 ڈالر ہے۔
لہذا، 12 قلم کی قیمت 25*12 = <<25*12=300>>300 ڈالر ہے۔
Example 3: Complex Word Problem
Question: احمد کے پاس 50 روپے تھے۔ اس نے 15 روپے کی کتاب اور 12 روپے کا قلم خریدا۔ پھر اس کے والد نے اسے 30 روپے دیے۔ اب احمد کے پاس کتنے روپے ہیں؟
Model Output:
اُس نے کتابوں اور قلم پر $15 + $12 = $<<15+12=27>>27 خرچ کیے۔
اُس کے پاس $50 - $27 = $<<50-27=23>>23 باقی بچے۔
اُس کے والد نے اُسے $30 مزید ملے، تو اب اُس کے پاس $23 + $30 = $<<23+30=53>>53 ہیں۔
Performance
Accuracy on Test Set
- Mathematical Correctness: 100% on tested examples
- Step-by-step Reasoning: Excellent
- Urdu Fluency: Very Good
- Multi-step Problems: Handles well
Strengths
✅ Accurate Calculations - Performs arithmetic operations correctly
✅ Step-by-step Reasoning - Shows work using <<calculation>> format
✅ Multi-step Problems - Handles complex word problems with multiple operations
✅ Urdu Fluency - Generates natural Urdu text
✅ Consistent Format - Follows GSM8K-style reasoning format
Known Limitations
Currency Symbol Inconsistency
The model sometimes uses "$" or "ڈالر" (dollar) instead of "روپے" (rupees) in responses, even when the question uses "روپے". This is an artifact from the original GSM8K dataset which uses dollars.
Impact: This does not affect mathematical accuracy, only the currency symbol used in the output.
Planned Fix: This will be addressed in the next version.
Real-world Constraints
The model may not always recognize practical constraints (e.g., calculating 7.5 students per group when dividing 45 students into 6 groups). It provides mathematically correct answers but may not account for real-world impossibilities.
Other Limitations
- Trained on grade-school level math (GSM8K difficulty)
- May struggle with very advanced mathematical concepts
- Limited to problems that can be solved with basic arithmetic
- Best performance on problems similar to training data
Intended Use
Primary Use Cases
✅ Educational tools for Urdu-speaking students
✅ Math tutoring applications
✅ Automated homework assistance
✅ Mathematical reasoning research
✅ Urdu NLP benchmarking
Out of Scope
❌ Advanced mathematics (calculus, linear algebra, etc.)
❌ Financial calculations requiring precision
❌ Real-time production systems without validation
❌ Medical or safety-critical applications
Ethical Considerations
- Educational Aid: This model is designed to assist learning, not replace teachers
- Verification Required: Always verify model outputs, especially in educational settings
- Language Preservation: Contributes to Urdu language technology development
- Accessibility: Makes mathematical reasoning tools available in Urdu
Future Improvements
The following improvements are planned for v2:
- Currency Symbol Fix - Replace "$" with "روپے" in outputs
- Extended Training - More epochs for better convergence
- Larger Dataset - Include more diverse Urdu math problems
- Real-world Constraints - Add training data for practical limitations
- Advanced Math - Expand to higher-level mathematical concepts
Model Card Authors
PuristanLabs
Citation
If you use this model in your research or applications, please cite:
@misc{qwen25-math-urdu-2025,
author = {PuristanLabs},
title = {Qwen2.5-Math-7B Fine-tuned on Urdu GSM8K},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/PuristanLabs1/qwen2.5-math-7b-GSM8K-urdu}},
}
Acknowledgments
License
This model is released under the Apache 2.0 License, consistent with the base Qwen2.5-Math model.
Contact
For questions, issues, or collaborations:
- HuggingFace: @PuristanLabs1
- Model Repository: qwen2.5-3B-GSM8K-urdu
Made with ❤️ for the Urdu-speaking community