Phi4-ThinkMode
This is a fine-tuned version of unsloth/Phi-4 with enhanced reasoning capabilities using GRPO (1000 step) on the dataset gsm8k
Model details
- Base model: unsloth/Phi-4
- Fine-tuning: 16-bit precision
- Use case: Improved reasoning and thinking mode