🇮🇳 AI Kurukh (Oraon) Translator Model
Model Description
This is a fine-tuned Google mT5 (Multilingual Text-to-Text Transfer Transformer) model optimized for translating between Hindi and Kurukh (Kurux), a Dravidian language spoken by nearly 2 million people in India (Jharkhand, Chhattisgarh, Odisha, West Bengal).
This model was developed to bridge the digital divide for tribal communities and support language preservation efforts using Artificial Intelligence.
- Developed by: [ankitklakra]
- Model Type: Encoder-Decoder Transformer (mT5-small)
- Language(s): Hindi (hi) ↔ Kurukh (kru)
- Fine-tuned from:
google/mt5-small
Intended Uses & Limitations
Intended Use
- Education: Assisting students in translating basic study materials.
- Communication: Bridging the gap between Hindi speakers and Kurukh tribal communities.
- Research: Serving as a baseline for future low-resource language models.
Limitations
- Data Scarcity: The model has been trained on a limited dataset (~1,000 sentences). It may hallucinate (make up words) for complex or unseen sentences.
- Context: It works best on short, daily-life sentences. It is not suitable for legal or medical translation yet.
Training Data
The model was trained on a custom-curated parallel corpus containing daily conversation pairs, agricultural terms, and general vocabulary.
- Optimization: Trained using Adafactor optimizer to prevent precision loss.
- Training Epochs: 60 (Aggressive fine-tuning for memorization).
How to Use
from transformers import pipeline
translator = pipeline("text2text-generation", model="ankitklakra/kurukh-to-hindi")
print(translator("निघै नामे इन्द्रा हिकै?"))
# Output: तुम्हारा नाम क्या है?
- Downloads last month
- 17