🇮🇳 AI Kurukh (Oraon) Translator Model

Model Description

This is a fine-tuned Google mT5 (Multilingual Text-to-Text Transfer Transformer) model optimized for translating between Hindi and Kurukh (Kurux), a Dravidian language spoken by nearly 2 million people in India (Jharkhand, Chhattisgarh, Odisha, West Bengal).

This model was developed to bridge the digital divide for tribal communities and support language preservation efforts using Artificial Intelligence.

  • Developed by: [ankitklakra]
  • Model Type: Encoder-Decoder Transformer (mT5-small)
  • Language(s): Hindi (hi) ↔ Kurukh (kru)
  • Fine-tuned from: google/mt5-small

Intended Uses & Limitations

Intended Use

  • Education: Assisting students in translating basic study materials.
  • Communication: Bridging the gap between Hindi speakers and Kurukh tribal communities.
  • Research: Serving as a baseline for future low-resource language models.

Limitations

  • Data Scarcity: The model has been trained on a limited dataset (~1,000 sentences). It may hallucinate (make up words) for complex or unseen sentences.
  • Context: It works best on short, daily-life sentences. It is not suitable for legal or medical translation yet.

Training Data

The model was trained on a custom-curated parallel corpus containing daily conversation pairs, agricultural terms, and general vocabulary.

  • Optimization: Trained using Adafactor optimizer to prevent precision loss.
  • Training Epochs: 60 (Aggressive fine-tuning for memorization).

How to Use

from transformers import pipeline

translator = pipeline("text2text-generation", model="ankitklakra/kurukh-to-hindi")
print(translator("निघै नामे इन्द्रा हिकै?"))
# Output: तुम्हारा नाम क्या है?
Downloads last month
17
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using ankitklakra/kurukh-to-hindi 1