hwr_text_ocr_rus
Handwritten word-level OCR (HWR) model for Russian.
This model is intended for recognizing cropped text snippets / single words from handwritten notebook images (not full-page OCR, use e.g. kotmayyaka/hwr_text_detection_rus).
For best results, feed tight word crops (or short token crops) with minimal surrounding background.
What’s inside
- Checkpoint:
ocr_model.ckpt - Inference helper code:
hwr_ocr.py—HWRTextOCRclass (load + preprocess + decode)inference.py— CLI example
Intended use
- ✅ Word-level handwritten recognition (Russian)
- ✅ Small cropped regions of text (one token / short piece)
- ❌ Not a full-page OCR pipeline (you need word/line detection & cropping)
- ❌ Not guaranteed to generalize to very different handwriting styles, paper types, or scanning conditions
Quickstart (inference)
1) Install dependencies
pip install torch torchvision pillow
2) Run CLI inference
python inference_ocr.py --image /path/to/word_crop.png --checkpoint ocr_model.ckpt
3) Use from Python
from PIL import Image
from hwr_ocr import HWRTextOCR
ocr = HWRTextOCR(checkpoint_path="ocr_model.ckpt", device="cpu")
img = Image.open("word_crop.png").convert("RGB")
text = ocr.predict(img)
print(text)
Input recommendations
- Prefer tight crops around a single word.
- Avoid large margins; background clutter reduces accuracy.
- If you have a full line/page image, run a detector/segmenter first and then recognize each crop.
Output
- The model outputs a single string (recognized word/text snippet).
Evaluation
Metrics reported in the model card header were obtained on an internal mixed validation split based on:
- ai-forever/school_notebooks_RU
- ai-forever/school_notebooks_EN
License
- MIT
Datasets used to train kotmayyaka/hwr_text_ocr_rus
Evaluation results
- Character Error Rate (CER) on ai-forever/school_notebooks_RU + ai-forever/school_notebooks_EN (validation mix)validation set Internal evaluation on mixed validation set0.049
- Word Error Rate (WER) on ai-forever/school_notebooks_RU + ai-forever/school_notebooks_EN (validation mix)validation set Internal evaluation on mixed validation set0.197
- Loss on ai-forever/school_notebooks_RU + ai-forever/school_notebooks_EN (validation mix)validation set Internal evaluation on mixed validation set1.064
- Average Accuracy on ai-forever/school_notebooks_RU + ai-forever/school_notebooks_EN (validation mix)validation set Internal evaluation on mixed validation set0.815
- Fuzzy score on ai-forever/school_notebooks_RU + ai-forever/school_notebooks_EN (validation mix)validation set Internal evaluation on mixed validation set95.038
- Normalized Levenshtein distance on ai-forever/school_notebooks_RU + ai-forever/school_notebooks_EN (validation mix)validation set Internal evaluation on mixed validation set0.255