๐ Github | ๐ค Hugging Face | ๐ Cookbooks
๐ฅ๏ธ Demo
๐ Arabic-Qwen3.5-OCR-v4
Arabic-Qwen3.5-OCR-v4 is an advanced Optical Character Recognition (OCR) model, an improvement over Qwen/Qwen3.5-0.8B. This model is specifically designed for handling Arabic text, with enhanced performance for printed text. It excels in handling various text types, including handwritten, classical, and diacritical marks.
In this training, the model was given "thinking ability" at each stage of page reading and text generation. The model became better able to understand the complex context in the middle and end of a sentence, which transforms raw information from attention into a true understanding of language.
This version offers an improved methodology and significant enhancements to data generation, focusing on complex formats, low-quality document images, PDFs, photos, and diacritical marks.
๐ Full support for Arabic scripts. ๐ Diverse Text Types: Capable of reading Handwritten, Printed, Classical, and Voweled text. โก Fast Inference: Optimized for speed ~4 images/second . ๐ฏ High Accuracy:
CER < 5% for clear printed text. CER ~5-25% for complex handwritten text.
Datasets Used
The model was trained on a mix of synthetic and real-world data, including:
Internal Synthetic Data: Generated with extensive variations. Public Datasets and Previous Model Data: Enhanced with new samples for better robustness.
Data Augmentation & Layouts
Fonts: Over 70 Arabic fonts (e.g., Amiri, Traditional Arabic, Sakkal Majalla, Scheherazade). Degradation: Physics-based optical simulation to apply realistic scan artifacts (paper texture, ink bleed, blur, warping, ISO noise) with intensities ranging from Light to Heavy.
๐ง Understands complex page structures:
- Poetry formatting (multi-line verse)
- Footnotes & references
- Marginal annotations
- Multi-column academic layouts
Comparison: v3 vs v4
| Performance Metric | v3 | v4 | Performance Delta |
|---|---|---|---|
| โฑ๏ธ Time per Image | 0.31 seconds | 0.25 seconds | +25% Faster |
| ๐ Images per Second | 3.23 images | 4 images | 20% throughput |
| โก Printed Performance | 70% | 90% | 30% percentage points |
| ๐ Page per Second | seconds | 3.5 seconds | Faster Average of 100 samples |
Layout:
Single column / Multi-column Headers / Footers Page numbers Footnotes Poetry blocks Tables Marginal notes
Page Characteristics:
โ 2-column layout detection โ Poetry-style split lines โ Footnotes at bottom โ Marginalia (side notes) โ Numbered lists โ Sparse English technical terms โ Diacritics (ุชุดููู)
๐ฎ Future Roadmap (Next Version v5)
Planned improvements for the upcoming release include:
- ๐ Table extraction (structured OCR)
- ๐งพ Invoice & receipt parsing
- ๐ Full document understanding (DocAI)
- ๐ค Improving some other languages
- โก Ultra-light quantized version (<500MB)
๐ก Why this model?
Unlike traditional OCR systems, this model:
- Understands layout (not just text)
- Handles Arabic diacritics natively
- Works on both printed and handwritten text
- Is optimized for real-world noisy scans
๐ผ๏ธ Visualizations.
![]() |
![]() |
![]() |
![]() |
๐ ๏ธ How to use
import os
import torch
from PIL import Image
from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration
from qwen_vl_utils import process_vision_info
# ==================== โ๏ธ ุฅุนุฏุงุฏุงุช ุงูุฌูุงุฒ ====================
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if device == "cuda" else torch.float32
# ==================== ๐ ุชุญู
ูู ุงููู
ูุฐุฌ ====================
print("[INFO] Loading model...")
model_path = "sherif1313/Arabic-Qwen3.5-OCR-v4" # โ ุบููุฑ ูู
ุณุงุฑ ูู
ูุฐุฌู
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
model = Qwen3_5ForConditionalGeneration.from_pretrained(
model_path,
dtype=dtype,
device_map="auto" if device == "cuda" else None,
trust_remote_code=True
)
model.eval()
print("[INFO] Model loaded!")
# ==================== ๐ ุฏุงูุฉ ุงูุงุณุชุฏูุงู (ุชูุนุฑูู ุฃููุงู!) ====================
def extract_text(image_path: str, prompt: str = None) -> str:
"""ุงุณุชุฎุฑุงุฌ ุงููุต ู
ู ุตูุฑุฉ ูุงุญุฏุฉ"""
if prompt is None:
prompt = "ุงูุฑุฃ ุงููุต ูู ูุฐู ุงูุตูุฑุฉ ูุงู
ูุงู ู
ู ุงูุจุฏุงูุฉ ุฅูู ุงูููุงูุฉ."
image = Image.open(image_path).convert("RGB")
# ุถุจุท ุงูุฃุจุนุงุฏ ูู
ุถุงุนูุงุช 64
w, h = image.size
new_w = ((w + 63) // 64) * 64
new_h = ((h + 63) // 64) * 64
image = image.resize((new_w, new_h), Image.LANCZOS)
messages = [{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": prompt}
]
}]
text_input = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, _ = process_vision_info(messages)
inputs = processor(
text=[text_input],
images=image_inputs,
padding=True,
return_tensors="pt"
).to(device)
with torch.no_grad():
generated_ids = model.generate(
**inputs,
max_new_tokens=512,
do_sample=False,
repetition_penalty=1.2,
no_repeat_ngram_size=3,
pad_token_id=processor.tokenizer.pad_token_id,
eos_token_id=processor.tokenizer.eos_token_id,
)
input_len = inputs.input_ids.shape[1]
output_text = processor.batch_decode(
generated_ids[:, input_len:],
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)[0]
return output_text.strip()
# ==================== ๐ ููุทุฉ ุงูุฏุฎูู (ุชูุณุชุฏุนู ุจุนุฏ ุชุนุฑูู ุงูุฏุงูุฉ) ====================
if __name__ == "__main__":
# โ
ุงูุขู ูู
ูู ุงุณุชุฏุนุงุก extract_text ูุฃููุง ู
ูุนุฑููุฉ ุฃุนูุงู
image_path = "/home/sheriff/Downloads/PIC.png"
if os.path.exists(image_path):
print(f"๐ Processing: {image_path}")
result = extract_text(image_path)
print(f"๐ Extracted Text:\n{result}")
else:
print(f"โ File not found: {image_path}")
๐ ๏ธ How to use it web
import os
import time
import torch
from PIL import Image
import gradio as gr
from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration
from qwen_vl_utils import process_vision_info
# ==================== โ๏ธ ุฅุนุฏุงุฏุงุช ุงูุฌูุงุฒ ====================
if torch.cuda.is_available():
device = "cuda"
dtype = torch.float16
print(f"โ
Using GPU: {torch.cuda.get_device_name(0)}")
elif torch.backends.mps.is_available():
device = "mps"
dtype = torch.float16
print("โ
Using Apple Silicon (MPS)")
else:
device = "cpu"
dtype = torch.float32
print("โ ๏ธ Using CPU (slower inference)")
print(f"[INFO] Device: {device} | Dtype: {dtype}")
# ==================== ๐ ุชุญู
ูู ุงููู
ูุฐุฌ ====================
def load_model():
"""ุชุญู
ูู ุงููู
ูุฐุฌ ูุงูู
ุนุงูุฌ ู
ุน ุฅุฏุงุฑุฉ ุงูุฐุงูุฑุฉ"""
model_path = os.getenv("MODEL_PATH", "sherif1313/Arabic-Qwen3.5-OCR-v4")
print(f"[INFO] Loading model from: {model_path}")
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
model = Qwen3_5ForConditionalGeneration.from_pretrained(
model_path,
dtype=dtype,
device_map="auto" if device == "cuda" else None,
trust_remote_code=True,
low_cpu_mem_usage=True,
)
model.eval()
print("[INFO] Model loaded successfully!")
return model, processor
# ุชุญู
ูู ุนุงูู
ู (ูุชู
ู
ุฑุฉ ูุงุญุฏุฉ ุนูุฏ ุจุฏุก ุงูุชุทุจูู)
try:
model, processor = load_model()
except Exception as e:
print(f"[ERROR] Failed to load model: {e}")
model = None
processor = None
# ==================== ๐งน ุฏูุงู ู
ุณุงุนุฏุฉ ====================
def prepare_image(image: Image.Image, max_size: int = 768) -> Image.Image:
"""ุชุญุถูุฑ ุงูุตูุฑุฉ: ุถุบุท + ุถุจุท ุงูุฃุจุนุงุฏ ูู
ุถุงุนูุงุช 64"""
if max(image.size) > max_size:
image.thumbnail((max_size, max_size), Image.Resampling.LANCZOS)
w, h = image.size
new_w = ((w + 63) // 64) * 64
new_h = ((h + 63) // 64) * 64
if (new_w, new_h) != image.size:
image = image.resize((new_w, new_h), Image.Resampling.LANCZOS)
return image
def clean_output(text: str, max_repetitions: int = 2) -> str:
"""ุชูุธูู ุงูุชูุฑุงุฑ ูู ุงูู
ุฎุฑุฌุงุช"""
if not text:
return text
import re
text = re.sub(r'(.)\1{4,}', r'\1\1\1', text)
lines = text.strip().split('\n')
cleaned = []
seen = {}
for line in lines:
line_stripped = line.strip()
if not line_stripped:
continue
count = seen.get(line_stripped, 0) + 1
if count <= max_repetitions:
cleaned.append(line)
seen[line_stripped] = count
return '\n'.join(cleaned).strip()
# ==================== ๐ ุฏุงูุฉ ุงูุงุณุชุฏูุงู ====================
def extract_text(image, prompt: str = None) -> tuple[str, str]:
"""ุงุณุชุฎุฑุงุฌ ุงููุต ู
ู ุงูุตูุฑุฉ"""
if model is None or processor is None:
return "โ Error: Model not loaded", "0.00"
if image is None:
return "โ ๏ธ Please upload an image", "0.00"
start_time = time.time()
try:
if isinstance(image, str):
image_pil = Image.open(image).convert("RGB")
elif isinstance(image, Image.Image):
image_pil = image.convert("RGB")
else:
image_pil = Image.fromarray(image).convert("RGB")
image_pil = prepare_image(image_pil)
if prompt is None or not prompt.strip():
prompt = "ุงูุฑุฃ ุงููุต ูู ูุฐู ุงูุตูุฑุฉ ูุงู
ูุงู ู
ู ุงูุจุฏุงูุฉ ุฅูู ุงูููุงูุฉ."
messages = [{
"role": "user",
"content": [
{"type": "image", "image": image_pil},
{"type": "text", "text": prompt}
]
}]
text_input = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, _ = process_vision_info(messages)
inputs = processor(
text=[text_input],
images=image_inputs,
padding=True,
return_tensors="pt"
).to(device)
with torch.inference_mode():
generated_ids = model.generate(
**inputs,
max_new_tokens=512,
do_sample=False,
temperature=1.0,
repetition_penalty=1.2,
no_repeat_ngram_size=3,
pad_token_id=processor.tokenizer.pad_token_id,
eos_token_id=processor.tokenizer.eos_token_id,
)
input_len = inputs.input_ids.shape[1]
output_text = processor.batch_decode(
generated_ids[:, input_len:],
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)[0]
output_text = clean_output(output_text.strip())
elapsed = time.time() - start_time
return output_text, f"{elapsed:.2f} seconds"
except torch.cuda.OutOfMemoryError:
torch.cuda.empty_cache()
return "โ Out of Memory. Try a smaller image.", "0.00"
except Exception as e:
print(f"[ERROR] {e}")
import traceback
traceback.print_exc()
return f"โ Error: {str(e)}", "0.00"
# ==================== ๐จ ูุงุฌูุฉ Gradio ====================
def create_interface():
"""ุฅูุดุงุก ูุงุฌูุฉ ุงูู
ุณุชุฎุฏู
"""
with gr.Blocks(
title="Arabic OCR - Qwen3.5-0.8B",
theme=gr.themes.Soft(),
css="""
.header { text-align: center; margin-bottom: 20px; }
.output-box { min-height: 200px; }
"""
) as demo:
gr.Markdown("""
# ๐ Arabic Handwritten & Printed OCR V4
### Powered by Qwen3.5-0.8B
Upload an image containing Arabic text, and the model will extract it.
โจ **Features:**
- ๐ Arabic support
- โ๏ธ Handwritten & printed text
- ๐ค Preserves diacritics (ุชุดููู)
- โก Full precision (no quantization)
""", elem_classes="header")
with gr.Row():
with gr.Column(scale=1):
# โ
ุชุนุฑูู ุงูู
ูููุงุช ุฃููุงู
image_input = gr.Image(
label="๐ท Upload Image",
type="pil",
height=300,
sources=["upload", "clipboard"]
)
prompt_input = gr.Textbox(
label="๐ Custom Prompt (Optional)",
placeholder="ุงูุฑุฃ ุงููุต ูู ูุฐู ุงูุตูุฑุฉ...",
value="ุงูุฑุฃ ุงููุต ูู ูุฐู ุงูุตูุฑุฉ ูุงู
ูุงู ู
ู ุงูุจุฏุงูุฉ ุฅูู ุงูููุงูุฉ.",
lines=2
)
submit_btn = gr.Button(
"๐ Extract Text",
variant="primary",
size="lg"
)
# โ
ุงูุฃู
ุซูุฉ ุฏุงุฎู ุงูุฏุงูุฉ - ู
ุณุงุฑุงุช ู
ุญููุฉ ููุท (ูุง ุฑูุงุจุท ุฎุงุฑุฌูุฉ)
# ูุฅุถุงูุฉ ุฃู
ุซูุฉุ ุงูุณุฎ ุงูุตูุฑ ุฅูู ู
ุฌูุฏ 'examples/' ูู ู
ุณุชูุฏุน ุงูู Space
# ุซู
ุงุณุชุฎุฏู
: examples=[["examples/sample1.jpg"], ...]
gr.Examples(
label="๐ Examples (Optional)",
examples=[
], # ุงุชุฑููุง ูุงุฑุบุฉ ุฃู ุงุณุชุฎุฏู
ู
ุณุงุฑุงุช ู
ุญููุฉ
inputs=[image_input], # โ
ุงูุขู ูุนู
ู ูุฃู image_input ู
ูุนุฑูู ุฃุนูุงู
cache_examples=False
)
with gr.Column(scale=1):
output_text = gr.Textbox(
label="๐ Extracted Text",
lines=12,
show_copy_button=True,
elem_classes="output-box"
)
time_output = gr.Textbox(
label="โฑ๏ธ Inference Time",
interactive=False,
value="-"
)
clear_btn = gr.Button("๐๏ธ Clear", variant="secondary")
# โ
ุฑุจุท ุงูุฃุญุฏุงุซ (ุจุนุฏ ุชุนุฑูู ุฌู
ูุน ุงูู
ูููุงุช)
submit_btn.click(
fn=extract_text,
inputs=[image_input, prompt_input],
outputs=[output_text, time_output]
)
clear_btn.click(
fn=lambda: (None, "", "-"),
inputs=[],
outputs=[image_input, prompt_input, time_output]
)
gr.Markdown("""
### ๐ก Tips for Best Results:
1. Use clear, well-lit images
2. Crop to the text region if possible
3. For handwritten text, ensure good contrast
4. Custom prompts can improve accuracy for specific formats
""")
return demo # โ
ุฅุฑุฌุงุน ุงูู demo
# ==================== ๐ ููุทุฉ ุงูุฏุฎูู ====================
if __name__ == "__main__":
print("[INFO] Creating Gradio interface...")
demo = create_interface()
# ุฅุนุฏุงุฏุงุช ุงูุชุดุบูู ูู Spaces
demo.launch(
server_name="0.0.0.0",
server_port=int(os.getenv("PORT", 7860)),
share=False,
debug=os.getenv("DEBUG", "false").lower() == "true",
show_error=True
)
๐ ๏ธ How to use it PDF
pip install pdf2image poppler-utils pymupdf
python pdf.py --pdf /home/sheriff/Desktop/222.pdf --output result.txt
import os
import sys
import time
import torch
from PIL import Image
from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration
from qwen_vl_utils import process_vision_info
import fitz # PyMuPDF
# ==================== โ๏ธ ุฅุนุฏุงุฏุงุช ุงูุฌูุงุฒ ====================
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
DTYPE = torch.float16 if DEVICE == "cuda" else torch.float32
print(f"[INFO] Using device: {DEVICE} | dtype: {DTYPE}")
# ==================== ๐ ุชุญู
ูู ุงููู
ูุฐุฌ ====================
def load_model(model_path: str):
"""ุชุญู
ูู ุงููู
ูุฐุฌ ูุงูู
ุนุงูุฌ"""
print(f"[INFO] Loading model from: {model_path}")
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
model = Qwen3_5ForConditionalGeneration.from_pretrained(
model_path,
torch_dtype=DTYPE,
device_map="auto" if DEVICE == "cuda" else None,
trust_remote_code=True,
low_cpu_mem_usage=True
)
model.eval()
print("[INFO] โ
Model loaded successfully!")
return model, processor
# ==================== ๐ผ๏ธ ุชุญููู ุตูุญุฉ PDF ุฅูู ุตูุฑุฉ ====================
def pdf_page_to_image(pdf_path: str, page_num: int, dpi: int = 150) -> Image.Image:
"""ุชุญููู ุตูุญุฉ ู
ู ู
ูู PDF ุฅูู ุตูุฑุฉ PIL"""
doc = fitz.open(pdf_path)
page = doc[page_num]
# ุฅุนุฏุงุฏ ู
ุตูููุฉ ุงูุชูุจูุฑ ููุฏูุฉ ุงูู
ุทููุจุฉ
zoom = dpi / 72 # 72 DPI ูู ุงูุงูุชุฑุงุถู ูู PDF
mat = fitz.Matrix(zoom, zoom)
# ุงูุญุตูู ุนูู ุงูุตูุฑุฉ
pix = page.get_pixmap(matrix=mat)
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
doc.close()
return img
# ==================== ๐งน ุชูุธูู ุงูู
ุฎุฑุฌุงุช ู
ู ุงูุชูุฑุงุฑ ====================
def clean_output(text: str, max_repetitions: int = 2) -> str:
"""ุฅุฒุงูุฉ ุงูุชูุฑุงุฑ ุงูู
ูุฑุท ูู ุงููุต ุงูู
ุณุชุฎุฑุฌ"""
import re
if not text:
return text
# ุฅุฒุงูุฉ ุชูุฑุงุฑ ุงูุญุฑูู ุงูู
ูุฑุท
text = re.sub(r'(.)\1{4,}', r'\1\1\1', text)
# ุฅุฒุงูุฉ ุชูุฑุงุฑ ุงูุฃุณุทุฑ
lines = text.strip().split('\n')
cleaned = []
seen = {}
for line in lines:
line_stripped = line.strip()
if not line_stripped:
continue
count = seen.get(line_stripped, 0) + 1
if count <= max_repetitions:
cleaned.append(line)
seen[line_stripped] = count
return '\n'.join(cleaned).strip()
# ==================== ๐ ุงุณุชุฎุฑุงุฌ ูุต ู
ู ุตูุฑุฉ ====================
def extract_text_from_image(model, processor, image: Image.Image, prompt: str = None) -> str:
"""ุงุณุชุฎุฑุงุฌ ุงููุต ู
ู ุตูุฑุฉ ูุงุญุฏุฉ ุจุงุณุชุฎุฏุงู
ุงููู
ูุฐุฌ"""
if prompt is None:
prompt = "ุงูุฑุฃ ุงููุต ูู ูุฐู ุงูุตูุฑุฉ ูุงู
ูุงู ู
ู ุงูุจุฏุงูุฉ ุฅูู ุงูููุงูุฉ."
# ุชุญุถูุฑ ุงูุตูุฑุฉ: ุถุจุท ุงูุฃุจุนุงุฏ ูู
ุถุงุนูุงุช 64
w, h = image.size
new_w = ((w + 63) // 64) * 64
new_h = ((h + 63) // 64) * 64
if (new_w, new_h) != (w, h):
image = image.resize((new_w, new_h), Image.Resampling.LANCZOS)
messages = [{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": prompt}
]
}]
text_input = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, _ = process_vision_info(messages)
inputs = processor(
text=[text_input],
images=image_inputs,
padding=True,
return_tensors="pt"
).to(DEVICE)
with torch.inference_mode():
generated_ids = model.generate(
**inputs,
max_new_tokens=2048,
do_sample=False,
repetition_penalty=1.2,
no_repeat_ngram_size=3,
pad_token_id=processor.tokenizer.pad_token_id,
eos_token_id=processor.tokenizer.eos_token_id,
)
input_len = inputs.input_ids.shape[1]
output_text = processor.batch_decode(
generated_ids[:, input_len:],
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)[0]
return clean_output(output_text.strip())
# ==================== ๐ ู
ุนุงูุฌุฉ ู
ูู PDF ูุงู
ู ====================
def process_pdf(
pdf_path: str,
model,
processor,
output_path: str = None,
start_page: int = 0,
end_page: int = None,
dpi: int = 150,
prompt: str = None
) -> dict:
"""
ู
ุนุงูุฌุฉ ู
ูู PDF ูุงู
ู ูุงุณุชุฎุฑุงุฌ ุงููุต ู
ู ูู ุตูุญุฉ
Args:
pdf_path: ู
ุณุงุฑ ู
ูู ุงูู PDF
model: ุงููู
ูุฐุฌ ุงูู
ูุญู
ูู
processor: ู
ุนุงูุฌ ุงููู
ูุฐุฌ
output_path: ู
ุณุงุฑ ู
ูู ุงูู
ุฎุฑุฌุงุช (ุงุฎุชูุงุฑู)
start_page: ุฑูู
ุงูุตูุญุฉ ุงูุฃููู (0-ู
ููุฑุณ)
end_page: ุฑูู
ุงูุตูุญุฉ ุงูุฃุฎูุฑุฉ (None = ุญุชู ุงูููุงูุฉ)
dpi: ุฏูุฉ ุชุญููู ุงูุตูุญุฉ ุฅูู ุตูุฑุฉ
prompt: ุงูุจุฑูู
ุจุช ุงูู
ุณุชุฎุฏู
ููุงุณุชุฎุฑุงุฌ
Returns:
dict: {
'total_pages': int,
'processed_pages': int,
'results': [ { 'page': int, 'text': str, 'time': float }, ... ],
'total_time': float
}
"""
import fitz
doc = fitz.open(pdf_path)
total_pages = len(doc)
if end_page is None:
end_page = total_pages
results = []
total_start = time.time()
print(f"[INFO] Processing: {pdf_path}")
print(f"[INFO] Pages: {start_page+1} to {end_page} of {total_pages}")
for page_num in range(start_page, min(end_page, total_pages)):
page_start = time.time()
try:
# ุชุญููู ุงูุตูุญุฉ ุฅูู ุตูุฑุฉ
image = pdf_page_to_image(pdf_path, page_num, dpi=dpi)
# ุงุณุชุฎุฑุงุฌ ุงููุต
text = extract_text_from_image(model, processor, image, prompt)
page_time = time.time() - page_start
results.append({
'page': page_num + 1, # ุตูุญุงุช ู
ููุฑุณุฉ ู
ู 1
'text': text,
'time': round(page_time, 2),
'image_size': image.size
})
print(f"[โ] Page {page_num+1}/{total_pages} | Time: {page_time:.2f}s | Chars: {len(text)}")
except Exception as e:
print(f"[โ] Page {page_num+1} Error: {str(e)}")
results.append({
'page': page_num + 1,
'text': f"[ERROR: {str(e)}]",
'time': 0,
'error': True
})
total_time = time.time() - total_start
doc.close()
# ุญูุธ ุงููุชุงุฆุฌ ูู ู
ูู ูุตู ุฅุฐุง ุทููุจ
if output_path:
save_results_to_file(results, output_path)
print(f"[INFO] Results saved to: {output_path}")
return {
'total_pages': total_pages,
'processed_pages': len(results),
'results': results,
'total_time': round(total_time, 2),
'avg_time_per_page': round(total_time / len(results), 2) if results else 0
}
# ==================== ๐พ ุญูุธ ุงููุชุงุฆุฌ ====================
def save_results_to_file(results: list, output_path: str, format: str = 'txt'):
"""ุญูุธ ูุชุงุฆุฌ ุงูุงุณุชุฎุฑุงุฌ ูู ู
ูู"""
os.makedirs(os.path.dirname(output_path) or '.', exist_ok=True)
if format == 'txt':
with open(output_path, 'w', encoding='utf-8') as f:
for item in results:
f.write(f"\n{'='*60}\n")
f.write(f"๐ ุงูุตูุญุฉ {item['page']}\n")
f.write(f"โฑ๏ธ ุงูููุช: {item['time']} ุซุงููุฉ\n")
f.write(f"{'='*60}\n\n")
f.write(item['text'])
f.write("\n\n")
elif format == 'json':
import json
with open(output_path, 'w', encoding='utf-8') as f:
json.dump(results, f, ensure_ascii=False, indent=2)
elif format == 'md':
with open(output_path, 'w', encoding='utf-8') as f:
f.write("# ๐ ูุชุงุฆุฌ ุงุณุชุฎุฑุงุฌ ุงููุต ู
ู PDF\n\n")
for item in results:
f.write(f"## ุงูุตูุญุฉ {item['page']}\n")
f.write(f"- โฑ๏ธ ุงูููุช: {item['time']} ุซุงููุฉ\n")
f.write(f"- ๐ ุญุฌู
ุงูุตูุฑุฉ: {item['image_size']}\n\n")
f.write("```text\n")
f.write(item['text'])
f.write("\n```\n\n")
# ==================== ๐ ููุทุฉ ุงูุฏุฎูู ====================
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(description="๐ Arabic OCR for PDF using Qwen3.5-0.8B")
parser.add_argument('--pdf', type=str, required=True, help='ู
ุณุงุฑ ู
ูู ุงูู PDF')
parser.add_argument('--model', type=str, default='sherif1313/Arabic-Qwen3.5-OCR-v4', help='ู
ุณุงุฑ ุงููู
ูุฐุฌ')
parser.add_argument('--output', type=str, default=None, help='ู
ุณุงุฑ ู
ูู ุงูู
ุฎุฑุฌุงุช')
parser.add_argument('--pages', type=str, default='all', help='ุงูุตูุญุงุช: all ุฃู 1-5 ุฃู 3')
parser.add_argument('--dpi', type=int, default=150, help='ุฏูุฉ ุงูุชุญููู (ุงูุชุฑุงุถู: 150)')
parser.add_argument('--prompt', type=str, default=None, help='ุจุฑูู
ุจุช ู
ุฎุตุต')
parser.add_argument('--format', type=str, default='txt', choices=['txt', 'json', 'md'], help='ุชูุณูู ุงูู
ุฎุฑุฌุงุช')
args = parser.parse_args()
# ุชุญููู ูุทุงู ุงูุตูุญุงุช
if args.pages == 'all':
start_page, end_page = 0, None
elif '-' in args.pages:
parts = args.pages.split('-')
start_page = int(parts[0]) - 1
end_page = int(parts[1]) if len(parts) > 1 and parts[1] else None
else:
page = int(args.pages) - 1
start_page, end_page = page, page + 1
# ุชุญู
ูู ุงููู
ูุฐุฌ
model, processor = load_model(args.model)
# ู
ุนุงูุฌุฉ ุงูู PDF
results = process_pdf(
pdf_path=args.pdf,
model=model,
processor=processor,
output_path=args.output,
start_page=start_page,
end_page=end_page,
dpi=args.dpi,
prompt=args.prompt
)
# ุทุจุงุนุฉ ู
ูุฎุต
print(f"\n{'='*60}")
print("๐ ู
ูุฎุต ุงูู
ุนุงูุฌุฉ")
print(f"{'='*60}")
print(f"๐ ุฅุฌู
ุงูู ุงูุตูุญุงุช: {results['total_pages']}")
print(f"โ
ุงูุตูุญุงุช ุงูู
ูุนุงููุฌุฉ: {results['processed_pages']}")
print(f"โฑ๏ธ ุงูููุช ุงูููู: {results['total_time']} ุซุงููุฉ")
print(f"โก ู
ุชูุณุท ุงูููุช/ุตูุญุฉ: {results['avg_time_per_page']} ุซุงููุฉ")
print(f"{'='*60}")
๐ Citation
If you use this model, please cite it as follows:
@misc{arabic-qwen-ocr-v4, title={sherif1313/Arabic-Qwen3.5-OCR-v4}, author={Sheriff}, year={2026}, url={https://huggingface.co/sherif1313/Arabic-Qwen3.5-OCR-v4} }
- Downloads last month
- 133



