Image-Text-to-Text
Safetensors
Arabic
qwen3_5
conversational

๐Ÿ’œ Github   |   ๐Ÿค— Hugging Face   |   ๐Ÿ“š Cookbooks  
๐Ÿ–ฅ๏ธ Demo  

๐Ÿ•Œ Arabic-Qwen3.5-OCR-v4

Arabic-Qwen3.5-OCR-v4 is an advanced Optical Character Recognition (OCR) model, an improvement over Qwen/Qwen3.5-0.8B. This model is specifically designed for handling Arabic text, with enhanced performance for printed text. It excels in handling various text types, including handwritten, classical, and diacritical marks.

In this training, the model was given "thinking ability" at each stage of page reading and text generation. The model became better able to understand the complex context in the middle and end of a sentence, which transforms raw information from attention into a true understanding of language.

This version offers an improved methodology and significant enhancements to data generation, focusing on complex formats, low-quality document images, PDFs, photos, and diacritical marks.

๐ŸŒ Full support for Arabic scripts. ๐Ÿ“ Diverse Text Types: Capable of reading Handwritten, Printed, Classical, and Voweled text. โšก Fast Inference: Optimized for speed ~4 images/second . ๐ŸŽฏ High Accuracy:

CER < 5% for clear printed text. CER ~5-25% for complex handwritten text.

Datasets Used

The model was trained on a mix of synthetic and real-world data, including:

Internal Synthetic Data: Generated with extensive variations. Public Datasets and Previous Model Data: Enhanced with new samples for better robustness.

Data Augmentation & Layouts

Fonts: Over 70 Arabic fonts (e.g., Amiri, Traditional Arabic, Sakkal Majalla, Scheherazade). Degradation: Physics-based optical simulation to apply realistic scan artifacts (paper texture, ink bleed, blur, warping, ISO noise) with intensities ranging from Light to Heavy.

๐Ÿง  Understands complex page structures:

  • Poetry formatting (multi-line verse)
  • Footnotes & references
  • Marginal annotations
  • Multi-column academic layouts

Comparison: v3 vs v4

Performance Metric v3 v4 Performance Delta
โฑ๏ธ Time per Image 0.31 seconds 0.25 seconds +25% Faster
๐Ÿš€ Images per Second 3.23 images 4 images 20% throughput
โšก Printed Performance 70% 90% 30% percentage points
๐Ÿš€ Page per Second seconds 3.5 seconds Faster Average of 100 samples

Layout:

Single column / Multi-column Headers / Footers Page numbers Footnotes Poetry blocks Tables Marginal notes

Page Characteristics:

โœ… 2-column layout detection โœ… Poetry-style split lines โœ… Footnotes at bottom โœ… Marginalia (side notes) โœ… Numbered lists โœ… Sparse English technical terms โœ… Diacritics (ุชุดูƒูŠู„)

๐Ÿ”ฎ Future Roadmap (Next Version v5)

Planned improvements for the upcoming release include:

  • ๐Ÿ“Š Table extraction (structured OCR)
  • ๐Ÿงพ Invoice & receipt parsing
  • ๐Ÿ“š Full document understanding (DocAI)
  • ๐Ÿ”ค Improving some other languages
  • โšก Ultra-light quantized version (<500MB)

๐Ÿ’ก Why this model?

Unlike traditional OCR systems, this model:

  • Understands layout (not just text)
  • Handles Arabic diacritics natively
  • Works on both printed and handwritten text
  • Is optimized for real-world noisy scans

๐Ÿ–ผ๏ธ Visualizations.

๐Ÿ› ๏ธ How to use


import os
import torch
from PIL import Image
from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration
from qwen_vl_utils import process_vision_info

# ==================== โš™๏ธ ุฅุนุฏุงุฏุงุช ุงู„ุฌู‡ุงุฒ ====================
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if device == "cuda" else torch.float32

# ==================== ๐Ÿ”„ ุชุญู…ูŠู„ ุงู„ู†ู…ูˆุฐุฌ ====================
print("[INFO] Loading model...")
model_path = "sherif1313/Arabic-Qwen3.5-OCR-v4"  # โ† ุบูŠู‘ุฑ ู„ู…ุณุงุฑ ู†ู…ูˆุฐุฌูƒ

processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
model = Qwen3_5ForConditionalGeneration.from_pretrained(
    model_path,
    dtype=dtype,
    device_map="auto" if device == "cuda" else None,
    trust_remote_code=True
)
model.eval()
print("[INFO] Model loaded!")

# ==================== ๐Ÿ” ุฏุงู„ุฉ ุงู„ุงุณุชุฏู„ุงู„ (ุชูุนุฑู‘ู ุฃูˆู„ุงู‹!) ====================
def extract_text(image_path: str, prompt: str = None) -> str:
    """ุงุณุชุฎุฑุงุฌ ุงู„ู†ุต ู…ู† ุตูˆุฑุฉ ูˆุงุญุฏุฉ"""
    if prompt is None:
        prompt = "ุงู‚ุฑุฃ ุงู„ู†ุต ููŠ ู‡ุฐู‡ ุงู„ุตูˆุฑุฉ ูƒุงู…ู„ุงู‹ ู…ู† ุงู„ุจุฏุงูŠุฉ ุฅู„ู‰ ุงู„ู†ู‡ุงูŠุฉ."
    
    image = Image.open(image_path).convert("RGB")
    
    # ุถุจุท ุงู„ุฃุจุนุงุฏ ู„ู…ุถุงุนูุงุช 64
    w, h = image.size
    new_w = ((w + 63) // 64) * 64
    new_h = ((h + 63) // 64) * 64
    image = image.resize((new_w, new_h), Image.LANCZOS)
    
    messages = [{
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": prompt}
        ]
    }]
    
    text_input = processor.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    image_inputs, _ = process_vision_info(messages)
    
    inputs = processor(
        text=[text_input],
        images=image_inputs,
        padding=True,
        return_tensors="pt"
    ).to(device)
    
    with torch.no_grad():
        generated_ids = model.generate(
            **inputs,
            max_new_tokens=512,
            do_sample=False,
            repetition_penalty=1.2,
            no_repeat_ngram_size=3,
            pad_token_id=processor.tokenizer.pad_token_id,
            eos_token_id=processor.tokenizer.eos_token_id,
        )
    
    input_len = inputs.input_ids.shape[1]
    output_text = processor.batch_decode(
        generated_ids[:, input_len:],
        skip_special_tokens=True,
        clean_up_tokenization_spaces=False
    )[0]
    
    return output_text.strip()

# ==================== ๐Ÿš€ ู†ู‚ุทุฉ ุงู„ุฏุฎูˆู„ (ุชูุณุชุฏุนู‰ ุจุนุฏ ุชุนุฑูŠู ุงู„ุฏุงู„ุฉ) ====================
if __name__ == "__main__":
    # โœ… ุงู„ุขู† ูŠู…ูƒู† ุงุณุชุฏุนุงุก extract_text ู„ุฃู†ู‡ุง ู…ูุนุฑู‘ูุฉ ุฃุนู„ุงู‡
    image_path = "/home/sheriff/Downloads/PIC.png"
    
    if os.path.exists(image_path):
        print(f"๐Ÿ” Processing: {image_path}")
        result = extract_text(image_path)
        print(f"๐Ÿ“ Extracted Text:\n{result}")
    else:
        print(f"โŒ File not found: {image_path}")

๐Ÿ› ๏ธ How to use it web

import os
import time
import torch
from PIL import Image
import gradio as gr
from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration
from qwen_vl_utils import process_vision_info

# ==================== โš™๏ธ ุฅุนุฏุงุฏุงุช ุงู„ุฌู‡ุงุฒ ====================
if torch.cuda.is_available():
    device = "cuda"
    dtype = torch.float16
    print(f"โœ… Using GPU: {torch.cuda.get_device_name(0)}")
elif torch.backends.mps.is_available():
    device = "mps"
    dtype = torch.float16
    print("โœ… Using Apple Silicon (MPS)")
else:
    device = "cpu"
    dtype = torch.float32
    print("โš ๏ธ Using CPU (slower inference)")

print(f"[INFO] Device: {device} | Dtype: {dtype}")

# ==================== ๐Ÿ”„ ุชุญู…ูŠู„ ุงู„ู†ู…ูˆุฐุฌ ====================
def load_model():
    """ุชุญู…ูŠู„ ุงู„ู†ู…ูˆุฐุฌ ูˆุงู„ู…ุนุงู„ุฌ ู…ุน ุฅุฏุงุฑุฉ ุงู„ุฐุงูƒุฑุฉ"""
    model_path = os.getenv("MODEL_PATH", "sherif1313/Arabic-Qwen3.5-OCR-v4")
    
    print(f"[INFO] Loading model from: {model_path}")
    
    processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
    
    model = Qwen3_5ForConditionalGeneration.from_pretrained(
        model_path,
        dtype=dtype,
        device_map="auto" if device == "cuda" else None,
        trust_remote_code=True,
        low_cpu_mem_usage=True,
    )
    
    model.eval()
    print("[INFO] Model loaded successfully!")
    return model, processor

# ุชุญู…ูŠู„ ุนุงู„ู…ูŠ (ูŠุชู… ู…ุฑุฉ ูˆุงุญุฏุฉ ุนู†ุฏ ุจุฏุก ุงู„ุชุทุจูŠู‚)
try:
    model, processor = load_model()
except Exception as e:
    print(f"[ERROR] Failed to load model: {e}")
    model = None
    processor = None

# ==================== ๐Ÿงน ุฏูˆุงู„ ู…ุณุงุนุฏุฉ ====================
def prepare_image(image: Image.Image, max_size: int = 768) -> Image.Image:
    """ุชุญุถูŠุฑ ุงู„ุตูˆุฑุฉ: ุถุบุท + ุถุจุท ุงู„ุฃุจุนุงุฏ ู„ู…ุถุงุนูุงุช 64"""
    if max(image.size) > max_size:
        image.thumbnail((max_size, max_size), Image.Resampling.LANCZOS)
    
    w, h = image.size
    new_w = ((w + 63) // 64) * 64
    new_h = ((h + 63) // 64) * 64
    if (new_w, new_h) != image.size:
        image = image.resize((new_w, new_h), Image.Resampling.LANCZOS)
    
    return image

def clean_output(text: str, max_repetitions: int = 2) -> str:
    """ุชู†ุธูŠู ุงู„ุชูƒุฑุงุฑ ููŠ ุงู„ู…ุฎุฑุฌุงุช"""
    if not text:
        return text
    
    import re
    text = re.sub(r'(.)\1{4,}', r'\1\1\1', text)
    
    lines = text.strip().split('\n')
    cleaned = []
    seen = {}
    for line in lines:
        line_stripped = line.strip()
        if not line_stripped:
            continue
        count = seen.get(line_stripped, 0) + 1
        if count <= max_repetitions:
            cleaned.append(line)
        seen[line_stripped] = count
    
    return '\n'.join(cleaned).strip()

# ==================== ๐Ÿ” ุฏุงู„ุฉ ุงู„ุงุณุชุฏู„ุงู„ ====================
def extract_text(image, prompt: str = None) -> tuple[str, str]:
    """ุงุณุชุฎุฑุงุฌ ุงู„ู†ุต ู…ู† ุงู„ุตูˆุฑุฉ"""
    if model is None or processor is None:
        return "โŒ Error: Model not loaded", "0.00"
    
    if image is None:
        return "โš ๏ธ Please upload an image", "0.00"
    
    start_time = time.time()
    
    try:
        if isinstance(image, str):
            image_pil = Image.open(image).convert("RGB")
        elif isinstance(image, Image.Image):
            image_pil = image.convert("RGB")
        else:
            image_pil = Image.fromarray(image).convert("RGB")
        
        image_pil = prepare_image(image_pil)
        
        if prompt is None or not prompt.strip():
            prompt = "ุงู‚ุฑุฃ ุงู„ู†ุต ููŠ ู‡ุฐู‡ ุงู„ุตูˆุฑุฉ ูƒุงู…ู„ุงู‹ ู…ู† ุงู„ุจุฏุงูŠุฉ ุฅู„ู‰ ุงู„ู†ู‡ุงูŠุฉ."
        
        messages = [{
            "role": "user",
            "content": [
                {"type": "image", "image": image_pil},
                {"type": "text", "text": prompt}
            ]
        }]
        
        text_input = processor.apply_chat_template(
            messages, tokenize=False, add_generation_prompt=True
        )
        image_inputs, _ = process_vision_info(messages)
        
        inputs = processor(
            text=[text_input],
            images=image_inputs,
            padding=True,
            return_tensors="pt"
        ).to(device)
        
        with torch.inference_mode():
            generated_ids = model.generate(
                **inputs,
                max_new_tokens=512,
                do_sample=False,
                temperature=1.0,
                repetition_penalty=1.2,
                no_repeat_ngram_size=3,
                pad_token_id=processor.tokenizer.pad_token_id,
                eos_token_id=processor.tokenizer.eos_token_id,
            )
        
        input_len = inputs.input_ids.shape[1]
        output_text = processor.batch_decode(
            generated_ids[:, input_len:],
            skip_special_tokens=True,
            clean_up_tokenization_spaces=False
        )[0]
        
        output_text = clean_output(output_text.strip())
        
        elapsed = time.time() - start_time
        
        return output_text, f"{elapsed:.2f} seconds"
        
    except torch.cuda.OutOfMemoryError:
        torch.cuda.empty_cache()
        return "โŒ Out of Memory. Try a smaller image.", "0.00"
    except Exception as e:
        print(f"[ERROR] {e}")
        import traceback
        traceback.print_exc()
        return f"โŒ Error: {str(e)}", "0.00"

# ==================== ๐ŸŽจ ูˆุงุฌู‡ุฉ Gradio ====================
def create_interface():
    """ุฅู†ุดุงุก ูˆุงุฌู‡ุฉ ุงู„ู…ุณุชุฎุฏู…"""
    
    with gr.Blocks(
        title="Arabic OCR - Qwen3.5-0.8B",
        theme=gr.themes.Soft(),
        css="""
        .header { text-align: center; margin-bottom: 20px; }
        .output-box { min-height: 200px; }
        """
    ) as demo:
        
        gr.Markdown("""
        # ๐Ÿ“ Arabic Handwritten & Printed OCR V4
        ### Powered by Qwen3.5-0.8B
        
        Upload an image containing Arabic text, and the model will extract it.
        
        โœจ **Features:**
        - ๐ŸŒ Arabic support
        - โœ๏ธ Handwritten & printed text
        - ๐Ÿ”ค Preserves diacritics (ุชุดูƒูŠู„)
        - โšก Full precision (no quantization)
        """, elem_classes="header")
        
        with gr.Row():
            with gr.Column(scale=1):
                # โœ… ุชุนุฑูŠู ุงู„ู…ูƒูˆู†ุงุช ุฃูˆู„ุงู‹
                image_input = gr.Image(
                    label="๐Ÿ“ท Upload Image",
                    type="pil",
                    height=300,
                    sources=["upload", "clipboard"]
                )
                
                prompt_input = gr.Textbox(
                    label="๐Ÿ“ Custom Prompt (Optional)",
                    placeholder="ุงู‚ุฑุฃ ุงู„ู†ุต ููŠ ู‡ุฐู‡ ุงู„ุตูˆุฑุฉ...",
                    value="ุงู‚ุฑุฃ ุงู„ู†ุต ููŠ ู‡ุฐู‡ ุงู„ุตูˆุฑุฉ ูƒุงู…ู„ุงู‹ ู…ู† ุงู„ุจุฏุงูŠุฉ ุฅู„ู‰ ุงู„ู†ู‡ุงูŠุฉ.",
                    lines=2
                )
                
                submit_btn = gr.Button(
                    "๐Ÿ” Extract Text",
                    variant="primary",
                    size="lg"
                )
                
                # โœ… ุงู„ุฃู…ุซู„ุฉ ุฏุงุฎู„ ุงู„ุฏุงู„ุฉ - ู…ุณุงุฑุงุช ู…ุญู„ูŠุฉ ูู‚ุท (ู„ุง ุฑูˆุงุจุท ุฎุงุฑุฌูŠุฉ)
                # ู„ุฅุถุงูุฉ ุฃู…ุซู„ุฉุŒ ุงู†ุณุฎ ุงู„ุตูˆุฑ ุฅู„ู‰ ู…ุฌู„ุฏ 'examples/' ููŠ ู…ุณุชูˆุฏุน ุงู„ู€ Space
                # ุซู… ุงุณุชุฎุฏู…: examples=[["examples/sample1.jpg"], ...]
                gr.Examples(
                    label="๐Ÿ“‹ Examples (Optional)",
                    examples=[
],  # ุงุชุฑูƒู‡ุง ูุงุฑุบุฉ ุฃูˆ ุงุณุชุฎุฏู… ู…ุณุงุฑุงุช ู…ุญู„ูŠุฉ
                    inputs=[image_input],  # โœ… ุงู„ุขู† ูŠุนู…ู„ ู„ุฃู† image_input ู…ูุนุฑู‘ู ุฃุนู„ุงู‡
                    cache_examples=False
                )
                
            with gr.Column(scale=1):
                output_text = gr.Textbox(
                    label="๐Ÿ“„ Extracted Text",
                    lines=12,
                    show_copy_button=True,
                    elem_classes="output-box"
                )
                
                time_output = gr.Textbox(
                    label="โฑ๏ธ Inference Time",
                    interactive=False,
                    value="-"
                )
                
                clear_btn = gr.Button("๐Ÿ—‘๏ธ Clear", variant="secondary")
        
        # โœ… ุฑุจุท ุงู„ุฃุญุฏุงุซ (ุจุนุฏ ุชุนุฑูŠู ุฌู…ูŠุน ุงู„ู…ูƒูˆู†ุงุช)
        submit_btn.click(
            fn=extract_text,
            inputs=[image_input, prompt_input],
            outputs=[output_text, time_output]
        )
        
        clear_btn.click(
            fn=lambda: (None, "", "-"),
            inputs=[],
            outputs=[image_input, prompt_input, time_output]
        )
        
        gr.Markdown("""
        ### ๐Ÿ’ก Tips for Best Results:
        1. Use clear, well-lit images
        2. Crop to the text region if possible
        3. For handwritten text, ensure good contrast
        4. Custom prompts can improve accuracy for specific formats
        """)
    
    return demo  # โœ… ุฅุฑุฌุงุน ุงู„ู€ demo

# ==================== ๐Ÿš€ ู†ู‚ุทุฉ ุงู„ุฏุฎูˆู„ ====================
if __name__ == "__main__":
    print("[INFO] Creating Gradio interface...")
    
    demo = create_interface()
    
    # ุฅุนุฏุงุฏุงุช ุงู„ุชุดุบูŠู„ ู„ู€ Spaces
    demo.launch(
        server_name="0.0.0.0",
        server_port=int(os.getenv("PORT", 7860)),
        share=False,
        debug=os.getenv("DEBUG", "false").lower() == "true",
        show_error=True
    )

๐Ÿ› ๏ธ How to use it PDF

pip install pdf2image poppler-utils pymupdf

python pdf.py --pdf /home/sheriff/Desktop/222.pdf --output result.txt

import os
import sys
import time
import torch
from PIL import Image
from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration
from qwen_vl_utils import process_vision_info
import fitz  # PyMuPDF

# ==================== โš™๏ธ ุฅุนุฏุงุฏุงุช ุงู„ุฌู‡ุงุฒ ====================
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
DTYPE = torch.float16 if DEVICE == "cuda" else torch.float32
print(f"[INFO] Using device: {DEVICE} | dtype: {DTYPE}")

# ==================== ๐Ÿ”„ ุชุญู…ูŠู„ ุงู„ู†ู…ูˆุฐุฌ ====================
def load_model(model_path: str):
    """ุชุญู…ูŠู„ ุงู„ู†ู…ูˆุฐุฌ ูˆุงู„ู…ุนุงู„ุฌ"""
    print(f"[INFO] Loading model from: {model_path}")
    
    processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
    
    model = Qwen3_5ForConditionalGeneration.from_pretrained(
        model_path,
        torch_dtype=DTYPE,
        device_map="auto" if DEVICE == "cuda" else None,
        trust_remote_code=True,
        low_cpu_mem_usage=True
    )
    model.eval()
    print("[INFO] โœ… Model loaded successfully!")
    return model, processor

# ==================== ๐Ÿ–ผ๏ธ ุชุญูˆูŠู„ ุตูุญุฉ PDF ุฅู„ู‰ ุตูˆุฑุฉ ====================
def pdf_page_to_image(pdf_path: str, page_num: int, dpi: int = 150) -> Image.Image:
    """ุชุญูˆูŠู„ ุตูุญุฉ ู…ู† ู…ู„ู PDF ุฅู„ู‰ ุตูˆุฑุฉ PIL"""
    doc = fitz.open(pdf_path)
    page = doc[page_num]
    
    # ุฅุนุฏุงุฏ ู…ุตููˆูุฉ ุงู„ุชูƒุจูŠุฑ ู„ู„ุฏู‚ุฉ ุงู„ู…ุทู„ูˆุจุฉ
    zoom = dpi / 72  # 72 DPI ู‡ูˆ ุงู„ุงูุชุฑุงุถูŠ ููŠ PDF
    mat = fitz.Matrix(zoom, zoom)
    
    # ุงู„ุญุตูˆู„ ุนู„ู‰ ุงู„ุตูˆุฑุฉ
    pix = page.get_pixmap(matrix=mat)
    img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
    
    doc.close()
    return img

# ==================== ๐Ÿงน ุชู†ุธูŠู ุงู„ู…ุฎุฑุฌุงุช ู…ู† ุงู„ุชูƒุฑุงุฑ ====================
def clean_output(text: str, max_repetitions: int = 2) -> str:
    """ุฅุฒุงู„ุฉ ุงู„ุชูƒุฑุงุฑ ุงู„ู…ูุฑุท ููŠ ุงู„ู†ุต ุงู„ู…ุณุชุฎุฑุฌ"""
    import re
    if not text:
        return text
    
    # ุฅุฒุงู„ุฉ ุชูƒุฑุงุฑ ุงู„ุญุฑูˆู ุงู„ู…ูุฑุท
    text = re.sub(r'(.)\1{4,}', r'\1\1\1', text)
    
    # ุฅุฒุงู„ุฉ ุชูƒุฑุงุฑ ุงู„ุฃุณุทุฑ
    lines = text.strip().split('\n')
    cleaned = []
    seen = {}
    for line in lines:
        line_stripped = line.strip()
        if not line_stripped:
            continue
        count = seen.get(line_stripped, 0) + 1
        if count <= max_repetitions:
            cleaned.append(line)
        seen[line_stripped] = count
    
    return '\n'.join(cleaned).strip()

# ==================== ๐Ÿ” ุงุณุชุฎุฑุงุฌ ู†ุต ู…ู† ุตูˆุฑุฉ ====================
def extract_text_from_image(model, processor, image: Image.Image, prompt: str = None) -> str:
    """ุงุณุชุฎุฑุงุฌ ุงู„ู†ุต ู…ู† ุตูˆุฑุฉ ูˆุงุญุฏุฉ ุจุงุณุชุฎุฏุงู… ุงู„ู†ู…ูˆุฐุฌ"""
    if prompt is None:
        prompt = "ุงู‚ุฑุฃ ุงู„ู†ุต ููŠ ู‡ุฐู‡ ุงู„ุตูˆุฑุฉ ูƒุงู…ู„ุงู‹ ู…ู† ุงู„ุจุฏุงูŠุฉ ุฅู„ู‰ ุงู„ู†ู‡ุงูŠุฉ."
    
    # ุชุญุถูŠุฑ ุงู„ุตูˆุฑุฉ: ุถุจุท ุงู„ุฃุจุนุงุฏ ู„ู…ุถุงุนูุงุช 64
    w, h = image.size
    new_w = ((w + 63) // 64) * 64
    new_h = ((h + 63) // 64) * 64
    if (new_w, new_h) != (w, h):
        image = image.resize((new_w, new_h), Image.Resampling.LANCZOS)
    
    messages = [{
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": prompt}
        ]
    }]
    
    text_input = processor.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    image_inputs, _ = process_vision_info(messages)
    
    inputs = processor(
        text=[text_input],
        images=image_inputs,
        padding=True,
        return_tensors="pt"
    ).to(DEVICE)
    
    with torch.inference_mode():
        generated_ids = model.generate(
            **inputs,
            max_new_tokens=2048,
            do_sample=False,
            repetition_penalty=1.2,
            no_repeat_ngram_size=3,
            pad_token_id=processor.tokenizer.pad_token_id,
            eos_token_id=processor.tokenizer.eos_token_id,
        )
    
    input_len = inputs.input_ids.shape[1]
    output_text = processor.batch_decode(
        generated_ids[:, input_len:],
        skip_special_tokens=True,
        clean_up_tokenization_spaces=False
    )[0]
    
    return clean_output(output_text.strip())

# ==================== ๐Ÿ“„ ู…ุนุงู„ุฌุฉ ู…ู„ู PDF ูƒุงู…ู„ ====================
def process_pdf(
    pdf_path: str,
    model,
    processor,
    output_path: str = None,
    start_page: int = 0,
    end_page: int = None,
    dpi: int = 150,
    prompt: str = None
) -> dict:
    """
    ู…ุนุงู„ุฌุฉ ู…ู„ู PDF ูƒุงู…ู„ ูˆุงุณุชุฎุฑุงุฌ ุงู„ู†ุต ู…ู† ูƒู„ ุตูุญุฉ
    
    Args:
        pdf_path: ู…ุณุงุฑ ู…ู„ู ุงู„ู€ PDF
        model: ุงู„ู†ู…ูˆุฐุฌ ุงู„ู…ูุญู…ู‘ู„
        processor: ู…ุนุงู„ุฌ ุงู„ู†ู…ูˆุฐุฌ
        output_path: ู…ุณุงุฑ ู…ู„ู ุงู„ู…ุฎุฑุฌุงุช (ุงุฎุชูŠุงุฑูŠ)
        start_page: ุฑู‚ู… ุงู„ุตูุญุฉ ุงู„ุฃูˆู„ู‰ (0-ู…ูู‡ุฑุณ)
        end_page: ุฑู‚ู… ุงู„ุตูุญุฉ ุงู„ุฃุฎูŠุฑุฉ (None = ุญุชู‰ ุงู„ู†ู‡ุงูŠุฉ)
        dpi: ุฏู‚ุฉ ุชุญูˆูŠู„ ุงู„ุตูุญุฉ ุฅู„ู‰ ุตูˆุฑุฉ
        prompt: ุงู„ุจุฑูˆู…ุจุช ุงู„ู…ุณุชุฎุฏู… ู„ู„ุงุณุชุฎุฑุงุฌ
    
    Returns:
        dict: {
            'total_pages': int,
            'processed_pages': int,
            'results': [ { 'page': int, 'text': str, 'time': float }, ... ],
            'total_time': float
        }
    """
    import fitz
    
    doc = fitz.open(pdf_path)
    total_pages = len(doc)
    
    if end_page is None:
        end_page = total_pages
    
    results = []
    total_start = time.time()
    
    print(f"[INFO] Processing: {pdf_path}")
    print(f"[INFO] Pages: {start_page+1} to {end_page} of {total_pages}")
    
    for page_num in range(start_page, min(end_page, total_pages)):
        page_start = time.time()
        
        try:
            # ุชุญูˆูŠู„ ุงู„ุตูุญุฉ ุฅู„ู‰ ุตูˆุฑุฉ
            image = pdf_page_to_image(pdf_path, page_num, dpi=dpi)
            
            # ุงุณุชุฎุฑุงุฌ ุงู„ู†ุต
            text = extract_text_from_image(model, processor, image, prompt)
            
            page_time = time.time() - page_start
            
            results.append({
                'page': page_num + 1,  # ุตูุญุงุช ู…ูู‡ุฑุณุฉ ู…ู† 1
                'text': text,
                'time': round(page_time, 2),
                'image_size': image.size
            })
            
            print(f"[โœ“] Page {page_num+1}/{total_pages} | Time: {page_time:.2f}s | Chars: {len(text)}")
            
        except Exception as e:
            print(f"[โœ—] Page {page_num+1} Error: {str(e)}")
            results.append({
                'page': page_num + 1,
                'text': f"[ERROR: {str(e)}]",
                'time': 0,
                'error': True
            })
    
    total_time = time.time() - total_start
    doc.close()
    
    # ุญูุธ ุงู„ู†ุชุงุฆุฌ ููŠ ู…ู„ู ู†ุตูŠ ุฅุฐุง ุทูู„ุจ
    if output_path:
        save_results_to_file(results, output_path)
        print(f"[INFO] Results saved to: {output_path}")
    
    return {
        'total_pages': total_pages,
        'processed_pages': len(results),
        'results': results,
        'total_time': round(total_time, 2),
        'avg_time_per_page': round(total_time / len(results), 2) if results else 0
    }

# ==================== ๐Ÿ’พ ุญูุธ ุงู„ู†ุชุงุฆุฌ ====================
def save_results_to_file(results: list, output_path: str, format: str = 'txt'):
    """ุญูุธ ู†ุชุงุฆุฌ ุงู„ุงุณุชุฎุฑุงุฌ ููŠ ู…ู„ู"""
    os.makedirs(os.path.dirname(output_path) or '.', exist_ok=True)
    
    if format == 'txt':
        with open(output_path, 'w', encoding='utf-8') as f:
            for item in results:
                f.write(f"\n{'='*60}\n")
                f.write(f"๐Ÿ“„ ุงู„ุตูุญุฉ {item['page']}\n")
                f.write(f"โฑ๏ธ ุงู„ูˆู‚ุช: {item['time']} ุซุงู†ูŠุฉ\n")
                f.write(f"{'='*60}\n\n")
                f.write(item['text'])
                f.write("\n\n")
    
    elif format == 'json':
        import json
        with open(output_path, 'w', encoding='utf-8') as f:
            json.dump(results, f, ensure_ascii=False, indent=2)
    
    elif format == 'md':
        with open(output_path, 'w', encoding='utf-8') as f:
            f.write("# ๐Ÿ“„ ู†ุชุงุฆุฌ ุงุณุชุฎุฑุงุฌ ุงู„ู†ุต ู…ู† PDF\n\n")
            for item in results:
                f.write(f"## ุงู„ุตูุญุฉ {item['page']}\n")
                f.write(f"- โฑ๏ธ ุงู„ูˆู‚ุช: {item['time']} ุซุงู†ูŠุฉ\n")
                f.write(f"- ๐Ÿ“ ุญุฌู… ุงู„ุตูˆุฑุฉ: {item['image_size']}\n\n")
                f.write("```text\n")
                f.write(item['text'])
                f.write("\n```\n\n")

# ==================== ๐Ÿš€ ู†ู‚ุทุฉ ุงู„ุฏุฎูˆู„ ====================
if __name__ == "__main__":
    import argparse
    
    parser = argparse.ArgumentParser(description="๐Ÿ“„ Arabic OCR for PDF using Qwen3.5-0.8B")
    parser.add_argument('--pdf', type=str, required=True, help='ู…ุณุงุฑ ู…ู„ู ุงู„ู€ PDF')
    parser.add_argument('--model', type=str, default='sherif1313/Arabic-Qwen3.5-OCR-v4', help='ู…ุณุงุฑ ุงู„ู†ู…ูˆุฐุฌ')
    parser.add_argument('--output', type=str, default=None, help='ู…ุณุงุฑ ู…ู„ู ุงู„ู…ุฎุฑุฌุงุช')
    parser.add_argument('--pages', type=str, default='all', help='ุงู„ุตูุญุงุช: all ุฃูˆ 1-5 ุฃูˆ 3')
    parser.add_argument('--dpi', type=int, default=150, help='ุฏู‚ุฉ ุงู„ุชุญูˆูŠู„ (ุงูุชุฑุงุถูŠ: 150)')
    parser.add_argument('--prompt', type=str, default=None, help='ุจุฑูˆู…ุจุช ู…ุฎุตุต')
    parser.add_argument('--format', type=str, default='txt', choices=['txt', 'json', 'md'], help='ุชู†ุณูŠู‚ ุงู„ู…ุฎุฑุฌุงุช')
    
    args = parser.parse_args()
    
    # ุชุญู„ูŠู„ ู†ุทุงู‚ ุงู„ุตูุญุงุช
    if args.pages == 'all':
        start_page, end_page = 0, None
    elif '-' in args.pages:
        parts = args.pages.split('-')
        start_page = int(parts[0]) - 1
        end_page = int(parts[1]) if len(parts) > 1 and parts[1] else None
    else:
        page = int(args.pages) - 1
        start_page, end_page = page, page + 1
    
    # ุชุญู…ูŠู„ ุงู„ู†ู…ูˆุฐุฌ
    model, processor = load_model(args.model)
    
    # ู…ุนุงู„ุฌุฉ ุงู„ู€ PDF
    results = process_pdf(
        pdf_path=args.pdf,
        model=model,
        processor=processor,
        output_path=args.output,
        start_page=start_page,
        end_page=end_page,
        dpi=args.dpi,
        prompt=args.prompt
    )
    
    # ุทุจุงุนุฉ ู…ู„ุฎุต
    print(f"\n{'='*60}")
    print("๐Ÿ“Š ู…ู„ุฎุต ุงู„ู…ุนุงู„ุฌุฉ")
    print(f"{'='*60}")
    print(f"๐Ÿ“„ ุฅุฌู…ุงู„ูŠ ุงู„ุตูุญุงุช: {results['total_pages']}")
    print(f"โœ… ุงู„ุตูุญุงุช ุงู„ู…ูุนุงู„ูŽุฌุฉ: {results['processed_pages']}")
    print(f"โฑ๏ธ ุงู„ูˆู‚ุช ุงู„ูƒู„ูŠ: {results['total_time']} ุซุงู†ูŠุฉ")
    print(f"โšก ู…ุชูˆุณุท ุงู„ูˆู‚ุช/ุตูุญุฉ: {results['avg_time_per_page']} ุซุงู†ูŠุฉ")
    print(f"{'='*60}")

๐Ÿ“ Citation

If you use this model, please cite it as follows:

@misc{arabic-qwen-ocr-v4, title={sherif1313/Arabic-Qwen3.5-OCR-v4}, author={Sheriff}, year={2026}, url={https://huggingface.co/sherif1313/Arabic-Qwen3.5-OCR-v4} }

Downloads last month
133
Safetensors
Model size
0.9B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sherif1313/Arabic-Qwen3.5-OCR-v4

Finetuned
(118)
this model
Quantizations
1 model

Datasets used to train sherif1313/Arabic-Qwen3.5-OCR-v4

Space using sherif1313/Arabic-Qwen3.5-OCR-v4 1