malkhuzanie commited on
Commit
99e397f
·
verified ·
1 Parent(s): 80d3e16

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -0
README.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ar
4
+ tags:
5
+ - punctuation-restoration
6
+ - arabic
7
+ - pytorch
8
+ - bilstm
9
+ - text-processing
10
+ pipeline_tag: text-classification
11
+ widget:
12
+ - text: "هل تساءلت يوما عن معنى الحياة ما هي الأسئلة التي تشغل بالك"
13
+ example_title: "Question Example"
14
+ - text: "الطقس جميل اليوم لا اعتقد انها ستمطر"
15
+ example_title: "Statement Example"
16
+ ---
17
+
18
+ # Arabic Punctuation Restoration Model (BiLSTM)
19
+
20
+ This is a **Bidirectional LSTM (BiLSTM)** model designed to restore punctuation marks in raw Arabic text. It takes unpunctuated Arabic text as input and inserts the appropriate punctuation marks.
21
+
22
+ ## Model Details
23
+ - **Architecture:** BiLSTM (2 Layers, Hidden Dim 256)
24
+ - **Embeddings:** AraVec (Twitter-CBOW 300d)
25
+ - **Vocabulary Size:** ~50k words
26
+ - **Input:** Raw Arabic text (with or without diacritics)
27
+ - **Output:** Text with restored punctuation marks
28
+
29
+ ## Supported Punctuation Marks
30
+ The model predicts the following punctuation marks:
31
+
32
+ | ID | Mark | Name |
33
+ |---|---|---|
34
+ | 0 | (None) | No Punctuation |
35
+ | 1 | **?** | Question Mark (؟) |
36
+ | 2 | **،** | Arabic Comma |
37
+ | 3 | **:** | Colon |
38
+ | 4 | **؛** | Arabic Semicolon |
39
+ | 5 | **!** | Exclamation Mark |
40
+ | 6 | **.** | Period / Full Stop |
41
+
42
+ ## How to Use
43
+
44
+ Since this is a custom PyTorch model, you need to load the model structure and vocabulary.
45
+
46
+ ### Method 1: Using the Inference Script (Recommended)
47
+ Download the `inference.py` file from this repository to use the model easily.
48
+
49
+ ```python
50
+ from huggingface_hub import hf_hub_download
51
+ import importlib.util
52
+
53
+ # 1. Download the script
54
+ script_path = hf_hub_download(repo_id="malkhuzanie/arabic-punctuation-checkpoints", filename="inference.py")
55
+
56
+ # 2. Load the script
57
+ spec = importlib.util.spec_from_file_location("inference", script_path)
58
+ inference = importlib.util.module_from_spec(spec)
59
+ spec.loader.exec_module(inference)
60
+
61
+ # 3. Initialize and Predict
62
+ model = inference.PunctuationRestorer()
63
+
64
+ text = "هل تساءلت يوما عن معنى الحياة ما هي الأسئلة التي تشغل بالك"
65
+ print(model.predict(text))
66
+ # Output: هل تساءلت يوماً عن معنى الحياة؟ ما هي الأسئلة التي تشغل بالك؟