CamelParser-Dialects
CamelParser-Dialects is a neural dependency parsing model for dialectal Arabic and Modern Standard Arabic (MSA), designed under the CATiB dependency formalism.
It is based on the biaffine attention parser architecture introduced by Dozat and Manning (2017), implemented using SuPar. The model leverages CamelBERT-MIX, a pretrained language model trained on a large and diverse Arabic corpus.
Full details are available in our paper: "Parsing Arabic Dialects Revisited: New Benchmarks, Models, and Insights"
π Model Variants and LAS (Labeled Attachment Score) on TEST
| Checkpoint | Training Data | MSA | EGY | GLF | AVG | |
|---|---|---|---|---|---|---|
CAMeL-Lab/camelparser-dialects-MSA |
CamelTB, PATB | 87.3 | 73.0 | 73.3 | 77.9 | |
CAMeL-Lab/camelparser-dialects-EGY |
ARZTB | 79.2 | 83.9 | 68.7 | 77.3 | |
CAMeL-Lab/camelparser-dialects-GLF |
CamelTB-Gumar | 65.4 | 58.7 | 73.8 | 66.0 | |
CAMeL-Lab/camelparser-dialects-MSA-EGY |
CamelTB, PATB, ARZTB | 87.1 | 84.4 | 70.1 | 79.8 | |
CAMeL-Lab/camelparser-dialects-MSA-GLF |
CamelTB, PATB, CamelTB-Gumar | 87.2 | 74.4 | 81.0 | 80.9 | |
CAMeL-Lab/camelparser-dialects-EGY-GLF |
ARZTB, CamelTB-Gumar | 80.0 | 83.8 | 79.4 | 81.1 | |
| βοΈ | CAMeL-Lab/camelparser-dialects-MSA-EGY-GLF |
CamelTB, PATB, ARZTB, CamelTB-Gumar | 87.2 | 84.2 | 80.3 | 83.9 |
The recommended checkpoint is the all-variety model (MSA-EGY-GLF), which provides the best overall cross-dialect performance.
π§ Model Architecture
- Encoder: CamelBERT-MIX
- Parser: Deep biaffine attention (Dozat & Manning, 2017)
- Framework: SuPar
- Formalism: CATiB dependency scheme
π Training Data
The models are trained on combinations of the following treebanks:
- CamelTB (MSA): camel_treebank_1.1.zip
- PATB (Penn Arabic Treebank): LDC2010T13, LDC2011T09, LDC2010T08
- ARZTB (Egyptian Arabic Treebank): LDC2018T23
- CamelTB-Gumar (Gulf Arabic):
CamelTB-Gumar.1.0.zip
π Intended Use
This model is intended for:
- Dependency parsing of Arabic text
- Linguistic analysis of dialectal Arabic
π§ Usage
For usage instructions and code, please refer to the official repository:
π https://github.com/CAMeL-Lab/camel_parser_dialects
π Citation
If you use this model, please cite:
@inproceedings{Elshabrawy:2026:camelparser-dialects,
title = "{Parsing Arabic Dialects Revisited: New Benchmarks, Models, and Insights}",
author = {Ahmed Elshabrawy and
Go Inoue and
Muhammed AbuOdeh and
Nizar Habash} ,
booktitle = {Proceedings of The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT)},
year = "2026",
address = "Palma, Spain"
}