CamelParser-Dialects

CamelParser-Dialects is a neural dependency parsing model for dialectal Arabic and Modern Standard Arabic (MSA), designed under the CATiB dependency formalism.

It is based on the biaffine attention parser architecture introduced by Dozat and Manning (2017), implemented using SuPar. The model leverages CamelBERT-MIX, a pretrained language model trained on a large and diverse Arabic corpus.

Full details are available in our paper: "Parsing Arabic Dialects Revisited: New Benchmarks, Models, and Insights"


πŸ“Š Model Variants and LAS (Labeled Attachment Score) on TEST

Checkpoint Training Data MSA EGY GLF AVG
CAMeL-Lab/camelparser-dialects-MSA CamelTB, PATB 87.3 73.0 73.3 77.9
CAMeL-Lab/camelparser-dialects-EGY ARZTB 79.2 83.9 68.7 77.3
CAMeL-Lab/camelparser-dialects-GLF CamelTB-Gumar 65.4 58.7 73.8 66.0
CAMeL-Lab/camelparser-dialects-MSA-EGY CamelTB, PATB, ARZTB 87.1 84.4 70.1 79.8
CAMeL-Lab/camelparser-dialects-MSA-GLF CamelTB, PATB, CamelTB-Gumar 87.2 74.4 81.0 80.9
CAMeL-Lab/camelparser-dialects-EGY-GLF ARZTB, CamelTB-Gumar 80.0 83.8 79.4 81.1
β˜‘οΈ CAMeL-Lab/camelparser-dialects-MSA-EGY-GLF CamelTB, PATB, ARZTB, CamelTB-Gumar 87.2 84.2 80.3 83.9

The recommended checkpoint is the all-variety model (MSA-EGY-GLF), which provides the best overall cross-dialect performance.


🧠 Model Architecture

  • Encoder: CamelBERT-MIX
  • Parser: Deep biaffine attention (Dozat & Manning, 2017)
  • Framework: SuPar
  • Formalism: CATiB dependency scheme

πŸ“š Training Data

The models are trained on combinations of the following treebanks:


πŸš€ Intended Use

This model is intended for:

  • Dependency parsing of Arabic text
  • Linguistic analysis of dialectal Arabic

πŸ”§ Usage

For usage instructions and code, please refer to the official repository:

πŸ‘‰ https://github.com/CAMeL-Lab/camel_parser_dialects

πŸ“– Citation

If you use this model, please cite:

@inproceedings{Elshabrawy:2026:camelparser-dialects,
    title = "{Parsing Arabic Dialects Revisited: New Benchmarks, Models, and Insights}",
    author = {Ahmed Elshabrawy and
              Go Inoue and
              Muhammed AbuOdeh and
              Nizar Habash} ,
    booktitle = {Proceedings of The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT)},
    year = "2026",
    address = "Palma, Spain"
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including CAMeL-Lab/camelparser-dialects-MSA-EGY-GLF