Model Card for FlauBERT-Wikt-base-verb

Model Description

This model is a French language model based on FlauBERT-base-cased, fine-tuned using verb examples from French Wiktionary via supervised contrastive learning. The fine-tuning improves token-level semantic representations, particularly for tasks like Word-in-Context (WiC) and Word Sense Disambiguation (WSD).

Although trained on verbs, the model shows enhanced representation quality across the lexicon.

  • Developed by: Anna Mosolova, Marie Candito, Carlos Ramisch
  • Funded by: ANR Selexini
  • Model type: BERT-based transformer (FlauBERT)
  • Language: French
  • License: MIT
  • Finetuned from model: flaubert/flaubert-base-cased

Model Sources

Uses

The model is intended for extracting token-level embeddings for French, with improved sense separation.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("flaubert/flaubert_base_cased")
model = AutoModel.from_pretrained("annamos/FlauBERT-Wikt-base-verb")
sentence = 'Les avions ne peuvent pas voler en ce moment'
tokenized = tokenizer(sentence, return_tensors='pt')
embeddings = model(**tokenized)[0]
Downloads last month
5
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support