Model Card for FlauBERT-Wikt-base-verb
Model Description
This model is a French language model based on FlauBERT-base-cased, fine-tuned using verb examples from French Wiktionary via supervised contrastive learning. The fine-tuning improves token-level semantic representations, particularly for tasks like Word-in-Context (WiC) and Word Sense Disambiguation (WSD).
Although trained on verbs, the model shows enhanced representation quality across the lexicon.
- Developed by: Anna Mosolova, Marie Candito, Carlos Ramisch
- Funded by: ANR Selexini
- Model type: BERT-based transformer (FlauBERT)
- Language: French
- License: MIT
- Finetuned from model: flaubert/flaubert-base-cased
Model Sources
- Repository: https://github.com/anya-bel/contrastive_learning_transfer
- Paper: Raffinage des représentations des tokens dans les modèles de langue pré-entraînés avec l’apprentissage contrastif : une étude entre modèles et entre langues
Uses
The model is intended for extracting token-level embeddings for French, with improved sense separation.
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("flaubert/flaubert_base_cased")
model = AutoModel.from_pretrained("annamos/FlauBERT-Wikt-base-verb")
sentence = 'Les avions ne peuvent pas voler en ce moment'
tokenized = tokenizer(sentence, return_tensors='pt')
embeddings = model(**tokenized)[0]
- Downloads last month
- 5