--- library_name: transformers tags: [] --- # Model Card for FlauBERT-Wikt-base-verb ### Model Description This model is a French language model based on FlauBERT-base-cased, fine-tuned using verb examples from French Wiktionary via supervised contrastive learning. The fine-tuning improves token-level semantic representations, particularly for tasks like Word-in-Context (WiC) and Word Sense Disambiguation (WSD). Although trained on verbs, the model shows enhanced representation quality across the lexicon. - **Developed by:** Anna Mosolova, Marie Candito, Carlos Ramisch - **Funded by:** [ANR Selexini](https://selexini.lis-lab.fr) - **Model type:** BERT-based transformer (FlauBERT) - **Language:** French - **License:** MIT - **Finetuned from model:** [flaubert/flaubert-base-cased](https://huggingface.co/flaubert/flaubert_base_cased) ### Model Sources - **Repository:** [https://github.com/anya-bel/contrastive_learning_transfer](https://github.com/anya-bel/contrastive_learning_transfer) - **Paper:** [Raffinage des représentations des tokens dans les modèles de langue pré-entraînés avec l’apprentissage contrastif : une étude entre modèles et entre langues](https://coria-taln-2025.lis-lab.fr/wp-content/uploads/2025/06/CORIA-TALN_2025_paper_139.pdf) ## Uses The model is intended for extracting token-level embeddings for French, with improved sense separation. ## How to Get Started with the Model ``` from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("flaubert/flaubert_base_cased") model = AutoModel.from_pretrained("annamos/FlauBERT-Wikt-base-verb") sentence = 'Les avions ne peuvent pas voler en ce moment' tokenized = tokenizer(sentence, return_tensors='pt') embeddings = model(**tokenized)[0] ```