view article Article Nano-BEIR: A Multilingual Information Retrieval Benchmark with Quality-Enhanced Queries sionic-ai • Dec 22, 2025 • 10
view article Article Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers tomaarsen • 26 days ago • 70
view article Article Multimodal Embedding & Reranker Models with Sentence Transformers tomaarsen • Apr 9 • 58
view article Article Introducing Storage Buckets on the Hugging Face Hub +10 Wauplin, coyotte508, XciD, victor, julien-c, lhoestq, pierric, Sylvestre, hlarcher, rajatarya, seanses, assafvayner • Mar 10 • 194
ColBERT-Zero 🐶 Collection First large-scale fully pre-trained ColBERT model using only public data, outperforming GTE-ModernColBERT and GTE-ModernBERT • 10 items • Updated Apr 7 • 21
Bharat-NanoBEIR: Indian Language Retrieval Benchmarks Collection NanoBEIR retrieval benchmarks translated into 22 Indian languages across 13 datasets. • 22 items • Updated Dec 13, 2025 • 5
CoRNStack Collection State-of-the-art code retrieval and re-ranking models and datasets • 9 items • Updated Mar 26, 2025 • 21
view article Article ModernVBERT: Towards Smaller Visual Document Retrievers paultltc • Oct 3, 2025 • 45
NanoBEIR datasets Collection These datasets are compatible with the (Sparse)NanoBEIREvaluator with Sentence Transformers v5.2+. Also CrossEncoderNanoBEIREvaluator if bm25 column • 16 items • Updated Mar 2 • 17
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 70 items • Updated Dec 10, 2025 • 168
view article Article Streaming datasets: 100x More Efficient +3 andito, lhoestq, burtenshaw, pcuenq, merve • Oct 27, 2025 • 86
view article Article Provence: efficient and robust context pruning for retrieval-augmented generation nadiinchi • Jan 28, 2025 • 26
view article Article huggingface_hub v1.0: Five Years of Building the Foundation of Open Machine Learning +2 Wauplin, celinah, lysandre, julien-c • Oct 27, 2025 • 75
view article Article Introducing RTEB: A New Standard for Retrieval Evaluation +4 fzliu, KennethEnevoldsen, Samoed, isaacchung, tomaarsen, fzoll • Oct 1, 2025 • 143
view article Article Nemotron-Personas-Japan: Synthesized Data for Sovereign AI nvidia • Sep 23, 2025 • 27
view article Article mmBERT: ModernBERT goes Multilingual +4 mmarone, orionweller, will-fleshman, eugene-yang, dlawrie, vandurme • Sep 9, 2025 • 146