view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) +2 natolambert, LouisCastricato, lvwerra, Dahoas • Dec 9, 2022 • 413
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Paper • 2506.14965 • Published Jun 17, 2025 • 50
ELECTRA release Collection This collection regroups the ELECTRA models released by the Google team. • 6 items • Updated Mar 12 • 13
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 +1 eliebak, lvwerra, lewtun • Jan 28, 2025 • 889
LINKS: English-English Mnemonics Collection Investigate the potential of mining linguistic knowledge/reasoning from LLM to generate mnemonic devices that aid vocabulary learning. • 5 items • Updated Mar 2 • 1
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated Mar 12 • 218
Tools 4 learning AI Collection This is a collection of tools on the hub that teachers and students can use to learn AI! • 9 items • Updated Mar 2 • 67
view article Article You could have designed state of the art positional encoding FL33TW00D-HF • Nov 25, 2024 • 479
view article Article Welcome Llama 4 Maverick & Scout on Hugging Face +5 burtenshaw, reach-vb, pcuenq, clem, rajatarya, jsulz, lysandre • Apr 5, 2025 • 149
view article Article SmolLM - blazingly fast and remarkably powerful +1 loubnabnl, anton-l, eliebak • Jul 16, 2024 • 455
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM +2 ariG23498, merve, pcuenq, reach-vb • Mar 12, 2025 • 497
view article Article Fixing Gradient Accumulation +4 lysandre, ArthurZ, muellerzr, ydshieh, BenjaminB, pcuenq • Oct 16, 2024 • 66
view article Article SmolVLM - small yet mighty Vision Language Model +3 andito, merve, mfarre, eliebak, pcuenq • Nov 26, 2024 • 417
Great Models Think Alike and this Undermines AI Oversight Paper • 2502.04313 • Published Feb 6, 2025 • 33
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models +1 loubnabnl, anton-l, davanstrien • Mar 20, 2024 • 113