view article Article Mixture of Experts Explained +4 osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq • Dec 11, 2023 • 1.13k
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 775
Running on CPU Upgrade Featured 3.18k The Smol Training Playbook 📚 3.18k The secrets to building world-class LLMs
The Ultimate Collection of Code Classifiers Collection 🔥 15 classifiers, 124M parameters, one per programming language— for assessing the educational value of GitHub code • 15 items • Updated May 5, 2025 • 16
Essential-Web v1.0: 24T tokens of organized web data Paper • 2506.14111 • Published Jun 17, 2025 • 46
view article Article nanoJAXGPT: A pedagogical introduction to JAX/Equinox sachithgunasekara • Oct 23, 2024 • 7
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published Apr 17, 2025 • 98
Running 3.85k The Ultra-Scale Playbook 🌌 3.85k The ultimate guide to training LLM on large GPU Clusters
Running 93 Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks 📝 93 Evaluate multilingual models using FineTasks