Embarrassingly Simple Self-Distillation Improves Code Generation Paper • 2604.01193 • Published Apr 1 • 56
view article Article Assisted Generation: a new direction toward low-latency text generation joaogante • May 11, 2023 • 79
view article Article A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes ybelkada, timdettmers • Aug 17, 2022 • 134
view article Article OpenEvolve: An Open Source Implementation of Google DeepMind's AlphaEvolve codelion • May 20, 2025 • 69
view article Article KV Cache from scratch in nanoVLM +3 ariG23498, kashif, lusxvr, andito, pcuenq • Jun 4, 2025 • 120
view article Article Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective LinkedIn • Jan 27 • 77
view article Article Continuous batching from first principles +1 ror, ArthurZ, mcpotato • Nov 25, 2025 • 403
view article Article Key Insights into the Law of Vision Representations in MLLMs Borise • Sep 2, 2024 • 20
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency not-lain • Jan 30, 2025 • 344
view article Article An Analysis of Chinese LLM Censorship and Bias with Qwen 2 Instruct leonardlin • Jun 11, 2024 • 69
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data +7 danaaubakirova, andito, merve, ariG23498, fracapuano, loubnabnl, pcuenq, mshukor, cadene • Jun 3, 2025 • 351
view article Article TimeScope: How Long Can Your Video Large Multimodal Model Go? +2 orrzohar, ruili0, andito, nicholswang • Jul 23, 2025 • 48
view article Article ⚡ nano-vLLM: Lightweight, Low-Latency LLM Inference from Scratch zamal • Jun 28, 2025 • 44
view article Article SmolVLM2: Bringing Video Understanding to Every Device +5 orrzohar, mfarre, andito, merve, pcuenq, cyrilzakka, Xenova • Feb 20, 2025 • 340
view article Article Introducing Cosmos Predict-2: A Foundation For Your Own World Model nvidia • Jun 17, 2025 • 9