view article Article DeepSeek-V4: a million-token context that agents can actually use 13 days ago • 42
DFlash Collection Block Diffusion for Flash Speculative Decoding • 19 items • Updated 1 day ago • 97
Nemotron-Personas Collection A collection of multilingual, region-specific synthetic persona datasets that support sovereign AI development across many countries and regions. • 7 items • Updated 16 days ago • 41
PGC Psychiatric GWAS Summary Statistics Collection ~1 billion rows of genome-wide association study (GWAS) NOTE: We are in the process to transfer these datasets to the Psychiatric Genomics Consortiu • 12 items • Updated 23 days ago • 91
view article Article Fine-Tuning Your First Large Language Model (LLM) with PyTorch and Hugging Face Feb 11, 2025 • 122
view article Article Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries +7 Mar 10 • 142
view article Article Nemotron-Personas: Improve AI Training With the First Synthetic Personas Dataset Aligned to Real-World Distributions Jun 10, 2025 • 25
Spanish PII & De-Identification Collection 33 models for Spanish PII detection & de-identification. 55+ entity types. HIPAA & GDPR compliant. Apache 2.0. • 35 items • Updated 23 days ago • 4