In-Context Reinforcement Learning for Tool Use in Large Language Models Paper • 2603.08068 • Published 27 days ago • 42
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26, 2025 • 70
Optimizing Anytime Reasoning via Budget Relative Policy Optimization Paper • 2505.13438 • Published May 19, 2025 • 36
PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization Paper • 2503.01328 • Published Mar 3, 2025 • 16
🔱 Sailor2 Language Models Collection Sailing in South-East Asia with Inclusive Multilingual LLMs • 32 items • Updated Mar 2 • 30
Balancing Pipeline Parallelism with Vocabulary Parallelism Paper • 2411.05288 • Published Nov 8, 2024 • 20