Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs Paper • 2509.25779 • Published Sep 30 • 18
SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens Paper • 2510.24940 • Published Oct 28 • 17
Scaling Up Efficient Small Language Models Serving and Deployment for Semantic Job Search Paper • 2510.22101 • Published Oct 25 • 2