Embarrassingly Simple Self-Distillation Improves Code Generation Paper • 2604.01193 • Published 12 days ago • 36
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration Paper • 2603.03823 • Published Mar 4 • 7
Scalable Prompt Routing via Fine-Grained Latent Task Discovery Paper • 2603.19415 • Published 25 days ago • 7
τ-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge Paper • 2603.04370 • Published Mar 4 • 3
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale Paper • 2602.23866 • Published Feb 27 • 88
On Data Engineering for Scaling LLM Terminal Capabilities Paper • 2602.21193 • Published Feb 24 • 102
LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces Paper • 2602.14337 • Published Feb 15 • 15
PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing Paper • 2512.02589 • Published Dec 2, 2025 • 73
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning Paper • 2511.16043 • Published Nov 20, 2025 • 111
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs Paper • 2510.07499 • Published Oct 8, 2025 • 49
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18, 2025 • 141
Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding Paper • 2502.10392 • Published Feb 14, 2025 • 6
Agentic End-to-End De Novo Protein Design for Tailored Dynamics Using a Language Diffusion Model Paper • 2502.10173 • Published Feb 14, 2025 • 4
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published Feb 13, 2025 • 150
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks Paper • 2502.08235 • Published Feb 12, 2025 • 59