Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits Paper • 2512.20578 • Published 29 days ago • 80
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published Jul 19, 2025 • 134
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework Paper • 2411.06176 • Published Nov 9, 2024 • 45
Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions Paper • 2405.20267 • Published May 30, 2024 • 1