HaluMem: Evaluating Hallucinations in Memory Systems of Agents Paper • 2511.03506 • Published Nov 5 • 93 • 3
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations Paper • 2504.10481 • Published Apr 14 • 85 • 2