Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions Paper • 2505.00675 • Published May 1, 2025 • 3
AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph Paper • 2311.09174 • Published Nov 15, 2023
AbsInstruct: Eliciting Abstraction Ability from LLMs through Explanation Tuning with Plausibility Estimation Paper • 2402.10646 • Published Feb 16, 2024
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers Paper • 2509.03059 • Published Sep 3, 2025 • 24
NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents Paper • 2510.07172 • Published Oct 8, 2025 • 28
Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents Paper • 2512.20092 • Published 11 days ago • 8
Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents Paper • 2512.20092 • Published 11 days ago • 8
MotionEdit: Benchmarking and Learning Motion-Centric Image Editing Paper • 2512.10284 • Published 23 days ago • 25
Guided Self-Evolving LLMs with Minimal Human Supervision Paper • 2512.02472 • Published Dec 2, 2025 • 50
Learning GUI Grounding with Spatial Reasoning from Visual Feedback Paper • 2509.21552 • Published Sep 25, 2025 • 11
KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detection Paper • 2310.09044 • Published Oct 13, 2023
CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning Paper • 2401.07286 • Published Jan 14, 2024
DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects Paper • 2410.02730 • Published Oct 3, 2024
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Paper • 2505.10610 • Published May 15, 2025 • 54
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Paper • 2505.10610 • Published May 15, 2025 • 54
Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression Paper • 2503.02812 • Published Mar 4, 2025 • 10
OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas Paper • 2501.15427 • Published Jan 26, 2025 • 6
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory Paper • 2410.10813 • Published Oct 14, 2024 • 14
LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks Paper • 2410.01744 • Published Oct 2, 2024 • 26
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? Paper • 2409.07703 • Published Sep 12, 2024 • 67