Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction Paper • 2512.18880 • Published 26 days ago • 24
Humanizing Machines: Rethinking LLM Anthropomorphism Through a Multi-Level Framework of Design Paper • 2508.17573 • Published Aug 25, 2025 • 1
Embracing Contradiction: Theoretical Inconsistency Will Not Impede the Road of Building Responsible AI Systems Paper • 2505.18139 • Published May 23, 2025
JiraiBench: A Bilingual Benchmark for Evaluating Large Language Models' Detection of Human Self-Destructive Behavior Content in Jirai Community Paper • 2503.21679 • Published Mar 27, 2025 • 1
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation Paper • 2503.10497 • Published Mar 13, 2025 • 2
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents Paper • 2601.07264 • Published 4 days ago • 22