view article Article cua-bench: A Framework for Benchmarking, Training Data, and RL Environments for Computer-Use Agents 21 days ago • 10
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Paper • 2409.08264 • Published Sep 12, 2024 • 48