RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards Paper • 2509.21319 • Published Sep 25 • 5
Reward Models 10-2025 Collection A collection of great reward models for research and production • 7 items • Updated 3 days ago • 9
DevQuasar/nvidia.Qwen3-Nemotron-32B-GenRM-Principle-GGUF Text Generation • 33B • Updated Nov 3 • 270 • 1