Mellum: Production-Grade in-IDE Contextual Code Completion with Multi-File Project Understanding Paper • 2510.05788 • Published Oct 7, 2025 • 2
Confidence and Stability of Global and Pairwise Scores in NLP Evaluation Paper • 2507.01633 • Published Jul 2, 2025
Best Prompts for Text-to-Image Models and How to Find Them Paper • 2209.11711 • Published Sep 23, 2022 • 3
Quartet: Native FP4 Training Can Be Optimal for Large Language Models Paper • 2505.14669 • Published May 20, 2025 • 78
IMDB-WIKI-SbS: An Evaluation Dataset for Crowdsourced Pairwise Comparisons Paper • 2110.14990 • Published Oct 28, 2021
Reliable, Reproducible, and Really Fast Leaderboards with Evalica Paper • 2412.11314 • Published Dec 15, 2024 • 2
Reliable, Reproducible, and Really Fast Leaderboards with Evalica Paper • 2412.11314 • Published Dec 15, 2024 • 2 • 2