LLM Eval - a mariagrandury Collection

mariagrandury 's Collections

AI Ethics projects in Spanish

Responsible AI resources

Spanish LM Research

Non-English LMs Research

LLM Eval

updated Feb 4, 2025

Instruction-Following Evaluation for Large Language Models

Paper • 2311.07911 • Published Nov 14, 2023 • 22
HuggingFaceH4/mt_bench_prompts

Viewer • Updated Jul 3, 2023 • 80 • 9.14k • 25
vectara/hallucination_evaluation_model

Text Classification • 0.1B • Updated Oct 20, 2025 • 138k • 354
GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 249
gaia-benchmark/GAIA

Viewer • Updated Oct 28, 2025 • 932 • 47k • 675
cais/mmlu

Viewer • Updated Mar 8, 2024 • 231k • 558k • 752
Runtime error

Agents

145

Hallucinations Leaderboard

🔥

145

View and submit LLM evaluations
A Survey on Evaluation of Large Language Models

Paper • 2307.03109 • Published Jul 6, 2023 • 43
Runtime error

Agents

Featured

436

Open Medical-LLM Leaderboard

🥇

436

Explore and submit models for benchmarking
Running on CPU Upgrade

Agents

76

La Leaderboard

🌸

76

Evaluate open LLMs in the languages of LATAM and Spain.