Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up
mariagrandury 's Collections
AI Ethics projects in Spanish
Responsible AI resources
LLM Eval
Spanish LM Research
Non-English LMs Research

LLM Eval

updated Feb 4, 2025
Upvote
-

  • Instruction-Following Evaluation for Large Language Models

    Paper • 2311.07911 • Published Nov 14, 2023 • 22

  • HuggingFaceH4/mt_bench_prompts

    Viewer • Updated Jul 3, 2023 • 80 • 9.14k • 25

  • vectara/hallucination_evaluation_model

    Text Classification • 0.1B • Updated Oct 20, 2025 • 138k • 354

  • GAIA: a benchmark for General AI Assistants

    Paper • 2311.12983 • Published Nov 21, 2023 • 249

  • gaia-benchmark/GAIA

    Viewer • Updated Oct 28, 2025 • 932 • 47k • 675

  • cais/mmlu

    Viewer • Updated Mar 8, 2024 • 231k • 558k • 752

  • Runtime error
    Agents
    145

    Hallucinations Leaderboard

    🔥
    145

    View and submit LLM evaluations


  • A Survey on Evaluation of Large Language Models

    Paper • 2307.03109 • Published Jul 6, 2023 • 43

  • Runtime error
    Agents
    Featured
    436

    Open Medical-LLM Leaderboard

    🥇
    436

    Explore and submit models for benchmarking


  • Running on CPU Upgrade
    Agents
    76

    La Leaderboard

    🌸
    76

    Evaluate open LLMs in the languages of LATAM and Spain.

Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs