lucadipalma
new video
1fde086
|
raw
history blame
3.93 kB
metadata
title: Agentic Codenames Arena
emoji: 📊
colorFrom: blue
colorTo: blue
python_version: 3.12.6
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: Time for the LLM to have some fun with Codenames!
tags:
  - mcp-in-action-track-creative

🧠 Agentic Codenames Arena

Meme

Watch, or join, LLMs battling it out in Codenames.

Demo on YouTube

My post on LinkedIn


🧩 What This App Does

Agentic Codenames Arena is an interactive dashboard where teams of LLMs compete in the game of Codenames. Two team, Red and Blue, face off in a 4v4 setup, with each team composed of:

  • 1 Boss: Provides the clue and clue number for each turn.
  • 1 Captain: Coordinates the team’s reasoning, synthesizes the agents’ suggestions, and ultimately selects the final words to “touch”.
  • 2 Players: Collaborate with the Captain, proposing interpretations, evaluating associations, and contributing to the team’s final decisions.

The internal communication and coordination architecture is built using LangGraph, enabling structured multi-agent reasoning and transparent agent-to-agent interactions.

Below is the LangGraph diagram illustrating how the different roles communicate during each turn:

LangGraph Architecture

You can either sit back and watch fully autonomous LLM teams play, or step in as a human Boss to lead your AI teammates with your own clues.


🤖 How It Works

LLM Teams

Build teams from several providers: OpenAI, Google, Anthropic, HuggingFace... Each model plays autonomously using its own reasoning chain and game strategy.

Two Gameplay Modalities

1️⃣ Observation Mode — Watch AIs Battle

Sit back and spectate. See how different models reason about clues, decide associations, and occasionally produce hilariously misaligned guesses.

You'll see:

  • Model-to-model conversations
  • Reasoning traces
  • Turn-by-turn decisions
  • How each team coordinates across multiple rounds

Perfect for AI benchmarking, research, or just entertainment.

2️⃣ Human Boss Mode — Enter the Fight

Become the Boss for either team and give your own clue + number. Your AI teammates will interpret your hint and take their guesses.


🧠 Why It’s Interesting

  • Compare LLM reasoning styles: Watch how different models interpret associations, analogies, and subtle semantic cues.

  • Analyze team dynamics: Some models coordinate beautifully. Others… not so much. Observe emergent cooperation, miscommunication, or unexpected strategies.

  • Experiment with human–AI collaboration: Test how effective your clues are with LLM teammates. Try pushing the limits with creative, cryptic, or minimalist hints.


🕹️ Main Features

  • Create & customize teams using any mix of LLMs
  • Switch between AI vs AI and Human vs AI modes
  • Detailed per-turn logs for all model decisions
  • Transparent reasoning chains
  • Interactive UI for watching matches play out
  • Match history & analytics dashboard

📊 Stats & Analytics

All games played in the Arena are stored in a database. The Stats section of the app includes:

  • Model win/loss rates across all recorded matches
  • Performance comparisons between model families (OpenAI vs Google vs …)
  • Historical match logs for replay & analysis
  • Leaderboards highlighting the best-performing models

This turns the Arena into a dynamic benchmarking tool for evaluating LLM semantic reasoning, coordination abilities, and reliability under pressure.