title: Agentic Codenames Arena
emoji: 📊
colorFrom: blue
colorTo: blue
python_version: 3.12.6
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: Time for the LLM to have some fun with Codenames!
tags:
- mcp-in-action-track-creative
🧠 Agentic Codenames Arena
Watch, or join, LLMs battling it out in Codenames.
🧩 What This App Does
Agentic Codenames Arena is an interactive dashboard where teams of LLMs compete in the game of Codenames. Two team, Red and Blue, face off in a 4v4 setup, with each team composed of:
- 1 Boss: Provides the clue and clue number for each turn.
- 1 Captain: Coordinates the team’s reasoning, synthesizes the agents’ suggestions, and ultimately selects the final words to “touch”.
- 2 Players: Collaborate with the Captain, proposing interpretations, evaluating associations, and contributing to the team’s final decisions.
The internal communication and coordination architecture is built using LangGraph, enabling structured multi-agent reasoning and transparent agent-to-agent interactions.
Below is the LangGraph diagram illustrating how the different roles communicate during each turn:
You can either sit back and watch fully autonomous LLM teams play, or step in as a human Boss to lead your AI teammates with your own clues.
🤖 How It Works
LLM Teams
Build teams from several providers: OpenAI, Google, Anthropic, HuggingFace... Each model plays autonomously using its own reasoning chain and game strategy.
Two Gameplay Modalities
1️⃣ Observation Mode — Watch AIs Battle
Sit back and spectate. See how different models reason about clues, decide associations, and occasionally produce hilariously misaligned guesses.
You'll see:
- Model-to-model conversations
- Reasoning traces
- Turn-by-turn decisions
- How each team coordinates across multiple rounds
Perfect for AI benchmarking, research, or just entertainment.
2️⃣ Human Boss Mode — Enter the Fight
Become the Boss for either team and give your own clue + number. Your AI teammates will interpret your hint and take their guesses.
🧠 Why It’s Interesting
Compare LLM reasoning styles: Watch how different models interpret associations, analogies, and subtle semantic cues.
Analyze team dynamics: Some models coordinate beautifully. Others… not so much. Observe emergent cooperation, miscommunication, or unexpected strategies.
Experiment with human–AI collaboration: Test how effective your clues are with LLM teammates. Try pushing the limits with creative, cryptic, or minimalist hints.
🕹️ Main Features
- Create & customize teams using any mix of LLMs
- Switch between AI vs AI and Human vs AI modes
- Detailed per-turn logs for all model decisions
- Transparent reasoning chains
- Interactive UI for watching matches play out
- Match history & analytics dashboard
📊 Stats & Analytics
All games played in the Arena are stored in a database. The Stats section of the app includes:
- Model win/loss rates across all recorded matches
- Performance comparisons between model families (OpenAI vs Google vs …)
- Historical match logs for replay & analysis
- Leaderboards highlighting the best-performing models
This turns the Arena into a dynamic benchmarking tool for evaluating LLM semantic reasoning, coordination abilities, and reliability under pressure.

