Spaces:

MCP-1st-Birthday
/

Agentic-Codenames-Arena

Running

App Files Files Community

Agentic-Codenames-Arena / README.md

lucadipalma

new video

1fde086 11 days ago

preview code

raw

history blame

3.93 kB

metadata

title: Agentic Codenames Arena
emoji: 📊
colorFrom: blue
colorTo: blue
python_version: 3.12.6
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: Time for the LLM to have some fun with Codenames!
tags:
  - mcp-in-action-track-creative

🧠 Agentic Codenames Arena

Watch, or join, LLMs battling it out in Codenames.

Demo on YouTube

My post on LinkedIn

🧩 What This App Does

Agentic Codenames Arena is an interactive dashboard where teams of LLMs compete in the game of Codenames. Two team, Red and Blue, face off in a 4v4 setup, with each team composed of:

1 Boss: Provides the clue and clue number for each turn.
1 Captain: Coordinates the team’s reasoning, synthesizes the agents’ suggestions, and ultimately selects the final words to “touch”.
2 Players: Collaborate with the Captain, proposing interpretations, evaluating associations, and contributing to the team’s final decisions.

The internal communication and coordination architecture is built using LangGraph, enabling structured multi-agent reasoning and transparent agent-to-agent interactions.

Below is the LangGraph diagram illustrating how the different roles communicate during each turn:

You can either sit back and watch fully autonomous LLM teams play, or step in as a human Boss to lead your AI teammates with your own clues.

🤖 How It Works

LLM Teams

Build teams from several providers: OpenAI, Google, Anthropic, HuggingFace... Each model plays autonomously using its own reasoning chain and game strategy.

Two Gameplay Modalities

1️⃣ Observation Mode — Watch AIs Battle

Sit back and spectate. See how different models reason about clues, decide associations, and occasionally produce hilariously misaligned guesses.

You'll see:

Model-to-model conversations
Reasoning traces
Turn-by-turn decisions
How each team coordinates across multiple rounds

Perfect for AI benchmarking, research, or just entertainment.

2️⃣ Human Boss Mode — Enter the Fight

Become the Boss for either team and give your own clue + number. Your AI teammates will interpret your hint and take their guesses.

🧠 Why It’s Interesting

Compare LLM reasoning styles: Watch how different models interpret associations, analogies, and subtle semantic cues.
Analyze team dynamics: Some models coordinate beautifully. Others… not so much. Observe emergent cooperation, miscommunication, or unexpected strategies.
Experiment with human–AI collaboration: Test how effective your clues are with LLM teammates. Try pushing the limits with creative, cryptic, or minimalist hints.

🕹️ Main Features

Create & customize teams using any mix of LLMs
Switch between AI vs AI and Human vs AI modes
Detailed per-turn logs for all model decisions
Transparent reasoning chains
Interactive UI for watching matches play out
Match history & analytics dashboard

📊 Stats & Analytics

All games played in the Arena are stored in a database. The Stats section of the app includes:

Model win/loss rates across all recorded matches
Performance comparisons between model families (OpenAI vs Google vs …)
Historical match logs for replay & analysis
Leaderboards highlighting the best-performing models

This turns the Arena into a dynamic benchmarking tool for evaluating LLM semantic reasoning, coordination abilities, and reliability under pressure.