Traditionally, to measure AI, static benchmarks have been the yardstick. These continue to work well for evaluating things like LLMs and AI reasoning systems. However, to evaluate frontier AI agent systems, we need new tools that measure:

  • Exploration
  • Percept → Plan → Action
  • Memory
  • Goal Acquisition
  • Alignment

By building agents that can play ARC-AGI-3, you’re directly contributing to the frontier of AI research. Watch theQuick Start tutorial video. Learn more about ARC-AGI-3.

Human playing LS20

Can you build an agent to beat this game?

Run your first agent against ARC-AGI-3

1. Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

2. Clone and install the ARC-AGI-3-Agents Repo

git clone https://github.com/arcprize/ARC-AGI-3-Agents.git && cd ARC-AGI-3-Agents && uv sync

3. Set up environment variables

cp .env-example .env
You will need to set the ARC_API_KEY in the .env file. You can get your ARC_API_KEY from your user profile after registration on the ARC-AGI-3 website.

4. Run your first agent

# Run 'random' agent against 'ls20' game
uv run main.py --agent=random --game=ls20
🎉 Congratulations! You just ran your first agent against ARC-AGI-3. A link to view your agent’s replay (example replay) is provided in the output.

Next Steps

After running your first agent:
  1. Explore your agent’s scorecard - View your scorecard (ex: https://three.arcprize.org/scorecards/<scorecard_id>)
  2. Explore a game’s replay - Via your scorecard, view the per-game replays of your agent (ex: https://three.arcprize.org/replay/ls20-016295f7601e/794795bf-d05f-4bf5-885a-b8a8f37a89fd)
  3. Try a different game - Run uv run main.py --agent=random --game=<> See a list of games available at three.arcprize.org or via api
  4. Try using a LLM - Try uv run main.py --agent=llm --game=ls20 (requires an OPENAI_API_KEY in .env) or explore other templates.
  5. Build your own agent - Follow the Agents Quickstart guide and view the agent tutorial.