Traditionally, to measure AI, static benchmarks have been the yardstick. These continue to work well for evaluating things like LLMs and AI reasoning systems. However, to evaluate frontier AI agent systems, we need new tools that measure:
- Exploration
- Percept → Plan → Action
- Memory
- Goal Acquisition
- Alignment
By building agents that can play ARC-AGI-3, you’re directly contributing to the frontier of AI research. Watch theQuick Start tutorial video. Learn more about ARC-AGI-3.

Can you build an agent to beat this game?
Run your first agent against ARC-AGI-3
1. Install uv
2. Clone and install the ARC-AGI-3-Agents Repo
3. Set up environment variables
ARC_API_KEY
in the .env
file. You can get your ARC_API_KEY from your user profile after registration on the ARC-AGI-3 website.
4. Run your first agent
Next Steps
After running your first agent:- Explore your agent’s scorecard - View your scorecard (ex:
https://three.arcprize.org/scorecards/<scorecard_id>
) - Explore a game’s replay - Via your scorecard, view the per-game replays of your agent (ex:
https://three.arcprize.org/replay/ls20-016295f7601e/794795bf-d05f-4bf5-885a-b8a8f37a89fd
) - Try a different game - Run
uv run main.py --agent=random --game=<>
See a list of games available at three.arcprize.org or via api - Try using a LLM - Try
uv run main.py --agent=llm --game=ls20
(requires anOPENAI_API_KEY
in.env
) or explore other templates. - Build your own agent - Follow the Agents Quickstart guide and view the agent tutorial.