ARC-AGI-3 is an Interactive Reasoning Benchmark designed to measure an AI Agent’s ability to generalize in novel, unseen environments.
Traditionally, to measure AI, static benchmarks have been the yardstick. These continue to work well for evaluating things like LLMs and AI reasoning systems. However, to evaluate frontier AI agent systems, we need new tools that measure:
By building agents that can play ARC-AGI-3, you’re directly contributing to the frontier of AI research. Watch theQuick Start tutorial video. Learn more about ARC-AGI-3.
Can you build an agent to beat this game?
ARC_API_KEY
in the .env
file. You can get your ARC_API_KEY from your user profile after registration on the ARC-AGI-3 website.
https://three.arcprize.org/scorecards/<scorecard_id>
)https://three.arcprize.org/replay/ls20-016295f7601e/794795bf-d05f-4bf5-885a-b8a8f37a89fd
)uv run main.py --agent=random --game=<>
See a list of games available at three.arcprize.org or via apiuv run main.py --agent=llm --game=ls20
(requires an OPENAI_API_KEY
in .env
) or explore other templates.