Currently in beta, our Benchmarking Agent will be the standard way to measure AI performance across model providers.Documentation Index
Fetch the complete documentation index at: https://docs.arcprize.org/llms.txt
Use this file to discover all available pages before exploring further.
ARC Harness arcagi3
This is a developer harness for building and benchmarking agentic research workflows on the ARC-AGI-3 corpus of environments.
When to use it
- Compare model versions or prompt strategies on the same game set.
- Detect regressions after code or prompt changes.
- Generate official scorecards and replays for sharing.
- Experimenting with multiple custom agentic architectures.
Quickstart
Prerequisites
- Python:
3.9+ - uv: recommended package manager. Install from uv.pm or
curl -LsSf https://astral.sh/uv/install.sh | sh - ARC-AGI-3 API key: required to talk to the ARC server. Sign up for a key here
Install
Clone the repository:uv:
Setting up your environment
Set the ARC API key and your provider keys. You can put them in a.env file (see .env.example) or export them in your shell.
Provider key links:
Check configuration:

