random agent for examples. Let’s move into LLM-guided agents.
Available LLM Agents
The ARC-AGI-3 repo provides several pre-built LLM agent variants, each optimized for different use cases:LLM Agent
- Standard OpenAI API agent that observes the game state and chooses actions using function calling, maintains conversation history with 10-message limit.
- Default Model: gpt-4o-mini
- Usage:
--agent=llm
Fast LLM Agent
- Skips the observation step entirely (DO_OBSERVATION=False), making decisions faster but potentially less informed - trades accuracy for speed.
- Default Model: gpt-4o-mini
- Usage:
--agent=fastllm
ReasoningLLM
- Uses OpenAI’s o4-mini model and captures detailed reasoning metadata including reasoning tokens and thought process in the action.reasoning field.
- Default Model: o4-mini
- Usage:
--agent=reasoningllm
GuidedLLM
- Uses the most advanced o3 model with high reasoning effort and includes explicit game-specific rules/strategy in the prompt. This template is for education purposes only, it won’t generalize to other games.
- Default Model: o3
- Usage:
--agent=guidedllm
Example Usage
Handling Malformed Outputs
LLM agents are expected to return exactly one of the valid action names (RESET, ACTION1 – ACTION6).
In the reference implementation we simply call .strip() on the model response and forward the resulting string. In practice a model might return an empty string, additional commentary, or a token that is not a valid action. When that happens the agent will raise a ValueError and the current game will terminate.
To make your agent more robust you can:
- Post-process the model output – e.g. extract the first word that looks like an action using a regular expression.
- Fallback to a safe action – if parsing fails, choose a random valid action or repeat the previous one.
- Log the bad response in the
reasoningfield – this makes debugging much easier when you review the replay in the UI.

