> For the complete documentation index, see [llms.txt](https://docs.interactive.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.interactive.ai/agents/concepts/models.md).

# Models

> **Context** — Every model call an agent makes is configured by the manifest's `llms` block and routed through the InteractiveAI **LLM router** (the agent never calls a model provider directly). This page explains the two-lane design — the most operationally important thing to understand about model configuration.
>
> YAML examples follow **manifest schema 6.1.1**. Manifest and content shapes are schema-versioned and differ across runtime versions — see [Versioning & compatibility](/agents/operations/versioning.md).

## The `llms` block

```yaml
agent_config:
  llms:
    default: anthropic/claude-haiku-4.5          # chat primary (required)
    api_key: ${ROUTER_API_KEY}                   # router credential (required)
    fallback:                                    # chat fallbacks (optional list)
      - anthropic/claude-sonnet-4-6
    evaluation: google/gemini-3-flash-preview    # evaluation primary (optional)
    evaluation_fallback: google/gemini-3.1-pro-preview  # evaluation backup (optional)
```

| Field                 | Required | Default                                     | Lane                                                                                             |
| --------------------- | -------- | ------------------------------------------- | ------------------------------------------------------------------------------------------------ |
| `default`             | yes      | `interactive/agent`                         | Chat primary                                                                                     |
| `fallback`            | no       | `[]`                                        | Chat fallbacks (ordered list)                                                                    |
| `api_key`             | yes      | —                                           | Router credential (`${VAR}` env-ref) — shared by **all** calls, chat, evaluation, and embeddings |
| `evaluation`          | no       | `interactive/google/gemini-3-flash-preview` | Evaluation primary                                                                               |
| `evaluation_fallback` | no       | `interactive/google/gemini-3.1-pro-preview` | Evaluation backup                                                                                |

Model names are `provider/model` aliases as served by the router's model catalog. There is no environment-variable override for model ids — change a model by editing the manifest and redeploying.

## Two lanes, two jobs

### Chat lane — what the customer reads

`default` (with `fallback` behind it) serves every **customer-visible** generation: replies, preambles, content the agent writes, and the tool-call reasoning for the agent's real tools. Optimise this lane for voice quality and instruction-following.

**Fallback semantics:** `fallback` is an ordered list forwarded to the router along with `default`; when the primary fails, the router tries each fallback in order. One request, router-side failover.

### Evaluation lane — decisions the customer never sees

`evaluation` (with `evaluation_fallback` behind it) serves the engine's **internal structured-JSON decisions**:

* policy matching (does this condition apply?),
* routine activation, next-step selection, and backtrack checks,
* routine metadata evaluation at startup (step reachability).

These are narrow, high-volume, schema-constrained calls — a fast, inexpensive model is the right default. Quality shows up as *correct routing*, not prose.

**Fallback semantics (different from chat!):** each evaluation call retries up to **3 attempts** on the primary; if all three fail (typically the model not conforming to the required output schema, or transport errors), the runtime swaps to `evaluation_fallback` and retries up to 3 more times on the bigger model. This is per call, not per process — one stubborn decision escalates alone. Every escalation is logged with a `[retry-fallback]` marker; see [Observability](/agents/guides/observability.md#retry-fallback-signals).

### The lanes never cross

A failed chat call falls through the chat `fallback` list — never to the evaluation models. A failed evaluation call escalates to `evaluation_fallback` — never to a chat model. This is deliberate:

* The chat lane optimises for **voice**; the evaluation lane for **cheap, fast, narrow JSON**. A model great at one is often mediocre at the other.
* Their failure profiles are orthogonal. A throughput problem on the chat model shouldn't degrade routing decisions, and schema-conformance issues on the evaluation model shouldn't change the agent's voice.

## Embeddings

When the [knowledge base](/agents/concepts/knowledge-base.md) is `type: pgvector`, query embeddings use the manifest's `search.embedding_model`, routed through the same router with the same `api_key`. There is no separate embeddings credential.

## When each lane fires

| Moment                                                | Lane                     | Notes                                                                                            |
| ----------------------------------------------------- | ------------------------ | ------------------------------------------------------------------------------------------------ |
| Every reply / preamble                                | Chat                     |                                                                                                  |
| Tool-call inference for your tools                    | Chat                     |                                                                                                  |
| Knowledge-base query rewrite                          | Chat                     | One completion on the chat primary per retrieval                                                 |
| Policy matching, each turn                            | Evaluation               | Batched: `policy_batch_size` policies per call                                                   |
| Routine activation / next-step / backtrack, each turn | Evaluation               |                                                                                                  |
| Routine metadata evaluation, at boot (cold cache)     | Evaluation               | The slow part of cold startup — see [Startup evaluation](/agents/concepts/startup-evaluation.md) |
| Knowledge-base embedding                              | `search.embedding_model` |                                                                                                  |

## Operational guidance

* **Token ceiling:** the operator env var `ROUTER_MAX_TOKENS` (default 100000) caps context size on router calls. See [Environment variables](/agents/reference/environment.md).
* **Watch the escalation rate.** Frequent `[retry-fallback]` lines mean the evaluation primary is struggling with your content's complexity — either simplify conditions or promote a stronger `evaluation` model. Both models exhausting (logged at ERROR) fails the turn.
* **Changing `evaluation` invalidates nothing**, but startup routine evaluation results are cached by *content* hash, so a model change does not bust the cache — re-evaluate deliberately if you change models and want fresh metadata (see [Startup evaluation](/agents/concepts/startup-evaluation.md)).
* **Credential:** one router key serves everything. Rotate by updating the secret and restarting; see [Security](/agents/operations/security.md).

## See also

* [Conversation lifecycle](/agents/concepts/conversation-lifecycle.md) — the calls in context
* [Limits & defaults](/agents/reference/limits-and-defaults.md) — every default in one table
* [Observability](/agents/guides/observability.md) — tracing model calls


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.interactive.ai/agents/concepts/models.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
