> For the complete documentation index, see [llms.txt](https://docs.interactive.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.interactive.ai/agents/concepts/startup-evaluation.md).

# Startup evaluation

> **Context** — Before an agent serves traffic, the engine studies every routine and policy and pre-computes the behavioural metadata it will lean on at runtime. This page explains the stages of that **startup evaluation** and the purpose of each, plus how it's cached and tuned. It assumes [Routines](/agents/concepts/routines.md) (nodes, transitions) and [Policies](/agents/concepts/policies.md); watching it run is in [Observability](/agents/guides/observability.md).
>
> YAML examples follow **manifest schema 6.1.1**. Manifest and content shapes are schema-versioned and differ across runtime versions — see [Versioning & compatibility](/agents/operations/versioning.md).

## Why evaluation exists

Your routines and policies are written in natural language: a node says "Send it to the manual-review queue", a policy condition says "the customer asked about their balance". To run a turn quickly and consistently, the engine needs sharper, structured answers about that language — *is this step a pure tool call or does it speak? does it need the customer to reply before the routine moves on? which node can come next, and when?*

Working those answers out is itself model-driven and not cheap. Doing it on **every turn** would make every turn slow and its behaviour non-deterministic. So the engine does it **once, at startup**, derives the metadata, and [caches it](#caching-cold-vs-warm-boots). Runtime turns then read precomputed answers instead of re-deriving them.

## Where it sits in boot

Startup evaluation is the last phase of the [boot sequence](/agents/concepts/architecture.md#boot-sequence), and it runs **before the agent begins serving**: the HTTP port binds and health checks start passing only once evaluation settles. On a warm cache it's a no-op (every item is a cache hit), so the agent comes up immediately; on a cold cache it can take minutes for a content-heavy agent, and the agent is simply unreachable until it finishes. That's why the cache is [pre-warmed](#caching-cold-vs-warm-boots) — so a content change deploys fast instead of waiting through a full evaluation.

## What gets evaluated

Two things, independently:

* **Every routine**, node by node — each node's action and its outgoing transitions.
* **Every policy** — both the agent-wide policies and each routine's activation conditions.

Routines are evaluated in parallel with one another, and the work within a routine is parallelised too; concurrency is an implementation detail with one operator knob — see [Tuning live evaluation](#tuning-live-evaluation). What matters conceptually is **what** is derived and **why**.

## The stages

### Stage 1 — Understand each action

The engine reads each routine node and each policy and works out a handful of properties that decide how that step behaves at runtime:

| Derived property                       | What it answers                                                                                                     | Why the runtime needs it                                                                                                                                                                                                                           |
| -------------------------------------- | ------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Clarified action**                   | What, precisely, is this step instructing — phrased unambiguously and aligned with the tool it calls?               | Raw wording is often vague ("handle it"). A sharpened instruction is what message generation and next-step selection actually work from.                                                                                                           |
| **Tool-only vs. speaks**               | Is this node a pure tool call with no customer-facing message?                                                      | If tool-only, the engine runs the tool and **skips message generation** — no reply to compose, no risk of inventing one. This is the [TOOL-node](/agents/concepts/routines.md#tool-node-completes-when-the-tool-executes) behaviour, decided here. |
| **Customer-dependent**                 | Does completing this step require the customer to respond, or is it purely agent-side?                              | If customer-dependent, the engine **waits for the customer's reply before advancing** instead of moving on the moment it has spoken. This is what makes a "ask, then act on the answer" routine pause at the right place.                          |
| **Continuous vs. one-shot** (policies) | Should this policy stay in force every turn, or apply once and retire?                                              | The matcher keeps continuous policies (e.g. "always speak formally") active across turns and drops one-shot ones after they fire — so a standing rule isn't forgotten after the first turn it applies.                                             |
| **Prospective condition** (policies)   | Does the condition describe something the agent is *about to do*, or something already in the conversation history? | The matcher evaluates a prospective condition against the agent's *next* move rather than searching the past for it — so "you are going to ask for ID" matches at the right moment.                                                                |

The first three apply to routine nodes; the last two additionally shape how policies match. Together they're why the same routine text produces "run the tool silently" in one node and "say this, then wait" in another — the distinction is computed here, not guessed per turn.

### Stage 2 — Make each action self-contained

Routine steps are often written relative to their neighbours — "send **it** to review", "notify the customer of **the outcome**". At runtime the engine fires one step's instruction without re-reading the whole routine, so a dangling reference would force it to reconstruct the antecedent (slow, and a chance to get it wrong).

This stage rewrites those steps to **stand alone** — "send the failed ID verification result to the manual-review queue" — resolving the reference using the routine's structure. Steps that were already self-contained are left untouched.

### Stage 3 — Map the routes

For each node, the engine computes its **reachable follow-ups**: which nodes can come next, and the exact condition under which each path is taken. This is the routine's routing table, derived from the nodes' outgoing transitions and their conditions, built up from the leaves inward so each node's map accounts for what lies beyond its immediate children.

At runtime, when a turn needs to decide where the routine goes next, the engine consults this precomputed table instead of re-analysing the whole graph on every turn — which would be both expensive and error-prone (easy to miss a transitive path or misread an edge condition). It's the machine-readable form of the `transitions` you author; see [Routines](/agents/concepts/routines.md#transitions-terminals-and-movement).

## What it produces, and where it shows up at runtime

The stages above attach, to each routine node and policy, the metadata the engine reads during a turn:

| Runtime decision                                         | Driven by                       |
| -------------------------------------------------------- | ------------------------------- |
| Skip composing a reply for a pure tool step              | Stage 1 — tool-only             |
| Wait for the customer before advancing a routine         | Stage 1 — customer-dependent    |
| Keep a standing policy in force across turns             | Stage 1 — continuous            |
| Evaluate a policy condition as a future intent           | Stage 1 — prospective condition |
| Fire a step's instruction without re-reading the routine | Stage 2 — self-contained action |
| Pick the next node when a turn advances                  | Stage 3 — reachable follow-ups  |

None of this changes *what you authored* — it's the engine's prepared reading of it. The per-turn mechanics that consume this metadata are in [Conversation lifecycle](/agents/concepts/conversation-lifecycle.md).

## Caching: cold vs. warm boots

Evaluation results are **keyed by content hash** — a policy by its condition, action, and tools; a routine by its id and the ordered ids of its nodes and transitions. At boot the engine looks each item up by hash:

* **Hit** → the metadata loads instantly, with no model calls.
* **Miss** → the item is evaluated live (the slow path), and the result is written back to the cache.

Because the key is the content, results stay valid until the content changes, and editing one routine invalidates only that routine — every other item is still a hit.

**The platform warms the cache as part of deploying content**: when you deploy a manifest pinning new content versions, the platform pre-computes their evaluations so the agent boots from a warm cache and comes up fast. If an agent ever boots cold (an item that wasn't pre-warmed), it still comes up correctly — it evaluates live on first boot, minutes rather than a failure — but, as noted above, it isn't reachable until that finishes. So a cold cache doesn't break anything; it just makes the deploy slower to go live.

### What invalidates the cache

The content hash makes the rules simple:

* **No effect** (cache stays valid): runtime upgrades, manifest tuning-knob changes, secret rotations — none change content hashes.
* **Invalidated automatically** (new hash → re-evaluated on next deploy): any edit to a policy's condition / action / tools, or a routine's nodes / transitions.
* **The one blind spot:** changing the evaluation *model* (`llms.evaluation`) does **not** change content hashes, so cached metadata is reused as-is. To recompute under a new evaluation model, ask your platform operator to force a re-evaluation of the content set.

## Tuning live evaluation

When evaluation does run live, one operator knob shapes it: `EVAL_NODE_PARALLELISM` (default 50) caps how many per-node evaluation calls run concurrently — higher is faster but bounded by the LLM router's rate budget; `1` forces fully sequential evaluation for debugging. It's a platform/operator setting (see [Environment variables](/agents/reference/environment.md)). The model doing the work is `llms.evaluation`, with the standard retry/fallback behaviour — see [Models](/agents/concepts/models.md).

## Watching it

Evaluation emits one log line per routine — `Routine '<title>' evaluated: N nodes in Xs` (`N=0` means it was served from cache) — with per-stage detail at debug level. See [Observability](/agents/guides/observability.md#boot-time-evaluation-logs).

## See also

* [Conversation lifecycle](/agents/concepts/conversation-lifecycle.md) — how a turn consumes this metadata
* [Routines](/agents/concepts/routines.md) and [Policies](/agents/concepts/policies.md) — the inputs being evaluated
* [Architecture](/agents/concepts/architecture.md#boot-sequence) — where evaluation sits in boot


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.interactive.ai/agents/concepts/startup-evaluation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.