> For the complete documentation index, see [llms.txt](https://docs.interactive.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.interactive.ai/agents/concepts/architecture.md).

# Architecture

> **Context** — Start here. This page names every moving part; the rest of the documentation assumes you know these names. No prior knowledge needed.
>
> YAML examples follow **manifest schema 6.1.1**. Manifest and content shapes are schema-versioned and differ across runtime versions — see [Versioning & compatibility](/agents/operations/versioning.md).

## The pieces

```
                     ┌─────────────────────────────────────┐
                     │         InteractiveAI platform       │
                     │   content catalog · LLM router ·     │
                     │            traces backend            │
                     └───────────┬─────────────┬────────────┘
          context fetch (boot),  │             │  LLM calls (chat + evaluation),
          secrets, hosting       │             │  OTel traces (per turn)
                                 ▼             ▼
  ┌──────────────┐  SDK / REST   ┌───────────────────────────┐   MCP    ┌──────────────┐
  │     Your     │◄─────────────►│        Agent server        │◄────────►│   Your MCP   │
  │ integration  │  sessions,    │   one agent per container  │   tool   │ tool servers │
  │  (UI, CRM,   │  events,      │   engine · sessions ·      │   calls  └──────────────┘
  │ backend job) │  triggers     │   webhooks · /chat UI      │
  └──────────────┘               └───────┬─────────────┬──────┘
                                          │             │
                            Postgres /    │             │   pgvector /
                            in-memory     ▼             ▼   HTTP endpoint
                                ┌────────────────┐ ┌──────────────────┐
                                │  session store │ │  knowledge base  │
                                │   (optional)   │ │    (optional)    │
                                └────────────────┘ └──────────────────┘
```

| Component                  | Owned by                                                                             | Role                                                                                                                                                                                                                                                       |
| -------------------------- | ------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Agent server**           | InteractiveAI (the platform hosts and runs it)                                       | The runtime. One container = one agent. Hosts the engine, the HTTP API, session state, and webhook entry points.                                                                                                                                           |
| **InteractiveAI platform** | InteractiveAI                                                                        | Stores versioned content (system prompt, routines, policies, glossaries, macros), serves the **LLM router**, and receives traces. The agent fetches its content from the platform once at boot.                                                            |
| **LLM router**             | InteractiveAI                                                                        | Single endpoint for all model calls. The agent never talks to a model provider directly — every inference call (chat, evaluation, embeddings) goes through `<platform base_url>/api/v1/` with your router API key.                                         |
| **MCP tool servers**       | You — a service you deploy in the platform, or a remote server you host              | HTTP services implementing the Model Context Protocol. Each server's tools become callable by the agent under a namespace (`crm:search`, `cars:create_booking`). See [Tools](/agents/concepts/tools.md).                                                   |
| **Session store**          | You — a database in the platform, a remote Postgres, or in-memory                    | Conversation history, customers, and context variables. Omit the `database` block for ephemeral in-memory storage; point it at Postgres (in the platform or remote) for persistence. See [Sessions, memory & state](/agents/concepts/memory-and-state.md). |
| **Knowledge base**         | You — a database in the platform, a remote pgvector store, or your own HTTP endpoint | Optional retrieval grounding: a pgvector collection the agent searches directly, or an HTTP search endpoint you own. See [Knowledge base & retrieval](/agents/concepts/knowledge-base.md).                                                                 |
| **Your integration**       | You                                                                                  | The service that connects a channel (web chat, Zendesk, Slack, IVR, a backend job) to the agent via the Python SDK or raw REST. See [Integrating the SDK](/agents/guides/integrating-the-sdk.md).                                                          |

## The engine

Inside the agent server sits the **Interactive Agents engine** — the loop that turns an incoming event into a reply or a typed result. Per turn, it:

1. Matches active **policies** against the conversation.
2. Evaluates which **routine** applies and which step of it comes next.
3. Decides whether to call **tools**, run a **think** step, or speak.
4. Iterates (up to `max_engine_iterations`, default 5) until the turn is complete, then emits typed **events**.

The full walkthrough is in [Conversation lifecycle](/agents/concepts/conversation-lifecycle.md).

The engine makes two distinct kinds of model calls — customer-facing **chat** calls and internal **evaluation** calls — routed to independently configured models. This split matters operationally; see [Models](/agents/concepts/models.md).

## Configuration model

Everything the agent is comes from one **manifest** plus the versioned content it references:

```yaml
name: DriveAway Demo
id: driveaway-demo
version: "1"
agent_config:
  runtime:
    api_key: ${AGENT_API_KEY}
  interactive_platform:
    public_key: ${INTERACTIVEAI_PUBLIC_KEY}
    secret_key: ${INTERACTIVEAI_SECRET_KEY}
  llms:
    default: anthropic/claude-haiku-4.5
    api_key: ${ROUTER_API_KEY}
  context:
    system_prompt:
      id: system-prompt
      version: 1
    language: match_user
    routines:
      - id: car-search
        version: 1
    policies:
      - id: stay-on-topic
        version: 1
  mcps:
    - id: cars
      hostname: http://cars-mcp
      port: 8765
      transport: streamable-http
```

Three rules govern the manifest:

1. **Content is referenced, not inlined.** Routines, policies, glossaries, and the system prompt live in the platform's versioned catalog — you create and publish them in the InteractiveAI platform, where each save produces a new immutable version, and the document's catalog name is the `id` the manifest references. The manifest pins exact versions; updating behaviour means publishing a new content version and bumping the pin.
2. **Secrets are env-refs, never literals.** Every credential field takes a `${VAR_NAME}` reference; the platform supplies the value from your secret bundle at boot. A missing variable fails the boot with the variable's name — there is no fallback path. See [Environment variables](/agents/reference/environment.md).
3. **One manifest, one agent, one container.** There is no multi-tenant mode. **Autoscaling is on by default and managed by the platform** — it adds and removes replicas of your agent internally as demand changes, so scaling an agent up under load (and back down) is handled for you rather than something you configure or operate.

The complete field-by-field schema is in [Manifest & content schemas](/agents/reference/manifest.md).

## Boot sequence

Understanding boot order explains most startup-time behaviour:

1. **Parse & validate the manifest.** Structural errors fail immediately with every violation listed (the platform also runs this validation when you upload the manifest).
2. **Resolve secrets** — every `${VAR}` env-ref is dereferenced against the secret bundle the platform injected. Any missing required variable aborts boot, naming the variable.
3. **Fetch content** — the server pulls every referenced routine, policy, glossary, macro, and prompt from the platform at the pinned versions (up to 5 fetches in parallel). A missing reference aborts boot.
4. **Connect tool servers** — each declared MCP server is contacted and its tools are catalogued.
5. **Apply configuration** — the agent, policies, routines, retrievers, and webhooks are wired into the engine.
6. **Routine evaluation runs** — the engine pre-computes behavioural metadata for every routine and policy (which steps speak vs. call tools, which depend on customer input, each node's reachable follow-ups). These are model calls and can take minutes on a cold cache; results are cached by content hash, so warm boots skip this entirely. The stages and their purpose: [Startup evaluation](/agents/concepts/startup-evaluation.md); caching and pre-warming: [Startup evaluation](/agents/concepts/startup-evaluation.md#caching-cold-vs-warm-boots).
7. **The agent starts serving** — the HTTP port binds and `GET /health/ready` begins answering `200` **only after evaluation settles**. The port isn't open before that, so a cold-cache evaluation delays when the agent becomes reachable at all — which is exactly why the evaluation cache is pre-warmed, so deploys come up fast rather than waiting through a full evaluation.

If startup fails at any stage the process exits non-zero (within 10 seconds) so the platform restarts it instead of leaving a half-configured agent serving traffic.

## Network surface

All inbound paths on the agent server:

| Path                                                                              | Auth                          | Purpose                                                                                                                        |
| --------------------------------------------------------------------------------- | ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------ |
| `GET /health/live`, `GET /health/ready`, `GET /health`                            | none                          | Probes                                                                                                                         |
| `POST /sessions/{id}/events`                                                      | Bearer                        | Post a customer or system message into a session (this is how a turn is started)                                               |
| `GET /sessions/{id}/events`                                                       | Bearer                        | **Receive the agent's events** — the long-poll / SSE stream of replies, tool calls, and status, resumable from an offset       |
| Sessions & customers API (`POST /sessions`, `GET /sessions/{id}`, `/customers/…`) | Bearer                        | Create/read sessions and customers, set variables and metadata                                                                 |
| `POST /routines/{routine_id}/trigger`                                             | Bearer                        | Fire an autonomous routine                                                                                                     |
| `POST /webhooks/{name}`                                                           | HMAC over raw body            | Third-party webhook entry                                                                                                      |
| `POST /sessions/{session_id}/tool_events`                                         | Bearer                        | Inject external context as a synthetic tool result                                                                             |
| `GET /journeys/{journey_id}/graph`                                                | Bearer                        | Routine graph for dashboards (the `journeys` path segment is legacy wire naming for routines)                                  |
| `GET /chat`                                                                       | Bearer or `agent_auth` cookie | Built-in browser **chat UI** for trying the agent directly (cookie login via `/auth/login`; not meant as a production channel) |

The conversation surface — posting messages and receiving events — is normally driven through the SDK rather than called directly; see [Integrating the SDK](/agents/guides/integrating-the-sdk.md) and the full [HTTP API](/agents/reference/http-api.md) reference.

Bearer auth compares `Authorization: Bearer <token>` in constant time against the manifest's `runtime.api_key`. Outbound, the agent calls: the platform (boot-time content fetch), the LLM router (every turn), your MCP servers (tool calls), your knowledge base, your callback/webhook URLs, and the traces backend. There are **no other outbound calls** — in particular, the agent never pushes conversation replies to your integration unless you opt in to event webhook delivery (see [Integrating the SDK](/agents/guides/integrating-the-sdk.md)).

## Where to go next

* The turn-by-turn engine walkthrough: [Conversation lifecycle](/agents/concepts/conversation-lifecycle.md)
* Build something: [Quickstart](/agents/guides/quickstart.md)
* Field-level reference: [Manifest & content schemas](/agents/reference/manifest.md)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.interactive.ai/agents/concepts/architecture.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.