# Load Balancing & Model Fallback

The InteractiveAI Router distributes requests across available providers to optimize for uptime and performance. By default, the Router load balances traffic among the highest-ranked providers for your selected model, automatically routing around failures.

To customize this behavior, include the `provider` object in your request body for [Chat Completions](/llm-router/api-reference/chat.md). This object controls provider selection, fallback logic, data handling, and performance constraints.

### Provider Configuration

The `provider` object accepts the following fields:

#### `order`

**Type:** `string[]` **Default:** —

Specifies the sequence of provider slugs to attempt. The Router tries providers in the order listed until one succeeds.

```json
"order": ["anthropic", "openai", "azure"]
```

***

#### `allow_fallbacks`

**Type:** `boolean` **Default:** `true`

Determines whether the Router should attempt backup providers when the primary is unavailable. Set to `false` to restrict requests to your specified providers only.

***

#### `require_parameters`

**Type:** `boolean` **Default:** `false`

When enabled, the Router only routes to providers that support all parameters in your request. Use this when your request includes provider-specific features like `response_format` that not all providers handle.

***

#### `data_collection`

**Type:** `"allow"` | `"deny"` **Default:** `"allow"`

Controls whether requests may be routed to providers that store or use data for training purposes. Set to `"deny"` to exclude providers with data retention policies.

***

#### `zdr`

**Type:** `boolean` **Default:** —

Restricts routing exclusively to Zero Data Retention endpoints. When enabled, requests only reach providers with contractual ZDR guarantees.

***

#### `enforce_distillable_text`

**Type:** `boolean` **Default:** —

Limits routing to models that permit text distillation. Enable this when you intend to use outputs for model training or fine-tuning.

***

#### `only`

**Type:** `string[]` **Default:** —

Explicitly allowlists providers for this request. Only providers in this array will receive traffic, regardless of other settings.

```json
"only": ["anthropic", "google"]
```

***

#### `ignore`

**Type:** `string[]` **Default:** —

Excludes specific providers from consideration. The Router skips any provider listed here, even if it would otherwise be selected.

```json
"ignore": ["azure"]
```

***

#### `quantizations`

**Type:** `string[]` **Default:** —

Filters providers by supported quantization levels. Use this to target specific model precision variants.

```json
"quantizations": ["int4", "int8"]
```

***

#### `sort`

**Type:** `string` | `object` **Default:** —

Determines how the Router ranks available providers. Accepts a string value (`"price"`, `"throughput"`, or `"latency"`) or an object with `by` and `partition` fields for advanced sorting logic.

```json
"sort": "price"
```

***

#### `preferred_min_throughput`

**Type:** `number` | `object` **Default:** —

Sets a minimum throughput threshold in tokens per second. Providers below this threshold are deprioritized. Accepts a flat number or an object with percentile cutoffs (`p50`, `p75`, `p90`, `p99`).

***

#### `preferred_max_latency`

**Type:** `number` | `object` **Default:** —

Sets a maximum acceptable latency in seconds. Providers exceeding this threshold are deprioritized. Accepts a flat number or an object with percentile cutoffs (`p50`, `p75`, `p90`, `p99`).

***

#### `max_price`

**Type:** `object` **Default:** —

Caps the maximum price per token you're willing to pay. Providers exceeding this threshold are excluded from routing.

***

### Example Configuration

```json
{
  "models": [
    "anthropic/claude-sonnet-4-20250514",
    "google/gemini-2.0-flash",
    "openai/gpt-4o"
  ],
  "messages": [
    {"role": "user", "content": "Summarize this document."}
  ],
  "provider": {
    "allow_fallbacks": true,
    "data_collection": "deny",
    "preferred_max_latency": 2.5
  }
}
```

This configuration attempts the models in array order: Claude first, then Gemini, then GPT-4o if needed. It excludes providers that retain data and deprioritizes any provider with latency above 2.5 seconds.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.interactive.ai/llm-router/api-guides/load-balancing-and-model-fallback.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
