# Load Balancing & Model Fallback

The InteractiveAI Router distributes requests across available providers to optimize for uptime and performance. By default, the Router load balances traffic among the highest-ranked providers for your selected model, automatically routing around failures.

To customize this behavior, include the `provider` object in your request body for [Chat Completions](https://docs.interactive.ai/llm-router/api-reference/chat). This object controls provider selection, fallback logic, data handling, and performance constraints.

### Provider Configuration

The `provider` object accepts the following fields:

#### `order`

**Type:** `string[]` **Default:** —

Specifies the sequence of provider slugs to attempt. The Router tries providers in the order listed until one succeeds.

```json
"order": ["anthropic", "openai", "azure"]
```

***

#### `allow_fallbacks`

**Type:** `boolean` **Default:** `true`

Determines whether the Router should attempt backup providers when the primary is unavailable. Set to `false` to restrict requests to your specified providers only.

***

#### `require_parameters`

**Type:** `boolean` **Default:** `false`

When enabled, the Router only routes to providers that support all parameters in your request. Use this when your request includes provider-specific features like `response_format` that not all providers handle.

***

#### `data_collection`

**Type:** `"allow"` | `"deny"` **Default:** `"allow"`

Controls whether requests may be routed to providers that store or use data for training purposes. Set to `"deny"` to exclude providers with data retention policies.

***

#### `zdr`

**Type:** `boolean` **Default:** —

Restricts routing exclusively to Zero Data Retention endpoints. When enabled, requests only reach providers with contractual ZDR guarantees.

***

#### `enforce_distillable_text`

**Type:** `boolean` **Default:** —

Limits routing to models that permit text distillation. Enable this when you intend to use outputs for model training or fine-tuning.

***

#### `only`

**Type:** `string[]` **Default:** —

Explicitly allowlists providers for this request. Only providers in this array will receive traffic, regardless of other settings.

```json
"only": ["anthropic", "google"]
```

***

#### `ignore`

**Type:** `string[]` **Default:** —

Excludes specific providers from consideration. The Router skips any provider listed here, even if it would otherwise be selected.

```json
"ignore": ["azure"]
```

***

#### `quantizations`

**Type:** `string[]` **Default:** —

Filters providers by supported quantization levels. Use this to target specific model precision variants.

```json
"quantizations": ["int4", "int8"]
```

***

#### `sort`

**Type:** `string` | `object` **Default:** —

Determines how the Router ranks available providers. Accepts a string value (`"price"`, `"throughput"`, or `"latency"`) or an object with `by` and `partition` fields for advanced sorting logic.

```json
"sort": "price"
```

***

#### `preferred_min_throughput`

**Type:** `number` | `object` **Default:** —

Sets a minimum throughput threshold in tokens per second. Providers below this threshold are deprioritized. Accepts a flat number or an object with percentile cutoffs (`p50`, `p75`, `p90`, `p99`).

***

#### `preferred_max_latency`

**Type:** `number` | `object` **Default:** —

Sets a maximum acceptable latency in seconds. Providers exceeding this threshold are deprioritized. Accepts a flat number or an object with percentile cutoffs (`p50`, `p75`, `p90`, `p99`).

***

#### `max_price`

**Type:** `object` **Default:** —

Caps the maximum price per token you're willing to pay. Providers exceeding this threshold are excluded from routing.

***

### Example Configuration

```json
{
  "models": [
    "anthropic/claude-sonnet-4-20250514",
    "google/gemini-2.0-flash",
    "openai/gpt-4o"
  ],
  "messages": [
    {"role": "user", "content": "Summarize this document."}
  ],
  "provider": {
    "allow_fallbacks": true,
    "data_collection": "deny",
    "preferred_max_latency": 2.5
  }
}
```

This configuration attempts the models in array order: Claude first, then Gemini, then GPT-4o if needed. It excludes providers that retain data and deprioritizes any provider with latency above 2.5 seconds.
