# Streaming

The InteractiveAI Router supports streaming responses from any model. Streaming is essential for chat interfaces and applications where the UI needs to update progressively as the model generates output.

To enable streaming, set the `stream` parameter to `true` in your request. The model returns the response in chunks rather than waiting for full completion.

### Basic Streaming Example

```python
import requests
import json

question = "Extract the action items from this meeting transcript."

url = "https://app.interactive.ai/api/v1/chat/completions"
headers = {
  "Authorization": f"Bearer <LLMROUTER_API_KEY>",
  "Content-Type": "application/json"
}

payload = {
  "model": "anthropic/claude-3-sonnet",
  "messages": [{"role": "user", "content": question}],
  "stream": True
}

buffer = ""
with requests.post(url, headers=headers, json=payload, stream=True) as r:
  for chunk in r.iter_content(chunk_size=1024, decode_unicode=True):
    buffer += chunk
    while True:
      try:
        # Find the next complete SSE line
        line_end = buffer.find('\n')
        if line_end == -1:
          break

        line = buffer[:line_end].strip()
        buffer = buffer[line_end + 1:]

        if line.startswith('data: '):
          data = line[6:]
          if data == '[DONE]':
            break

          try:
            data_obj = json.loads(data)
            content = data_obj["choices"][0]["delta"].get("content")
            if content:
              print(content, end="", flush=True)
          except json.JSONDecodeError:
            pass
      except Exception:
        break

```

#### Additional Information

For SSE (Server-Sent Events) streams, the Router periodically sends comments to prevent connection timeouts. These comments appear as:

```
: LLMROUTER PROCESSING
```

These comment payloads can be safely ignored per the  [SSE specification](https://html.spec.whatwg.org/multipage/server-sent-events.html#event-stream-interpretation). However, you can use them to improve UX, such as displaying a dynamic loading indicator.

Some SSE client implementations do not parse payloads according to spec, which causes uncaught errors when you `JSON.stringify` non-JSON payloads. The following clients handle this correctly:

* [eventsource-parser](https://github.com/rexxars/eventsource-parser)
* [OpenAI SDK](https://www.npmjs.com/package/openai)
* [Vercel AI SDK](https://www.npmjs.com/package/ai)

### Stream Cancellation

Cancel streaming requests by aborting the connection. When using a supported provider, this stops model processing immediately and prevents further billing.

#### Provider Support

<table><thead><tr><th valign="middle">Supported</th><th>Not Currently Supported</th></tr></thead><tbody><tr><td valign="middle">OpenAI, Azure, Anthropic</td><td>AWS Bedrock, Groq, Modal</td></tr><tr><td valign="middle">Fireworks, Mancer, Recursal</td><td>Google, Google AI Studio, Minimax</td></tr><tr><td valign="middle">AnyScale, Lepton, OctoAI</td><td>HuggingFace, Replicate, Perplexity</td></tr><tr><td valign="middle">Novita, DeepInfra, Together</td><td>Mistral, AI21, Featherless</td></tr><tr><td valign="middle">Cohere, Hyperbolic, Infermatic</td><td>Lynn, Lambda, Reflection</td></tr><tr><td valign="middle">Avian, XAI, Cloudflare</td><td>SambaNova, Inflection, ZeroOneAI</td></tr><tr><td valign="middle">SFCompute, Nineteen, Liquid</td><td>AionLabs, Alibaba, Nebius</td></tr><tr><td valign="middle">Friendli, Chutes, DeepSeek</td><td>Kluster, Targon, InferenceNet</td></tr></tbody></table>

{% hint style="info" %}
Cancellation only works for streaming requests with supported providers. For non-streaming requests or unsupported providers, the model continues processing and you will be billed for the complete response.
{% endhint %}

#### Cancellation Examples

```python
import requests
from threading import Event, Thread

def stream_with_cancellation(prompt: str, cancel_event: Event):
    with requests.Session() as session:
        response = session.post(
            "https://app.interactive.ai/api/v1/chat/completions",
            headers={"Authorization": f"Bearer <LLMROUTER_API_KEY>"},
            json={"model": "{{MODEL}}", "messages": [{"role": "user", "content": prompt}], "stream": True},
            stream=True
        )

        try:
            for line in response.iter_lines():
                if cancel_event.is_set():
                    response.close()
                    return
                if line:
                    print(line.decode(), end="", flush=True)
        finally:
            response.close()

# Example usage:
cancel_event = Event()
stream_thread = Thread(target=lambda: stream_with_cancellation("Generate a detailed compliance checklist for SOC 2 audits.", cancel_event))
stream_thread.start()

# To cancel the stream:
cancel_event.set()
```

#### Handling Errors During Streaming

The Router's error handling differs based on when the error occurs during streaming.

**Errors Before Any Tokens Are Sent**

If an error happens before any tokens reach the client, the Router returns a standard JSON error with the corresponding HTTP status code:

```json
{
  "error": {
    "code": 400,
    "message": "Invalid model specified"
  }
}
```

Common HTTP status codes include:

* **400**: Bad Request (invalid parameters)
* **401**: Unauthorized (invalid API key)
* **402**: Payment Required (insufficient credits)
* **429**: Too Many Requests (rate limited)
* **502**: Bad Gateway (provider error)
* **503**: Service Unavailable (no available providers)

**Errors After Tokens Have Been Sent (Mid-Stream)**

When an error occurs after streaming has started, the HTTP status code is already locked at 200 OK. The Router then delivers the error as a Server-Sent Event with a unified structure:

```
data: {"id":"cmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"claude-3-sonnet","provider":"anthropic","error":{"code":"server_error","message":"Provider disconnected unexpectedly"},"choices":[{"index":0,"delta":{"content":""},"finish_reason":"error"}]}
```

Key characteristics of mid-stream errors:

* The error appears at the top level alongside standard response fields (`id`, `object`, `created`, etc.)
* A `choices` array is included with `finish_reason: "error"` to properly terminate the stream
* The HTTP status remains `200 OK` since headers were already sent
* The stream terminates after this unified error event

**Error Handling Examples**

```python
import requests
import json

async def stream_with_error_handling(prompt):
    response = requests.post(
        'https://app.interactive.ai/api/v1/chat/completions',
        headers={'Authorization': f'Bearer <LLMROUTER_API_KEY>'},
        json={
            'model': '{{MODEL}}',
            'messages': [{'role': 'user', 'content': prompt}],
            'stream': True
        },
        stream=True
    )

    # Check initial HTTP status for pre-stream errors
    if response.status_code != 200:
        error_data = response.json()
        print(f"Error: {error_data['error']['message']}")
        return

    # Process stream and handle mid-stream errors
    for line in response.iter_lines():
        if line:
            line_text = line.decode('utf-8')
            if line_text.startswith('data: '):
                data = line_text[6:]
                if data == '[DONE]':
                    break

                try:
                    parsed = json.loads(data)

                    # Check for mid-stream error
                    if 'error' in parsed:
                        print(f"Stream error: {parsed['error']['message']}")
                        # Check finish_reason if needed
                        if parsed.get('choices', [{}])[0].get('finish_reason') == 'error':
                            print("Stream terminated due to error")
                        break

                    # Process normal content
                    content = parsed['choices'][0]['delta'].get('content')
                    if content:
                        print(content, end='', flush=True)

                except json.JSONDecodeError:
                    pass
```

**API-Specific Behavior**

Streaming error behavior varies slightly across API endpoints:

* **Chat Completions API**: Returns `ErrorResponse` directly if no chunks were processed, or includes error information in the response if some chunks were processed.
* **Responses API**: May transform certain error codes (like `context_length_exceeded`) into a successful response with `finish_reason: "length"` instead of treating them as errors.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.interactive.ai/llm-router/api-guides/streaming.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
Supported	Not Currently Supported
OpenAI, Azure, Anthropic	AWS Bedrock, Groq, Modal
Fireworks, Mancer, Recursal	Google, Google AI Studio, Minimax
AnyScale, Lepton, OctoAI	HuggingFace, Replicate, Perplexity
Novita, DeepInfra, Together	Mistral, AI21, Featherless
Cohere, Hyperbolic, Infermatic	Lynn, Lambda, Reflection
Avian, XAI, Cloudflare	SambaNova, Inflection, ZeroOneAI
SFCompute, Nineteen, Liquid	AionLabs, Alibaba, Nebius
Friendli, Chutes, DeepSeek	Kluster, Targon, InferenceNet