# Monitoring

Once your services are deployed through the InteractiveAI Deployment CLI, the Monitoring page provides real-time infrastructure visibility into every running workload. While the Reporting dashboards focus on LLM-level metrics like traces, latency, and token costs, Deployment Monitoring operates at the infrastructure layer: how much traffic your containers are handling, whether requests are succeeding, how fast responses are being served, and whether your pods have enough CPU and memory to operate reliably.

This distinction matters. A service can return correct LLM outputs while silently running out of memory, or it can show healthy pod metrics while returning degraded responses due to a bad prompt version. Deployment Monitoring and Reporting Dashboards are complementary views: one watches the infrastructure, the other watches the AI behavior. Together, they give you complete operational coverage.

<div data-with-frame="true"><figure><img src="https://708770081-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F1ICwJbq7EJdn5kBgXnQu%2Fuploads%2FIqYx43LDnlP0DJZqBgsf%2Fimage.png?alt=media&#x26;token=d0b57136-8036-4d3c-9343-70f9b4521508" alt=""><figcaption></figcaption></figure></div>

***

### Dashboard Controls

The top of the page provides three controls for customizing your view:

**Workload filter**:The "All workloads" dropdown lets you select which deployed services to display. Use this to isolate a specific workload when debugging or to compare multiple services side by side.

**Time range**: Choose the observation window: 5m, 15m, 1h, 6h (default), 24h, or 7d. Shorter windows are useful for debugging active incidents, while longer windows help identify trends and patterns.

**Auto-refresh**: Toggle automatic dashboard refresh on or off. When enabled, the charts update periodically without requiring manual page reloads.

***

### Traffic Metrics

#### Total Requests per Domain

Displays the request volume over time for each deployed domain. Each workload's domain appears as a separate series in the chart legend. Hover over any data point to see the exact request count at that timestamp.

Use this chart to understand traffic patterns, identify unexpected spikes, and verify that your service is receiving requests after deployment.

<div data-with-frame="true"><figure><img src="https://708770081-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F1ICwJbq7EJdn5kBgXnQu%2Fuploads%2FQ7XSA5d7ZOOp4GvXftvJ%2Fimage.png?alt=media&#x26;token=cacd2500-fa2d-44bc-b873-a1a8b87e35c2" alt=""><figcaption></figcaption></figure></div>

#### Total Requests per Status Code

Breaks down request volume by HTTP status code category: 2xx (success, green), 4xx (client errors, yellow), and 5xx (server errors, red). Hover over any bar to see the exact count for each category at that timestamp.

A healthy service shows predominantly green bars. A sudden increase in red (5xx) bars indicates server-side failures that need investigation. Yellow (4xx) bars may indicate client-side issues like malformed requests or authentication problems.

<div data-with-frame="true"><figure><img src="https://708770081-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F1ICwJbq7EJdn5kBgXnQu%2Fuploads%2FI5DwVa05w38PvOzREF1M%2Fimage.png?alt=media&#x26;token=ce02881b-a72f-4dc8-8b17-19405267555d" alt=""><figcaption></figcaption></figure></div>

### Performance Metrics

#### Response Time

Tracks response latency over time using three percentile lines: p50 (median, blue), p95 (teal), and p99 (orange). Hover over the chart to see exact values at any point in time.

The gap between p50 and p99 reveals tail latency. If p50 is 9ms but p99 is 2.5s, most requests are fast but a small percentage experience significant delays. This pattern often points to cold starts, resource contention, or specific request types that trigger slower code paths.

<div data-with-frame="true"><figure><img src="https://708770081-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F1ICwJbq7EJdn5kBgXnQu%2Fuploads%2FxEuotb2oHkKdqvbS0oWu%2Fimage.png?alt=media&#x26;token=ea7e8d6c-0d96-43f9-9918-dd8847a5518b" alt=""><figcaption></figcaption></figure></div>

### Resource Metrics

#### CPU Usage

Displays CPU consumption in cores over time for each workload. Hover over the chart to see the exact core usage at any point.

#### CPU Resources

A summary table showing current CPU allocation and utilization per workload:

| Column      | Description                                               |
| ----------- | --------------------------------------------------------- |
| Name        | Workload name                                             |
| Replicas    | Number of running replicas                                |
| Allocated   | Total CPU cores allocated (per replica)                   |
| Usage       | Current CPU consumption in cores                          |
| Usage %     | Current usage as a percentage of allocated CPU            |
| Max Usage % | Peak usage percentage observed in the selected time range |

<div data-with-frame="true"><figure><img src="https://708770081-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F1ICwJbq7EJdn5kBgXnQu%2Fuploads%2FOlNjQ6kotTIXDeiGNSK9%2Fimage.png?alt=media&#x26;token=ac4198b7-93cf-4c25-868a-e41d8413fbe9" alt=""><figcaption></figcaption></figure></div>

If Usage % consistently approaches 100%, your workload needs more CPU allocation or additional replicas. If it stays very low, you may be over-provisioned.

#### Memory Usage

Displays memory consumption over time in MB for each workload. Hover over the chart to see the exact memory usage at any point.

#### Memory Resources

A summary table showing current memory allocation and utilization per workload:

| Column      | Description                                               |
| ----------- | --------------------------------------------------------- |
| Name        | Workload name                                             |
| Replicas    | Number of running replicas                                |
| Allocated   | Total memory allocated (per replica)                      |
| Usage       | Current memory consumption                                |
| Usage %     | Current usage as a percentage of allocated memory         |
| Max Usage % | Peak usage percentage observed in the selected time range |

<div data-with-frame="true"><figure><img src="https://708770081-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F1ICwJbq7EJdn5kBgXnQu%2Fuploads%2FLJkKcEVENiRSrhJEC87x%2Fimage.png?alt=media&#x26;token=7707d3e8-d88f-47e2-875f-bd580b428867" alt=""><figcaption></figcaption></figure></div>

Memory that grows steadily over time without releasing may indicate a memory leak. If Usage % approaches 100%, the container risks being killed by the orchestrator (OOMKilled), which causes service interruptions.
