> For the complete documentation index, see [llms.txt](https://docs.interactive.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.interactive.ai/deploy/monitor.md). # Monitor Once your services are deployed, the Monitoring page provides **real-time infrastructure visibility** into every running workload. While the [Reporting](/report/reporting.md) dashboards focus on LLM-level metrics like traces, latency, and token costs, Monitor operates at the infrastructure layer: how much traffic your containers are handling, whether requests are succeeding, how fast responses are being served, and whether your pods have enough CPU and memory to operate reliably. This distinction matters. A service can return correct LLM outputs while silently running out of memory, or it can show healthy pod metrics while returning degraded responses due to a bad prompt version. Monitor and Reporting are complementary views: one watches the infrastructure, the other watches the AI behavior. Together, they give you complete operational coverage.

*** ### Dashboard Controls The top of the page provides three controls for customizing your view: **Agent selector** is a dropdown in the breadcrumb bar that lets you filter the monitoring view by a specific agent. It lists all agents in the project with a search field, and includes an "All agents" option to show everything and a "+ New agent" shortcut. Use this to isolate a single agent when debugging or to compare traffic patterns across agents. **Workload filter**:The "All workloads" dropdown lets you select which deployed services to display. Use this to isolate a specific workload when debugging or to compare multiple services side by side. **Time range**: Choose the observation window: 5m, 15m, 1h, 6h (default), 24h, or 7d. Shorter windows are useful for debugging active incidents, while longer windows help identify trends and patterns. **Auto-refresh**: Toggle automatic dashboard refresh on or off. When enabled, the charts update periodically without requiring manual page reloads. *** ### Traffic Metrics #### Total Requests per Domain Displays the request volume over time for each deployed domain. Each workload's domain appears as a separate series in the chart legend. Hover over any data point to see the exact request count at that timestamp. Use this chart to understand traffic patterns, identify unexpected spikes, and verify that your service is receiving requests after deployment.

#### Total Requests per Status Code Breaks down request volume by HTTP status code category: 2xx (success, green), 4xx (client errors, yellow), and 5xx (server errors, red). Hover over any bar to see the exact count for each category at that timestamp. A healthy service shows predominantly green bars. A sudden increase in red (5xx) bars indicates server-side failures that need investigation. Yellow (4xx) bars may indicate client-side issues like malformed requests or authentication problems.

### Performance Metrics #### Response Time Tracks response latency over time using three percentile lines: p50 (median, blue), p95 (teal), and p99 (orange). Hover over the chart to see exact values at any point in time. The gap between p50 and p99 reveals tail latency. If p50 is 9ms but p99 is 2.5s, most requests are fast but a small percentage experience significant delays. This pattern often points to cold starts, resource contention, or specific request types that trigger slower code paths.

### Resource Metrics #### CPU Usage Displays CPU consumption in cores over time for each workload. Hover over the chart to see the exact core usage at any point. #### CPU Resources A summary table showing current CPU allocation and utilization per workload: | Column | Description | | ----------- | --------------------------------------------------------- | | Name | Workload name | | Replicas | Number of running replicas | | Allocated | Total CPU cores allocated (per replica) | | Usage | Current CPU consumption in cores | | Usage % | Current usage as a percentage of allocated CPU | | Max Usage % | Peak usage percentage observed in the selected time range |

If Usage % consistently approaches 100%, your workload needs more CPU allocation or additional replicas. If it stays very low, you may be over-provisioned. #### Memory Usage Displays memory consumption over time in MB for each workload. Hover over the chart to see the exact memory usage at any point. #### Memory Resources A summary table showing current memory allocation and utilization per workload: | Column | Description | | ----------- | --------------------------------------------------------- | | Name | Workload name | | Replicas | Number of running replicas | | Allocated | Total memory allocated (per replica) | | Usage | Current memory consumption | | Usage % | Current usage as a percentage of allocated memory | | Max Usage % | Peak usage percentage observed in the selected time range |

Memory that grows steadily over time without releasing may indicate a memory leak. If Usage % approaches 100%, the container risks being killed by the orchestrator (OOMKilled), which causes service interruptions. --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://docs.interactive.ai/deploy/monitor.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.