Annotations
Annotation enables domain experts to manually evaluate AI outputs by adding scores and comments to traces, observations, or sessions. This approach establishes ground truth for your evaluation infrastructure and provides reference points for benchmarking automated evaluators.
Before creating annotations, you need at least one Score Config defined in your project. Score configs determine which scoring dimensions are available during annotation. Refer to Score Configs to know more about it.
Why Annotation Matters
Automated evaluation can scale, but human judgment remains essential for establishing what "good" actually means in your specific context. Human annotation serves two purposes:
Direct Assessment allows multiple team members to review AI outputs and score them against defined criteria. This builds high-quality labeled datasets and surfaces issues that automated systems might miss.
Calibration aligns your LLM-as-a-Judge evaluators with human judgment. By comparing automated scores against expert annotations, you can identify where your evaluators drift from human expectations and adjust accordingly.
Annotation Queues
Annotation Queues streamline the process of working through batches of items that need review. Instead of hunting through traces one by one, queues let you organize work and track progress across your team.
Creating a Queue
Navigate to Improvement → Annotations
Click + New Queue
Configure the queue:
Name: Identifier for this annotation task
Description: Optional context about what reviewers should focus on
Score Config: Select which scoring dimensions annotators will use

Adding Items to a Queue
You can add traces to a queue individually or in bulk from the Traces view:
Single item
Open a trace, click Annotate dropdown, select the Score Config of your choosing and score that trace
Bulk selection
Select traces using checkboxes, click Actions dropdown, then + Add to Annotation Queue
Use filters on the Traces view to narrow down to the specific subset you want reviewed before adding to a queue.
Processing a Queue
Click Process Queue to enter the annotation interface. Each item displays its content alongside the scoring panel.

Focused view shows Input, Output, and Metadata in a simplified layout for rapid review.
Detailed view displays the full trace with execution graph, latency, cost, and the Run/Scores tabs. Use this when you need complete context to make a judgment.
For each item:
Review the content in the main panel
Enter scores for each dimension in the Annotate panel on the right
Click Complete + Next to save and move to the next item
The queue tracks completion status and who completed each item, giving you visibility into annotation progress.
Single-Trace Annotation
For ad-hoc review outside of queues, you can annotate any trace directly:
Open a trace, observation, or session detail view
Click Annotate
Select which Score Configs to use
Enter score values
Scores appear in the Scores tab on the detail view

This approach works well for investigating flagged issues, validating specific outputs, or quick spot-checks.
Viewing Annotation Results
Scores created through annotation appear in multiple places:
Trace detail view: Click the Scores tab to see all scores attached to that trace
Scores page: View all scores across your project with filtering and export options
Dashboards: Aggregate annotation data appears in your quality metrics
Use the Source filter on the Scores page to isolate human annotations (ANNOTATION) from automated evaluations (EVAL) or evaluation made via the InteractiveAI SDK (API) .
Last updated
Was this helpful?

