Annotations

Annotation enables domain experts to manually evaluate AI outputs by adding scores and comments to traces, observations, or sessions. This approach establishes ground truth for your evaluation infrastructure and provides reference points for benchmarking automated evaluators.

circle-info

Before creating annotations, you need at least one Score Config defined in your project. Score configs determine which scoring dimensions are available during annotation. Refer to Score Configs to know more about it.

Why Annotation Matters

Automated evaluation can scale, but human judgment remains essential for establishing what "good" actually means in your specific context. Human annotation serves two purposes:

  • Direct Assessment allows multiple team members to review AI outputs and score them against defined criteria. This builds high-quality labeled datasets and surfaces issues that automated systems might miss.

  • Calibration aligns your LLM-as-a-Judge evaluators with human judgment. By comparing automated scores against expert annotations, you can identify where your evaluators drift from human expectations and adjust accordingly.


Annotation Queues

Annotation Queues streamline the process of working through batches of items that need review. Instead of hunting through traces one by one, queues let you organize work and track progress across your team.

Creating a Queue

  1. Navigate to Improvement → Annotations

  2. Click + New Queue

  3. Configure the queue:

    • Name: Identifier for this annotation task

    • Description: Optional context about what reviewers should focus on

    • Score Config: Select which scoring dimensions annotators will use

Adding Items to a Queue

You can add traces to a queue individually or in bulk from the Traces view:

Method
How to do it

Single item

Open a trace, click Annotate dropdown, select the Score Config of your choosing and score that trace

Bulk selection

Select traces using checkboxes, click Actions dropdown, then + Add to Annotation Queue

Use filters on the Traces view to narrow down to the specific subset you want reviewed before adding to a queue.

Processing a Queue

Click Process Queue to enter the annotation interface. Each item displays its content alongside the scoring panel.

  • Focused view shows Input, Output, and Metadata in a simplified layout for rapid review.

  • Detailed view displays the full trace with execution graph, latency, cost, and the Run/Scores tabs. Use this when you need complete context to make a judgment.

For each item:

  1. Review the content in the main panel

  2. Enter scores for each dimension in the Annotate panel on the right

  3. Click Complete + Next to save and move to the next item

The queue tracks completion status and who completed each item, giving you visibility into annotation progress.


Single-Trace Annotation

For ad-hoc review outside of queues, you can annotate any trace directly:

  1. Open a trace, observation, or session detail view

  2. Click Annotate

  3. Select which Score Configs to use

  4. Enter score values

  5. Scores appear in the Scores tab on the detail view

This approach works well for investigating flagged issues, validating specific outputs, or quick spot-checks.


Viewing Annotation Results

Scores created through annotation appear in multiple places:

  • Trace detail view: Click the Scores tab to see all scores attached to that trace

  • Scores page: View all scores across your project with filtering and export options

  • Dashboards: Aggregate annotation data appears in your quality metrics

circle-info

Use the Source filter on the Scores page to isolate human annotations (ANNOTATION) from automated evaluations (EVAL) or evaluation made via the InteractiveAI SDK (API) .

Last updated

Was this helpful?