Datasets

Datasets are collections of test cases used to evaluate your LLM application systematically. Each dataset contains items with inputs, ideally expected outputs, and metadata that you run your AI system against to measure quality, detect regressions, and compare configurations.

Why Datasets Matter

LLM applications need structured evaluation beyond ad-hoc testing. Datasets let you:

  • Test your system against consistent, repeatable inputs

  • Compare performance across prompt versions, models, or configurations

  • Detect regressions before they reach production

  • Build a library of edge cases and known failure modes

  • Establish baselines for quality measurement


Creating Datasets

You can create datasets through the UI or programmatically via the InteractiveAI SDK.

  • Navigate to Improvement > Datasets in the sidebar

  • Click + New Dataset

  • Enter a name, optional description, and optional metadata

  • Click Create Dataset

Adding Items to a Dataset

Each item represents a single test case with an input, expected output, and metadata. You can populate datasets through the UI or SDK.

There are three ways to add items through the UI:

  1. From the Items tab:

    1. Open a dataset from the Datasets list

    2. Click the Items tab

    3. Click + New Item

    4. Enter the input, expected output, and metadata

    5. Save the item

  2. From a production trace:

    1. Open any trace in the Observability view

    2. Click Add to Dataset in the trace detail view

    3. Select the target dataset

    4. Optionally edit the input, expected output, and metadata

    5. Confirm to add the item

circle-info

This method is useful for building test cases from real user interactions.

  1. Via CSV upload:

    1. When creating a new dataset, drag and drop a CSV file or click to upload

    2. Map CSV columns to input, expected output, and metadata fields

    3. The items will be created automatically from your CSV rows

Last updated

Was this helpful?