Datasets
Datasets are collections of test cases used to evaluate your LLM application systematically. Each dataset contains items with inputs, ideally expected outputs, and metadata that you run your AI system against to measure quality, detect regressions, and compare configurations.
Why Datasets Matter
LLM applications need structured evaluation beyond ad-hoc testing. Datasets let you:
Test your system against consistent, repeatable inputs
Compare performance across prompt versions, models, or configurations
Detect regressions before they reach production
Build a library of edge cases and known failure modes
Establish baselines for quality measurement
Creating Datasets
You can create datasets through the UI or programmatically via the InteractiveAI SDK.
Navigate to Improvement > Datasets in the sidebar
Click + New Dataset
Enter a name, optional description, and optional metadata
Click Create Dataset

Adding Items to a Dataset
Each item represents a single test case with an input, expected output, and metadata. You can populate datasets through the UI or SDK.
There are three ways to add items through the UI:
From the Items tab:
Open a dataset from the Datasets list
Click the Items tab
Click + New Item
Enter the input, expected output, and metadata
Save the item

From a production trace:
Open any trace in the Observability view
Click Add to Dataset in the trace detail view
Select the target dataset
Optionally edit the input, expected output, and metadata
Confirm to add the item

This method is useful for building test cases from real user interactions.
Via CSV upload:
When creating a new dataset, drag and drop a CSV file or click to upload
Map CSV columns to input, expected output, and metadata fields
The items will be created automatically from your CSV rows

There are three ways to add items through the SDK:
You can add a single item:
You can add multiple items at once:
You can create dataset items linked to production traces:
Last updated
Was this helpful?

