> For the complete documentation index, see [llms.txt](https://docs.interactive.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.interactive.ai/sdk/datasets.md).

# Datasets

## Overview

Create and manage datasets and dataset items used for evaluation runs.

***

## `create_dataset` [(source)](https://github.com/interactive-ai/interactiveai-python-sdk/blob/main/interactiveai/_client/client.py#L2811)

Create a dataset with the given name on InteractiveAI.

```python
create_dataset(
    *,
    name: str,
    description: str | None = None,
    metadata: Any | None = None,
    input_schema: Any | None = None,
    expected_output_schema: Any | None = None,
) -> Dataset
```

**Parameters**

* `name` — Name of the dataset to create.
* `description` — Description of the dataset. Defaults to None.
* `metadata` — Additional metadata. Defaults to None.
* `input_schema` — JSON Schema for validating dataset item inputs. When set, all new items will be validated against this schema.
* `expected_output_schema` — JSON Schema for validating dataset item expected outputs. When set, all new items will be validated against this schema.

**Returns**

Dataset: The created dataset as returned by the InteractiveAI API.

***

## `create_dataset_item` [(source)](https://github.com/interactive-ai/interactiveai-python-sdk/blob/main/interactiveai/_client/client.py#L2840)

Create a dataset item.

Upserts if an item with id already exists.

```python
create_dataset_item(
    *,
    dataset_name: str,
    input: Any | None = None,
    expected_output: Any | None = None,
    metadata: Any | None = None,
    source_trace_id: str | None = None,
    source_observation_id: str | None = None,
    status: DatasetStatus | None = None,
    id: str | None = None,
) -> DatasetItem
```

**Parameters**

* `dataset_name` — Name of the dataset in which the dataset item should be created.
* `input` — Input data. Defaults to None. Can contain any dict, list or scalar.
* `expected_output` — Expected output data. Defaults to None. Can contain any dict, list or scalar.
* `metadata` — Additional metadata. Defaults to None. Can contain any dict, list or scalar.
* `source_trace_id` — Id of the source trace. Defaults to None.
* `source_observation_id` — Id of the source observation. Defaults to None.
* `status` — Status of the dataset item. Defaults to ACTIVE for newly created items.
* `id` — Id of the dataset item. Defaults to None. Provide your own id if you want to dedupe dataset items. Id needs to be globally unique and cannot be reused across datasets.

**Returns**

DatasetItem: The created dataset item as returned by the InteractiveAI API.

**Example**

```python
from interactiveai import Interactive

interactiveai = Interactive()

# Uploading items to the InteractiveAI dataset named "capital_cities"
interactiveai.create_dataset_item(
    dataset_name="capital_cities",
    input={"input": {"country": "Italy"}},
    expected_output={"expected_output": "Rome"},
    metadata={"foo": "bar"}
)
```

***

## `get_dataset` [(source)](https://github.com/interactive-ai/interactiveai-python-sdk/blob/main/interactiveai/_client/client.py#L2304)

Fetch a dataset by its name.

```python
get_dataset(
    name: str,
    *,
    fetch_items_page_size: int | None = 50,
) -> 'DatasetClient'
```

**Parameters**

* `name` — The name of the dataset to fetch.
* `fetch_items_page_size` — All items of the dataset will be fetched in chunks of this size. Defaults to 50.

**Returns**

DatasetClient: The dataset with the given name.

***

## `get_dataset_run` [(source)](https://github.com/interactive-ai/interactiveai-python-sdk/blob/main/interactiveai/_client/client.py#L2320)

Fetch a dataset run by dataset name and run name.

```python
get_dataset_run(
    *,
    dataset_name: str,
    run_name: str,
) -> DatasetRunWithItems
```

**Parameters**

* `dataset_name` — The name of the dataset.
* `run_name` — The name of the run.

**Returns**

DatasetRunWithItems: The dataset run with its items.

***

## `get_dataset_runs` [(source)](https://github.com/interactive-ai/interactiveai-python-sdk/blob/main/interactiveai/_client/client.py#L2336)

Fetch all runs for a dataset.

```python
get_dataset_runs(
    *,
    dataset_name: str,
    page: int | None = None,
    limit: int | None = None,
) -> PaginatedDatasetRuns
```

**Parameters**

* `dataset_name` — The name of the dataset.
* `page` — Page number, starts at 1.
* `limit` — Limit of items per page.

**Returns**

PaginatedDatasetRuns: Paginated list of dataset runs.

***

## `delete_dataset_run` [(source)](https://github.com/interactive-ai/interactiveai-python-sdk/blob/main/interactiveai/_client/client.py#L2357)

Delete a dataset run and all its run items. This action is irreversible.

```python
delete_dataset_run(
    *,
    dataset_name: str,
    run_name: str,
) -> DeleteDatasetRunResponse
```

**Parameters**

* `dataset_name` — The name of the dataset.
* `run_name` — The name of the run.

**Returns**

DeleteDatasetRunResponse: Confirmation of deletion.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.interactive.ai/sdk/datasets.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
