# Datasets

## Overview

Create and manage datasets and dataset items used for evaluation runs.

***

## `create_dataset` [(source)](https://github.com/interactive-ai/interactiveai-python-sdk/blob/main/interactiveai/_client/client.py#L2811)

Create a dataset with the given name on InteractiveAI.

```python
create_dataset(
    *,
    name: str,
    description: str | None = None,
    metadata: Any | None = None,
    input_schema: Any | None = None,
    expected_output_schema: Any | None = None,
) -> Dataset
```

**Parameters**

* `name` — Name of the dataset to create.
* `description` — Description of the dataset. Defaults to None.
* `metadata` — Additional metadata. Defaults to None.
* `input_schema` — JSON Schema for validating dataset item inputs. When set, all new items will be validated against this schema.
* `expected_output_schema` — JSON Schema for validating dataset item expected outputs. When set, all new items will be validated against this schema.

**Returns**

Dataset: The created dataset as returned by the InteractiveAI API.

***

## `create_dataset_item` [(source)](https://github.com/interactive-ai/interactiveai-python-sdk/blob/main/interactiveai/_client/client.py#L2840)

Create a dataset item.

Upserts if an item with id already exists.

```python
create_dataset_item(
    *,
    dataset_name: str,
    input: Any | None = None,
    expected_output: Any | None = None,
    metadata: Any | None = None,
    source_trace_id: str | None = None,
    source_observation_id: str | None = None,
    status: DatasetStatus | None = None,
    id: str | None = None,
) -> DatasetItem
```

**Parameters**

* `dataset_name` — Name of the dataset in which the dataset item should be created.
* `input` — Input data. Defaults to None. Can contain any dict, list or scalar.
* `expected_output` — Expected output data. Defaults to None. Can contain any dict, list or scalar.
* `metadata` — Additional metadata. Defaults to None. Can contain any dict, list or scalar.
* `source_trace_id` — Id of the source trace. Defaults to None.
* `source_observation_id` — Id of the source observation. Defaults to None.
* `status` — Status of the dataset item. Defaults to ACTIVE for newly created items.
* `id` — Id of the dataset item. Defaults to None. Provide your own id if you want to dedupe dataset items. Id needs to be globally unique and cannot be reused across datasets.

**Returns**

DatasetItem: The created dataset item as returned by the InteractiveAI API.

**Example**

```python
from interactiveai import Interactive

interactiveai = Interactive()

# Uploading items to the InteractiveAI dataset named "capital_cities"
interactiveai.create_dataset_item(
    dataset_name="capital_cities",
    input={"input": {"country": "Italy"}},
    expected_output={"expected_output": "Rome"},
    metadata={"foo": "bar"}
)
```

***

## `get_dataset` [(source)](https://github.com/interactive-ai/interactiveai-python-sdk/blob/main/interactiveai/_client/client.py#L2304)

Fetch a dataset by its name.

```python
get_dataset(
    name: str,
    *,
    fetch_items_page_size: int | None = 50,
) -> 'DatasetClient'
```

**Parameters**

* `name` — The name of the dataset to fetch.
* `fetch_items_page_size` — All items of the dataset will be fetched in chunks of this size. Defaults to 50.

**Returns**

DatasetClient: The dataset with the given name.

***

## `get_dataset_run` [(source)](https://github.com/interactive-ai/interactiveai-python-sdk/blob/main/interactiveai/_client/client.py#L2320)

Fetch a dataset run by dataset name and run name.

```python
get_dataset_run(
    *,
    dataset_name: str,
    run_name: str,
) -> DatasetRunWithItems
```

**Parameters**

* `dataset_name` — The name of the dataset.
* `run_name` — The name of the run.

**Returns**

DatasetRunWithItems: The dataset run with its items.

***

## `get_dataset_runs` [(source)](https://github.com/interactive-ai/interactiveai-python-sdk/blob/main/interactiveai/_client/client.py#L2336)

Fetch all runs for a dataset.

```python
get_dataset_runs(
    *,
    dataset_name: str,
    page: int | None = None,
    limit: int | None = None,
) -> PaginatedDatasetRuns
```

**Parameters**

* `dataset_name` — The name of the dataset.
* `page` — Page number, starts at 1.
* `limit` — Limit of items per page.

**Returns**

PaginatedDatasetRuns: Paginated list of dataset runs.

***

## `delete_dataset_run` [(source)](https://github.com/interactive-ai/interactiveai-python-sdk/blob/main/interactiveai/_client/client.py#L2357)

Delete a dataset run and all its run items. This action is irreversible.

```python
delete_dataset_run(
    *,
    dataset_name: str,
    run_name: str,
) -> DeleteDatasetRunResponse
```

**Parameters**

* `dataset_name` — The name of the dataset.
* `run_name` — The name of the run.

**Returns**

DeleteDatasetRunResponse: Confirmation of deletion.
