# Ragas

Ragas is an open-source tool for model-based evaluation of RAG pipelines. The library performs reference-free evaluations, eliminating the need for ground-truth data when assessing system performance. Ragas measures aspects like faithfulness, answer relevancy, and context precision.

Combining Ragas with InteractiveAI enables evaluation and monitoring of Retrieval-Augmented Generation pipelines with detailed scoring and analytics.

### Prerequisites

* InteractiveAI account with API credentials
* LLM provider credentials (OpenAI, Ollama, or other supported provider)

***

### Installation

```bash
pip install interactiveai datasets ragas llama_index openai
```

***

### Configuration

Set your API credentials as environment variables:

```python
import os

# InteractiveAI credentials
# Obtain keys from Settings > API Keys in the dashboard
os.environ["INTERACTIVEAI_PUBLIC_KEY"] = "pk-..."
os.environ["INTERACTIVEAI_SECRET_KEY"] = "sk-..."

# Model provider credentials
os.environ["OPENAI_API_KEY"] = "sk-proj-..."
```

Initialize the client and confirm connectivity:

```python
from interactiveai import Interactive

client = Interactive(
    public_key=os.environ["INTERACTIVEAI_PUBLIC_KEY"],
    secret_key=os.environ["INTERACTIVEAI_SECRET_KEY"],
)

if client.auth_check():
    print("Connection established")
else:
    print("Authentication failed - verify credentials")
```

***

### Available Metrics

Ragas provides several metrics for RAG evaluation:

* **Faithfulness**: Measures factual consistency between the generated answer and provided context
* **Answer Relevancy**: Assesses how pertinent the generated answer is to the given prompt
* **Context Precision**: Evaluates whether relevant items in the context are ranked appropriately

```python
from ragas.metrics import (
    Faithfulness,
    ResponseRelevancy,
    LLMContextPrecisionWithoutReference,
)

metrics = [
    Faithfulness(),
    ResponseRelevancy(),
    LLMContextPrecisionWithoutReference(),
]
```

***

### Initializing Metrics

Configure metrics with your preferred LLM and embeddings:

```python
from ragas.run_config import RunConfig
from ragas.metrics.base import MetricWithLLM, MetricWithEmbeddings
from langchain_openai.chat_models import ChatOpenAI
from langchain_openai.embeddings import OpenAIEmbeddings
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper

def init_ragas_metrics(metrics, llm, embedding):
    for metric in metrics:
        if isinstance(metric, MetricWithLLM):
            metric.llm = llm
        if isinstance(metric, MetricWithEmbeddings):
            metric.embeddings = embedding
        run_config = RunConfig()
        metric.init(run_config)

llm = ChatOpenAI()
emb = OpenAIEmbeddings()

init_ragas_metrics(
    metrics,
    llm=LangchainLLMWrapper(llm),
    embedding=LangchainEmbeddingsWrapper(emb),
)
```

***

### Evaluation Approaches

Two methods exist for running evaluations:

* **Score Each Trace**: Evaluate every trace individually for granular performance insights. More expensive but comprehensive.
* **Score as Batch**: Sample traces periodically and score them together. Lower cost with approximate performance estimates.

#### 1. Scoring Individual Traces

Define a utility function to compute scores:

```python
from ragas.dataset_schema import SingleTurnSample

async def score_with_ragas(query, chunks, answer):
    scores = {}
    for m in metrics:
        sample = SingleTurnSample(
            user_input=query,
            retrieved_contexts=chunks,
            response=answer,
        )
        scores[m.name] = await m.single_turn_ascore(sample)
    return scores
```

Integrate scoring into your RAG pipeline:

```python
from interactiveai import Interactive

client = Interactive(
    public_key=os.environ["INTERACTIVEAI_PUBLIC_KEY"],
    secret_key=os.environ["INTERACTIVEAI_SECRET_KEY"],
    host=os.environ.get("INTERACTIVEAI_HOST", "https://app.interactiveai.com")
)

question = "What are the key benefits of vector databases?"
contexts = ["Vector databases enable similarity search...", "They support high-dimensional data..."]
answer = "Vector databases provide efficient similarity search and support for high-dimensional embeddings."

with client.start_as_current_span(name="rag") as trace:
    trace_id = trace.trace_id
    
    with client.start_as_current_span(
        name="retrieval",
        input={"question": question},
        output={"contexts": contexts}
    ):
        pass

    with client.start_as_current_span(
        name="generation",
        input={"question": question, "contexts": contexts},
        output={"answer": answer}
    ):
        pass

    ragas_scores = await score_with_ragas(question, contexts, answer)

print("Ragas Scores:", ragas_scores)
```

Attach scores to the trace:

```python
for m in metrics:
    client.create_score(
        name=m.name,
        value=ragas_scores[m.name],
        trace_id=trace_id
    )
```

#### 2. Batch Scoring

For high-traffic applications, batch scoring reduces cost while providing performance estimates.

Retrieve traces from the InteractiveAI Platform:

```python
def get_traces(name=None, limit=None, user_id=None):
    all_data = []
    page = 1

    while True:
        response = client.api.trace.list(
            name=name, page=page, user_id=user_id
        )
        if not response.data:
            break
        page += 1
        all_data.extend(response.data)
        if len(all_data) > limit:
            break

    return all_data[:limit]
```

Build an evaluation batch:

```python
from random import sample

NUM_TRACES_TO_SAMPLE = 3
traces = get_traces(name="rag", limit=5)
traces_sample = sample(traces, NUM_TRACES_TO_SAMPLE)

evaluation_batch = {
    "question": [],
    "contexts": [],
    "answer": [],
    "trace_id": [],
}

for t in traces_sample:
    observations = [client.api.observations.get(o) for o in t.observations]
    for o in observations:
        if o.name == "retrieval":
            question = o.input["question"]
            contexts = o.output["contexts"]
        if o.name == "generation":
            answer = o.output["answer"]
    evaluation_batch["question"].append(question)
    evaluation_batch["contexts"].append(contexts)
    evaluation_batch["answer"].append(answer)
    evaluation_batch["trace_id"].append(t.id)
```

Run batch evaluation:

```python
from datasets import Dataset
from ragas import evaluate
from ragas.metrics import Faithfulness, ResponseRelevancy

ds = Dataset.from_dict(evaluation_batch)
results = evaluate(ds, metrics=[Faithfulness(), ResponseRelevancy()])

print(results)
```

Push scores back to InteractiveAI:

```python
df = results.to_pandas()
df["trace_id"] = ds["trace_id"]

for _, row in df.iterrows():
    for metric_name in ["faithfulness", "answer_relevancy"]:
        client.create_score(
            name=metric_name,
            value=row[metric_name],
            trace_id=row["trace_id"]
        )
```

***

### Trace Visibility

The InteractiveAI dashboard displays:

* Traces with attached Ragas scores for quick performance overview
* Score-based filtering to identify low-quality responses
* Analytics with drill-down into user segments and use cases
* Score trends across time periods for performance monitoring


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.interactive.ai/integrations/ai-frameworks/ragas.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.