Ragas

Ragas is an open-source tool for model-based evaluation of RAG pipelines. The library performs reference-free evaluations, eliminating the need for ground-truth data when assessing system performance. Ragas measures aspects like faithfulness, answer relevancy, and context precision.

Combining Ragas with InteractiveAI enables evaluation and monitoring of Retrieval-Augmented Generation pipelines with detailed scoring and analytics.

Prerequisites

  • InteractiveAI account with API credentials

  • LLM provider credentials (OpenAI, Ollama, or other supported provider)


Installation

pip install interactiveai datasets ragas llama_index openai

Configuration

Set your API credentials as environment variables:

import os

# InteractiveAI credentials
# Obtain keys from Settings > API Keys in the dashboard
os.environ["INTERACTIVEAI_PUBLIC_KEY"] = "pk-..."
os.environ["INTERACTIVEAI_SECRET_KEY"] = "sk-..."

# Model provider credentials
os.environ["OPENAI_API_KEY"] = "sk-proj-..."

Initialize the client and confirm connectivity:


Available Metrics

Ragas provides several metrics for RAG evaluation:

  • Faithfulness: Measures factual consistency between the generated answer and provided context

  • Answer Relevancy: Assesses how pertinent the generated answer is to the given prompt

  • Context Precision: Evaluates whether relevant items in the context are ranked appropriately


Initializing Metrics

Configure metrics with your preferred LLM and embeddings:


Evaluation Approaches

Two methods exist for running evaluations:

  • Score Each Trace: Evaluate every trace individually for granular performance insights. More expensive but comprehensive.

  • Score as Batch: Sample traces periodically and score them together. Lower cost with approximate performance estimates.

1. Scoring Individual Traces

Define a utility function to compute scores:

Integrate scoring into your RAG pipeline:

Attach scores to the trace:

2. Batch Scoring

For high-traffic applications, batch scoring reduces cost while providing performance estimates.

Retrieve traces from the InteractiveAI Platform:

Build an evaluation batch:

Run batch evaluation:

Push scores back to InteractiveAI:


Trace Visibility

The InteractiveAI dashboard displays:

  • Traces with attached Ragas scores for quick performance overview

  • Score-based filtering to identify low-quality responses

  • Analytics with drill-down into user segments and use cases

  • Score trends across time periods for performance monitoring

Last updated

Was this helpful?