# Embeddings

Embeddings transform text into numerical vectors that encode semantic meaning. These vector representations enable machine learning applications to process and compare text mathematically. The InteractiveAI Router offers a unified interface for accessing embedding models across multiple providers.

### Understanding Embeddings

When text is converted to an embedding, it becomes a point in a high-dimensional vector space. Semantically similar texts occupy nearby positions in this space. For example, "refund policy" and "return guidelines" will have vectors that are close together, while "refund policy" and "server configuration" will be far apart.

This mathematical representation of meaning forms the backbone of many production AI systems.

### Use Cases

**Retrieval-Augmented Generation (RAG)**: Build pipelines that fetch relevant context from your knowledge base before generating responses. Embeddings determine which documents should be included in the LLM's context window.

**Semantic Search**: Convert your document corpus and user queries into embeddings, then rank results by vector similarity. Unlike keyword matching, this approach understands meaning and surfaces relevant results even when exact terms don't match.

**Recommendation Engines**: Generate embeddings for content items (articles, products, support tickets) and user behavior to identify similar items. Vector proximity reveals relationships that keyword analysis would miss.

**Anomaly Detection**: Flag unusual content by identifying embeddings that fall outside normal patterns in your dataset.

**Document Classification**: Assign texts to categories or group related content by measuring embedding distances. Documents with similar embeddings typically address related subjects.

**Duplicate Detection**: Identify identical or near-identical content by comparing embeddings. This method catches duplicates even when the text has been reworded or paraphrased.

***

### Generating Embeddings

#### Single Text Request

Send a POST request to `/api/v1/embeddings` with your text and chosen model:

{% tabs %}
{% tab title="Python" %}

```python
import requests

response = requests.post(
  "https://app.interactive.ai/api/v1/embeddings",
  headers={
    "Authorization": f"Bearer <LLMROUTER_API_KEY>",
    "Content-Type": "application/json",
  },
  json={
      "model": "openai/text-embedding-3-large",
      "input": "Customer requests refund for order placed within return window"
  }
)

data = response.json()
embedding = data["data"][0]["embedding"]
print(f"Embedding dimension: {len(embedding)}")

```

{% endtab %}

{% tab title="Shell" %}

```bash
curl https://app.interactive.ai/api/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LLMROUTER_API_KEY" \
  -d '{
    "model": "anthropic/claude-3-sonnet",
    "input": "Customer requests refund for order placed within return window"
  }'
```

{% endtab %}
{% endtabs %}

#### Multiple Texts in One Request

Process several texts simultaneously by passing an array of strings:

{% tabs %}
{% tab title="Python" %}

```python
import requests

response = requests.post(
  "https://app.interactive.ai/api/v1/embeddings",
  headers={
    "Authorization": f"Bearer <LLMROUTER_API_KEY>",
    "Content-Type": "application/json",
  },
  json={
      "model": "openai/text-embedding-3-large",
      "input": [
          "How do I reset my password?",
          "What are the pricing tiers for enterprise accounts?",
          "My invoice shows incorrect charges for last month"
      ]
  }
)

data = response.json()
for i, item in enumerate(data["data"]): # optional 
  print(f"Embedding {i}: {len(item['embedding'])} dimensions")

```

{% endtab %}

{% tab title="Shell" %}

```bash
curl https://app.interactive.ai/api/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LLMROUTER_API_KEY" \
  -d '{
    "model": "anthropic/claude-3-sonnet",
    "input": [
      "How do I reset my password?",
      "What are the pricing tiers for enterprise accounts?",
      "My invoice shows incorrect charges for last month"
    ]
  }'
```

{% endtab %}
{% endtabs %}

### Supported Models

The InteractiveAI Router connects to embedding models from various providers. View the complete catalog at:

`https://interactiveai.com/models?fmt=cards&output_modalities=embeddings`

Fetch the list of available models programmatically:

{% tabs %}
{% tab title="Python" %}

```python
import requests

response = requests.get(
  "https://app.interactive.ai/api/v1/embeddings/models",
  headers={
    "Authorization": f"Bearer <LLMROUTER_API_KEY>",
  }
)

models = response.json()
for model in models["data"]:
  print(f"{model['id']}: {model.get('context_length', 'N/A')} tokens")

```

{% endtab %}

{% tab title="Shell" %}

```bash
curl https://app.interactive.ai/api/v1/embeddings/models \
  -H "Authorization: Bearer <LLMROUTER_API_KEY>"
```

{% endtab %}
{% endtabs %}

***

### Implementation Example: Semantic Search

The following example demonstrates a complete semantic search implementation:

```python
import requests
import numpy as np

INTERACTIVEAI_API_KEY = "<LLMROUTER_API_KEY>"

# Sample documents
documents = [
    "To reset your password, navigate to Settings > Security > Change Password",
    "Enterprise plans include dedicated support and custom SLAs",
    "Billing cycles run from the 1st to the last day of each month",
    "API rate limits vary by subscription tier",
    "Two-factor authentication can be enabled in your security settings"
]

def cosine_similarity(a, b):
  """Calculate cosine similarity between two vectors"""
  dot_product = np.dot(a, b)
  magnitude_a = np.linalg.norm(a)
  magnitude_b = np.linalg.norm(b)
  return dot_product / (magnitude_a * magnitude_b)

def semantic_search(query, documents):
  """Perform semantic search using embeddings"""
  # Generate embeddings for query and all documents
  response = requests.post(
    "https://app.interactive.ai/api/v1/embeddings",
    headers={
      "Authorization": f"Bearer {LLMROUTER_API_KEY}",
      "Content-Type": "application/json",
    },
    json={
      "model": "openai/text-embedding-3-large",
      "input": [query] + documents
    }
  )
  
  data = response.json()
  query_embedding = np.array(data["data"][0]["embedding"])
  doc_embeddings = [np.array(item["embedding"]) for item in data["data"][1:]]
  
  # Calculate similarity scores
  results = []
  for i, doc in enumerate(documents):
    similarity = cosine_similarity(query_embedding, doc_embeddings[i])
    results.append({"document": doc, "similarity": similarity})
  
  # Sort by similarity (highest first)
  results.sort(key=lambda x: x["similarity"], reverse=True)
  
  return results

# Search for documents related to pets
results = semantic_search("How do I secure my account?", documents)
print("Search results:")
for i, result in enumerate(results):
  print(f"{i + 1}. {result['document']} (similarity: {result['similarity']:.4f})")

```

Expected Output:&#x20;

```
Search results:
1. Two-factor authentication can be enabled in your security settings (similarity: 0.8456)
2. To reset your password, navigate to Settings > Security > Change Password (similarity: 0.7892)
3. API rate limits vary by subscription tier (similarity: 0.3124)
4. Enterprise plans include dedicated support and custom SLAs (similarity: 0.2876)
5. Billing cycles run from the 1st to the last day of each month (similarity: 0.2341)
```

### Recommendations

**Choose Models Based on Requirements**: Embedding models present trade-offs between speed, cost, and quality. Compact models like `qwen/qwen3-embedding-0.6b` or `openai/text-embedding-3-small` respond quickly at lower cost. Larger models like `openai/text-embedding-3-large` produce higher-quality vectors. Test several options against your specific data.

**Combine Texts into Single Requests**: When you need embeddings for multiple strings, send them together in one API call rather than making separate requests. This approach minimizes latency and reduces costs.

**Store and Reuse Embeddings**: The same input always produces the same embedding vector. Cache these results in a database or vector store to eliminate redundant API calls.

**Use Cosine Similarity for Comparisons**: When measuring distance between embeddings, cosine similarity outperforms Euclidean distance in high-dimensional spaces because it focuses on directional alignment rather than absolute magnitude.

**Respect Token Limits**: Each model enforces a maximum input length. Documents exceeding this threshold require chunking or truncation. Consult model specifications before processing long texts.

**Split Documents at Natural Boundaries**: When dividing lengthy content, break at paragraph or section boundaries rather than arbitrary character positions. This approach maintains the semantic coherence of each chunk.

***

### Controlling Provider Selection

The `provider` parameter lets you specify which providers handle your embedding requests. Common reasons to use this:

* Restricting data to specific providers for compliance
* Prioritizing providers based on cost or performance
* Accessing provider-specific capabilities

**Example configuration:**

```json
{
  "model": "openai/text-embedding-3-small",
  "input": "Your text here",
  "provider": {
    "order": ["openai", "azure"],
    "allow_fallbacks": true,
    "data_collection": "deny"
  }
}
```

***

### Errors

| Code                        | Cause                                                                                                             |
| --------------------------- | ----------------------------------------------------------------------------------------------------------------- |
| **400 Bad Request**         | Malformed input or missing required fields. Verify your request structure.                                        |
| **401 Unauthorized**        | Missing or invalid API key. Confirm your key is correct and included in the Authorization header.                 |
| **402 Payment Required**    | Account balance depleted. Add credits to continue.                                                                |
| **404 Not Found**           | Model does not exist or does not support embeddings. Double-check the model identifier.                           |
| **429 Too Many Requests**   | Rate limit hit. Implement backoff and retry logic.                                                                |
| **529 Provider Overloaded** | Upstream provider capacity exceeded. Set `allow_fallbacks: true` to route to alternative providers automatically. |

***

### Constraints

* **No Streaming Support**: Embedding responses are delivered complete; incremental delivery is not available.
* **Input Length Restrictions**: Models impose maximum token limits. Inputs beyond this threshold are truncated or rejected.
* **Deterministic Results**: Given identical input, embeddings are always identical. Temperature and randomness parameters do not apply.
* **Language Performance Varies**: Model effectiveness differs across languages. Review documentation to confirm support for your target languages.
