Embeddings
The AI Wrapper now exposes /v1/embeddings, OpenAI-compatible, so RAG pipelines can route through PromptGate just like chat. Send the OpenAI-shaped request, get the OpenAI-shaped response, regardless of which provider answers.
Request
Section titled “Request”POST /api/{projectUuid}/v1/embeddingsAuthorization: Bearer pg_live_…Content-Type: application/json
{ "model": "openai:text-embedding-3-small", "input": "Embed this sentence."}input accepts a single string or an array of strings (up to the provider’s batch limit). Optional fields:
| Field | Where it applies |
|---|---|
dimensions | OpenAI / Mistral — request a smaller vector |
input_type | Cohere — search_document (default), search_query, classification, clustering |
Response
Section titled “Response”{ "object": "list", "data": [ { "object": "embedding", "index": 0, "embedding": [0.1, 0.2, …] } ], "model": "text-embedding-3-small", "usage": { "prompt_tokens": 3, "total_tokens": 3 }}Same envelope no matter who served it. Vectors come back as float[] (already parsed from JSON).
Provider matrix
Section titled “Provider matrix”| Provider | Implementation | Notes |
|---|---|---|
| OpenAI | text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002 | dimensions supported on -3-* models |
| Mistral | mistral-embed | Same OpenAI shape |
| Groq | (not currently — Groq doesn’t host embeddings) | Returns 400 |
| Together AI | OpenAI-shaped embeddings | Pass-through |
| Ollama | local embeddings models | Pass-through |
| Cohere | embed-english-v3.0, embed-multilingual-v3.0, … | Translated: texts + input_type upstream, normalized response |
| Anthropic | (Anthropic has no embeddings API) | Returns 400 |
| (planned) | Returns 400 |
If you point the wrapper at a provider without embeddings support, you get a 400 with "error.message": "Provider :p does not support embeddings." — fail loud, don’t pretend.
Routing via aliases
Section titled “Routing via aliases”The same wrapper-alias / preset machinery that routes chat works for embeddings. Define an alias model: "embed:fast" in the wrapper and route it to openai:text-embedding-3-small. Clients call the alias, you swap the underlying provider any time without touching their code.
Cost & observability
Section titled “Cost & observability”Today: embedding calls are not yet logged into gateway_logs (the table is shaped for chat). They go through the provider directly. We’ll add a dedicated embedding_logs table or extend gateway_logs with an op_type column in a follow-up — for now, embeddings live “outside” the cost dashboard. The provider’s own usage dashboard is the source of truth for the bill.
Limitations (v1)
Section titled “Limitations (v1)”- No caching yet (chat-cache hash is shaped around
messages, notinput). - No streaming (embeddings have no streaming concept).
- Single endpoint (the wrapper). AI Gateway endpoints don’t have an
embeddingsrequest shape — they’re chat-shaped by design.
© Akyros Labs LLC. All rights reserved.