Skip to content

Embeddings

The AI Wrapper now exposes /v1/embeddings, OpenAI-compatible, so RAG pipelines can route through PromptGate just like chat. Send the OpenAI-shaped request, get the OpenAI-shaped response, regardless of which provider answers.

POST /api/{projectUuid}/v1/embeddings
Authorization: Bearer pg_live_…
Content-Type: application/json
{
"model": "openai:text-embedding-3-small",
"input": "Embed this sentence."
}

input accepts a single string or an array of strings (up to the provider’s batch limit). Optional fields:

FieldWhere it applies
dimensionsOpenAI / Mistral — request a smaller vector
input_typeCohere — search_document (default), search_query, classification, clustering
{
"object": "list",
"data": [
{ "object": "embedding", "index": 0, "embedding": [0.1, 0.2, ] }
],
"model": "text-embedding-3-small",
"usage": { "prompt_tokens": 3, "total_tokens": 3 }
}

Same envelope no matter who served it. Vectors come back as float[] (already parsed from JSON).

ProviderImplementationNotes
OpenAItext-embedding-3-small, text-embedding-3-large, text-embedding-ada-002dimensions supported on -3-* models
Mistralmistral-embedSame OpenAI shape
Groq(not currently — Groq doesn’t host embeddings)Returns 400
Together AIOpenAI-shaped embeddingsPass-through
Ollamalocal embeddings modelsPass-through
Cohereembed-english-v3.0, embed-multilingual-v3.0, …Translated: texts + input_type upstream, normalized response
Anthropic(Anthropic has no embeddings API)Returns 400
Google(planned)Returns 400

If you point the wrapper at a provider without embeddings support, you get a 400 with "error.message": "Provider :p does not support embeddings." — fail loud, don’t pretend.

The same wrapper-alias / preset machinery that routes chat works for embeddings. Define an alias model: "embed:fast" in the wrapper and route it to openai:text-embedding-3-small. Clients call the alias, you swap the underlying provider any time without touching their code.

Today: embedding calls are not yet logged into gateway_logs (the table is shaped for chat). They go through the provider directly. We’ll add a dedicated embedding_logs table or extend gateway_logs with an op_type column in a follow-up — for now, embeddings live “outside” the cost dashboard. The provider’s own usage dashboard is the source of truth for the bill.

  • No caching yet (chat-cache hash is shaped around messages, not input).
  • No streaming (embeddings have no streaming concept).
  • Single endpoint (the wrapper). AI Gateway endpoints don’t have an embeddings request shape — they’re chat-shaped by design.

© Akyros Labs LLC. All rights reserved.