Skip to content

Providers Overview

PromptGate ships with 8 built-in provider adapters. Adding a credential and picking a model is enough to start routing — no extra config needed.

ProviderBase URLStreamingAdapter strategy
OpenAIhttps://api.openai.com/v1OpenAI-compatible (uses max_completion_tokens)
Anthropichttps://api.anthropic.com/v1Native (lifts system messages, x-api-key auth)
Google Geminiprovider-specificNative
Mistralhttps://api.mistral.ai/v1OpenAI-compatible
Groqhttps://api.groq.com/openai/v1OpenAI-compatible
Together AIhttps://api.together.xyz/v1OpenAI-compatible (Llama, Mixtral, DeepSeek, Qwen)
Ollama$OLLAMA_BASE_URLOpenAI-compatible, local
Coherehttps://api.cohere.com/v2Native (uppercase finish_reason mapping, p not top_p)

Five of them (OpenAI, Mistral, Groq, Together, Ollama) share an OpenAiCompatibleProvider abstract base, so adding a new OpenAI-shaped provider is ~30 lines. The other three (Anthropic, Google, Cohere) implement ProviderContract directly because their APIs aren’t OpenAI-shaped.

See Adding a Provider if you want to add one.

Some practical recommendations:

JobPick
General-purpose, best output qualityAnthropic Claude Sonnet 4 / 4.5 / 4.6, OpenAI gpt-4o
Cheap + fastGroq llama-3.1-8b-instant, OpenAI gpt-4o-mini
Long contextAnthropic Claude (200k+), Google Gemini
Vendor-lock-free open weightsTogether AI Mixtral / Llama, Ollama local
Privacy / on-premOllama
MultilingualCohere Command R+, Mistral large

Uses max_completion_tokens rather than the classic max_tokens — required by newer chat models. The adapter handles the renaming internally; you set max_output_tokens on the endpoint and it lands as max_completion_tokens in the upstream request.

API key prefix: sk-….

Auth header is x-api-key, not Authorization: Bearer …. Requires anthropic-version: 2023-06-01.

System messages are lifted from messages[] into a top-level system field (Anthropic’s API doesn’t accept system inside messages). The adapter does this automatically.

API key prefix: sk-ant-….

Uses Google’s chat-style API. Currently configured for the public Generative Language API. API key in query string.

Plain OpenAI-compatible. Uses classic max_tokens. Drop your Mistral La Plateforme key in.

Plain OpenAI-compatible. Famously fast (custom hardware). Best with Llama / Mixtral / Gemma models.

API key prefix: gsk_….

Plain OpenAI-compatible. Wide model catalogue: Llama 3.x, Mixtral, DeepSeek, Qwen, Code Llama, etc. Set the provider_model to the full Together identifier (mistralai/Mixtral-8x7B-Instruct-v0.1).

API key prefix: tk_….

Local. Configure the base URL via OLLAMA_BASE_URL (default http://localhost:11434/v1). Auth is sent but ignored by Ollama — any non-empty placeholder is fine.

Use this when you want local models with zero data egress.

The most divergent of the eight. Cohere v2 chat:

  • Response is message.content[] (array of text blocks) instead of choices[]
  • finish_reason values are uppercase: COMPLETE, MAX_TOKENS, STOP_SEQUENCE, TOOL_CALL, ERROR
  • Top-p parameter is named p, not top_p
  • Temperature is capped at 1.0 (OpenAI allows 2.0)

The adapter handles all of these — your endpoint config uses the same fields as any other provider, but the upstream call uses Cohere’s wire format.

API key prefix: usually co_….

The admin Providers page lets you enable/disable any of the eight gateway-wide. A disabled provider rejects every request that targets it with a 503 Provider disabled in this gateway. Use this to e.g. disable OpenAI temporarily during an incident without touching credentials.

See Provider Settings.

Endpoint configurations support a failover chain: a list of (credential, model) pairs that are tried in order if the primary fails. This is provider-aware — you can fail over from OpenAI to Anthropic, or to Groq’s faster Llama variant.

See AI Endpoints → Tab 2.


Next: Credentials.


© Akyros Labs LLC. All rights reserved.