Recipe — Multi-provider AI Wrapper
The AI Wrapper isn’t just an OpenAI proxy — it’s a multi-provider router. This recipe sets up three aliases that map to three different providers, so the same client code can target whichever backend is appropriate for the task.
End state: your client sends model: "fast" for chat, model: "smart" for hard reasoning, model: "cheap" for batch jobs — and PromptGate routes each to the right upstream behind the scenes.
Prerequisites
Section titled “Prerequisites”- AI Wrapper project created (see OpenAI via Gateway)
- Credentials registered for at least three providers (OpenAI, Anthropic, Groq)
Step 1 — Bind providers in the wrapper
Section titled “Step 1 — Bind providers in the wrapper”Project sidebar → Providers. Tick Enabled + assign a credential for each:
| Provider | Credential | Status |
|---|---|---|
| OpenAI | OpenAI Production | ✅ |
| Anthropic | Anthropic Production | ✅ |
| Groq | Groq Production | ✅ |
Other providers stay disabled.
Step 2 — Define aliases
Section titled “Step 2 — Define aliases”Project sidebar → Aliases. Add three:
| Alias | Provider | Model |
|---|---|---|
fast | groq | llama-3.1-8b-instant |
smart | anthropic | claude-sonnet-4-6-20251001 |
cheap | openai | gpt-4o-mini |
Step 3 — Verify discovery
Section titled “Step 3 — Verify discovery”curl $PG_URL/api/$PG_UUID/v1/models \ -H "Authorization: Bearer $PG_TOKEN" | jqExpected output (excerpt):
{ "object": "list", "data": [ { "id": "fast", "object": "model", "owned_by": "promptgate", "is_alias": true }, { "id": "smart", "object": "model", "owned_by": "promptgate", "is_alias": true }, { "id": "cheap", "object": "model", "owned_by": "promptgate", "is_alias": true }, { "id": "openai:*", "object": "model", "owned_by": "promptgate", "is_alias": false }, { "id": "anthropic:*", "object": "model", "owned_by": "promptgate", "is_alias": false }, { "id": "groq:*", "object": "model", "owned_by": "promptgate", "is_alias": false } ]}So clients can use the aliases OR provider:model directly.
Step 4 — Pick the right alias per call
Section titled “Step 4 — Pick the right alias per call”from openai import OpenAI
client = OpenAI(base_url=PG_BASE, api_key=PG_TOKEN)
# Quick UI chat — Groq is fast and cheapclient.chat.completions.create(model="fast", messages=[...])
# Complex reasoning — Anthropic Sonnetclient.chat.completions.create(model="smart", messages=[...])
# Batch summarisation — OpenAI mini, lots of throughputclient.chat.completions.create(model="cheap", messages=[...])The client code is identical — the model picker becomes the router.
Step 5 — Swap a provider, change nothing client-side
Section titled “Step 5 — Swap a provider, change nothing client-side”You realise Groq’s llama-3.1-8b-instant is occasionally flaky and you’d rather use OpenAI’s gpt-4o-mini for fast too:
- Edit the
fastalias. - Change Provider to
openai, Model togpt-4o-mini. - Save.
Client code didn’t change. Next request through model: "fast" lands at OpenAI.
Step 6 — Per-provider observability
Section titled “Step 6 — Per-provider observability”In Live Logs, filter by provider:groq to see only Groq-served requests. Or by model:llama-3.1-8b-instant to see Llama traffic specifically. The Metrics page shows per-provider breakdown out of the box.
This is invaluable when:
- One provider has an outage — you can spot it in seconds.
- You’re A/B-testing two backends — see latency / token cost per provider.
- You’re cost-attributing — separate spend by provider.
Step 7 — Hard cap per alias (optional)
Section titled “Step 7 — Hard cap per alias (optional)”Aliases themselves don’t carry rate limits or budgets — those are endpoint-level features. If you want per-alias enforcement:
- Create an AI Gateway project (separate from this AI Wrapper).
- Make one endpoint per “alias”, with the right provider/model/credential.
- Configure the rate limit / budget on each endpoint.
Trade-off: clients now use the AI Gateway’s /api/{uuid}/{slug} URL shape, not the OpenAI-compatible /v1/chat/completions. So you lose drop-in OpenAI SDK support but gain per-endpoint policy.
For most use cases the wrapper is enough — gate at the token level (issue separate tokens for separate apps) and rely on the global guardrails for content safety.
What you’ve built
Section titled “What you’ve built”- ✅ One URL for clients, three providers behind it.
- ✅ Friendly model names (
fast/smart/cheap) decoupled from upstream. - ✅ Live observability per provider.
- ✅ Trivially swappable backends — edit an alias, no code change.
Next: Proxy GitHub via OAuth.
© Akyros Labs LLC. All rights reserved.