Rate Limits
PromptGate’s rate limits are per-endpoint, with per-minute and per-hour windows that are independent of each other. Configurable on AI Gateway endpoints and API Gateway endpoints. When exceeded, the gateway returns 429 with a Retry-After header so well-behaved clients can back off precisely.
Where they live
Section titled “Where they live”In the endpoint config:
| Field | AI Gateway | API Gateway |
|---|---|---|
rate_limit_per_minute | ✅ | ✅ |
rate_limit_per_hour | ✅ | ✅ |
Both are integers. null means no limit for that window. You can set one, both, or neither.
How it works
Section titled “How it works”Each endpoint has two independent counters, scoped by namespace + endpoint UUID + the bucket window:
rl:ep:{endpoint-uuid}:m:{minute-bucket}rl:ep:{endpoint-uuid}:h:{hour-bucket}(ep for AI Gateway endpoints, ag for API Gateway endpoints — independent namespaces, no collision.)
The bucket key is floor(now / 60) for minute or floor(now / 3600) for hour. So all requests in the same wall-clock minute share a bucket, regardless of when the first one arrived.
Increment happens atomically:
Cache::add(key, 0, ttl)— initialises the bucket on first hit (no-op if exists).Cache::increment(key)— atomic +1, returns new value.- If the new value > the cap, return 429 +
Retry-After: <seconds left in bucket>.
This works with any cache driver — file, array, redis, database — because both add and increment are atomic primitives Laravel’s cache layer guarantees.
What it doesn’t do (yet)
Section titled “What it doesn’t do (yet)”- ❌ Per-token rate limits — e.g. “user X can make 10 RPS”. Roadmap.
- ❌ Per-IP rate limits. Roadmap.
- ❌ Sliding windows — bucket boundaries are wall-clock-aligned, not request-aligned. A burst at 13:59:55 + 14:00:05 in a 1/min limit succeeds: each falls in a different bucket. Sliding-window is more accurate but more expensive.
- ❌ Token-bucket style burst allowances. The current shape is fixed-window.
For most gateway use cases, fixed-window is fine. If you need precise SLA-grade rate limiting, put a dedicated rate limiter (Envoy, Kong, etc.) in front of PromptGate.
Behaviour
Section titled “Behaviour”When the limit is hit:
HTTP/1.1 429 Too Many RequestsRetry-After: 38Content-Type: application/json
{ "ok": false, "error": "Rate limit exceeded.", "scope": "minute", "retry_after": 38}The scope field tells you which window tripped (minute or hour). retry_after is the seconds remaining in the breached bucket (so well-behaved clients can wait exactly the right amount before retrying).
Configuration
Section titled “Configuration”AI Gateway
Section titled “AI Gateway”Endpoint wizard → Tab 3 — Limits:
Rate limit per minute: 60 (or empty = unlimited)Rate limit per hour: 1000 (or empty = unlimited)API Gateway
Section titled “API Gateway”Endpoint form → Rate Limits section. Same fields, same semantics.
Examples
Section titled “Examples”Public-facing demo
Section titled “Public-facing demo”rate_limit_per_minute: 6rate_limit_per_hour: 60A user gets ~one request every 10 seconds and hits the wall after an hour.
Internal service caller
Section titled “Internal service caller”rate_limit_per_minute: 600rate_limit_per_hour: 30000Protects against runaway bugs without throttling normal traffic.
Disabled (default)
Section titled “Disabled (default)”rate_limit_per_minute: nullrate_limit_per_hour: nullNo limits. Useful for trusted internal callers where you’d rather catch problems via budgets and audit logs.
Stacking with budgets
Section titled “Stacking with budgets”Rate limits and budgets are independent:
- Rate limits = “how often can you call?”
- Budgets = “how much can you spend?”
A token-cheap endpoint can have a high rate limit and a low monthly budget. A token-expensive endpoint can have a tight rate limit and a generous budget. Set them according to what each one actually protects.
See Budgets.
Cache driver
Section titled “Cache driver”Production should use Redis (CACHE_STORE=redis). The default database driver works but adds a DB hit per request — fine for low traffic, not great at scale. array doesn’t survive across requests so it disables rate limits in practice (only set in tests).
Inspecting current usage
Section titled “Inspecting current usage”There’s no first-class “current bucket count” UI. To check via Tinker:
docker compose exec app php artisan tinker\Illuminate\Support\Facades\Cache::get('rl:ep:abc-123:m:35198421');Where the UUID is your endpoint’s UUID and the integer is the current minute bucket.
Behaviour summary
Section titled “Behaviour summary”| Configuration | Result |
|---|---|
| Both null | No limit. |
| Per-minute only | Per-minute enforced; per-hour unlimited. |
| Per-hour only | Per-hour enforced; per-minute unlimited. |
| Both set | Both enforced independently. The first to trip fires 429. |
Next: Budgets.
© Akyros Labs LLC. All rights reserved.