Skip to content

Guardrails

A guardrail is a check that runs against every chat request before the provider is called. PromptGate ships four built-in guardrails and a 3-level inheritance model for configuring them.

GuardrailWhat it doesModes
PII FilterDetects emails, IBANs, credit cards, SSNs, phone numbers, IPs, custom regexes — plus optional LLM-based contextual detection of person names and addressesmask / block
Prompt InjectionScans for known jailbreak / instruction-override patternsblock
Keyword BlocklistProject-defined word/phrase listblock
Content LengthMin / max input length capblock (rejects with 422)

Each one has its own page with the detection rules, configuration, and edge cases.

Guardrail rules are defined at three scopes. They merge from broadest to narrowest:

Global (admin → Guardrails)
Project (project sidebar → Guardrails)
Endpoint (endpoint wizard → Guardrails tab) [coming]

For each guardrail key (e.g. pii_filter):

  • Global config is the default.
  • Project config overrides global. Setting pii_filter.enabled = false at project scope turns it off everywhere in that project, even if global has it on.
  • Endpoint config (when wired) overrides project.

The merging is shallow — at each level you replace the entire rule (not deep-merge fields). So if you want to inherit project’s PII config but tweak mode for one endpoint, copy the whole rule down and edit mode.

In the AI Gateway / AI Wrapper request pipeline:

1. Auth + scope check
2. Rate limit
3. Budget check
4. >>> Guardrails <<< ← here
5. Input schema validation
6. Prompt apply
7. Provider call
8. Output schema validation
9. Log

Guardrails run on the concatenated content of every message in the request (concatenated with \n). They do not see the system prompt that PromptGate prepends afterwards.

If a guardrail throws (block mode), the request is rejected with 422 before any provider work happens — no tokens, no cost.

Project sidebar → Guardrails. Shows a card per guardrail key with:

  • The current effective state (enabled / disabled, mode, types, words, etc.)
  • Source — Inherited (from global) / Project (locally configured)
  • Configure / Toggle actions

Top-right user menu → Guardrails. Same UI, but configures the gateway-wide defaults that every project inherits unless overridden.

Toggling a guardrail on/off persists immediately via AJAX (POST /projects/{project}/guardrails/policy). A toast confirms the save. No “Save” button.

The configure modal also persists on save — you tweak the JSON-ish form, click apply, the rule is written back. No reload.

Rules live in guardrail_configs:

id | scope ('global' | 'project' | 'endpoint')
| scope_id (project_id or endpoint_id; null for global)
| rules (json: { "pii_filter": {...}, "prompt_injection": {...}, ... })

One row per scope. The rules JSON is a map keyed by guardrail key, each value being whatever shape the guardrail expects.

ConfigurationWhat happens
enabled: falseGuardrail skipped entirely.
enabled: true, mode: "mask" (PII only)Runs, redacts matched substrings with [<TYPE> REDACTED], request continues.
enabled: true, mode: "block"Runs, throws 422 on first match — request rejected.
Custom config (PII custom_patterns, blocklist words, etc.)Merged with built-ins.

The 422 response body:

{
"ok": false,
"error": "Request blocked: E-Mail detected in input."
}

Code is the HTTP status; the message names which guardrail / which rule fired.

Implement the GuardrailContract interface:

namespace App\Services\Guardrails;
interface GuardrailContract
{
public function key(): string;
public function label(): string;
public function description(): string;
public function process(string $text, array $config): string;
}

process() receives the concatenated message text and the rule config. Either:

  • Return the (possibly modified) text — request continues.
  • Throw RuntimeException(..., 422) — request blocked.

Register in GuardrailService::__construct:

$this->register(new MyCustomGuardrail());

A first-class plugin path is on the roadmap (see Plugins).


Next: PII Filter — the most-used guardrail.


© Akyros Labs LLC. All rights reserved.