Reversible Redaction
Reversible Redaction is the headline guardrail for the Agent Proxy use case. It solves the gap traditional PII masking leaves: if you mask john@acme.com to [REDACTED] in the prompt, the LLM responds about [REDACTED] and your user sees a useless answer. With reversible redaction, the LLM sees an opaque token; your user sees their original data back; nobody loses utility, and the LLM provider never had the real value.
How it works
Section titled “How it works”- Inbound (before the LLM call) — every message body is scanned with the configured detectors (email / phone / IBAN / credit card / SSN / IPv4 + admin custom patterns). Each match is replaced with a stable opaque token like
[[EMAIL_001]]. The mappingtoken → realis kept in memory for the request only. - Outbound (after the response) — the assistant content AND any tool-call arguments get scanned for the same tokens. Every token is substituted back with the original value before the response leaves PromptGate.
Result: the LLM saw [[EMAIL_001]]; the client saw john@acme.com. Both ends are consistent.
Properties
Section titled “Properties”- Stable per request, fresh across requests. Same value reused in one prompt → same token (so the LLM treats them as one entity). Different requests get fresh counters; correlatable tokens across requests would themselves be a privacy leak.
- Never persisted. The mapping lives only inside the request lifecycle. There’s no database row, no log entry containing
john@acme.com → EMAIL_001. Audit logs see only the redacted form. - Token format
[[KIND_NNN]]is intentional. Brackets are unusual enough that LLMs preserve them verbatim when echoing input, and unique enough not to collide with normal text. - Tool-call arguments are restored too. If the LLM calls
send_email({ to: "[[EMAIL_001]]" }), the client seesto: "john@acme.com"— your downstream tool never has to reverse anything.
Configure
Section titled “Configure”Under Guardrails → Reversible Redaction:
| Field | Effect |
|---|---|
| Enabled | Toggle the policy. Inheritance: Global → Project. |
| Detector kinds | Pick from email / phone / iban / credit_card / ssn / ipv4. Default: all. |
| Custom patterns | Admin-defined regex pairs (label, pattern). The label becomes the token prefix. |
Project-level config overrides global. Disable at the project level to opt out of an inherited rule.
Custom patterns
Section titled “Custom patterns”Common cases admins add per project:
Order ID : /\bORD-\d{6}\b/ → [[ORDER_ID_001]]Customer : /CUST-[A-Z0-9]{8}/ → [[CUSTOMER_001]]Internal : /\b[A-Z]{3}\d{4}\b/ → [[INTERNAL_001]]Patterns must compile under PHP preg_match (PCRE). The Guardrails configure modal validates compilation server-side and rejects bad regex with a friendly error.
Limitations
Section titled “Limitations”- Wrapper / Agent-Proxy only. AI Gateway endpoints (
POST /api/{uuid}/{slug}) don’t run reversible redaction yet — they have a fixed prompt + fixed provider, so PII masking via the existing PII Filter is usually the right tool for them. - String-only. Image / file inputs aren’t scanned (the LLM sees them as base64 anyway; redaction wouldn’t survive).
- Streaming responses. Tokens in streaming chunks are restored; in v1 the substitution happens after the full response is buffered, so streaming-with-redaction still works but loses the very-first-byte latency advantage.
© Akyros Labs LLC. All rights reserved.