Policies
How DefenseClaw decides — repo policies (OPA/Rego), guardrail rule packs (regex + LLM judge), scanner policies, and the suppression layer that keeps your alert volume sane.
DefenseClaw records a normalized runtime action—allow, alert, confirm, or block—for guardrail events. Connector adapters then map that action onto the host's actual response vocabulary, such as allow/ask/deny. Admission, firewall, sandbox, and asset-policy evaluations have their own domain-specific outputs.
Guided example · Synthetic runtime event
Trace a runtime verdict from event to action
Follow normalization, deterministic matching, suppressions, severity, and the active action mapping.
rules: - id: shell.data-egress match: sensitive_source_and_external_destination severity: highjudge: enabled: falsesuppressions: trusted_destinations: []actions: high: blockrules: - id: shell.data-egress match: sensitive_source_and_external_destination severity: highjudge: enabled: falsesuppressions: trusted_destinations: []actions: high: block
HIGH runtime finding maps to block
Emit decision record
What DefenseClaw did — and did not do
What it did
- Show the ordered stages that assemble a runtime verdict
- Distinguish the optional judge from deterministic rules
What it did not do
- Evaluate Rego or scanner output in the browser
- Reuse runtime actions as skill or MCP admission actions
What you just saw
The guided trace normalizes a connector event, matches a deterministic rule, checks suppressions, skips the optional judge when it is disabled, assigns severity, and resolves the active runtime action. Runtime guardrail actions are separate from the skill_actions and mcp_actions admission mappings used by scanners.
Policy creator
Build a complete policy from a preset, see live OPA-WASM verdicts in your browser, and export every YAML / data.json / Rego file ready to paste into ~/.defenseclaw/policies/.
Recipes
133 pre-cooked rules, suppressions, and judge categories. Search, filter, and copy the YAML directly.
Suppression cookbook
Tune the four suppression layers (pre-judge / finding / tool / correlator) without losing real signals.
Regex cookbook
RE2-compatible patterns for secrets, prompt injection, and exfiltration markers, with counterexamples and traps to avoid.
How the layers fit together
Plain local rules cover many secrets and explicit prompt-injection patterns. The optional judge handles ambiguous and semantic cases. Suppressions remove known noise before the verdict is aggregated. The session correlator independently watches persisted findings for cross-event patterns such as the lethal trifecta. OPA/Rego also powers admission, firewall, sandbox, audit, and direct guardrail-policy evaluation.
Layer 1 — Repo policies (OPA / Rego)
The top-level policy file (policies/default.yaml and friends) is the source used to generate the active OPA data document. It declares admission behavior, severity-to-action mappings, scanner overrides, guardrail thresholds, firewall policy, audit settings, and related operator defaults:
name: default
admission:
scan_on_install: true
allow_list_bypass_scan: true
skill_actions:
critical:
file: quarantine
runtime: disable
install: block
high:
file: quarantine
runtime: disable
install: block
guardrail:
block_threshold: 4
alert_threshold: 2
hilt:
enabled: false
min_severity: HIGH
cisco_trust_level: full
firewall:
default_action: deny
allowed_ports: [443, 80]defenseclaw policy activate <name> normalizes this YAML into the active data.json consumed by the bundled Rego modules. Each package exposes domain-specific results rather than one universal decision object:
| Domain | Rego package | Primary outputs |
|---|---|---|
admission | defenseclaw.admission | verdict, reason |
guardrail | defenseclaw.guardrail | action, severity, reason, scanner_sources |
firewall | defenseclaw.firewall | action, rule_name |
sandbox | defenseclaw.sandbox | allowed/denied endpoints, permissions, skills |
audit | defenseclaw.audit | retain, retain_reason, export_to |
skill_actions | defenseclaw.skill_actions | runtime/file/install actions and boolean helpers |
You can hand-edit the Rego — the engine is just OPA — but most operators stay in the YAML layer because the bundled Rego covers the common cases.
How the engine evaluates
input
query package output rules
domain-specific result
normalized caller result
emit audit / telemetry event
Guardrail scanner and hook events separately preserve matching rule IDs, evaluation IDs, reasons, and severity in their structured audit payloads. Those fields power the per-rule breakdowns in Splunk and Grafana; they are not a universal reasons[] field returned by every Rego package.
Layer 2 — Guardrail rule packs
The guardrail rule pack is the content the gateway evaluates against — the actual regex patterns, judge prompts, and category taxonomies. Three packs ship out of the box:
permissive
Permissive runtime posture: CRITICAL blocks, HIGH alerts, and lower severities allow. Pair with observe mode for a non-enforcing pilot.
default
Balanced runtime posture: CRITICAL blocks, HIGH/MEDIUM alert, and LOW allows; HITL can turn eligible HIGH findings into confirm.
strict
Strict runtime posture: MEDIUM and above block, LOW alerts. Choose this when false negatives cost more than interruptions.
Each pack is a directory:
policies/guardrail/<pack>/
rules/
local-patterns.yaml # triage phrases, PII regexes, secrets, exfil
commands.yaml # shell and execution rules
sensitive-paths.yaml # credential/config paths
c2.yaml # command-and-control indicators
cognitive.yaml # cognitive tampering
enterprise-data.yaml # enterprise data exposure
secrets.yaml # credential signatures
trust-exploit.yaml # prompt/trust exploitation
custom-org.yaml # your org's regex extensions (optional)
judge/
injection.yaml
pii.yaml
tool-injection.yaml
sensitive-tools.yaml
suppressions.yaml # the suppression layer (see below)Pick a pack at setup time:
defenseclaw setup guardrail --rule-pack strict--rule-pack only accepts the three bundled profiles (default, strict, permissive). To run a custom pack from your own directory, point guardrail.rule_pack_dir at it in ~/.defenseclaw/config.yaml — see Authoring custom rule packs below for the full workflow.
Rule shape
Rules are YAML for portability and review:
version: 1
# Literal substring matches (case-folded against the normalized
# triage view, so "/ etc / passwd" still matches "/etc/passwd").
injection:
- "ignore previous"
- "ignore all instructions"
- "jailbreak"
# Regex matches. Compiled at load time; bad regexes are logged and
# dropped (the rest of the file still applies).
injection_regexes:
- 'ignore\s+(?:all\s+)?(?:previous|prior|above|your)\s+(?:instructions|rules|directives|guidelines)'
# Per-family pattern sets the gateway scans on every request.
# Omitting any key keeps the compiled-in default for that family.
# Setting `field: []` explicitly clears it (rare, used in testbed
# profiles that want a family disabled).
secrets:
- "sk-"
- "ghp_"
- "-----begin rsa"
pii_requests:
- "social security number"
- "credit card number"
pii_data_regexes:
- '\b\d{3}-\d{2}-\d{4}\b'
exfiltration:
- "/etc/passwd"
- "exfiltrate"Three-state semantics for each top-level key:
- Omit the key entirely — keep the compiled-in default for that family. Useful when you only want to tweak one thing without copying the whole baseline.
field: [...]— operator override; replaces the default wholesale.field: []— operator explicitly cleared the family. Mostly used in testbed profiles; rarely useful in production.
The bundled default, strict, and permissive local-patterns.yaml files each carry the full baseline 1:1 with what the gateway ships compiled-in. The Go test TestLocalPatternsDefaultsParity fails CI if the bundled YAML drifts from the Go source, so editing one without the other can't silently downgrade posture.
Layer 3 — LLM judge
The LLM judge is an optional second opinion. Three judge configuration files ship out of the box—injection, PII, and tool-injection. Data-exfiltration is a category inside the tool-injection judge rather than a fourth exfil.yaml file. They're particularly useful for:
- Subtle prompt injection that doesn't match a regex.
- Tool calls whose stated purpose disagrees with their effects (most common in MCP).
- Categorical content checks — "does this prompt try to extract proprietary code?"
Shared severity rubric
All three judges evaluate against the same severity rubric. The rubric is embedded directly in each judge's system prompt so verdicts stay consistent across categories.
| Tier | Meaning |
|---|---|
| CRITICAL | Direct unambiguous harm, provable from the content alone. Credential exfil, destructive shell, jailbreak succeeded, SSN/passport disclosed. |
| HIGH | Clear adversarial intent or high-impact sensitive data. Prompt injection with explicit override, /etc/passwd probe, phone number in completion. |
| MEDIUM | Suspicious but ambiguous. Benign readings are plausible. |
| LOW | Weak indicator; context-dependent (e.g. user self-disclosing their own email). |
| NONE | No concern. |
Signal-strength scoring
For each category a judge flags, it also emits a signal_strength label derived from two booleans: unambiguous (no plausible benign reading) and high_impact (hard-to-reverse damage possible).
unambiguous | high_impact | signal_strength |
|---|---|---|
| ✓ | ✓ | strong_signal |
| ✓ | ✗ | signal |
| ✗ | ✓ | needs_review |
| ✗ | ✗ | weak_signal |
The Go-side verdict aggregation uses these labels to keep weakly-corroborated flags from stacking to CRITICAL. A weak_signal on a single category downgrades to MEDIUM/alert; strong_signal on a structural category (destructive command, exfil channel) escalates to CRITICAL. The calibration metrics for this path are reproducible via go test -run TestEval — see the evaluation corpus README for the scorecard.
Judge configuration is split by detector under judge/. For example, the
injection judge declares its categories and severity mapping in
judge/injection.yaml:
version: 1
name: injection
enabled: true
categories:
"Instruction Manipulation":
finding_id: JUDGE-INJ-INSTRUCT
severity: HIGH
"Obfuscation":
finding_id: JUDGE-INJ-OBFUSC
severity: CRITICAL
min_categories_for_high: 1
single_category_max_severity: HIGH
min_categories_for_critical: 2The judge call goes through the same Bifrost pipeline as everything else and uses the unified LLM key. Token usage is emitted as a judge.call event so you can monitor cost from your dashboards.
You can target a different model than the rest of the stack by setting guardrail.judge.llm in config.yaml:
guardrail:
judge:
enabled: true
llm:
provider: anthropic
model: claude-3-5-haiku-20241022
api_key_env: DEFENSECLAW_LLM_KEYLayer 4 — Suppressions
Suppressions are the difference between "we deployed DefenseClaw" and "we use DefenseClaw daily." Three suppression flavours ship with every rule pack:
pre_judge_strips — redact before the judge sees it
pre_judge_strips:
- id: STRIP-SYSTEM-SENDER
pattern: '\b(cli|system|bot|admin)\b'
context: "System sender metadata injected by agent framework"
applies_to: [pii]The redaction happens before the LLM judge is invoked, so the secret never crosses a third-party API even when the judge is enabled. Useful for fields that can never legitimately contain prompt content (auth headers, API tokens).
finding_suppressions — silence known-good signatures
finding_suppressions:
- id: SUPP-EMAIL-CHATID
finding_pattern: JUDGE-PII-EMAIL
entity_pattern: '^19:[a-f0-9\-]+@unq\.gbl\.spaces$'
reason: "Teams chatId format, not email address"When both patterns match, that entity is removed from the judge result before
the final verdict is aggregated. Use narrow patterns and an auditable reason;
the current schema has no expiry field, so time-bounded exceptions must be
removed by your configuration-management workflow.
tool_suppressions — scope HITL prompts
tool_suppressions:
- tool_pattern: '^(graph_auth_status|session_status|get_status)$'
suppress_findings: [JUDGE-PII-USER]
reason: "Status check tools return expected system metadata"These remove only the listed findings for matching tool names. Other findings from the same call still participate in the verdict.
Why three layers?
| Layer | Where it runs | Use it for |
|---|---|---|
pre_judge_strips | Before the LLM judge | Keeping secrets out of third-party APIs |
finding_suppressions | After the PII judge fires | Narrow entity exceptions with a required reason; the schema has no expiry field |
tool_suppressions | After a tool-scoped PII judge result | Dropping only named finding IDs for matching tools |
Together they make it realistic to leave the gateway in action mode without operator fatigue.
Layer 5 — Session correlator
The first four layers evaluate each event in isolation. The correlator is the cross-event layer: it watches the recent-finding stream for the same session and raises a synthetic CORR-* meta-finding when a pattern of individually-benign-looking events combines into something worse.
This closes a class of attacks where no single event crosses the action threshold on its own, but the sequence across turns is an exfil path — for example, Simon Willison's lethal trifecta (untrusted content + sensitive data access + external egress in the same session).
How it runs
Every finding DefenseClaw records is tagged with a data axis (one or more of ingress_untrusted, sensitive_access, egress_external) and — on the tool-call surface — a tool capability class (read_fs, write_fs, exec_shell, network_fetch, send_message, none). Regex rules and judge categories resolve through the canonical mapping in internal/guardrail/axes.go, so patterns can reason across detectors without hard-coding rule IDs.
Bundled patterns
Four patterns ship in internal/guardrail/defaults/correlation-patterns.yaml:
| Pattern ID | Window | Match severity | Trigger |
|---|---|---|---|
LETHAL-TRIFECTA | 30 events | CRITICAL | Ordered axes: untrusted ingress, then sensitive access, then external egress. |
TRIFECTA-WITH-FINGERPRINT-MATCH | 30 events | CRITICAL | Same content fingerprint seen in a sensitive_access finding and a later egress_external finding — direct exfil, not just temporal coincidence. |
ESCALATION-CHAIN | 10 events | CRITICAL | MEDIUM → HIGH → HIGH severity progression inside the same session — attacker iterating on a prompt. |
DESTRUCTIVE-FLOW | 50 events | CRITICAL | An exec_shell finding at HIGH+ in the same session window as a sensitive_access finding. |
Each pattern sets its own window size — ESCALATION-CHAIN wants the progression tight; DESTRUCTIVE-FLOW tolerates 50 events between a credential read and an rm -rf.
What the operator sees
When a pattern matches, the correlator writes a synthetic row into scan_findings:
scanner: "correlator"
rule_id: "CORR-LETHAL-TRIFECTA"
severity: CRITICAL
tags: ["correlation", "LETHAL-TRIFECTA"]
description: |
contributing findings:
- INJ-IGNORE-ALL at 2026-05-12T14:03:11Z
- SENSITIVE-PATH-SSH-KEY at 2026-05-12T14:05:42Z
- SRC-FETCH at 2026-05-12T14:06:18ZThe same row is emitted to configured observability sinks — Splunk, webhooks, and the TUI audit panel. The four bundled patterns currently emit CRITICAL findings, so they route to operator attention without per-sink threshold changes.
The correlator is post-event detection, not an in-flight block of the request that completed the pattern. It writes a synthetic scan summary whose verdict is block and a CRITICAL CORR-* finding for sinks and operator workflows, but it does not retroactively stop the contributing action or automatically quarantine later requests.
Why alert-only, not block
Promoting a later request in the session to deny based on an earlier pattern match has two honest costs:
- It inverts the existing "each request is evaluated on its own" contract.
- It can cause spurious blocks on legitimate parallel/retry traffic in the same session.
Ways to act on correlations without changing the gateway runtime:
- Tail
scan_findingsforrule_id LIKE 'CORR-%'and invoke a custom script. - Wire the existing Splunk / webhook sink to any CRITICAL severity —
CORR-*inherits automatically. - Feed the sink event into an external response workflow that disables a connector, revokes a credential, or opens an incident under your organization's policy.
Authoring custom rule packs
The path of least resistance is to extend rather than replace. Custom rule packs live on disk and are pointed at via the guardrail.rule_pack_dir config key — there is no built-in linter or fixture-runner today, so the workflow is git-driven.
Locate the active rule-pack directory. The bundled packs ship inside the installed Python package; the operator-editable copies live under ~/.defenseclaw/policies/guardrail/{default,strict,permissive}/. Find the active one with:
awk '/rule_pack_dir/ {print $2}' ~/.defenseclaw/config.yamlCopy the closest pack.
cp -r ~/.defenseclaw/policies/guardrail/default \
~/.defenseclaw/policies/guardrail/my-orgAdd a rules file alongside local-patterns.yaml. Keep your custom rules in their own file so updates to the bundled patterns don't conflict.
Validate the surrounding config. There is no rule-pack-specific CLI linter today. defenseclaw config validate checks config.yaml itself (including the rule_pack_dir value), but it does not parse every YAML file inside that directory:
defenseclaw config validatePoint the gateway at the new directory. Edit ~/.defenseclaw/config.yaml and set:
guardrail:
rule_pack_dir: ~/.defenseclaw/policies/guardrail/my-orgThen restart the gateway so it rebuilds the local rule set and judge from the new directory:
defenseclaw-gateway restartdefenseclaw-gateway policy reload is narrower: it recompiles the OPA modules under policy_dir; it is not the rule-pack reload command.
defenseclaw setup guardrail --rule-pack <name> is for switching between the three bundled packs (default, strict, permissive); custom directories go through rule_pack_dir.
Operator commands
The policy group manages named OPA / asset policies (stored under ~/.defenseclaw/policies/) — different surface from the guardrail rule packs above. Subcommands:
defenseclaw policy list # every named policy on disk
defenseclaw policy show my-policy # normalized summary of one policy
defenseclaw policy create my-policy -d "Production policy"
defenseclaw policy activate my-policy # set as active
defenseclaw policy delete my-policy # remove
defenseclaw policy validate # validates data.json schema + compiles bundled Rego
defenseclaw policy test --verbose # runs `opa test` against the bundled Rego (requires `opa` on PATH)
defenseclaw policy edit actions --severity high --runtime disable --policy-name my-policypolicy show and policy activate always take a policy name, not a filesystem
path. policy show renders a normalized summary rather than the source YAML.
policy validate checks that data.json parses, that every severity tier in
actions and scanner_overrides has the required fields, and that the bundled
Rego modules compile (it is not an opa fmt --diff). Both validate and
test only accept --rego-dir to override the bundled Rego location; neither
takes a path argument or --fixtures. There is no policy diff subcommand;
compare custom YAML files with diff or another YAML-aware tool.
See also
- Defaults — what each shipped rule pack actually sets, and how to pick one based on risk tolerance
- Setup Guardrail — the CLI that wires the chosen pack into your connectors
- Unified LLM key — how the LLM judge resolves its provider key
- Reference → Configuration — every config key the policy layer reads
Human-in-the-Loop (HITL)
How DefenseClaw turns confirmable findings into native approval prompts when a connector can pause, and what happens on connectors without an ask surface.
Policy creator
Build a DefenseClaw policy section by section. Live OPA-WASM evaluation in the browser, copy-pasteable YAML on the way out.