Policies
How DefenseClaw decides — repo policies (OPA/Rego), guardrail rule packs (regex + LLM judge), scanner policies, and the suppression layer that keeps your alert volume sane.
DefenseClaw makes one decision per event: allow, ask, or deny. The plumbing that turns a prompt or a tool call into that decision lives in three layers:
You can enable any subset. Plain regex is enough for many secrets and obvious prompt-injection patterns; the LLM judge catches the long tail; OPA Rego owns the operator's organisational rules; suppressions keep all of it usable.
Layer 1 — Repo policies (OPA / Rego)
The top-level policy file (policies/default.yaml and friends) is the operator's source of truth. It declares what the gateway should do for each domain — admission, guardrail thresholds, sandbox, audit, skill actions:
name: default
admission:
scanners:
skill-scanner:
enabled: true
block_severity_min: high
quarantine_severity_min: medium
mcp-scanner:
enabled: true
block_severity_min: high
guardrail:
judge:
enabled: true
min_severity: medium
rule_pack: default
skill_actions:
default: ask
first_party_allow_list:
- cisco-ai-defense/*These YAML blocks are compiled into OPA Rego inputs. Custom Rego modules under policies/rego/ then evaluate per-domain decisions:
| Domain | Rego entry point | Inputs |
|---|---|---|
admission | policies/rego/admission.rego | scan result, skill / MCP metadata, allow lists |
guardrail | policies/rego/guardrail.rego | rule matches, judge result, severity floor |
firewall | policies/rego/firewall.rego | egress destination, agent identity |
sandbox | policies/rego/sandbox.rego | tool call, sandbox profile |
audit | policies/rego/audit.rego | event kind, sink targets |
skill_actions | policies/rego/skill_actions.rego | skill source, scan severity |
You can hand-edit the Rego — the engine is just OPA — but most operators stay in the YAML layer because the bundled Rego covers the common cases.
How the engine evaluates
- 01Gateway Policy engine (OPA)
input
- 02Policy engine (OPA) Rego module
data.defenseclaw.domain.decision
- 03Rego module Policy engine (OPA)
decision and reasons
- 04Policy engine (OPA) Gateway
verdict
- 05Gateway Sinks
emit decision event
Every decision carries the matching reasons[] so the audit event can name the rule that fired. That's what powers the per-rule breakdowns in Splunk and Grafana.
Layer 2 — Guardrail rule packs
The guardrail rule pack is the content the gateway evaluates against — the actual regex patterns, judge prompts, and category taxonomies. Three packs ship out of the box:
permissive
Observe-only baseline. Patterns log but don't block. Good for first-week pilots.
default
Sensible production defaults. Blocks high-severity secrets, prompts for medium.
strict
Belt-and-suspenders. Blocks medium+ on most categories. Choose this for regulated workloads.
Each pack is a directory:
policies/guardrail/<pack>/
rules/
local-patterns.yaml # injection, secrets, PII, etc.
custom-org.yaml # your org's regex extensions (optional)
judge/
judge.yaml # categories, severities, prompt template
suppressions.yaml # the suppression layer (see below)Pick a pack at setup time:
defenseclaw setup guardrail --rule-pack strict--rule-pack only accepts the three bundled profiles (default, strict, permissive). To run a custom pack from your own directory, point guardrail.rule_pack_dir at it in ~/.defenseclaw/config.yaml — see Authoring custom rule packs below for the full workflow.
Rule shape
Rules are YAML for portability and review:
version: 1
injection:
description: "Prompt injection / jailbreak signatures"
injection_regexes:
- id: ignore_previous
pattern: "(?i)ignore (all )?previous (instructions|messages)"
severity: high
secrets:
- id: aws_access_key
pattern: "AKIA[0-9A-Z]{16}"
severity: high
suppress_signature: "{{ matched | sha256 }}"
pii_requests:
- id: ssn_format
pattern: "\\b\\d{3}-\\d{2}-\\d{4}\\b"
severity: mediumAnything severity: high triggers deny by default, medium triggers ask (HITL). The thresholds are configurable per rule pack.
Layer 3 — LLM judge
The LLM judge is an optional second opinion that fires when the deterministic rules don't have a clean answer. It's particularly useful for:
- Subtle prompt injection that doesn't match a regex.
- Tool calls whose stated purpose disagrees with their effects (most common in MCP).
- Categorical content checks — "does this prompt try to extract proprietary code?"
Configuration lives in judge.yaml inside each rule pack:
version: 1
model_hint: gpt-4o-mini
categories:
- id: prompt_injection
severity_floor: medium
- id: data_exfiltration
severity_floor: high
- id: code_execution_intent
severity_floor: medium
prompt_template: |
You are reviewing the following agent message for security issues.
Respond with the most-severe matching category and a one-line reason.
...
tool_suppressions:
- tool: fs.read_text
when_path_matches: "^/Users/.+/Documents/.*"
suppress: trueThe judge call goes through the same Bifrost pipeline as everything else and uses the unified LLM key. Token usage is emitted as a judge.call event so you can monitor cost from your dashboards.
You can target a different model than the rest of the stack by setting guardrail.judge.llm in config.yaml:
guardrail:
judge:
enabled: true
llm:
provider: anthropic
model: claude-3-5-haiku-20241022
api_key_env: DEFENSECLAW_LLM_KEYLayer 4 — Suppressions
Suppressions are the difference between "we deployed DefenseClaw" and "we use DefenseClaw daily." Three suppression flavours ship with every rule pack:
pre_judge_strips — redact before the judge sees it
pre_judge_strips:
- field: prompt
pattern: "secret_[a-zA-Z0-9]+"
replacement: "secret_***"The redaction happens before the LLM judge is invoked, so the secret never crosses a third-party API even when the judge is enabled. Useful for fields that can never legitimately contain prompt content (auth headers, API tokens).
finding_suppressions — silence known-good signatures
finding_suppressions:
- rule: secrets/aws_access_key
signature: "5e884898...sha256-of-known-test-key"
expires_at: "2026-12-31T00:00:00Z"
reason: "Public test fixture, see commit abc1234"When a finding's signature matches, the verdict is downgraded to allow and the event is emitted with suppressed_by: <reason>. This is the right place to put one-off exceptions — they're auditable, they expire, and they show up in the dashboards as a separate panel.
tool_suppressions — scope HITL prompts
tool_suppressions:
- tool: fs.read_text
when_path_matches: "^/Users/.+/Documents/.*"
suppress: true
- tool: shell.execute
when_command_matches: "^git (status|diff|log)"
suppress: trueThese keep the operator from being asked to approve cosmetic actions (git status, opening ~/Documents files) while still surfacing them in the audit log. The suppression is narrow — anything outside the predicate still prompts.
Why three layers?
| Layer | Where it runs | Use it for |
|---|---|---|
pre_judge_strips | Before the LLM judge | Keeping secrets out of third-party APIs |
finding_suppressions | After rules + judge fire | One-off exceptions with audit trail and expiry |
tool_suppressions | At HITL prompt time | Cutting noise without changing what gets logged |
Together they make it realistic to leave the gateway in action mode without operator fatigue.
Authoring custom rule packs
The path of least resistance is to extend rather than replace. Custom rule packs live on disk and are pointed at via the guardrail.rule_pack_dir config key — there is no built-in linter or fixture-runner today, so the workflow is git-driven.
Locate the active rule-pack directory. The bundled packs ship inside the installed Python package; the operator-editable copies live under ~/.defenseclaw/policies/guardrail/{default,strict,permissive}/. Find the active one with:
awk '/rule_pack_dir/ {print $2}' ~/.defenseclaw/config.yamlCopy the closest pack.
cp -r ~/.defenseclaw/policies/guardrail/default \
~/.defenseclaw/policies/guardrail/my-orgAdd a rules file alongside local-patterns.yaml. Keep your custom rules in their own file so updates to the bundled patterns don't conflict.
Validate the surrounding config. There is no rule-pack-specific linter, but defenseclaw config validate will reject obvious schema breaks before the gateway tries to load the directory:
defenseclaw config validatePoint the gateway at the new directory. Edit ~/.defenseclaw/config.yaml and set:
guardrail:
rule_pack_dir: ~/.defenseclaw/policies/guardrail/my-orgThen reload (no restart needed):
defenseclaw-gateway policy reloaddefenseclaw setup guardrail --rule-pack <name> is for switching between the three bundled packs (default, strict, permissive); custom directories go through rule_pack_dir.
Operator commands
The policy group manages named OPA / asset policies (stored under ~/.defenseclaw/policies/) — different surface from the guardrail rule packs above. Subcommands:
defenseclaw policy list # every named policy on disk
defenseclaw policy show <name> # full content of one policy
defenseclaw policy create <name> -d "..." # scaffold a new one from a preset
defenseclaw policy activate <name> # set as active
defenseclaw policy delete <name> # remove
defenseclaw policy validate # validates data.json schema + compiles bundled Rego
defenseclaw policy test [-v] # runs `opa test` against the bundled Rego (requires `opa` on PATH)
defenseclaw policy edit <section> ... # actions / scanner / guardrail / firewallpolicy show and policy activate always take a policy name, not a filesystem path. policy validate checks that data.json parses, that every severity tier in actions and scanner_overrides has the required fields, and that the bundled Rego modules compile (it is not an opa fmt --diff). Both validate and test only accept --rego-dir to override the bundled Rego location; neither takes a path argument or --fixtures. There is no policy diff subcommand — compare named policies by reading them back through policy show <a> and policy show <b>.
See also
- Defaults — what each shipped rule pack actually sets, and how to pick one based on risk tolerance
- Setup Guardrail — the CLI that wires the chosen pack into your connectors
- Unified LLM key — how the LLM judge resolves its provider key
- Reference → Configuration — every config key the policy layer reads
Human-in-the-Loop (HITL)
How DefenseClaw escalates risky tool calls to a human operator. Covers when HITL fires, how min-severity gates the prompt, and the per-connector difference between native ask and downgraded confirm verdicts.
Defaults
What every fresh DefenseClaw install ships with — three OPA policies (permissive / default / strict), three matching guardrail rule packs, the operator-config defaults, and how to pick the combination that fits your team's risk tolerance.