CiscoCiscoDefenseClaw

Policies

How DefenseClaw decides — repo policies (OPA/Rego), guardrail rule packs (regex + LLM judge), scanner policies, and the suppression layer that keeps your alert volume sane.

DefenseClaw makes one decision per event: allow, ask, or deny. The plumbing that turns a prompt or a tool call into that decision lives in three layers:

Eventprompt · tool · finding
Pre-judge stripsredact · truncate
Regex rulesdeterministic
LLM judgeoptional second opinion
Suppression layerallow lists · signatures
Repo policyOPA · Rego
Verdictallow · ask · deny
Cheap deterministic rules first. The judge is only consulted when needed. Suppression is the last line before the verdict is recorded.

You can enable any subset. Plain regex is enough for many secrets and obvious prompt-injection patterns; the LLM judge catches the long tail; OPA Rego owns the operator's organisational rules; suppressions keep all of it usable.

Layer 1 — Repo policies (OPA / Rego)

The top-level policy file (policies/default.yaml and friends) is the operator's source of truth. It declares what the gateway should do for each domain — admission, guardrail thresholds, sandbox, audit, skill actions:

policies/default.yaml (excerpt)
name: default
admission:
  scanners:
    skill-scanner:
      enabled: true
      block_severity_min: high
      quarantine_severity_min: medium
    mcp-scanner:
      enabled: true
      block_severity_min: high

guardrail:
  judge:
    enabled: true
    min_severity: medium
  rule_pack: default

skill_actions:
  default: ask
first_party_allow_list:
  - cisco-ai-defense/*

These YAML blocks are compiled into OPA Rego inputs. Custom Rego modules under policies/rego/ then evaluate per-domain decisions:

DomainRego entry pointInputs
admissionpolicies/rego/admission.regoscan result, skill / MCP metadata, allow lists
guardrailpolicies/rego/guardrail.regorule matches, judge result, severity floor
firewallpolicies/rego/firewall.regoegress destination, agent identity
sandboxpolicies/rego/sandbox.regotool call, sandbox profile
auditpolicies/rego/audit.regoevent kind, sink targets
skill_actionspolicies/rego/skill_actions.regoskill source, scan severity

You can hand-edit the Rego — the engine is just OPA — but most operators stay in the YAML layer because the bundled Rego covers the common cases.

How the engine evaluates

  1. 01Gateway Policy engine (OPA)

    input

  2. 02Policy engine (OPA) Rego module

    data.defenseclaw.domain.decision

  3. 03Rego module Policy engine (OPA)

    decision and reasons

  4. 04Policy engine (OPA) Gateway

    verdict

  5. 05Gateway Sinks

    emit decision event

Every decision carries the matching reasons array so the audit row can name the rule that fired.

Every decision carries the matching reasons[] so the audit event can name the rule that fired. That's what powers the per-rule breakdowns in Splunk and Grafana.

Layer 2 — Guardrail rule packs

The guardrail rule pack is the content the gateway evaluates against — the actual regex patterns, judge prompts, and category taxonomies. Three packs ship out of the box:

Each pack is a directory:

policies/guardrail/<pack>/
  rules/
    local-patterns.yaml       # injection, secrets, PII, etc.
    custom-org.yaml           # your org's regex extensions (optional)
  judge/
    judge.yaml                # categories, severities, prompt template
  suppressions.yaml           # the suppression layer (see below)

Pick a pack at setup time:

defenseclaw setup guardrail --rule-pack strict

--rule-pack only accepts the three bundled profiles (default, strict, permissive). To run a custom pack from your own directory, point guardrail.rule_pack_dir at it in ~/.defenseclaw/config.yaml — see Authoring custom rule packs below for the full workflow.

Rule shape

Rules are YAML for portability and review:

policies/guardrail/default/rules/local-patterns.yaml (excerpt)
version: 1
injection:
  description: "Prompt injection / jailbreak signatures"
  injection_regexes:
    - id: ignore_previous
      pattern: "(?i)ignore (all )?previous (instructions|messages)"
      severity: high
secrets:
  - id: aws_access_key
    pattern: "AKIA[0-9A-Z]{16}"
    severity: high
    suppress_signature: "{{ matched | sha256 }}"
pii_requests:
  - id: ssn_format
    pattern: "\\b\\d{3}-\\d{2}-\\d{4}\\b"
    severity: medium

Anything severity: high triggers deny by default, medium triggers ask (HITL). The thresholds are configurable per rule pack.

Layer 3 — LLM judge

The LLM judge is an optional second opinion that fires when the deterministic rules don't have a clean answer. It's particularly useful for:

  • Subtle prompt injection that doesn't match a regex.
  • Tool calls whose stated purpose disagrees with their effects (most common in MCP).
  • Categorical content checks — "does this prompt try to extract proprietary code?"

Configuration lives in judge.yaml inside each rule pack:

policies/guardrail/default/judge/judge.yaml (excerpt)
version: 1
model_hint: gpt-4o-mini
categories:
  - id: prompt_injection
    severity_floor: medium
  - id: data_exfiltration
    severity_floor: high
  - id: code_execution_intent
    severity_floor: medium
prompt_template: |
  You are reviewing the following agent message for security issues.
  Respond with the most-severe matching category and a one-line reason.
  ...
tool_suppressions:
  - tool: fs.read_text
    when_path_matches: "^/Users/.+/Documents/.*"
    suppress: true

The judge call goes through the same Bifrost pipeline as everything else and uses the unified LLM key. Token usage is emitted as a judge.call event so you can monitor cost from your dashboards.

You can target a different model than the rest of the stack by setting guardrail.judge.llm in config.yaml:

guardrail:
  judge:
    enabled: true
    llm:
      provider: anthropic
      model: claude-3-5-haiku-20241022
      api_key_env: DEFENSECLAW_LLM_KEY

Layer 4 — Suppressions

Suppressions are the difference between "we deployed DefenseClaw" and "we use DefenseClaw daily." Three suppression flavours ship with every rule pack:

pre_judge_strips — redact before the judge sees it

policies/guardrail/default/suppressions.yaml
pre_judge_strips:
  - field: prompt
    pattern: "secret_[a-zA-Z0-9]+"
    replacement: "secret_***"

The redaction happens before the LLM judge is invoked, so the secret never crosses a third-party API even when the judge is enabled. Useful for fields that can never legitimately contain prompt content (auth headers, API tokens).

finding_suppressions — silence known-good signatures

finding_suppressions:
  - rule: secrets/aws_access_key
    signature: "5e884898...sha256-of-known-test-key"
    expires_at: "2026-12-31T00:00:00Z"
    reason: "Public test fixture, see commit abc1234"

When a finding's signature matches, the verdict is downgraded to allow and the event is emitted with suppressed_by: <reason>. This is the right place to put one-off exceptions — they're auditable, they expire, and they show up in the dashboards as a separate panel.

tool_suppressions — scope HITL prompts

tool_suppressions:
  - tool: fs.read_text
    when_path_matches: "^/Users/.+/Documents/.*"
    suppress: true
  - tool: shell.execute
    when_command_matches: "^git (status|diff|log)"
    suppress: true

These keep the operator from being asked to approve cosmetic actions (git status, opening ~/Documents files) while still surfacing them in the audit log. The suppression is narrow — anything outside the predicate still prompts.

Why three layers?

LayerWhere it runsUse it for
pre_judge_stripsBefore the LLM judgeKeeping secrets out of third-party APIs
finding_suppressionsAfter rules + judge fireOne-off exceptions with audit trail and expiry
tool_suppressionsAt HITL prompt timeCutting noise without changing what gets logged

Together they make it realistic to leave the gateway in action mode without operator fatigue.

Authoring custom rule packs

The path of least resistance is to extend rather than replace. Custom rule packs live on disk and are pointed at via the guardrail.rule_pack_dir config key — there is no built-in linter or fixture-runner today, so the workflow is git-driven.

Locate the active rule-pack directory. The bundled packs ship inside the installed Python package; the operator-editable copies live under ~/.defenseclaw/policies/guardrail/{default,strict,permissive}/. Find the active one with:

awk '/rule_pack_dir/ {print $2}' ~/.defenseclaw/config.yaml

Copy the closest pack.

cp -r ~/.defenseclaw/policies/guardrail/default \
      ~/.defenseclaw/policies/guardrail/my-org

Add a rules file alongside local-patterns.yaml. Keep your custom rules in their own file so updates to the bundled patterns don't conflict.

Validate the surrounding config. There is no rule-pack-specific linter, but defenseclaw config validate will reject obvious schema breaks before the gateway tries to load the directory:

defenseclaw config validate

Point the gateway at the new directory. Edit ~/.defenseclaw/config.yaml and set:

guardrail:
  rule_pack_dir: ~/.defenseclaw/policies/guardrail/my-org

Then reload (no restart needed):

defenseclaw-gateway policy reload

defenseclaw setup guardrail --rule-pack <name> is for switching between the three bundled packs (default, strict, permissive); custom directories go through rule_pack_dir.

Operator commands

The policy group manages named OPA / asset policies (stored under ~/.defenseclaw/policies/) — different surface from the guardrail rule packs above. Subcommands:

defenseclaw policy list                       # every named policy on disk
defenseclaw policy show <name>                # full content of one policy
defenseclaw policy create <name> -d "..."     # scaffold a new one from a preset
defenseclaw policy activate <name>            # set as active
defenseclaw policy delete <name>              # remove
defenseclaw policy validate                   # validates data.json schema + compiles bundled Rego
defenseclaw policy test [-v]                  # runs `opa test` against the bundled Rego (requires `opa` on PATH)
defenseclaw policy edit <section> ...         # actions / scanner / guardrail / firewall

policy show and policy activate always take a policy name, not a filesystem path. policy validate checks that data.json parses, that every severity tier in actions and scanner_overrides has the required fields, and that the bundled Rego modules compile (it is not an opa fmt --diff). Both validate and test only accept --rego-dir to override the bundled Rego location; neither takes a path argument or --fixtures. There is no policy diff subcommand — compare named policies by reading them back through policy show <a> and policy show <b>.

See also