Defaults
What every fresh DefenseClaw install ships with — three OPA policies (permissive / default / strict), three matching guardrail rule packs, the operator-config defaults, and how to pick the combination that fits your team's risk tolerance.
DefenseClaw ships with opinionated defaults that are immediately useful without tuning, and stay out of your way until you opt into stricter behaviour. This page documents what actually ships — grounded in the policy YAMLs in policies/, the schema in internal/config/config.go, and what defenseclaw setup guardrail actually writes.
The two layers you can swap
DefenseClaw separates admission policy from runtime guardrail rules. They ship as matching triples but are independent knobs.
Admission policy (OPA)
What happens when a skill / MCP / plugin gets installed or executed. Lives in policies/<name>.yaml, activated by defenseclaw policy activate <name>.
Guardrail rule pack
The regex patterns, LLM-judge prompts, and suppressions that gate prompts and completions in flight. Lives in policies/guardrail/<name>/, pointed at by guardrail.rule_pack_dir.
Both layers ship in three named profiles — default, strict, permissive — and you flip them independently:
defenseclaw policy activate default # OPA layer
defenseclaw setup guardrail --rule-pack default # Guardrail layerdefenseclaw policy activate strict does not change guardrail.rule_pack_dir, and vice versa. If you want strict everywhere, flip both. See docs/GUARDRAIL_RULE_PACKS.md for the rationale.
OPA admission policy — what each profile ships
The OPA policy file (policies/<name>.yaml) drives admission decisions: what happens to a finding by severity, whether the allow-list lets first-party assets bypass scanning, and what threshold a Cisco AI Defense verdict has to clear before it blocks.
| Knob | permissive | default | strict |
|---|---|---|---|
admission.allow_list_bypass_scan | true | true | false |
skill_actions.critical | quarantine + disable + block | quarantine + disable + block | quarantine + disable + block |
skill_actions.high | none + enable + none | quarantine + disable + block | quarantine + disable + block |
skill_actions.medium | none + enable + none | none + enable + none | quarantine + disable + block |
scanner_overrides | empty | empty | per-scanner medium-severity blocks for mcp and plugin |
guardrail.block_threshold | 4 (CRITICAL) | 4 (CRITICAL) | 2 (MEDIUM) |
guardrail.alert_threshold | 3 (HIGH) | 2 (MEDIUM) | 1 (LOW) |
guardrail.cisco_trust_level | advisory | full | full |
guardrail.hilt.enabled | false | false | true |
guardrail.hilt.min_severity | HIGH | HIGH | HIGH |
Severity ranks are the rego convention from policies/rego/guardrail.rego: 1 = LOW, 2 = MEDIUM, 3 = HIGH, 4 = CRITICAL. cisco_trust_level: advisory means even Cisco AI Defense's own verdicts are surfaced but never escalated to a block.
The columns are deliberately conservative. We'd rather you opt into stricter behaviour than have an upgrade silently start blocking your traffic.
Guardrail rule pack — what each profile ships
The rule pack directory (policies/guardrail/<name>/) holds the regex YAMLs, judge prompts, sensitive-tool definitions, and suppressions the in-flight scanner consumes.
| Pack | rules/ files | judge/ prompts | suppressions.yaml | sensitive-tools.yaml |
|---|---|---|---|---|
permissive | injection, secrets, c2, commands, cognitive, trust-exploit, sensitive-paths, local-patterns, enterprise-data | injection, pii, tool-injection (less aggressive thresholds) | broad — git status, doc reads suppressed | minimal |
default | same nine families as permissive | injection, pii, tool-injection (medium thresholds) | moderate | balanced |
strict | same nine + tighter regex | injection, pii, tool-injection (low thresholds) | none | aggressive |
Switching the rule pack does not enable the LLM judge — that's a separate guardrail.judge.enabled toggle in your operator config (default: false). Flipping the rule pack only changes which prompt YAMLs the judge will run if you've enabled it.
What setup guardrail actually writes
A vanilla defenseclaw init && defenseclaw setup guardrail --connector openclaw --rule-pack default produces a config that looks like this — schema validated against internal/config/config.go:
mode: action
guardrail:
enabled: true
mode: action
rule_pack_dir: /Users/<you>/.defenseclaw/policies/guardrail/default
hook_fail_mode: open
judge:
enabled: false # opt in via --judge-model
hilt:
enabled: false # opt in via --human-approval
min_severity: HIGH
privacy:
disable_redaction: false
audit_sinks: [] # JSONL fallback at ~/.defenseclaw/gateway.jsonl always writes
webhooks: [] # add via `defenseclaw setup webhook add ...`
claude_code:
enabled: false # toggled when you pick claude-code in setup guardrail
codex:
enabled: falseThree things to notice that contradict folklore:
- The LLM judge is OFF by default. It only flips on if you pass
--judge-modeltosetup guardrailor answer "yes" to the interactive judge prompt. The viper default isguardrail.judge.enabled = false(internal/config/config.go:2162). Keeping it off keeps cost predictable; flip it on once you have aDEFENSECLAW_LLM_KEYconfigured. - HILT is OFF by default. The shipped severity floor is
HIGH, butenabled: falsemeans it never prompts.--human-approvalflips it on;--hilt-min-severityadjusts the floor. - Only the JSONL fallback writes.
audit_sinks: []means no Splunk, no OTLP — the gateway still tails everything to~/.defenseclaw/gateway.jsonlfordefenseclaw alertsand the TUI. Wire external sinks viasetup splunkorsetup local-observability.
Tuning by risk tolerance
You usually don't need a custom policy or rule pack — just a few knob changes.
"I'm in pilot, just observe"
defenseclaw policy activate permissive
defenseclaw setup guardrail --rule-pack permissiveNothing blocks (block threshold = CRITICAL, Cisco trust = advisory). Everything still flows to the audit log and JSONL so you can review what would have happened. Recommended first week of any rollout.
"Move fast, stop only the obvious harm"
defenseclaw policy activate default
defenseclaw setup guardrail --rule-pack default
defenseclaw setup guardrail --human-approval --hilt-min-severity highDefault rules; HILT prompts only on HIGH+. Most engineering teams in the early/middle phase land here.
"Regulated workload, lock it down"
defenseclaw policy activate strict
defenseclaw setup guardrail --rule-pack strict
defenseclaw setup guardrail --human-approval --hilt-min-severity low \
--judge-model openai/gpt-4o-miniStrict policy (block ≥ MEDIUM, no allow-list bypass), strict rule pack (tightest regex + suppressions empty), HILT on every LOW+ event, LLM judge enabled. Combine with the bundled OpenShell sandbox profile and an MCP allow-list (in policies/strict.yaml's first_party_allow_list).
"I trust the scanner, raise its bar specifically"
The asset-class behavior is independent of the rule pack. Edit the active policy YAML directly:
```yaml title="policies/default.yaml override (apply with defenseclaw policy activate default)"
scanner_overrides:
mcp:
medium: # was none/enable/none
file: quarantine
runtime: disable
install: block
Then re-activate so OPA picks up the change:
```bash
defenseclaw policy activate defaultWhat defenseclaw init doesn't change
A few defaults are intentionally fixed unless you edit ~/.defenseclaw/config.yaml directly:
| Knob | Default | Why fixed |
|---|---|---|
~/.defenseclaw/gateway.jsonl (JSONL fallback path) | always written | Reliability fallback — the gateway must always have a writable place to log when external sinks fail |
guardrail.hook_fail_mode | open | Conservative — a malformed hook response shouldn't take the agent down |
guardrail.judge.timeout | 30s | Hot-path latency budget for the judge |
guardrail.judge.adjudication_timeout | 5s | Per-prompt adjudication budget |
guardrail.detection_strategy | regex_judge | Tested baseline — regex first, judge for medium+ findings |
| Bifrost retry policy | 3 attempts, exp backoff | Tested LLM-routing baseline |
If you need to change any of these, edit ~/.defenseclaw/config.yaml directly and defenseclaw config validate confirms the schema.
Per-connector overrides
You can override per-connector in ~/.defenseclaw/config.yaml for the small number of cases where one agent needs different behaviour:
claude_code:
enabled: true
mode: action
fail_mode: open # LEGACY hint, not consumed by hooks; see Reference → Fail modes
codex:
enabled: true
mode: observe # softer for Codex than for Claude CodeThe connector overlay is shallow-merged on top of the global config, so you only need to specify what changes. (As of this writing, the OPA policy is global — there's no per-connector policy override surface yet.)
Inspect the active defaults
defenseclaw config show # rendered ~/.defenseclaw/config.yaml (secrets masked)
defenseclaw policy list # all policies on disk + which is active
defenseclaw policy show <name> # full content of one policyconfig show always renders the resolved configuration — base + env-var overlay — so you can see the effective values without spelunking. Use --reveal to also show resolved secret values (still masked in the output for safety).
policy show <name> prints the policy YAML for the named file (default, strict, permissive, or any custom policies/<name>.yaml you've added). There's no built-in "dump a single rule by id" — for that, grep the rule pack directly:
grep -rn "rule_id_you_care_about" "$(awk '/rule_pack_dir/ {print $2}' ~/.defenseclaw/config.yaml)"Reset to defaults
There's no --reset flag. Two real paths exist:
Soft reset (most common) — just re-run setup with the defaults you want. setup guardrail overwrites the relevant guardrail.* keys idempotently:
defenseclaw setup guardrail --rule-pack default --no-human-approval
defenseclaw policy activate defaultHard reset (start from zero) — defenseclaw uninstall archives ~/.defenseclaw/ to a timestamped backup, so you can roll back:
defenseclaw uninstall
defenseclaw init
defenseclaw setup guardrailSee also
- Policies — the layered architecture (regex → judge → suppressions → OPA admission)
- Setup Guardrail — the CLI that consumes these defaults
- HITL — what
guardrail.hilt.enabledandmin_severityactually change for the operator - Reference → Fail modes — the three "fail open vs closed" knobs disambiguated
- Reference → Configuration — every key surfaced here, with type and default
docs/GUARDRAIL_RULE_PACKS.md— the canonical engineering doc on the OPA-vs-rule-pack split
Policies
How DefenseClaw decides — repo policies (OPA/Rego), guardrail rule packs (regex + LLM judge), scanner policies, and the suppression layer that keeps your alert volume sane.
OpenClaw integration
How DefenseClaw integrates with OpenClaw end-to-end — fetch interceptor, before_tool_call hook, correlation headers, plugin-mediated HITL approvals, and the audit loop.