Defaults
What every fresh DefenseClaw install ships with — three OPA policies (permissive / default / strict), three matching guardrail rule packs, the operator-config defaults, and how to pick the combination that fits your team's risk tolerance.
DefenseClaw ships with opinionated defaults that are immediately useful without tuning, and stay out of your way until you opt into stricter behaviour. This page documents what actually ships — grounded in the policy YAMLs in policies/, the schema in internal/config/config.go, and what defenseclaw setup guardrail actually writes.
The two layers you can swap
DefenseClaw separates admission policy from runtime guardrail rules. They ship as matching triples but are independent knobs.
Admission policy (OPA)
What happens when a skill / MCP / plugin gets installed or executed. Lives in policies/<name>.yaml, activated by defenseclaw policy activate <name>.
Guardrail rule pack
The regex patterns, LLM-judge prompts, and suppressions that gate prompts and completions in flight. Lives in policies/guardrail/<name>/, pointed at by guardrail.rule_pack_dir.
Both layers ship in three named profiles — default, strict, permissive — and you flip them independently:
defenseclaw policy activate default # OPA layer
defenseclaw setup guardrail --rule-pack default # Guardrail layerdefenseclaw policy activate strict does not change guardrail.rule_pack_dir, and vice versa. If you want strict everywhere, flip both. See docs/GUARDRAIL_RULE_PACKS.md for the rationale.
OPA admission policy — what each profile ships
The OPA policy file (policies/<name>.yaml) drives admission decisions: what happens to a finding by severity, whether the allow-list lets first-party assets bypass scanning, and what threshold a Cisco AI Defense verdict has to clear before it blocks.
| Knob | permissive | default | strict |
|---|---|---|---|
admission.allow_list_bypass_scan | true | true | false |
skill_actions.critical | quarantine + disable + block | quarantine + disable + block | quarantine + disable + block |
skill_actions.high | none + enable + none | quarantine + disable + block | quarantine + disable + block |
skill_actions.medium | none + enable + none | none + enable + none | quarantine + disable + block |
scanner_overrides | empty | MCP LOW/MEDIUM and plugin MEDIUM/HIGH overrides | MCP LOW/MEDIUM and plugin MEDIUM/HIGH overrides |
guardrail.block_threshold | 4 (CRITICAL) | 4 (CRITICAL) | 2 (MEDIUM) |
guardrail.alert_threshold | 3 (HIGH) | 2 (MEDIUM) | 1 (LOW) |
guardrail.cisco_trust_level | advisory | full | full |
guardrail.hilt.enabled | false (key omitted) | false | false (key omitted) |
guardrail.hilt.min_severity | HIGH | HIGH | HIGH |
Severity ranks are the rego convention from policies/rego/guardrail.rego: 1 = LOW, 2 = MEDIUM, 3 = HIGH, 4 = CRITICAL. cisco_trust_level: advisory means even Cisco AI Defense's own verdicts are surfaced but never escalated to a block.
The columns are deliberately conservative. We'd rather you opt into stricter behaviour than have an upgrade silently start blocking your traffic.
Guardrail rule pack — what each profile ships
The rule pack directory (policies/guardrail/<name>/) holds the regex YAMLs, judge prompts, sensitive-tool definitions, and suppressions the in-flight scanner consumes.
| Pack | rules/ files | judge/ prompts | suppressions.yaml | sensitive-tools.yaml |
|---|---|---|---|---|
permissive | c2, cognitive, commands, enterprise-data, local-patterns, secrets, sensitive-paths, trust-exploit | pii and tool-injection (higher judge thresholds); injection ships disabled | broadest; additionally suppresses all IP findings and selected file-inspection PII | same six sensitive tool definitions as the other packs |
default | same eight families; balanced variants where profiles differ | injection, pii, tool-injection | private/loopback IPs, platform IDs, expected system metadata | same six sensitive tool definitions as the other packs |
strict | same eight families; stricter variants where profiles differ | injection, pii, tool-injection (lower judge thresholds) | minimal structural suppressions; no tool suppressions | same six sensitive tool definitions as the other packs |
All three packs share the same severity rubric and the same signal_strength output schema — only the per-category thresholds and suppression scope differ between packs. Switching the rule pack does not enable the LLM judge — that's a separate guardrail.judge.enabled toggle in your operator config (default: false). Flipping the rule pack only changes which prompt YAMLs the judge will run if you've enabled it.
What setup guardrail actually writes
After defenseclaw init, an explicit defenseclaw setup guardrail --connector openclaw --mode action --rule-pack default --non-interactive produces the relevant configuration below (unrelated generated fields omitted):
claw:
mode: openclaw
guardrail:
enabled: true
mode: action
rule_pack_dir: /Users/<you>/.defenseclaw/policies/guardrail/default
hook_fail_mode: closed
judge:
enabled: false # opt in via --judge-model
hilt:
enabled: false # opt in via --human-approval
min_severity: HIGH
privacy:
disable_redaction: false
audit_sinks: [] # no external audit sink; local SQLite + JSONL still write
webhooks: [] # add via `defenseclaw setup webhook add ...`
claude_code:
enabled: false # toggled when you pick claude-code in setup guardrail
codex:
enabled: falseThree things to notice that contradict folklore:
- The LLM judge is OFF by default. It only flips on if you pass
--judge-modeltosetup guardrailor answer "yes" to the interactive judge prompt. The schema default isguardrail.judge.enabled = falseininternal/config/config.go. Keeping it off keeps cost predictable; flip it on once you have aDEFENSECLAW_LLM_KEYconfigured. - HILT is OFF by default. The shipped severity floor is
HIGH, butenabled: falsemeans it never prompts.--human-approvalflips it on;--hilt-min-severityadjusts the floor. - The built-in local stores still write.
audit_sinks: []means no external audit-event destination. SQLiteaudit_eventspowersdefenseclaw alerts, the TUI, and audit export;~/.defenseclaw/gateway.jsonlremains the live structured runtime log. Wire external sinks viasetup splunkorsetup local-observability.
Tuning by risk tolerance
You usually don't need a custom policy or rule pack — just a few knob changes.
"I'm in pilot, just observe"
defenseclaw policy activate permissive
defenseclaw setup guardrail \
--connector openclaw \
--mode observe \
--rule-pack permissive \
--restart \
--non-interactiveObserve mode prevents enforcement even though the permissive policy's action-mode block threshold remains CRITICAL. Everything still flows to the audit log and JSONL so you can review what would have happened. Recommended first week of any rollout.
"Move fast, stop only the obvious harm"
defenseclaw policy activate default
defenseclaw setup guardrail \
--connector openclaw \
--mode action \
--rule-pack default \
--human-approval \
--hilt-min-severity high \
--restart \
--non-interactiveDefault rules; CRITICAL blocks and HIGH can prompt on OpenClaw's native ask surface. Most engineering teams in the early/middle phase land here.
"Regulated workload, lock it down"
defenseclaw policy activate strict
defenseclaw setup guardrail \
--connector openclaw \
--mode action \
--rule-pack strict \
--detection-strategy regex_judge \
--judge-model openai/gpt-4o-mini \
--restart \
--non-interactiveStrict policy (block ≥ MEDIUM, no allow-list bypass), strict rule pack (stricter profile variants and minimal suppressions), and the LLM judge enabled. Because the strict block threshold runs before HITL, MEDIUM-and-higher findings block rather than prompt. Combine this with the bundled OpenShell sandbox profile and a reviewed first-party allow-list.
"I trust the scanner, raise its bar specifically"
The asset-class behavior is independent of the rule pack. Edit the active policy YAML directly:
```yaml title="policies/default.yaml override (apply with defenseclaw policy activate default)"
scanner_overrides:
mcp:
medium: # was none/enable/none
file: quarantine
runtime: disable
install: block
Then re-activate so OPA picks up the change:
```bash
defenseclaw policy activate defaultWhat defenseclaw init doesn't change
A few defaults are intentionally fixed unless you edit ~/.defenseclaw/config.yaml directly:
| Knob | Default | Why fixed |
|---|---|---|
~/.defenseclaw/gateway.jsonl (JSONL fallback path) | always written | Reliability fallback — the gateway must always have a writable place to log when external sinks fail |
guardrail.hook_fail_mode | closed on new installs | Malformed/unauthorized hook responses fail closed; upgrades from the pre-change default are migrated to open for compatibility |
guardrail.judge.timeout | 30s | Hot-path latency budget for the judge |
guardrail.judge.adjudication_timeout | 5s | Per-prompt adjudication budget |
guardrail.detection_strategy | regex_judge | Tested baseline — regex first, judge for medium+ findings |
| Bifrost retry policy | 3 attempts, exp backoff | Tested LLM-routing baseline |
If you need to change any of these, edit ~/.defenseclaw/config.yaml directly and defenseclaw config validate confirms the schema.
Per-connector overrides (guardrail.connectors)
When you run more than one hook connector from a single gateway, override guardrail policy per connector under guardrail.connectors.<name> in ~/.defenseclaw/config.yaml. Every field is optional and inherits the global guardrail.* value when unset, so a connector block only carries what differs:
guardrail:
mode: action # global default
hook_fail_mode: closed
connectors:
claudecode:
mode: action # enforce for Claude Code
codex:
mode: observe # softer for Codex than for Claude Code
hook_fail_mode: open # explicit softer override for Codexclaw.mode flips to multi automatically once more than one connector is active. Manage these blocks with defenseclaw setup <connector> (choosing Add) and the defenseclaw guardrail ... --connector X command group — see Setup → Multi-connector and Reference → Configuration. The OPA admission policy is still global — there's no per-connector policy override surface yet.
Legacy top-level connector blocks are deprecated
Older installs used top-level claude_code: / codex: blocks (the AgentHookConfig fields) for per-connector overrides:
claude_code:
enabled: true
mode: action
fail_mode: open # LEGACY hint, NOT consumed by hooks; see Reference → Fail modesThese are still parsed for backward compatibility, but fail_mode here does nothing at runtime (see Reference → Fail modes). Prefer guardrail.connectors.<name> for new configuration — it's the surface the per-connector CLI writes and the gateway resolves at request time.
Inspect the active defaults
defenseclaw config show # rendered ~/.defenseclaw/config.yaml (secrets masked)
defenseclaw policy list # all policies on disk + which is active
defenseclaw policy show default # normalized summary of one named policyconfig show always renders the resolved configuration — base + env-var overlay — so you can see the effective values without spelunking. Use --reveal to also show resolved secret values (still masked in the output for safety).
policy show <name> prints a normalized summary of the named file (default,
strict, permissive, or any custom policies/<name>.yaml you've added).
It does not dump the source YAML or individual guardrail rules. To find a rule
by id, search the configured rule-pack directory directly:
grep -rn "rule_id_you_care_about" "$(awk '/rule_pack_dir/ {print $2}' ~/.defenseclaw/config.yaml)"Reset to defaults
There's no --reset flag. Two real paths exist:
Soft reset (most common) — just re-run setup with the defaults you want. setup guardrail overwrites the relevant guardrail.* keys idempotently:
defenseclaw setup guardrail --rule-pack default --no-human-approval
defenseclaw policy activate defaultHard reset (start from zero) — defenseclaw uninstall archives ~/.defenseclaw/ to a timestamped backup, so you can roll back:
defenseclaw uninstall
defenseclaw init
defenseclaw setup guardrailSee also
- Policies — the layered architecture (regex → judge → suppressions → OPA admission)
- Setup Guardrail — the CLI that consumes these defaults
- HITL — what
guardrail.hilt.enabledandmin_severityactually change for the operator - Reference → Fail modes — the three "fail open vs closed" knobs disambiguated
- Reference → Configuration — every key surfaced here, with type and default
docs/GUARDRAIL_RULE_PACKS.md— the canonical engineering doc on the OPA-vs-rule-pack split
Regex cookbook
Battle-tested regex patterns for DefenseClaw guardrails — secrets, command injection, exfiltration markers, prompt injection — with explanations and counterexamples.
OpenClaw integration
How DefenseClaw integrates with OpenClaw end-to-end — fetch interceptor, before_tool_call hook, correlation headers, plugin-mediated HITL approvals, and the audit loop.