CiscoCiscoDefenseClaw

Defaults

What every fresh DefenseClaw install ships with — three OPA policies (permissive / default / strict), three matching guardrail rule packs, the operator-config defaults, and how to pick the combination that fits your team's risk tolerance.

DefenseClaw ships with opinionated defaults that are immediately useful without tuning, and stay out of your way until you opt into stricter behaviour. This page documents what actually ships — grounded in the policy YAMLs in policies/, the schema in internal/config/config.go, and what defenseclaw setup guardrail actually writes.

The two layers you can swap

DefenseClaw separates admission policy from runtime guardrail rules. They ship as matching triples but are independent knobs.

Admission policy (OPA)

What happens when a skill / MCP / plugin gets installed or executed. Lives in policies/<name>.yaml, activated by defenseclaw policy activate <name>.

Guardrail rule pack

The regex patterns, LLM-judge prompts, and suppressions that gate prompts and completions in flight. Lives in policies/guardrail/<name>/, pointed at by guardrail.rule_pack_dir.

Both layers ship in three named profiles — default, strict, permissive — and you flip them independently:

defenseclaw policy activate default                                # OPA layer
defenseclaw setup guardrail --rule-pack default                    # Guardrail layer

defenseclaw policy activate strict does not change guardrail.rule_pack_dir, and vice versa. If you want strict everywhere, flip both. See docs/GUARDRAIL_RULE_PACKS.md for the rationale.

OPA admission policy — what each profile ships

The OPA policy file (policies/<name>.yaml) drives admission decisions: what happens to a finding by severity, whether the allow-list lets first-party assets bypass scanning, and what threshold a Cisco AI Defense verdict has to clear before it blocks.

Knobpermissivedefaultstrict
admission.allow_list_bypass_scantruetruefalse
skill_actions.criticalquarantine + disable + blockquarantine + disable + blockquarantine + disable + block
skill_actions.highnone + enable + nonequarantine + disable + blockquarantine + disable + block
skill_actions.mediumnone + enable + nonenone + enable + nonequarantine + disable + block
scanner_overridesemptyemptyper-scanner medium-severity blocks for mcp and plugin
guardrail.block_threshold4 (CRITICAL)4 (CRITICAL)2 (MEDIUM)
guardrail.alert_threshold3 (HIGH)2 (MEDIUM)1 (LOW)
guardrail.cisco_trust_leveladvisoryfullfull
guardrail.hilt.enabledfalsefalsetrue
guardrail.hilt.min_severityHIGHHIGHHIGH

Severity ranks are the rego convention from policies/rego/guardrail.rego: 1 = LOW, 2 = MEDIUM, 3 = HIGH, 4 = CRITICAL. cisco_trust_level: advisory means even Cisco AI Defense's own verdicts are surfaced but never escalated to a block.

The columns are deliberately conservative. We'd rather you opt into stricter behaviour than have an upgrade silently start blocking your traffic.

Guardrail rule pack — what each profile ships

The rule pack directory (policies/guardrail/<name>/) holds the regex YAMLs, judge prompts, sensitive-tool definitions, and suppressions the in-flight scanner consumes.

Packrules/ filesjudge/ promptssuppressions.yamlsensitive-tools.yaml
permissiveinjection, secrets, c2, commands, cognitive, trust-exploit, sensitive-paths, local-patterns, enterprise-datainjection, pii, tool-injection (less aggressive thresholds)broad — git status, doc reads suppressedminimal
defaultsame nine families as permissiveinjection, pii, tool-injection (medium thresholds)moderatebalanced
strictsame nine + tighter regexinjection, pii, tool-injection (low thresholds)noneaggressive

Switching the rule pack does not enable the LLM judge — that's a separate guardrail.judge.enabled toggle in your operator config (default: false). Flipping the rule pack only changes which prompt YAMLs the judge will run if you've enabled it.

What setup guardrail actually writes

A vanilla defenseclaw init && defenseclaw setup guardrail --connector openclaw --rule-pack default produces a config that looks like this — schema validated against internal/config/config.go:

~/.defenseclaw/config.yaml (after init + setup guardrail --rule-pack default)
mode: action

guardrail:
  enabled: true
  mode: action
  rule_pack_dir: /Users/<you>/.defenseclaw/policies/guardrail/default
  hook_fail_mode: open
  judge:
    enabled: false                    # opt in via --judge-model
  hilt:
    enabled: false                    # opt in via --human-approval
    min_severity: HIGH

privacy:
  disable_redaction: false

audit_sinks: []                        # JSONL fallback at ~/.defenseclaw/gateway.jsonl always writes
webhooks: []                           # add via `defenseclaw setup webhook add ...`

claude_code:
  enabled: false                       # toggled when you pick claude-code in setup guardrail
codex:
  enabled: false

Three things to notice that contradict folklore:

  • The LLM judge is OFF by default. It only flips on if you pass --judge-model to setup guardrail or answer "yes" to the interactive judge prompt. The viper default is guardrail.judge.enabled = false (internal/config/config.go:2162). Keeping it off keeps cost predictable; flip it on once you have a DEFENSECLAW_LLM_KEY configured.
  • HILT is OFF by default. The shipped severity floor is HIGH, but enabled: false means it never prompts. --human-approval flips it on; --hilt-min-severity adjusts the floor.
  • Only the JSONL fallback writes. audit_sinks: [] means no Splunk, no OTLP — the gateway still tails everything to ~/.defenseclaw/gateway.jsonl for defenseclaw alerts and the TUI. Wire external sinks via setup splunk or setup local-observability.

Tuning by risk tolerance

You usually don't need a custom policy or rule pack — just a few knob changes.

"I'm in pilot, just observe"

defenseclaw policy activate permissive
defenseclaw setup guardrail --rule-pack permissive

Nothing blocks (block threshold = CRITICAL, Cisco trust = advisory). Everything still flows to the audit log and JSONL so you can review what would have happened. Recommended first week of any rollout.

"Move fast, stop only the obvious harm"

defenseclaw policy activate default
defenseclaw setup guardrail --rule-pack default
defenseclaw setup guardrail --human-approval --hilt-min-severity high

Default rules; HILT prompts only on HIGH+. Most engineering teams in the early/middle phase land here.

"Regulated workload, lock it down"

defenseclaw policy activate strict
defenseclaw setup guardrail --rule-pack strict
defenseclaw setup guardrail --human-approval --hilt-min-severity low \
  --judge-model openai/gpt-4o-mini

Strict policy (block ≥ MEDIUM, no allow-list bypass), strict rule pack (tightest regex + suppressions empty), HILT on every LOW+ event, LLM judge enabled. Combine with the bundled OpenShell sandbox profile and an MCP allow-list (in policies/strict.yaml's first_party_allow_list).

"I trust the scanner, raise its bar specifically"

The asset-class behavior is independent of the rule pack. Edit the active policy YAML directly:

```yaml title="policies/default.yaml override (apply with defenseclaw policy activate default)" scanner_overrides: mcp: medium: # was none/enable/none file: quarantine runtime: disable install: block


Then re-activate so OPA picks up the change:

```bash
defenseclaw policy activate default

What defenseclaw init doesn't change

A few defaults are intentionally fixed unless you edit ~/.defenseclaw/config.yaml directly:

KnobDefaultWhy fixed
~/.defenseclaw/gateway.jsonl (JSONL fallback path)always writtenReliability fallback — the gateway must always have a writable place to log when external sinks fail
guardrail.hook_fail_modeopenConservative — a malformed hook response shouldn't take the agent down
guardrail.judge.timeout30sHot-path latency budget for the judge
guardrail.judge.adjudication_timeout5sPer-prompt adjudication budget
guardrail.detection_strategyregex_judgeTested baseline — regex first, judge for medium+ findings
Bifrost retry policy3 attempts, exp backoffTested LLM-routing baseline

If you need to change any of these, edit ~/.defenseclaw/config.yaml directly and defenseclaw config validate confirms the schema.

Per-connector overrides

You can override per-connector in ~/.defenseclaw/config.yaml for the small number of cases where one agent needs different behaviour:

claude_code:
  enabled: true
  mode: action
  fail_mode: open                # LEGACY hint, not consumed by hooks; see Reference → Fail modes

codex:
  enabled: true
  mode: observe                  # softer for Codex than for Claude Code

The connector overlay is shallow-merged on top of the global config, so you only need to specify what changes. (As of this writing, the OPA policy is global — there's no per-connector policy override surface yet.)

Inspect the active defaults

defenseclaw config show                  # rendered ~/.defenseclaw/config.yaml (secrets masked)
defenseclaw policy list                  # all policies on disk + which is active
defenseclaw policy show <name>           # full content of one policy

config show always renders the resolved configuration — base + env-var overlay — so you can see the effective values without spelunking. Use --reveal to also show resolved secret values (still masked in the output for safety).

policy show <name> prints the policy YAML for the named file (default, strict, permissive, or any custom policies/<name>.yaml you've added). There's no built-in "dump a single rule by id" — for that, grep the rule pack directly:

grep -rn "rule_id_you_care_about" "$(awk '/rule_pack_dir/ {print $2}' ~/.defenseclaw/config.yaml)"

Reset to defaults

There's no --reset flag. Two real paths exist:

Soft reset (most common) — just re-run setup with the defaults you want. setup guardrail overwrites the relevant guardrail.* keys idempotently:

defenseclaw setup guardrail --rule-pack default --no-human-approval
defenseclaw policy activate default

Hard reset (start from zero)defenseclaw uninstall archives ~/.defenseclaw/ to a timestamped backup, so you can roll back:

defenseclaw uninstall
defenseclaw init
defenseclaw setup guardrail

See also