CiscoCiscoDefenseClaw
SetupGuardrail

defenseclaw setup guardrail

The central command. Routes LLM traffic through the Go guardrail proxy, configures observe vs action mode, picks the connector, scanner, rule pack, judge, and HITL behaviour, then restarts the gateway.

defenseclaw setup guardrail is the operator command. It picks the connector, picks the mode, points the scanners at the right backend, optionally enables the LLM judge, and configures human-in-the-loop. Every other setup verb in DefenseClaw is a thin wrapper around this one.

Run it interactively the first time — the wizard explains each choice and picks safe defaults. Once the configuration is happy, re-run with explicit flags (or use --non-interactive) for unattended setups and CI.

Watch the interactive flow

The animation below replays a real defenseclaw setup guardrail session — connector pick, integration mode, observe vs action, hook fail-mode, scanner backend, judge, advanced options, and the final summary. Hover or press Pause to study a frame; press Restart to replay.

~/code/your-agent-repo
Defaults shown in [brackets]. Pressing Enter accepts the default; the orange characters represent the operator's reply.

Same setup, no prompts

Every choice the wizard makes has a flag (with a couple of documented exceptions, below). The CI-friendly equivalent of the demo above is one command:

defenseclaw setup guardrail \
  --non-interactive \
  --connector claudecode \
  --mode observe \
  --scanner-mode local \
  --detection-strategy regex_only \
  --restart

Pass --non-interactive (or --accept-defaults) to skip every prompt; missing flags fall back to the same defaults the wizard would have offered. The judge stays off until you pass --judge-model (or pick a strategy that uses it); --detection-strategy regex_only here just makes that explicit.

Prompt → flag mapping

Each row corresponds to one prompt in the animation. Default shows what pressing Enter selects; Flag is the CI-shaped equivalent. Rows tagged interactive-only have no direct flag — the Note explains the workaround.

Prompt (default)Flag equivalentNote
Which agent framework? (default: previous selection or auto-detected)--connector / --agentAccepts claudecode, codex, cursor, windsurf, geminicli, copilot, hermes, openclaw, zeptoclaw.
Select integration mode — Claude Code / Codex only (default: 2 = full guardrail)interactive-onlyUse setup claude-code or setup codex for the observability-only branch from a script.
Enable guardrail? (default: Y)implicit when --connector is setPass --disable to roll the guardrail back.
Select mode (default: observe)--mode observe|actionobserve logs; action enforces.
Select hook fail mode (default: open)interactive-only hereChange later with defenseclaw guardrail fail-mode <open|closed>. Only asked on first setup or when --mode changes.
Human approval for risky actions? (action mode only, default: current)--human-approval / --no-human-approvalSkipped entirely in observe mode.
Approval minimum severity (if HITL on, default: high)--hilt-min-severity high|medium|low|criticalClick renders the choices lowercase; case-insensitive matching, so HIGH also works.
Select scanner engine (default: local)--scanner-mode local|remote|bothWizard never picks both — pass the flag to run the union.
API endpoint / key env / timeout (remote scanner only)--cisco-endpoint, --cisco-api-key-env, --cisco-timeout-msDefaults inherit from existing aid.* config.
Enable LLM judge? (default: current)implicit when --judge-model is setPass --detection-strategy regex_only to force the judge off; the bare guardrail command has no --no-judge toggle.
Select judge strategy (if judge on, default: regex_judge)--detection-strategy regex_only|regex_judge|judge_first
Inherit unified LLM key for judge? (default: Y)--judge-api-key-envOverride with a different env var to use a separate key.
Configure judge fallback models? (default: N)interactive-onlyEdit guardrail.judge.fallback_models in ~/.defenseclaw/config.yaml to script.
Configure advanced options? (default: N)Gates the next three prompts.
Guardrail proxy port (default: 4000)--port
Custom block message? (action mode only)--block-message
Disable redaction? (default: current)--disable-redaction / --enable-redactionOnly disable inside trusted, single-tenant environments.

Three genuinely interactive-only choices on this command: integration mode (Claude Code / Codex split), hook fail mode (use defenseclaw guardrail fail-mode afterward), and judge fallback models (edit config.yaml). Everything else round-trips through flags.

Two modes you have to choose between

observe

Log findings to the audit DB and sinks. Block nothing. Run this for at least a week before promoting.

action

Block on configured severities. CRITICAL always blocks. HIGH blocks unless HITL is on.

Connector resolution

When you omit --connector, DefenseClaw resolves it in this order:

  1. --connector flag (operator intent always wins)
  2. Existing guardrail.connector if you have run setup before
  3. <data_dir>/picked_connector hint written by scripts/install.sh --connector ...
  4. Filesystem auto-detection (only in interactive mode)
  5. Fallback to openclaw

Tabs by mode (non-interactive recipes)

defenseclaw setup guardrail \
  --non-interactive \
  --connector claudecode \
  --mode observe \
  --scanner-mode local \
  --restart

The audit DB fills up with every prompt and tool call. Nothing blocks. Open defenseclaw tui for the live audit panel, or tail -f ~/.defenseclaw/gateway.jsonl | jq for a scripted view.

defenseclaw setup guardrail \
  --non-interactive \
  --connector claudecode \
  --mode action \
  --scanner-mode local \
  --rule-pack default \
  --restart

CRITICAL findings block immediately. HIGH findings block. Operators see the block reason in the agent UI; the audit log captures the verdict.

defenseclaw setup guardrail \
  --non-interactive \
  --connector claudecode \
  --mode action \
  --rule-pack strict \
  --human-approval \
  --hilt-min-severity high \
  --restart

HIGH findings pause for operator approval. CRITICAL still blocks unconditionally. On Claude Code, the approval prompt surfaces as a native PreToolUse ask; on Codex, it downgrades to a TUI prompt. See the HITL page for the full matrix.

export DEFENSECLAW_LLM_KEY=<your-key>

defenseclaw setup guardrail \
  --non-interactive \
  --connector claudecode \
  --mode action \
  --detection-strategy regex_judge \
  --judge-model anthropic/claude-sonnet-4-20250514 \
  --judge-api-key-env DEFENSECLAW_LLM_KEY \
  --restart

Regex still runs first (cheap and offline). The judge adjudicates anything regex flagged as ambiguous. Detection strategy judge_first flips the order — useful when regex is too noisy.

Every flag

Prop

Type

What setup writes

config.yaml
picked_connector
settings.json (DefenseClaw hook entries appended)

A hash-checked backup is stored before edits; teardown restores or surgically removes only DefenseClaw-owned entries. See the per-connector pages for the exact files mutated for each agent.

Verify it worked

defenseclaw doctor
defenseclaw status
defenseclaw alerts --limit 25

doctor prints the full health report. status shows the resolved connector + enforcement flags (it has no flags of its own — it always reports the active connector from config and /health). alerts lists recent decisions as a table; pass --show <n> to expand a specific row. For a live stream, open defenseclaw tui (interactive) or tail -f ~/.defenseclaw/gateway.jsonl | jq (scripted).

Interactive vs non-interactive — every command

DefenseClaw is interactive-first. When you run a setup or init verb at a terminal, you get a wizard with sane defaults and inline help. The same commands accept flags for unattended runs.

The matrix below is the source of truth. ✓ means "supported"; ✗ means "not supported on this command — see Notes for the workaround". A blank cell in the Notes column means there's no asymmetry worth flagging.

CommandInteractive wizardNon-interactive flagsNotes
defenseclaw init✓ default on a TTY--non-interactive --yes + per-knob flagsIdentical to the wizard mode of quickstart once choices are made; both call bootstrap.run_first_run().
defenseclaw quickstart✗ never prompts✓ alwaysZero-prompt by design. There is no wizard variant — use init for that.
defenseclaw setup guardrail✓ default--non-interactive + flagsTwo prompts have no flag — see callout above.
defenseclaw setup claude-code✓ confirm prompt--yesLight wrapper that only configures observability. Full guardrail still goes through setup guardrail.
defenseclaw setup codex✓ confirm prompt--yesSame shape as setup claude-code.
defenseclaw setup cursor / setup windsurf / setup geminicli / setup copilot / setup hermes✓ confirm prompt--yesGenerated wrappers — confirmation only, then delegate.
defenseclaw setup openclaw / setup zeptoclaw✓ confirm prompt--yesConfirm is skippable; the full guardrail wizard is not available here — pass setup guardrail flags after the confirm.
defenseclaw setup splunk✓ wizard when no mode flag--non-interactive + --logs / --enterprise / --o11y
defenseclaw setup local-observability✗ no prompts✓ flags onlyCompose-style up/down/logs subcommands; never asks.
defenseclaw setup mcp-scanner✓ wizard--non-interactive
defenseclaw setup skill-scanner✓ wizard--non-interactive
defenseclaw registry add / edit / remove / sync✓ prompts for missing args--non-interactive + required flagssync (not refresh) fetches + scans + promotes. Module is built dual-path on purpose.
defenseclaw doctorconditional — only with --fix--fix --yesDefault run is read-only and never prompts.
defenseclaw agent discover✗ no prompts✓ flags / --jsonRead-only inventory command.
defenseclaw aibom scan✗ no prompts✓ flags onlyRead-only.
defenseclaw policy ...✗ no prompts✓ flags onlyPure CLI for repeatable policy authoring.
defenseclaw tui✓ full TUIInteractive only — Bubbletea dashboard with audit, alerts, logs, inventory panels.
defenseclaw alerts✗ no prompts--limit, --show <n>, acknowledge / dismiss subcommandsSnapshot view of recent alerts; not a live tail.
defenseclaw-gateway audit export (Go binary)✗ no prompts--output, --limit, --include-activityJSONL export of audit_events from the SQLite DB.
defenseclaw-gateway (sidecar daemon)✗ no prompts✓ flags onlyLong-running gateway — --config, --port, --log-level. Started for you by setup guardrail --restart.

The two interactive-only corners worth knowing about: the integration-mode submenu inside setup guardrail (Claude Code / Codex), and the judge fallback-models prompt. Everything else has a flag.

The non-interactive-only group is quickstart, setup local-observability, policy, agent discover, aibom scan, defenseclaw-gateway audit export, and gateway start — all by design (CI-shaped, read-only, or daemon).

Common follow-ups