defenseclaw setup guardrail

The central command. Routes LLM traffic through the Go guardrail proxy, configures observe vs action mode, picks the connector, scanner, rule pack, judge, and HITL behaviour, then restarts the gateway.

defenseclaw setup guardrail is the operator command. It picks the connector, picks the mode, points the scanners at the right backend, optionally enables the LLM judge, and configures human-in-the-loop. Every other setup verb in DefenseClaw is a thin wrapper around this one.

Run it interactively the first time — the wizard explains each choice and picks safe defaults. Once the configuration is happy, re-run with explicit flags (or use --non-interactive) for unattended setups and CI.

Watch the interactive flow

The animation below replays a real defenseclaw setup guardrail session — connector pick, integration mode, observe vs action, hook fail-mode, scanner backend, judge, advanced options, and the final summary. Hover or press Pause to study a frame; press Restart to replay.

~/code/your-agent-repo

playing

Defaults shown in [brackets]. Pressing Enter accepts the default; the orange characters represent the operator's reply.

Same setup, no prompts

Every choice the wizard makes has a flag (with a couple of documented exceptions, below). The CI-friendly equivalent of the demo above is one command:

defenseclaw setup guardrail \
  --non-interactive \
  --connector claudecode \
  --mode observe \
  --scanner-mode local \
  --detection-strategy regex_only \
  --restart

Pass --non-interactive (or --accept-defaults) to skip every prompt; missing flags fall back to the same defaults the wizard would have offered. The judge stays off until you pass --judge-model (or pick a strategy that uses it); --detection-strategy regex_only here just makes that explicit.

Don't hand-roll the flags

The Command generator builds a non-interactive defenseclaw setup guardrail invocation for any connector with all the knobs below — mode, scanner, judge, HITL, advanced — and surfaces validation warnings inline.

Prompt → flag mapping

Each row corresponds to one prompt in the animation. Default shows what pressing Enter selects; Flag is the CI-shaped equivalent. Rows tagged interactive-only have no direct flag — the Note explains the workaround.

Prompt (default)	Flag equivalent	Note
Which agent framework? (default: previous selection or auto-detected)	`--connector` / `--agent`	Explicitly selects the connector being configured. Accepts `claudecode`, `codex`, `cursor`, `windsurf`, `geminicli`, `copilot`, `hermes`, `openhands`, `antigravity`, `opencode`, `omnigent`, `openclaw`, `zeptoclaw`.
Direct-to-upstream quick setup (default: observe)	`defenseclaw setup <connector> --mode observe\|action`	Claude Code, Codex, Cursor, Windsurf, Gemini CLI, Copilot CLI, Hermes, OpenHands, Antigravity, OpenCode, and OmniGent all have one-line hook or policy setup aliases.
Enable guardrail? (default: Y)	implicit when `--connector` is set	Pass `--disable` to roll the guardrail back.
Select mode (default: observe)	`--mode observe\|action`	`observe` logs; `action` enforces.
Select hook fail mode (default: current config; closed on a fresh install)	interactive-only here	Change later with `defenseclaw guardrail fail-mode <open\|closed>`. Only asked on first setup or when `--mode` changes.
Human approval for risky actions? (action mode only, default: current)	`--human-approval` / `--no-human-approval`	Skipped entirely in observe mode.
Approval minimum severity (if HITL on, default: high)	`--hilt-min-severity high\|medium\|low\|critical`	Click renders the choices lowercase; case-insensitive matching, so `HIGH` also works.
Select scanner engine (default: local)	`--scanner-mode local\|remote\|both`	Wizard never picks `both` — pass the flag to run the union.
API endpoint / key env / timeout (remote scanner only)	`--cisco-endpoint`, `--cisco-api-key-env`, `--cisco-timeout-ms`	Defaults inherit from existing `aid.*` config.
Enable LLM judge? (default: current)	implicit when `--judge-model` is set	Pass `--detection-strategy regex_only` to force the judge off; the bare guardrail command has no `--no-judge` toggle.
Select judge strategy (if judge on, default: regex_judge)	`--detection-strategy regex_only\|regex_judge\|judge_first`	—
Inherit unified LLM key for judge? (default: Y)	`--judge-api-key-env`, or `--inherit-llm` to copy the connector's agent-side LLM block wholesale	Override with a different env var to use a separate key, or `--inherit-from guardrail.judge` to start from the previous judge config.
Who consumes the unified LLM key? (proxy connectors only)	`--llm-role judge_only\|judge_and_agent`	Defaults to `judge_only` for hook-based connectors (Claude Code, Codex, ...) and `judge_and_agent` for proxy connectors (OpenClaw, ZeptoClaw). See Unified LLM key → Hook-based vs proxy-based connectors.
Judge provider / region / instance (judge on, advanced)	`--judge-provider`, `--judge-region`, `--judge-instance-name`	Use `--judge-provider bedrock\|vertex_ai\|azure\|anthropic\|openai\|...` and the matching regional flags below to point the judge at a different backend than the agent LLM. `--judge-instance-name` binds to a `~/.defenseclaw/custom-providers.json` overlay entry.
Configure judge fallback models? (default: N)	interactive-only	Edit `guardrail.judge.fallback_models` in `~/.defenseclaw/config.yaml` to script.
Configure advanced options? (default: N)	—	Gates the next three prompts.
Guardrail proxy port (default: 4000)	`--port`	—
Custom block message? (action mode only)	`--block-message`	—
Disable redaction? (default: current)	`--disable-redaction` / `--enable-redaction`	Only disable inside trusted, single-tenant environments.

On a multi-connector install, --connector <name> scopes this setup run to that connector. When --block-message is supplied without --connector, setup treats it as broad operator intent: it updates the shared block message and reconciles active connector overrides so one connector does not keep stale block text.

Two genuinely interactive-only choices on this command: hook fail mode (use defenseclaw guardrail fail-mode afterward), and judge fallback models (edit config.yaml). Everything else round-trips through flags.

Two modes you have to choose between

observe

Log findings to the audit DB and sinks. Block nothing. Run this for at least a week before promoting.

action

Apply the selected policy thresholds. In the default balanced profile, CRITICAL blocks, HIGH alerts or confirms with HITL, and MEDIUM alerts.

Connector resolution

When you omit --connector, DefenseClaw resolves it in this order:

--connector flag (operator intent always wins)
Existing guardrail.connector if you have run setup before
<data_dir>/picked_connector hint written by scripts/install.sh --connector ...
Filesystem auto-detection (only in interactive mode)
Fallback to openclaw

Tabs by mode (non-interactive recipes)

defenseclaw setup guardrail \
  --non-interactive \
  --connector claudecode \
  --mode observe \
  --scanner-mode local \
  --restart

The audit DB fills up with every prompt and tool call. Nothing blocks. Open defenseclaw tui for the live audit panel, or tail -f ~/.defenseclaw/gateway.jsonl | jq for a scripted view.

defenseclaw setup guardrail \
  --non-interactive \
  --connector claudecode \
  --mode action \
  --scanner-mode local \
  --rule-pack default \
  --restart

With the default balanced profile, CRITICAL findings block immediately, HIGH and MEDIUM findings alert, and LOW findings allow. Operators see the block or alert context supported by their connector; the audit log captures every verdict.

defenseclaw setup guardrail \
  --non-interactive \
  --connector claudecode \
  --mode action \
  --rule-pack default \
  --human-approval \
  --hilt-min-severity high \
  --restart

HIGH findings are eligible for confirmation. CRITICAL still blocks unconditionally. On Claude Code, PreToolUse can surface a native ask; on Codex, confirm falls back to an alert/system message with raw_action preserved and does not create a TUI approval. See the HITL page for the full matrix.

export DEFENSECLAW_LLM_KEY='replace-with-your-key'

defenseclaw setup guardrail \
  --non-interactive \
  --connector claudecode \
  --mode action \
  --detection-strategy regex_judge \
  --judge-model anthropic/claude-sonnet-4-20250514 \
  --judge-api-key-env DEFENSECLAW_LLM_KEY \
  --restart

Regex still runs first (cheap and offline). The judge adjudicates anything regex flagged as ambiguous. Detection strategy judge_first flips the order — useful when regex is too noisy.

defenseclaw setup guardrail \
  --non-interactive \
  --connector claudecode \
  --mode action \
  --detection-strategy regex_judge \
  --judge-provider bedrock \
  --judge-model us.anthropic.claude-sonnet-4-6 \
  --judge-bedrock-region us-east-1 \
  --judge-bedrock-auth-mode iam_credentials \
  --judge-bedrock-access-key-env AWS_ACCESS_KEY_ID \
  --judge-bedrock-secret-key-env AWS_SECRET_ACCESS_KEY \
  --judge-bedrock-inference-profile us. \
  --restart

Routes the judge through AWS Bedrock instead of a SaaS endpoint. boto3 ships in the base install, so no extra pip install is needed. Swap --judge-provider vertex_ai + --judge-vertex-{project-id,region,auth-mode,service-account-json-env} for GCP Vertex, or --judge-provider azure + --judge-azure-{endpoint,api-version,auth-mode,deployment-alias} for Azure OpenAI. For self-signed lab endpoints, add --judge-tls-ca-cert-file /etc/ssl/lab-root.pem. See Unified LLM key → Regional providers for the full matrix.

If the same Bedrock / Vertex / Azure posture is already configured on a custom-provider overlay entry (region, auth mode, deployment aliases, TLS — see Bedrock / Vertex AI / Azure on a custom instance), the judge can inherit it with a single --judge-instance-name <name> instead of repeating every flag. Role-level judge flags still win field-by-field, so a shared overlay can supply auth credentials while --judge-bedrock-region pins a different region per environment.

Every flag

Prop

Type

What setup writes

config.yaml

picked_connector

settings.json (DefenseClaw hook entries appended)

A hash-checked backup is stored before edits; teardown restores or surgically removes only DefenseClaw-owned entries. See the per-connector pages for the exact files mutated for each agent.

Verify it worked

defenseclaw doctor
defenseclaw status
defenseclaw alerts --limit 25

doctor prints the full health report. status shows enforcement flags plus a per-connector block for every active connector (it has no flags of its own — it always reports the full Agents roster from config and /health, identical layout whether one or N connectors are wired). alerts lists recent decisions as a table; pass --connector <name> to filter by connector attribution and --show <n> to expand a specific row. For a live stream, open defenseclaw tui (interactive) or tail -f ~/.defenseclaw/gateway.jsonl | jq (scripted).

Interactive vs non-interactive — every command

DefenseClaw is interactive-first. When you run a setup or init verb at a terminal, you get a wizard with sane defaults and inline help. The same commands accept flags for unattended runs.

The matrix below is the source of truth. ✓ means "supported"; ✗ means "not supported on this command — see Notes for the workaround". A blank cell in the Notes column means there's no asymmetry worth flagging.

Command	Interactive wizard	Non-interactive flags	Notes
`defenseclaw init`	✓ default on a TTY	✓ `--non-interactive --yes` + per-knob flags	Identical to the wizard mode of `quickstart` once choices are made; both call `bootstrap.run_first_run()`.
`defenseclaw quickstart`	✗ never prompts	✓ always	Zero-prompt by design. There is no wizard variant — use `init` for that.
`defenseclaw setup guardrail`	✓ default	✓ `--non-interactive` + flags	Two prompts have no flag — see callout above.
`defenseclaw setup claude-code`	✓ confirm prompt	✓ `--yes`, `--mode`, `--restart/--no-restart`, policy flags	Direct-to-upstream wrapper; observe by default, action returns supported lifecycle verdicts.
`defenseclaw setup codex`	✓ confirm prompt	✓ `--yes`, `--mode`, `--restart/--no-restart`, policy flags	Same shape as `setup claude-code`.
`defenseclaw setup cursor` / `setup windsurf` / `setup geminicli` / `setup copilot` / `setup openhands` / `setup antigravity` / `setup hermes` / `setup opencode` / `setup omnigent`	✓ confirm prompt	✓ `--yes`, `--mode`, `--restart/--no-restart`, connector policy flags	Generated wrappers; observe by default, action uses the connector's native hook or policy decision surface.
`defenseclaw setup openclaw` / `setup zeptoclaw`	✓ confirm prompt	✓ `--yes`	Confirm is skippable; the full guardrail wizard is not available here — pass `setup guardrail` flags after the confirm.
`defenseclaw setup splunk`	✓ wizard when no mode flag	✓ `--non-interactive` + `--logs` / `--enterprise` / `--o11y`
`defenseclaw setup local-observability`	✗ no prompts	✓ flags only	Compose-style up/down/logs subcommands; never asks.
`defenseclaw setup mcp-scanner`	✓ wizard	✓ `--non-interactive`
`defenseclaw setup skill-scanner`	✓ wizard	✓ `--non-interactive`
`defenseclaw registry add / edit / remove / sync`	✓ prompts for missing args	✓ `--non-interactive` + required flags	`sync` (not `refresh`) fetches + scans + promotes. Module is built dual-path on purpose.
`defenseclaw doctor`	conditional — only with `--fix`	✓ `--fix --yes`	Default run is read-only and never prompts.
`defenseclaw agent discover`	✗ no prompts	✓ flags / `--json`	Read-only inventory command.
`defenseclaw aibom scan`	✗ no prompts	✓ flags only	Read-only.
`defenseclaw policy ...`	✗ no prompts	✓ flags only	Pure CLI for repeatable policy authoring.
`defenseclaw tui`	✓ full TUI	—	Interactive only — Textual dashboard with audit, alerts, logs, inventory, and setup panels.
`defenseclaw alerts`	✗ no prompts	✓ `--limit`, `--show <n>`, `acknowledge` / `dismiss` subcommands	Snapshot view of recent alerts; not a live tail.
`defenseclaw-gateway audit export` (Go binary)	✗ no prompts	✓ `--output`, `--limit`, `--include-activity`	JSONL export of `audit_events` from the SQLite DB.
`defenseclaw-gateway` (sidecar daemon)	✗ no prompts	✓ flags only	Long-running gateway — `--config`, `--port`, `--log-level`. Started for you by `setup guardrail --restart`.

The two interactive-only corners worth knowing about: the integration-mode submenu inside setup guardrail (Claude Code / Codex), and the judge fallback-models prompt. Everything else has a flag.

The non-interactive-only group is quickstart, setup local-observability, policy, agent discover, aibom scan, defenseclaw-gateway audit export, and gateway start — all by design (CI-shaped, read-only, or daemon).

Add — keep the existing connector(s) and layer the new one in. The gateway now enforces guardrail policy for both, each under its own guardrail.connectors.<name> block, and claw.mode flips to multi. This is the multi-connector path.
Replace — tear down the previous connector's wiring and make the new one the single active connector (documented in Changing connectors).

Proxy connectors (OpenClaw, ZeptoClaw) can't be Add peers — they own the traffic plane, so only one runs at a time.

defenseclaw setup guardrail

Watch the interactive flow

Same setup, no prompts

Prompt → flag mapping

Two modes you have to choose between

observe

action

Connector resolution

Tabs by mode (non-interactive recipes)

Every flag

What setup writes

Verify it worked

Interactive vs non-interactive — every command

Common follow-ups

Quick aliases

Multi-connector

Changing connectors

Disabling

HITL

Add vs Replace: keeping more than one connector wired

On this page