SetupGuardrail

defenseclaw setup guardrail

The central command. Routes LLM traffic through the Go guardrail proxy, configures observe vs action mode, picks the connector, scanner, rule pack, judge, and HITL behaviour, then restarts the gateway.

defenseclaw setup guardrail is the operator command. It picks the connector, picks the mode, points the scanners at the right backend, optionally enables the LLM judge, and configures human-in-the-loop. Every other setup verb in DefenseClaw is a thin wrapper around this one.

Run it interactively the first time — the wizard explains each choice and picks safe defaults. Once the configuration is happy, re-run with explicit flags (or use --non-interactive) for unattended setups and CI.

Watch the interactive flow

The animation below replays a real defenseclaw setup guardrail session — connector pick, integration mode, observe vs action, hook fail-mode, scanner backend, judge, advanced options, and the final summary. Hover or press Pause to study a frame; press Restart to replay.

~/code/your-agent-repo
Defaults shown in [brackets]. Pressing Enter accepts the default; the orange characters represent the operator's reply.

Same setup, no prompts

Every choice the wizard makes has a flag (with a couple of documented exceptions, below). The CI-friendly equivalent of the demo above is one command:

defenseclaw setup guardrail \
  --non-interactive \
  --connector claudecode \
  --mode observe \
  --scanner-mode local \
  --detection-strategy regex_only \
  --restart

Pass --non-interactive (or --accept-defaults) to skip every prompt; missing flags fall back to the same defaults the wizard would have offered. The judge stays off until you pass --judge-model (or pick a strategy that uses it); --detection-strategy regex_only here just makes that explicit.

Don't hand-roll the flags

The Command generator builds a non-interactive defenseclaw setup guardrail invocation for any connector with all the knobs below — mode, scanner, judge, HITL, advanced — and surfaces validation warnings inline.

Prompt → flag mapping

Each row corresponds to one prompt in the animation. Default shows what pressing Enter selects; Flag is the CI-shaped equivalent. Rows tagged interactive-only have no direct flag — the Note explains the workaround.

Prompt (default)Flag equivalentNote
Which agent framework? (default: previous selection or auto-detected)--connector / --agentExplicitly selects the connector being configured. Accepts claudecode, codex, cursor, windsurf, geminicli, copilot, hermes, openhands, antigravity, opencode, omnigent, openclaw, zeptoclaw.
Direct-to-upstream quick setup (default: observe)defenseclaw setup <connector> --mode observe|actionClaude Code, Codex, Cursor, Windsurf, Gemini CLI, Copilot CLI, Hermes, OpenHands, Antigravity, OpenCode, and OmniGent all have one-line hook or policy setup aliases.
Enable guardrail? (default: Y)implicit when --connector is setPass --disable to roll the guardrail back.
Select mode (default: observe)--mode observe|actionobserve logs; action enforces.
Select hook fail mode (default: current config; closed on a fresh install)interactive-only hereChange later with defenseclaw guardrail fail-mode <open|closed>. Only asked on first setup or when --mode changes.
Human approval for risky actions? (action mode only, default: current)--human-approval / --no-human-approvalSkipped entirely in observe mode.
Approval minimum severity (if HITL on, default: high)--hilt-min-severity high|medium|low|criticalClick renders the choices lowercase; case-insensitive matching, so HIGH also works.
Select scanner engine (default: local)--scanner-mode local|remote|bothWizard never picks both — pass the flag to run the union.
API endpoint / key env / timeout (remote scanner only)--cisco-endpoint, --cisco-api-key-env, --cisco-timeout-msDefaults inherit from existing aid.* config.
Enable LLM judge? (default: current)implicit when --judge-model is setPass --detection-strategy regex_only to force the judge off; the bare guardrail command has no --no-judge toggle.
Select judge strategy (if judge on, default: regex_judge)--detection-strategy regex_only|regex_judge|judge_first
Inherit unified LLM key for judge? (default: Y)--judge-api-key-env, or --inherit-llm to copy the connector's agent-side LLM block wholesaleOverride with a different env var to use a separate key, or --inherit-from guardrail.judge to start from the previous judge config.
Who consumes the unified LLM key? (proxy connectors only)--llm-role judge_only|judge_and_agentDefaults to judge_only for hook-based connectors (Claude Code, Codex, ...) and judge_and_agent for proxy connectors (OpenClaw, ZeptoClaw). See Unified LLM key → Hook-based vs proxy-based connectors.
Judge provider / region / instance (judge on, advanced)--judge-provider, --judge-region, --judge-instance-nameUse --judge-provider bedrock|vertex_ai|azure|anthropic|openai|... and the matching regional flags below to point the judge at a different backend than the agent LLM. --judge-instance-name binds to a ~/.defenseclaw/custom-providers.json overlay entry.
Configure judge fallback models? (default: N)interactive-onlyEdit guardrail.judge.fallback_models in ~/.defenseclaw/config.yaml to script.
Configure advanced options? (default: N)Gates the next three prompts.
Guardrail proxy port (default: 4000)--port
Custom block message? (action mode only)--block-message
Disable redaction? (default: current)--disable-redaction / --enable-redactionOnly disable inside trusted, single-tenant environments.

On a multi-connector install, --connector <name> scopes this setup run to that connector. When --block-message is supplied without --connector, setup treats it as broad operator intent: it updates the shared block message and reconciles active connector overrides so one connector does not keep stale block text.

Two genuinely interactive-only choices on this command: hook fail mode (use defenseclaw guardrail fail-mode afterward), and judge fallback models (edit config.yaml). Everything else round-trips through flags.

Two modes you have to choose between

observe

Log findings to the audit DB and sinks. Block nothing. Run this for at least a week before promoting.

action

Apply the selected policy thresholds. In the default balanced profile, CRITICAL blocks, HIGH alerts or confirms with HITL, and MEDIUM alerts.

Connector resolution

When you omit --connector, DefenseClaw resolves it in this order:

  1. --connector flag (operator intent always wins)
  2. Existing guardrail.connector if you have run setup before
  3. <data_dir>/picked_connector hint written by scripts/install.sh --connector ...
  4. Filesystem auto-detection (only in interactive mode)
  5. Fallback to openclaw

Tabs by mode (non-interactive recipes)

defenseclaw setup guardrail \
  --non-interactive \
  --connector claudecode \
  --mode observe \
  --scanner-mode local \
  --restart

The audit DB fills up with every prompt and tool call. Nothing blocks. Open defenseclaw tui for the live audit panel, or tail -f ~/.defenseclaw/gateway.jsonl | jq for a scripted view.

defenseclaw setup guardrail \
  --non-interactive \
  --connector claudecode \
  --mode action \
  --scanner-mode local \
  --rule-pack default \
  --restart

With the default balanced profile, CRITICAL findings block immediately, HIGH and MEDIUM findings alert, and LOW findings allow. Operators see the block or alert context supported by their connector; the audit log captures every verdict.

defenseclaw setup guardrail \
  --non-interactive \
  --connector claudecode \
  --mode action \
  --rule-pack default \
  --human-approval \
  --hilt-min-severity high \
  --restart

HIGH findings are eligible for confirmation. CRITICAL still blocks unconditionally. On Claude Code, PreToolUse can surface a native ask; on Codex, confirm falls back to an alert/system message with raw_action preserved and does not create a TUI approval. See the HITL page for the full matrix.

export DEFENSECLAW_LLM_KEY='replace-with-your-key'

defenseclaw setup guardrail \
  --non-interactive \
  --connector claudecode \
  --mode action \
  --detection-strategy regex_judge \
  --judge-model anthropic/claude-sonnet-4-20250514 \
  --judge-api-key-env DEFENSECLAW_LLM_KEY \
  --restart

Regex still runs first (cheap and offline). The judge adjudicates anything regex flagged as ambiguous. Detection strategy judge_first flips the order — useful when regex is too noisy.

defenseclaw setup guardrail \
  --non-interactive \
  --connector claudecode \
  --mode action \
  --detection-strategy regex_judge \
  --judge-provider bedrock \
  --judge-model us.anthropic.claude-sonnet-4-6 \
  --judge-bedrock-region us-east-1 \
  --judge-bedrock-auth-mode iam_credentials \
  --judge-bedrock-access-key-env AWS_ACCESS_KEY_ID \
  --judge-bedrock-secret-key-env AWS_SECRET_ACCESS_KEY \
  --judge-bedrock-inference-profile us. \
  --restart

Routes the judge through AWS Bedrock instead of a SaaS endpoint. boto3 ships in the base install, so no extra pip install is needed. Swap --judge-provider vertex_ai + --judge-vertex-{project-id,region,auth-mode,service-account-json-env} for GCP Vertex, or --judge-provider azure + --judge-azure-{endpoint,api-version,auth-mode,deployment-alias} for Azure OpenAI. For self-signed lab endpoints, add --judge-tls-ca-cert-file /etc/ssl/lab-root.pem. See Unified LLM key → Regional providers for the full matrix.

If the same Bedrock / Vertex / Azure posture is already configured on a custom-provider overlay entry (region, auth mode, deployment aliases, TLS — see Bedrock / Vertex AI / Azure on a custom instance), the judge can inherit it with a single --judge-instance-name <name> instead of repeating every flag. Role-level judge flags still win field-by-field, so a shared overlay can supply auth credentials while --judge-bedrock-region pins a different region per environment.

Every flag

Prop

Type

What setup writes

config.yaml
picked_connector
settings.json (DefenseClaw hook entries appended)

A hash-checked backup is stored before edits; teardown restores or surgically removes only DefenseClaw-owned entries. See the per-connector pages for the exact files mutated for each agent.

Verify it worked

defenseclaw doctor
defenseclaw status
defenseclaw alerts --limit 25

doctor prints the full health report. status shows enforcement flags plus a per-connector block for every active connector (it has no flags of its own — it always reports the full Agents roster from config and /health, identical layout whether one or N connectors are wired). alerts lists recent decisions as a table; pass --connector <name> to filter by connector attribution and --show <n> to expand a specific row. For a live stream, open defenseclaw tui (interactive) or tail -f ~/.defenseclaw/gateway.jsonl | jq (scripted).

Interactive vs non-interactive — every command

DefenseClaw is interactive-first. When you run a setup or init verb at a terminal, you get a wizard with sane defaults and inline help. The same commands accept flags for unattended runs.

The matrix below is the source of truth. ✓ means "supported"; ✗ means "not supported on this command — see Notes for the workaround". A blank cell in the Notes column means there's no asymmetry worth flagging.

CommandInteractive wizardNon-interactive flagsNotes
defenseclaw init✓ default on a TTY--non-interactive --yes + per-knob flagsIdentical to the wizard mode of quickstart once choices are made; both call bootstrap.run_first_run().
defenseclaw quickstart✗ never prompts✓ alwaysZero-prompt by design. There is no wizard variant — use init for that.
defenseclaw setup guardrail✓ default--non-interactive + flagsTwo prompts have no flag — see callout above.
defenseclaw setup claude-code✓ confirm prompt--yes, --mode, --restart/--no-restart, policy flagsDirect-to-upstream wrapper; observe by default, action returns supported lifecycle verdicts.
defenseclaw setup codex✓ confirm prompt--yes, --mode, --restart/--no-restart, policy flagsSame shape as setup claude-code.
defenseclaw setup cursor / setup windsurf / setup geminicli / setup copilot / setup openhands / setup antigravity / setup hermes / setup opencode / setup omnigent✓ confirm prompt--yes, --mode, --restart/--no-restart, connector policy flagsGenerated wrappers; observe by default, action uses the connector's native hook or policy decision surface.
defenseclaw setup openclaw / setup zeptoclaw✓ confirm prompt--yesConfirm is skippable; the full guardrail wizard is not available here — pass setup guardrail flags after the confirm.
defenseclaw setup splunk✓ wizard when no mode flag--non-interactive + --logs / --enterprise / --o11y
defenseclaw setup local-observability✗ no prompts✓ flags onlyCompose-style up/down/logs subcommands; never asks.
defenseclaw setup mcp-scanner✓ wizard--non-interactive
defenseclaw setup skill-scanner✓ wizard--non-interactive
defenseclaw registry add / edit / remove / sync✓ prompts for missing args--non-interactive + required flagssync (not refresh) fetches + scans + promotes. Module is built dual-path on purpose.
defenseclaw doctorconditional — only with --fix--fix --yesDefault run is read-only and never prompts.
defenseclaw agent discover✗ no prompts✓ flags / --jsonRead-only inventory command.
defenseclaw aibom scan✗ no prompts✓ flags onlyRead-only.
defenseclaw policy ...✗ no prompts✓ flags onlyPure CLI for repeatable policy authoring.
defenseclaw tui✓ full TUIInteractive only — Textual dashboard with audit, alerts, logs, inventory, and setup panels.
defenseclaw alerts✗ no prompts--limit, --show <n>, acknowledge / dismiss subcommandsSnapshot view of recent alerts; not a live tail.
defenseclaw-gateway audit export (Go binary)✗ no prompts--output, --limit, --include-activityJSONL export of audit_events from the SQLite DB.
defenseclaw-gateway (sidecar daemon)✗ no prompts✓ flags onlyLong-running gateway — --config, --port, --log-level. Started for you by setup guardrail --restart.

The two interactive-only corners worth knowing about: the integration-mode submenu inside setup guardrail (Claude Code / Codex), and the judge fallback-models prompt. Everything else has a flag.

The non-interactive-only group is quickstart, setup local-observability, policy, agent discover, aibom scan, defenseclaw-gateway audit export, and gateway start — all by design (CI-shaped, read-only, or daemon).

Common follow-ups

Add vs Replace: keeping more than one connector wired

When you run a hook setup (defenseclaw setup codex, setup claude-code, …) and a different connector is already wired, DefenseClaw asks whether to Add or Replace:

  • Add — keep the existing connector(s) and layer the new one in. The gateway now enforces guardrail policy for both, each under its own guardrail.connectors.<name> block, and claw.mode flips to multi. This is the multi-connector path.
  • Replace — tear down the previous connector's wiring and make the new one the single active connector (documented in Changing connectors).

Proxy connectors (OpenClaw, ZeptoClaw) can't be Add peers — they own the traffic plane, so only one runs at a time.