Overview

The guardrail is a LiteLLM-compatible reverse proxy that sits between OpenClaw and the upstream model. Every prompt and every completion is inspected by a scanner pipeline before it leaves or returns, producing a verdict (allow, warn, block) that is either logged (observe) or enforced (action). The wizard installs the OpenClaw plugin, patches ~/.openclaw/openclaw.json so the agent runtime talks to the guardrail on 127.0.0.1:4000, and makes sure the sidecar reflects the chosen mode.

Interactive setup

defenseclaw setup guardrail

The interactive wizard walks through five questions in order, showing the current value and the defaults:

Mode — observe (log-only), action (block in-line), disabled.
Scanner mode — local (DefenseClaw rule packs + optional judge) or remote (Cisco AI Defense cloud scanner).
Port — default 4000. Shares port space with LiteLLM; reuses an existing master_key when present in ~/.openclaw/config.yaml.
Detection strategy — how rule packs classify content. See judge vs regex vs regex-judge.
LLM judge — optional. Choose model, provider, and API-key env var. Each judge invocation is metered via OpenTelemetry GenAI semconv so spend can be capped downstream.

On save the wizard:

Installs the OpenClaw plugin into ~/.defenseclaw/extensions/defenseclaw/ (from the bundled tarball, npm i, or a dev symlink — whichever path is available).
Patches ~/.openclaw/openclaw.json to add a guardrail block pointing at 127.0.0.1:4000 with the shared master_key.
Writes ~/.defenseclaw/.env with DEFENSECLAW_LLM_KEY (when a judge is configured) and any provider-specific variables.
Restarts the gateway and OpenClaw by default so the updated openclaw.json and guardrail settings are active.

Non-interactive setup

Every prompt has a flag equivalent — use these in CI:

defenseclaw setup guardrail \
  --mode observe \
  --scanner-mode local \
  --port 4000 \
  --detection-strategy regex_judge \
  --judge-model gpt-4o-mini \
  --judge-api-base https://api.openai.com/v1 \
  --judge-api-key-env DEFENSECLAW_LLM_KEY

Flag	Values	Default
`--mode`	`observe`, `action`	`observe`
`--scanner-mode`	`local`, `remote`	`local`
`--port`	integer, 1024–65535	`4000`
`--detection-strategy`	`regex_only`, `regex_judge`, `judge_first`	`regex_judge`
`--judge-model`	any LiteLLM-compatible slug	unset
`--judge-api-base`	URL	matches the slug
`--judge-api-key-env`	env var name	`DEFENSECLAW_LLM_KEY`
`--disable`	flag	disable guardrail, revert `openclaw.json` patch

See the autogenerated setup command page for the full flag table.

Picking a mode

Mode	Behavior	When to use
`observe`	Log verdicts, never block, never rewrite	Day 0–7 rollouts; tuning suppressions; SIEM baselining
`action`	Enforce verdicts: `block` short-circuits, `warn` annotates, `allow` passes through	Once observe has been quiet for ≥24h at your traffic volume
`disabled`	Plugin remains installed but the guardrail proxy is not bound; `openclaw.json` is reverted	Break-glass during an incident

Flip between observe and action with defenseclaw setup guardrail --mode <x> --non-interactive; the command restarts the gateway by default. Use --disable when you need to turn the proxy off.

Picking a scanner mode

Scanner mode	Needs	Latency profile
`local`	Default rule packs under `~/.defenseclaw/policy/guardrail/`	Single-digit ms per request (regex only); ~150–400ms when the judge is on
`remote`	`CISCO_AI_DEFENSE_API_KEY`	Single-digit → double-digit ms per request, bounded by network to the cloud scanner
For airgapped deployments only `local` is viable. Use `remote` only when `CISCO_AI_DEFENSE_API_KEY` and the configured Cisco endpoint are reachable.

Picking a detection strategy

This controls how rule packs flag content, not whether the guardrail itself is on. See the deep-dive: judge vs regex.

Strategy	How each rule decides	Typical use
`regex_only`	Deterministic pattern match against configured rules	Floor of every deployment; fast, explainable, zero LLM spend
`regex_judge`	Regex triages; matches are sent to the LLM judge for final verdict	High-precision tuning when you need regex recall but want to suppress false positives with semantic context
`judge_first`	The judge runs before regex fallback	Research mode only — expensive, slower; needed when rule signals cannot be captured in regex

Verify it worked

defenseclaw status | grep -i guardrail
curl -s http://127.0.0.1:4000/v1/models -H "Authorization: Bearer $master_key" | jq '.data | length'
curl -s -X POST http://127.0.0.1:4000/v1/chat/completions \
  -H "Authorization: Bearer $master_key" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"ping"}]}' | jq .id
defenseclaw tui   # inspect live verdicts

Undo

defenseclaw setup guardrail --disable

This reverts the openclaw.json patch, flips guardrail.enabled=false in config.yaml, and restarts the affected processes. The plugin tree remains so re-enabling is a one-command round-trip.

Troubleshooting

Symptom	Cause	Fix
`ValueError: master_key not found`	`~/.openclaw/config.yaml` missing/incomplete	Run `openclaw` once to create it; or set `litellm.master_key` manually
`Plugin install failed: no extension source available`	Neither the bundled tarball, `npm`, nor a dev symlink worked	`defenseclaw doctor` to see which source is missing
Verdicts not enforced after `--mode action`	Sidecar did not reload	`defenseclaw-gateway restart`
Judge calls return `401`	API key env var not set	`defenseclaw keys set DEFENSECLAW_LLM_KEY`

LLM guardrail — DefenseClaw