Overview
The guardrail is a LiteLLM-compatible reverse proxy that sits between OpenClaw and the upstream model. Every prompt and every completion is inspected by a scanner pipeline before it leaves or returns, producing a verdict (allow, warn, block) that is either logged (observe) or enforced (action). The wizard installs the OpenClaw plugin, patches ~/.openclaw/openclaw.json so the agent runtime talks to the guardrail on 127.0.0.1:4000, and makes sure the sidecar reflects the chosen mode.
Interactive setup
defenseclaw setup guardrail
The interactive wizard walks through five questions in order, showing the current value and the defaults:
- Mode —
observe(log-only),action(block in-line),disabled. - Scanner mode —
local(DefenseClaw rule packs + optional judge) orremote(Cisco AI Defense cloud scanner). - Port — default 4000. Shares port space with LiteLLM; reuses an existing
master_keywhen present in~/.openclaw/config.yaml. - Detection strategy — how rule packs classify content. See judge vs regex vs regex-judge.
- LLM judge — optional. Choose model, provider, and API-key env var. Each judge invocation is metered via OpenTelemetry GenAI semconv so spend can be capped downstream.
On save the wizard:
- Installs the OpenClaw plugin into
~/.defenseclaw/extensions/defenseclaw/(from the bundled tarball,npm i, or a dev symlink — whichever path is available). - Patches
~/.openclaw/openclaw.jsonto add aguardrailblock pointing at127.0.0.1:4000with the sharedmaster_key. - Writes
~/.defenseclaw/.envwithDEFENSECLAW_LLM_KEY(when a judge is configured) and any provider-specific variables. - Restarts the gateway and OpenClaw by default so the updated
openclaw.jsonand guardrail settings are active.
Non-interactive setup
Every prompt has a flag equivalent — use these in CI:
defenseclaw setup guardrail \
--mode observe \
--scanner-mode local \
--port 4000 \
--detection-strategy regex_judge \
--judge-model gpt-4o-mini \
--judge-api-base https://api.openai.com/v1 \
--judge-api-key-env DEFENSECLAW_LLM_KEY
| Flag | Values | Default |
|---|---|---|
--mode | observe, action | observe |
--scanner-mode | local, remote | local |
--port | integer, 1024–65535 | 4000 |
--detection-strategy | regex_only, regex_judge, judge_first | regex_judge |
--judge-model | any LiteLLM-compatible slug | unset |
--judge-api-base | URL | matches the slug |
--judge-api-key-env | env var name | DEFENSECLAW_LLM_KEY |
--disable | flag | disable guardrail, revert openclaw.json patch |
See the autogenerated setup command page for the full flag table.
Picking a mode
| Mode | Behavior | When to use |
|---|---|---|
observe | Log verdicts, never block, never rewrite | Day 0–7 rollouts; tuning suppressions; SIEM baselining |
action | Enforce verdicts: block short-circuits, warn annotates, allow passes through | Once observe has been quiet for ≥24h at your traffic volume |
disabled | Plugin remains installed but the guardrail proxy is not bound; openclaw.json is reverted | Break-glass during an incident |
Flip between observe and action with defenseclaw setup guardrail --mode <x> --non-interactive; the command restarts the gateway by default. Use --disable when you need to turn the proxy off.
Picking a scanner mode
| Scanner mode | Needs | Latency profile |
|---|---|---|
local | Default rule packs under ~/.defenseclaw/policy/guardrail/ | Single-digit ms per request (regex only); ~150–400ms when the judge is on |
remote | CISCO_AI_DEFENSE_API_KEY | Single-digit → double-digit ms per request, bounded by network to the cloud scanner |
For airgapped deployments only local is viable. Use remote only when CISCO_AI_DEFENSE_API_KEY and the configured Cisco endpoint are reachable. |
Picking a detection strategy
This controls how rule packs flag content, not whether the guardrail itself is on. See the deep-dive: judge vs regex.
| Strategy | How each rule decides | Typical use |
|---|---|---|
regex_only | Deterministic pattern match against configured rules | Floor of every deployment; fast, explainable, zero LLM spend |
regex_judge | Regex triages; matches are sent to the LLM judge for final verdict | High-precision tuning when you need regex recall but want to suppress false positives with semantic context |
judge_first | The judge runs before regex fallback | Research mode only — expensive, slower; needed when rule signals cannot be captured in regex |
Verify it worked
defenseclaw status | grep -i guardrail
curl -s http://127.0.0.1:4000/v1/models -H "Authorization: Bearer $master_key" | jq '.data | length'
curl -s -X POST http://127.0.0.1:4000/v1/chat/completions \
-H "Authorization: Bearer $master_key" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"ping"}]}' | jq .id
defenseclaw tui # inspect live verdicts
Undo
defenseclaw setup guardrail --disable
This reverts the openclaw.json patch, flips guardrail.enabled=false in config.yaml, and restarts the affected processes. The plugin tree remains so re-enabling is a one-command round-trip.
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
ValueError: master_key not found | ~/.openclaw/config.yaml missing/incomplete | Run openclaw once to create it; or set litellm.master_key manually |
Plugin install failed: no extension source available | Neither the bundled tarball, npm, nor a dev symlink worked | defenseclaw doctor to see which source is missing |
Verdicts not enforced after --mode action | Sidecar did not reload | defenseclaw-gateway restart |
Judge calls return 401 | API key env var not set | defenseclaw keys set DEFENSECLAW_LLM_KEY |