Skip to content
Cisco AI Defense logo
CiscoAI Security

LLM guardrail — DefenseClaw

Overview

The guardrail is a LiteLLM-compatible reverse proxy that sits between OpenClaw and the upstream model. Every prompt and every completion is inspected by a scanner pipeline before it leaves or returns, producing a verdict (allow, warn, block) that is either logged (observe) or enforced (action). The wizard installs the OpenClaw plugin, patches ~/.openclaw/openclaw.json so the agent runtime talks to the guardrail on 127.0.0.1:4000, and makes sure the sidecar reflects the chosen mode.

Interactive setup

defenseclaw setup guardrail

The interactive wizard walks through five questions in order, showing the current value and the defaults:

  1. Modeobserve (log-only), action (block in-line), disabled.
  2. Scanner modelocal (DefenseClaw rule packs + optional judge) or remote (Cisco AI Defense cloud scanner).
  3. Port — default 4000. Shares port space with LiteLLM; reuses an existing master_key when present in ~/.openclaw/config.yaml.
  4. Detection strategy — how rule packs classify content. See judge vs regex vs regex-judge.
  5. LLM judge — optional. Choose model, provider, and API-key env var. Each judge invocation is metered via OpenTelemetry GenAI semconv so spend can be capped downstream.

On save the wizard:

  • Installs the OpenClaw plugin into ~/.defenseclaw/extensions/defenseclaw/ (from the bundled tarball, npm i, or a dev symlink — whichever path is available).
  • Patches ~/.openclaw/openclaw.json to add a guardrail block pointing at 127.0.0.1:4000 with the shared master_key.
  • Writes ~/.defenseclaw/.env with DEFENSECLAW_LLM_KEY (when a judge is configured) and any provider-specific variables.
  • Restarts the gateway and OpenClaw by default so the updated openclaw.json and guardrail settings are active.

Non-interactive setup

Every prompt has a flag equivalent — use these in CI:

defenseclaw setup guardrail \
  --mode observe \
  --scanner-mode local \
  --port 4000 \
  --detection-strategy regex_judge \
  --judge-model gpt-4o-mini \
  --judge-api-base https://api.openai.com/v1 \
  --judge-api-key-env DEFENSECLAW_LLM_KEY
FlagValuesDefault
--modeobserve, actionobserve
--scanner-modelocal, remotelocal
--portinteger, 1024–655354000
--detection-strategyregex_only, regex_judge, judge_firstregex_judge
--judge-modelany LiteLLM-compatible slugunset
--judge-api-baseURLmatches the slug
--judge-api-key-envenv var nameDEFENSECLAW_LLM_KEY
--disableflagdisable guardrail, revert openclaw.json patch

See the autogenerated setup command page for the full flag table.

Picking a mode

ModeBehaviorWhen to use
observeLog verdicts, never block, never rewriteDay 0–7 rollouts; tuning suppressions; SIEM baselining
actionEnforce verdicts: block short-circuits, warn annotates, allow passes throughOnce observe has been quiet for ≥24h at your traffic volume
disabledPlugin remains installed but the guardrail proxy is not bound; openclaw.json is revertedBreak-glass during an incident

Flip between observe and action with defenseclaw setup guardrail --mode <x> --non-interactive; the command restarts the gateway by default. Use --disable when you need to turn the proxy off.

Picking a scanner mode

Scanner modeNeedsLatency profile
localDefault rule packs under ~/.defenseclaw/policy/guardrail/Single-digit ms per request (regex only); ~150–400ms when the judge is on
remoteCISCO_AI_DEFENSE_API_KEYSingle-digit → double-digit ms per request, bounded by network to the cloud scanner
For airgapped deployments only local is viable. Use remote only when CISCO_AI_DEFENSE_API_KEY and the configured Cisco endpoint are reachable.

Picking a detection strategy

This controls how rule packs flag content, not whether the guardrail itself is on. See the deep-dive: judge vs regex.

StrategyHow each rule decidesTypical use
regex_onlyDeterministic pattern match against configured rulesFloor of every deployment; fast, explainable, zero LLM spend
regex_judgeRegex triages; matches are sent to the LLM judge for final verdictHigh-precision tuning when you need regex recall but want to suppress false positives with semantic context
judge_firstThe judge runs before regex fallbackResearch mode only — expensive, slower; needed when rule signals cannot be captured in regex

Verify it worked

defenseclaw status | grep -i guardrail
curl -s http://127.0.0.1:4000/v1/models -H "Authorization: Bearer $master_key" | jq '.data | length'
curl -s -X POST http://127.0.0.1:4000/v1/chat/completions \
  -H "Authorization: Bearer $master_key" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"ping"}]}' | jq .id
defenseclaw tui   # inspect live verdicts

Undo

defenseclaw setup guardrail --disable

This reverts the openclaw.json patch, flips guardrail.enabled=false in config.yaml, and restarts the affected processes. The plugin tree remains so re-enabling is a one-command round-trip.

Troubleshooting

SymptomCauseFix
ValueError: master_key not found~/.openclaw/config.yaml missing/incompleteRun openclaw once to create it; or set litellm.master_key manually
Plugin install failed: no extension source availableNeither the bundled tarball, npm, nor a dev symlink workeddefenseclaw doctor to see which source is missing
Verdicts not enforced after --mode actionSidecar did not reloaddefenseclaw-gateway restart
Judge calls return 401API key env var not setdefenseclaw keys set DEFENSECLAW_LLM_KEY

Related