OpenClaw

The reference proxy connector. DefenseClaw ships a TypeScript plugin that wires OpenClaw's fetch interceptor and before_tool_call hook directly into the gateway.

OpenClaw is the connector DefenseClaw was designed against. It ships a first-party TypeScript plugin (extensions/defenseclaw/) that hooks into fetch interception and OpenClaw's before_tool_call lifecycle so every prompt, response, and tool call lands in the DefenseClaw gateway.

Full OpenClaw end-to-end: prompt arrives → gateway inspects → policy decides → HITL approves → tool runs → audit row written.

Setup

defenseclaw setup openclaw --mode observe --restart
defenseclaw setup openclaw --mode action --human-approval --rule-pack default --restart

setup openclaw is an alias around defenseclaw setup guardrail --connector openclaw and inherits every guardrail flag. Unlike Claude Code / Codex, the proxy is always in the data path here: there is no observability-only branch, only --mode observe (log without blocking) vs --mode action (enforce).

What this command sets vs. leaves at defaults

The flags above explicitly set: connector, mode, optional HITL, and optional rule pack. Every other knob falls back to the values DefenseClaw ships with — schema-defined in internal/config/config.go and documented on the Defaults page.

Knob	Value when omitted	Flag to override
Scanner backend	`local` (bundled regex packs, zero key)	`--scanner-mode local\|remote\|both`
Rule pack	unset → built-in baseline (no overlay)	`--rule-pack default\|strict\|permissive`
LLM judge	off (regex-only triage)	`--judge-model <model>` plus `--judge-api-key-env`
Detection strategy	`regex_judge` if judge is on, else regex-only	`--detection-strategy regex_only\|regex_judge\|judge_first`
HITL	off (no operator approval prompts)	`--human-approval` plus `--hilt-min-severity ...`
HITL minimum severity	`HIGH` (when `--human-approval` is on; stored uppercase in config)	`--hilt-min-severity low\|medium\|high\|critical` (case-insensitive)
Hook fail-mode	current config; `closed` on a fresh install (`open` is retained for migrated legacy configs)	`defenseclaw guardrail fail-mode <open\|closed>` (no flag)
Proxy port	`4000`	`--port <int>`
Block message	empty (uses built-in copy)	`--block-message "<text>"`
Redaction	enabled	`--disable-redaction` (trusted single-tenant only)
Verify after setup	on	`--no-verify`

See the full flag reference for the complete table or run defenseclaw setup guardrail --help.

Common variations — pick the recipe that fits your phase

defenseclaw setup openclaw \
  --mode observe \
  --rule-pack permissive \
  --restart

The proxy is in the data path but nothing blocks. Every prompt, response, and tool call lands in ~/.defenseclaw/gateway.jsonl. Run this for at least a week before promoting — see Defaults → tuning by risk tolerance.

defenseclaw setup openclaw \
  --mode action \
  --human-approval \
  --hilt-min-severity high \
  --restart

HIGH findings can pause for operator approval; CRITICAL still blocks unconditionally. OpenClaw's bundled DefenseClaw plugin provides a native chat-origin approval surface, so approvals reach the agent UI directly. See the HITL page for the per-connector matrix.

export DEFENSECLAW_LLM_KEY='replace-with-your-key'

defenseclaw setup openclaw \
  --mode action \
  --human-approval \
  --hilt-min-severity high \
  --detection-strategy regex_judge \
  --judge-model anthropic/claude-sonnet-4-20250514 \
  --judge-api-key-env DEFENSECLAW_LLM_KEY \
  --restart

Adds the LLM judge as a second pass on regex-flagged prompts. Costs a few cents per turn; cuts false positives meaningfully on semantic jailbreaks regex misses.

defenseclaw policy activate strict
defenseclaw setup openclaw \
  --mode action \
  --rule-pack strict \
  --restart

Block MEDIUM and above, alert on LOW, and do not offer approval for findings that have already crossed the block threshold. Pair with the OpenShell sandbox profile and a reviewed first-party allow-list for full lockdown.

Decision aids — should I turn this on?

Human-in-the-loop (HITL)

When --human-approval is worth it. OpenClaw approvals reach chat-origin sessions via the bundled plugin, not just the TUI.

Mode + judge recipes

Side-by-side bash for observe / action / action+HITL / action+judge — copy-paste ready.

Defaults & rule packs

What permissive / default / strict actually ship, and which one matches your risk tolerance.

Interactive wizard

Animated terminal demo of the prompt-by-prompt setup flow — the safest path the first time.

Not sure what to pick? Run defenseclaw setup guardrail (no flags) — the interactive wizard walks you through every choice with safe defaults pre-selected and inline help. The Prompt → flag mapping table gives you the CI-shaped command for the same configuration.

Files DefenseClaw will modify

openclaw.json (plugin allow / load entries)

A hash-checked backup of openclaw.json is stored before edits; teardown restores or surgically removes only DefenseClaw-owned entries.

What the plugin does

01User OpenClaw
prompt
02OpenClaw Plugin
before fetch (LLM request)
03Plugin Gateway
POST /v1/inspect
04Gateway Plugin
allow / block / pause
05Plugin OpenClaw
forward (or reject)
06OpenClaw Plugin
before_tool_call(name, args)
07Plugin Gateway
POST /v1/tool/inspect
08Gateway Plugin
verdict
09Plugin OpenClaw
allow / block / pause
10OpenClaw User
response

Three interception points let DefenseClaw inspect every interesting moment in OpenClaw's lifecycle. Plugin is the bundled DefenseClaw plugin; Gateway is defenseclaw-gateway.

Hook capabilities

Block events

before_tool_call
fetch_request
fetch_response

Native ask events

before_tool_call

OpenClaw supports DefenseClaw approval prompts for tool actions through its bundled plugin. Approvals reach chat-origin sessions directly.

Subprocess policy

sandbox — see SANDBOX.md in the source repo for the full openshell-sandbox setup. The connector wires DefenseClaw into the sandbox's syscall and filesystem policy.

Disable

defenseclaw setup guardrail --disable

Restores ~/.openclaw/openclaw.json from the backup, removes the plugin entries, and stops the proxy.