Claude Code
Claude Code connector wires PreToolUse, PostToolUse, UserPromptSubmit, Stop, and PermissionRequest hooks plus the native OTel exporter. Native ask is supported on PreToolUse for HITL.
The Claude Code connector wires DefenseClaw into Anthropic's documented hook surfaces without inserting a proxy in the data path. Claude Code talks directly to its native upstream; DefenseClaw inspects via hooks + native OTel.
Setup
defenseclaw setup claude-codeThe shortcut alias. Pins claw.mode=claudecode, wires hooks + native OTel, leaves enforcement off.
defenseclaw setup guardrail --connector claudecode --mode action --human-approval --restartFlips guardrail.claudecode_enforcement_enabled=true so the proxy listener binds and Claude Code's ANTHROPIC_BASE_URL is pointed at the local DefenseClaw proxy (e.g. http://127.0.0.1:4000/c/claudecode). The proxy then forwards inspected, allowed traffic to the real upstream provider via the gateway's Bifrost-backed router. Hooks now consult the gateway for an allow/block decision on every tool call.
What this command sets vs. leaves at defaults
The four flags above explicitly set: connector, mode, HITL, and post-setup restart. Every other knob falls back to the values DefenseClaw ships with — schema-defined in internal/config/config.go and documented on the Defaults page.
| Knob | Value when omitted | Flag to override |
|---|---|---|
| Scanner backend | local (bundled regex packs, zero key) | --scanner-mode local|remote|both |
| Rule pack | unset → built-in baseline (no overlay) | --rule-pack default|strict|permissive |
| LLM judge | off (regex-only triage) | --judge-model <model> plus --judge-api-key-env |
| Detection strategy | regex_judge if judge is on, else regex-only | --detection-strategy regex_only|regex_judge|judge_first |
| HITL minimum severity | HIGH (when --human-approval is on; stored uppercase in config) | --hilt-min-severity low|medium|high|critical (case-insensitive) |
| Hook fail-mode | open (allow on guardrail-side failure) | defenseclaw guardrail fail-mode <open|closed> (no flag) |
| Proxy port | 4000 | --port <int> |
| Block message | empty (uses built-in copy) | --block-message "<text>" |
| Redaction | enabled | --disable-redaction (trusted single-tenant only) |
| Verify after setup | on | --no-verify |
See the full flag reference for the complete table or run defenseclaw setup guardrail --help.
Common variations — pick the recipe that fits your phase
defenseclaw setup guardrail \
--connector claudecode \
--mode observe \
--rule-pack permissive \
--restartNothing blocks. Every prompt and tool call lands in ~/.defenseclaw/gateway.jsonl. Run this for at least a week before promoting — see Defaults → tuning by risk tolerance.
export DEFENSECLAW_LLM_KEY=<your-key>
defenseclaw setup guardrail \
--connector claudecode \
--mode action \
--human-approval \
--hilt-min-severity high \
--detection-strategy regex_judge \
--judge-model anthropic/claude-sonnet-4-20250514 \
--judge-api-key-env DEFENSECLAW_LLM_KEY \
--restartAdds the LLM judge as a second pass on regex-flagged prompts. Costs a few cents per turn; cuts false positives meaningfully on semantic jailbreaks regex misses.
defenseclaw policy activate strict
defenseclaw setup guardrail \
--connector claudecode \
--mode action \
--human-approval \
--hilt-min-severity low \
--rule-pack strict \
--restartBlock ≥ MEDIUM, no allow-list bypass, HITL on every LOW+ event. Pair with the OpenShell sandbox profile and an MCP allow-list for full lockdown.
Decision aids — should I turn this on?
Human-in-the-loop (HITL)
When to enable --human-approval, the per-connector ask matrix, and how Claude Code's native PreToolUse ask works.
Mode + judge recipes
Side-by-side bash for observe / action / action+HITL / action+judge — copy-paste ready.
Defaults & rule packs
What permissive / default / strict actually ship, and which one matches your risk tolerance.
Interactive wizard
Animated terminal demo of the prompt-by-prompt setup flow — the safest path the first time.
Not sure what to pick? Run defenseclaw setup guardrail (no flags) — the interactive wizard walks you through every choice with safe defaults pre-selected and inline help. Once you're happy with the setup, the Prompt → flag mapping table gives you the CI-shaped command for the same configuration.
Files DefenseClaw will modify
DefenseClaw stores a hash-checked backup of settings.json before edits. Teardown restores it byte-for-byte; if the file drifted, only DefenseClaw-owned entries are surgically removed.
Hook capabilities
Block events
- PreToolUse
- PostToolUse
- UserPromptSubmit
- Stop
- PermissionRequest
Native ask events
- PreToolUse
Claude Code is one of the few connectors that supports native PreToolUse ask. HITL approvals surface inside the agent UI itself, so the operator never has to leave Claude Code to decide.
Telemetry channels at boot
When to enable enforcement
The enforcement flag (guardrail.claudecode_enforcement_enabled) gates the proxy-redirect path: setting Claude Code's ANTHROPIC_BASE_URL to the local proxy (which then proxies upstream via Bifrost), reserved-id strip, and subprocess sandbox. Until you flip it on, observability runs end-to-end via hooks and OTel without any data-path modification.
Flip it on when:
- The audit log has at least a week of data and you've tuned the rule pack.
- HITL is on for HIGH so risky calls pause for review instead of blocking outright.
- Operators have access to either the agent UI (for native PreToolUse ask) or the DefenseClaw TUI.
Disable
defenseclaw setup guardrail --disableConnectors
Nine first-class connectors — OpenClaw, ZeptoClaw, Claude Code, Codex, Cursor, Windsurf, Gemini CLI, GitHub Copilot CLI, Hermes — with a single enforcement contract.
Codex
Codex connector wires config.toml hooks (UserPromptSubmit, PreToolUse, PermissionRequest, PostToolUse, Stop, SessionStart), the native OTel exporter, and the notify bridge for agent-turn-complete events.