Codex
Codex connector wires config.toml hooks (UserPromptSubmit, PreToolUse, PermissionRequest, PostToolUse, Stop, SessionStart), the native OTel exporter, and the notify bridge for agent-turn-complete events.
The Codex connector wires DefenseClaw into Codex's documented hooks, native OpenTelemetry exporter, and the notify bridge for agent-turn-complete events.
Setup
defenseclaw setup codexPins claw.mode=codex, wires hooks + OTel + notify bridge, leaves enforcement off.
defenseclaw setup guardrail --connector codex --mode action --restartFlips guardrail.codex_enforcement_enabled=true so the proxy listener binds and Codex's openai_base_url (in ~/.codex/config.toml) is pointed at the local DefenseClaw proxy (e.g. http://127.0.0.1:4000/c/codex). The proxy then forwards inspected, allowed traffic to the real upstream provider via the gateway's Bifrost-backed router.
What this command sets vs. leaves at defaults
The three flags above explicitly set: connector, mode, and post-setup restart. Every other knob falls back to the values DefenseClaw ships with — schema-defined in internal/config/config.go and documented on the Defaults page.
| Knob | Value when omitted | Flag to override |
|---|---|---|
| Scanner backend | local (bundled regex packs, zero key) | --scanner-mode local|remote|both |
| Rule pack | unset → built-in baseline (no overlay) | --rule-pack default|strict|permissive |
| LLM judge | off (regex-only triage) | --judge-model <model> plus --judge-api-key-env |
| Detection strategy | regex_judge if judge is on, else regex-only | --detection-strategy regex_only|regex_judge|judge_first |
| HITL | off (no operator approval prompts) | --human-approval plus --hilt-min-severity ... |
| Hook fail-mode | open (allow on guardrail-side failure) | defenseclaw guardrail fail-mode <open|closed> (no flag) |
| Proxy port | 4000 | --port <int> |
| Block message | empty (uses built-in copy) | --block-message "<text>" |
| Redaction | enabled | --disable-redaction (trusted single-tenant only) |
| Verify after setup | on | --no-verify |
See the full flag reference for the complete table or run defenseclaw setup guardrail --help.
Common variations — pick the recipe that fits your phase
defenseclaw setup guardrail \
--connector codex \
--mode observe \
--rule-pack permissive \
--restartNothing blocks. Every prompt and tool call lands in ~/.defenseclaw/gateway.jsonl. Run this for at least a week before promoting — see Defaults → tuning by risk tolerance.
defenseclaw setup guardrail \
--connector codex \
--mode action \
--human-approval \
--hilt-min-severity high \
--restartHIGH findings pause for operator approval; CRITICAL still blocks unconditionally. Codex has no native ask surface, so HITL approvals downgrade to confirm verdicts in the DefenseClaw TUI / audit log — make sure operators are reachable there. See the HITL page for the full per-connector matrix.
export DEFENSECLAW_LLM_KEY=<your-key>
defenseclaw setup guardrail \
--connector codex \
--mode action \
--human-approval \
--hilt-min-severity high \
--detection-strategy regex_judge \
--judge-model anthropic/claude-sonnet-4-20250514 \
--judge-api-key-env DEFENSECLAW_LLM_KEY \
--restartAdds the LLM judge as a second pass on regex-flagged prompts. Costs a few cents per turn; cuts false positives meaningfully on semantic jailbreaks regex misses.
defenseclaw policy activate strict
defenseclaw setup guardrail \
--connector codex \
--mode action \
--human-approval \
--hilt-min-severity low \
--rule-pack strict \
--restartBlock ≥ MEDIUM, no allow-list bypass, HITL on every LOW+ event. Pair with the OpenShell sandbox profile and an MCP allow-list for full lockdown.
Decision aids — should I turn this on?
Human-in-the-loop (HITL)
When --human-approval is worth it. Codex downgrades to confirm verdicts since it has no native ask — operators see them in the TUI / audit log.
Mode + judge recipes
Side-by-side bash for observe / action / action+HITL / action+judge — copy-paste ready.
Defaults & rule packs
What permissive / default / strict actually ship, and which one matches your risk tolerance.
Interactive wizard
Animated terminal demo of the prompt-by-prompt setup flow — the safest path the first time.
Not sure what to pick? Run defenseclaw setup guardrail (no flags) — the interactive wizard walks you through every choice with safe defaults pre-selected and inline help. The Prompt → flag mapping table gives you the CI-shaped command for the same configuration.
Files DefenseClaw will modify
The [hooks], [otel], and [notify] blocks are owned by DefenseClaw; everything else in config.toml is preserved verbatim.
Hook capabilities
Block events
- UserPromptSubmit
- PreToolUse
- PermissionRequest
- PostToolUse
- Stop
- SessionStart
Native ask events
None — confirm verdicts are downgraded with the raw action preserved.
Codex has no native ask surface here. Confirm verdicts are downgraded with raw_action preserved so operators can review the original action in the TUI / OpenClaw plugin.
Telemetry channels at boot
When to enable enforcement
guardrail.codex_enforcement_enabled gates the proxy-redirect path: openai_base_url is set to the local proxy (which then proxies upstream via Bifrost), reserved-id strip, and the subprocess sandbox. Observability runs end-to-end without it. Flip it on when:
- The Codex audit log has stable signal (tune your rule pack first).
- You have a working HITL escalation path — Codex falls back to confirm verdicts so operators need to be reachable.
Disable
defenseclaw setup guardrail --disableClaude Code
Claude Code connector wires PreToolUse, PostToolUse, UserPromptSubmit, Stop, and PermissionRequest hooks plus the native OTel exporter. Native ask is supported on PreToolUse for HITL.
OpenClaw
The reference proxy connector. DefenseClaw ships a TypeScript plugin that wires OpenClaw's fetch interceptor and before_tool_call hook directly into the gateway.