CiscoCiscoDefenseClaw
Stories

Stop Claude Code from running rm -rf

Wire DefenseClaw into Claude Code, observe for a week, then promote to action mode and watch a destructive shell command never reach the disk.

"I want to observe my Claude Code, and make sure it doesn't run rm -rf on my home directory."

That sentence is the original DefenseClaw user story. Here is the entire flow, end to end.

Outcome

By the end of this story, you'll have:

  • Claude Code wired to DefenseClaw via native hooks + OTel.
  • A week of audit data in ~/.defenseclaw/audit.db proving what your agent actually does.
  • An action mode policy that blocks rm -rf <home subtree> with a CRITICAL finding.
  • An audit row showing the block, with the rule reference for the next reviewer.

Wire Claude Code (observability only)

defenseclaw setup claude-code

This is the observability-only alias. Claude Code talks directly to its native upstream. DefenseClaw inspects via PreToolUse / PostToolUse / UserPromptSubmit / Stop / PermissionRequest hooks plus the native OTel exporter.

Watch the audit log fill up

Open the live dashboard:

defenseclaw tui

Or stream the JSONL fan-out from the gateway:

tail -f ~/.defenseclaw/gateway.jsonl | jq 'select(.connector == "claudecode")'

Open Claude Code and use it normally. Every prompt, tool call, and response shows up. Run this for a week — you want a real picture of your agent's behaviour before the first block decision.

Inspect the rule pack

ls ~/.defenseclaw/policies/guardrail/default/
cat ~/.defenseclaw/policies/guardrail/default/shell.yaml

The default pack ships with a CRITICAL rule for recursive deletes against home-directory paths. The strict pack adds dot-file targeting and /var/lib/*; the permissive pack drops to HIGH and only matches ~/.

Promote to action mode with HITL

defenseclaw setup guardrail \
  --connector claudecode \
  --mode action \
  --rule-pack default \
  --human-approval \
  --hilt-min-severity high \
  --restart

Three flags matter here:

  • --mode action — CRITICAL findings now block.
  • --human-approval — HIGH findings pause for operator approval.
  • --hilt-min-severity high — MEDIUM and LOW still log; only HIGH and above prompt.

Claude Code supports native PreToolUse ask, so the approval prompt surfaces inside the agent UI itself.

Trigger the rule

In Claude Code:

Please run rm -rf ~/projects/old-experiments to free disk space.

The PreToolUse hook fires, the gateway scores the command, the default rule pack flags it CRITICAL, and the hook returns block. Claude Code never executes the command and surfaces the block reason to the user.

Confirm in the audit log

defenseclaw alerts --limit 10
# or live-tail the JSONL fan-out filtered by severity:
tail -f ~/.defenseclaw/gateway.jsonl \
  | jq 'select(.connector == "claudecode" and (.severity == "HIGH" or .severity == "CRITICAL"))'
2026-05-08T14:02:11Z  CRITICAL  shell.dangerous-rm  blocked
  command:    rm -rf ~/projects/old-experiments
  reason:     recursive delete targeting user home subtree
  rule:       policies/guardrail/default/shell.yaml:14
  connector:  claudecode
  hook:       PreToolUse
  decision:   block

Why this works the way it does

  1. 01User Claude Code

    rm -rf ~/projects/old-experiments

  2. 02Claude Code PreToolUse

    PreToolUse(command)

  3. 03PreToolUse Gateway

    POST /api/v1/claudecode/hook

  4. 04Gateway Policy

    evaluate(command)

  5. 05Policy Gateway

    CRITICAL · shell.dangerous-rm

  6. 06Gateway PreToolUse

    block · recursive delete

  7. 07PreToolUse Claude Code

    block

  8. 08Claude Code User

    I can't run that — DefenseClaw blocked it.

The block decision is made by the gateway, not Claude Code. The hook is the messenger.

Variations

Next