Skip to content
Cisco AI Defense logo
CiscoAI Security

Troubleshooting — DefenseClaw

Overview

Most problems belong to one subsystem and have a focused troubleshooting page:

This page is for issues that span subsystems. If you know which system is failing, jump to that section first.

First commands to run

defenseclaw doctor
defenseclaw status
defenseclaw-gateway status
sqlite3 ~/.defenseclaw/audit.db \
  "SELECT timestamp, action, severity FROM audit_events ORDER BY timestamp DESC LIMIT 10;"

If those commands all look healthy and you still have a problem, the issue is likely at a subsystem boundary: sidecar configuration, audit persistence, sink delivery, or TUI filtering.

Boundary issues

Guardrail sees traffic but nothing in the audit store

  • Confirm audit_db points at the database you are querying: defenseclaw config show --format json.
  • The gateway process may have a stale audit DB handle after a crash: restart the sidecar with your supervisor.
  • Disk full: df -h ~/.defenseclaw/ — the sidecar stops writing when the partition is full.

Audit store has the verdict but the TUI doesn't show it

  • Restart the TUI so it re-reads local state.
  • Check panel filters before assuming the row is missing.
  • Export a small audit slice with defenseclaw-gateway audit export --limit 20 and compare it with the panel.

Policy reload returns success, but decisions don't change

  • Confirm you edited the active policy directory, not a stale copy under another DEFENSECLAW_HOME.
  • Re-run the relevant policy validation/test command from Policy testing.
  • Retest with content that should match the new rule; additive rules only change decisions for matching traffic.

Sandbox runs but violations don't appear in Splunk

  • First confirm the local audit database has the sandbox-related action.
  • Then run defenseclaw setup observability list and defenseclaw setup observability test splunk-main.
  • Check the sink actions and min_severity filters in audit_sinks[].

Clock skew

Timestamps matter. If your sinks are rejecting events with "too old" errors:

timedatectl status        # Linux
sntp -sS time.apple.com   # macOS

Clock drift > 5 minutes triggers rejection from many SaaS sinks.

Disk pressure

The gateway is disk-bound when audit.db or gateway.jsonl grow too large:

  • gateway.jsonl uses lumberjack defaults from internal/gatewaylog/writer.go: 50 MB, 5 backups, 30 days, compressed.
  • The audit DB is SQLite WAL-backed. Keep the data directory on a filesystem with enough free space and back it up like other local state.
  • Use defenseclaw-gateway audit export --limit 1000 --output audit-events.jsonl before pruning or archiving externally.

Memory pressure

  • The webhook dispatcher caps concurrent deliveries at 20 and suppresses duplicate target/action pairs during the cooldown window.
  • The gatewaylog writer writes JSONL synchronously, then runs fanout callbacks outside the writer lock.
  • OTel buffers are owned by the OTel SDK/exporter configuration.

If the sidecar is using > 200 MB RSS steady-state, investigate — that's well above expected.

When to open a bug

Before opening a GitHub issue, run:

defenseclaw doctor --json-output > doctor.json
defenseclaw config show --format json > config.redacted.json
defenseclaw-gateway audit export --limit 1000 --output audit-events.jsonl
tail -n 1000 ~/.defenseclaw/gateway.jsonl > gateway.tail.jsonl

Attach those files after reviewing them for deployment-specific details.

Related