Skip to content
Cisco AI Defense logo
CiscoAI Security

Tune PII false positives — DefenseClaw

Problem

The default PII rule pack is aggressive — it has to be, because real PII exfil is the most common failure class. The first 72 hours of mode: observe typically produce a batch of false positives from:

  • Addresses in sales reply templates
  • Test credit-card numbers (4111-1111-1111-1111)
  • Public sample datasets
  • Documentation referencing emails that aren't actually PII

You want to suppress these while still catching real exfil.

Solution

Step 1: Triage

Export two days of observe findings:

defenseclaw-gateway audit export --limit 5000 --output /tmp/audit.jsonl
jq 'select(.details | test("pii:"; "i"))' /tmp/audit.jsonl > /tmp/pii-sample.jsonl

Eyeball the top 50 by finding:

jq -r '.details' /tmp/pii-sample.jsonl | sort | uniq -c | sort -rn | head -50

Step 2: Classify

For each top reason, decide:

  • True positive — real exfil or leak. Leave alone.
  • False positive from specific context — strip, don't drop. E.g., test CC numbers.
  • False positive from specific source — suppress, keep audit. E.g., known sales reply template.

Step 3: Write suppressions

# ~/.defenseclaw/policy/guardrail/default/suppressions.yaml
- id: suppress-test-credit-cards
  type: strip
  reason: "Test CC numbers are not PII"
  direction: prompt
  match_finding: pii:credit-card
  replace:
    regex: '4111[-\s]?1111[-\s]?1111[-\s]?1111'
    with: "[TEST-CC]"

- id: suppress-sample-ssn
  type: strip
  direction: prompt
  match_finding: pii:ssn
  replace:
    regex: '000-00-0000|123-45-6789'
    with: "[SAMPLE-SSN]"

- id: suppress-docs-emails
  type: finding
  reason: "Support mailbox, not user PII"
  match_finding: pii:email
  match_content_regex: 'support@(example|your-company)\.com'

- id: suppress-public-address-book
  type: finding
  reason: "Sales templates stored in shared/templates"
  when:
    request_header_matches:
      name: x-dc-session-id
      regex: 'sales-replies-.*'
  match_finding: pii:address

See Suppressions for the full schema.

Step 4: Test

Before reload, dry-run:

curl -s -X POST http://127.0.0.1:18970/v1/guardrail/evaluate \
  -H "Content-Type: application/json" \
  -H "X-DefenseClaw-Client: docs" \
  -d '{"direction":"prompt","mode":"observe","scanner_mode":"local","local_result":{"action":"block","severity":"HIGH","reason":"pii:credit-card"},"content_length":42}' | jq .

Expected: a guardrail policy decision for the supplied local scan result. For end-to-end content stripping, send a real prompt through the guardrail proxy on port 4000 after reload.

Step 5: Reload

defenseclaw-gateway policy reload

The reload is atomic — in-flight requests still see the old suppressions; new requests see the new ones.

Step 6: Watch

Monitor the ratio:

index=defenseclaw scope="guardrail" findings{}="pii:*"
| stats count(eval(suppressed)) as suppressed count as total by finding
| eval ratio = round(suppressed / total * 100, 1)

Aim for ≤ 20% suppression ratio per finding. Higher means you're effectively disabling the rule and should rewrite it instead.

Anti-patterns

  • Don't suppress a whole rule family (match_finding: pii:*). Always scope as narrowly as possible.
  • Don't suppress by wildcard content (match_content_regex: '.*'). That's a functional disable.
  • Don't skip the reason field. Future-you will not remember why.
  • Don't suppress CRITICAL findings without elevating to blocked.

Related