Problem
The default PII rule pack is aggressive — it has to be, because real PII exfil is the most common failure class. The first 72 hours of mode: observe typically produce a batch of false positives from:
- Addresses in sales reply templates
- Test credit-card numbers (4111-1111-1111-1111)
- Public sample datasets
- Documentation referencing emails that aren't actually PII
You want to suppress these while still catching real exfil.
Solution
Step 1: Triage
Export two days of observe findings:
defenseclaw-gateway audit export --limit 5000 --output /tmp/audit.jsonl
jq 'select(.details | test("pii:"; "i"))' /tmp/audit.jsonl > /tmp/pii-sample.jsonl
Eyeball the top 50 by finding:
jq -r '.details' /tmp/pii-sample.jsonl | sort | uniq -c | sort -rn | head -50
Step 2: Classify
For each top reason, decide:
- True positive — real exfil or leak. Leave alone.
- False positive from specific context — strip, don't drop. E.g., test CC numbers.
- False positive from specific source — suppress, keep audit. E.g., known sales reply template.
Step 3: Write suppressions
# ~/.defenseclaw/policy/guardrail/default/suppressions.yaml
- id: suppress-test-credit-cards
type: strip
reason: "Test CC numbers are not PII"
direction: prompt
match_finding: pii:credit-card
replace:
regex: '4111[-\s]?1111[-\s]?1111[-\s]?1111'
with: "[TEST-CC]"
- id: suppress-sample-ssn
type: strip
direction: prompt
match_finding: pii:ssn
replace:
regex: '000-00-0000|123-45-6789'
with: "[SAMPLE-SSN]"
- id: suppress-docs-emails
type: finding
reason: "Support mailbox, not user PII"
match_finding: pii:email
match_content_regex: 'support@(example|your-company)\.com'
- id: suppress-public-address-book
type: finding
reason: "Sales templates stored in shared/templates"
when:
request_header_matches:
name: x-dc-session-id
regex: 'sales-replies-.*'
match_finding: pii:address
See Suppressions for the full schema.
Step 4: Test
Before reload, dry-run:
curl -s -X POST http://127.0.0.1:18970/v1/guardrail/evaluate \
-H "Content-Type: application/json" \
-H "X-DefenseClaw-Client: docs" \
-d '{"direction":"prompt","mode":"observe","scanner_mode":"local","local_result":{"action":"block","severity":"HIGH","reason":"pii:credit-card"},"content_length":42}' | jq .
Expected: a guardrail policy decision for the supplied local scan result. For end-to-end content stripping, send a real prompt through the guardrail proxy on port 4000 after reload.
Step 5: Reload
defenseclaw-gateway policy reload
The reload is atomic — in-flight requests still see the old suppressions; new requests see the new ones.
Step 6: Watch
Monitor the ratio:
index=defenseclaw scope="guardrail" findings{}="pii:*"
| stats count(eval(suppressed)) as suppressed count as total by finding
| eval ratio = round(suppressed / total * 100, 1)
Aim for ≤ 20% suppression ratio per finding. Higher means you're effectively disabling the rule and should rewrite it instead.
Anti-patterns
- Don't suppress a whole rule family (
match_finding: pii:*). Always scope as narrowly as possible. - Don't suppress by wildcard content (
match_content_regex: '.*'). That's a functional disable. - Don't skip the
reasonfield. Future-you will not remember why. - Don't suppress CRITICAL findings without elevating to blocked.