Skip to content
Cisco AI Defense logo
CiscoAI Security

Guardrail overview — DefenseClaw

Overview

The guardrail is a Go proxy on the default loopback port 4000. It receives LLM traffic, inspects request prompts before the upstream call, inspects completions after the upstream call, and records a ScanVerdict with action, severity, reason, findings, and scanner attribution. In observe mode findings are logged and traffic continues. In action mode a block verdict is returned in-band.

The local guardrail path is built around internal/gateway/guardrail.go::GuardrailInspector. It can run deterministic regex/rule-pack checks, optional Cisco AI Defense remote inspection, optional LLM judge checks, and final OPA policy evaluation through policies/rego/guardrail.rego.

Rendering diagram…

Inspection surfaces

SurfaceRuntime directionCode pathNotes
Request messagespromptGuardrailProxy.handleChatCompletion -> InspectRuns before the upstream model call.
Non-streaming response textcompletionhandleNonStreamingRequest -> InspectRuns after the model response is available.
Streaming response textcompletionhandleStreamingRequest -> InspectMidStream and final InspectMid-stream checks are regex-only; final inspection runs after accumulation.
OpenAI-format tool callstool_callinspectToolCalls -> ScanAllRulesHigh or critical tool-call findings block in action mode.
Gateway tool resultscompletionEventRouter.inspectToolResultOnly configured sensitive tools are inspected.

Detection defaults

The code defaults are set in internal/config/defaults.go and mirrored by internal/config/config.go.

KeyDefaultMeaning
guardrail.enabledfalseThe proxy does not bind unless enabled.
guardrail.modeobserveVerdicts are recorded but not enforced.
guardrail.scanner_modebothLocal rules and Cisco AI Defense can both contribute when configured.
guardrail.detection_strategyregex_judgeGlobal strategy when a direction-specific override is absent.
guardrail.detection_strategy_completionregex_onlyCompletion path uses the deterministic path by default.
guardrail.judge_sweeptrueregex_judge can run a full judge sweep when regex triage found no signal.
guardrail.stream_buffer_bytes1024Initial streaming text buffer before first action-mode flush.

Core pages

PageWhat it covers
ArchitectureProxy, inspector, rules, judge, OPA, and runtime update boundaries.
ConfigurationActual guardrail.* keys in GuardrailConfig.
Rule packsBuilt-in YAML packs and generated rule inventory.
Writing rulesThe rule YAML shape loaded by internal/guardrail/rulepack.go.
SuppressionsPre-judge strips, PII suppressions, and tool suppressions.
Sensitive toolsTool-result inspection settings.
Judge vs regexStrategy trade-offs and the exact decision path.
Verdict cacheThe process-local LLM judge verdict cache.
Multi-turnContextTracker bounds and current runtime limits.
Notification queueThe 50-item TTL system-message queue used after blocks.
ProvidersProvider catalog and adapter matrix.
StreamingInitial buffering, mid-stream scans, and block chunks.
TuningSource-backed knobs for observe, action, strategy, and policy thresholds.
TroubleshootingFailure modes that map to real code paths.

Related