Skip to content
Cisco AI Defense logo
CiscoAI Security

Streaming — DefenseClaw

Overview

Streaming completions are handled in internal/gateway/proxy.go::handleStreamingRequest and passthrough streaming is handled in handlePassthrough. Mid-stream inspection calls internal/gateway/guardrail.go::InspectMidStream, which always delegates to inspectRegexOnly; the LLM judge is not called per chunk.

OpenAI-compatible streaming path

StepSource behavior
Initial bufferIn action mode, chunks are held until accumulated text reaches stream_buffer_bytes or the stream ends. Default is 1024.
Initial preblockBefore flushing a short or first buffer, the proxy calls InspectMidStream on accumulated text.
Mid-stream cadenceAfter the initial flush, action mode scans each time accumulated text grows by at least 500 characters.
Block responseIf a mid-stream scan returns action=block, the proxy emits a provider-shaped block chunk and then data: [DONE].
Tool-call chunksIn action mode, streamed tool-call deltas are buffered until post-stream tool-call inspection passes.
Final inspectionWhen accumulated completion text exists, the proxy runs regular Inspect on the full completion.

Passthrough streaming path

Provider-native passthrough responses parse data: SSE frames, accumulate extracted text, and use the same InspectMidStream function. The first action-mode flush waits for accumulated text to reach stream_buffer_bytes, the upstream read to end, or a 1 MiB initial-buffer safety cap.

What does not exist

Unsupported doc claimSource-backed correction
guardrail.streaming.inspect_chunksThere is no nested streaming config block.
guardrail.streaming.min_chunk_bytes: 64The actual key is guardrail.stream_buffer_bytes, default 1024.
Per-token or every-SSE-frame judge callsMid-stream inspection is regex-only.
A 503 trailer or event: defenseclaw block eventThe proxy writes normal provider-shaped block chunks and terminates SSE with [DONE].
Chunk inspection in observe modeMid-stream blocking checks run on the action-mode path. Observe mode forwards streamed chunks and relies on final logging.

Why mid-stream is regex-only

InspectMidStream intentionally avoids judge calls. The code comments call out the reason: LLM judge calls are too slow for per-chunk scanning, so mid-stream uses deterministic checks for high-severity signatures and reserves semantic judging for pre-call or post-call inspection.

Related