Skip to content
Cisco AI Defense logo
CiscoAI Security

Architecture — Skill Scanner

Architecture

Skill Scanner is a modular security scanner built around a central orchestrator and pluggable analyzers. Scans execute in two phases: deterministic analysis first, then LLM-powered analysis enriched with Phase 1 context.


Scanning Pipeline

Every scan follows the same six-stage pipeline:

1. Load and Pre-processSkillLoader parses the skill's SKILL.md frontmatter, discovers files recursively, classifies file types, and extracts referenced file hints. ContentExtractor safely unpacks any embedded archives (ZIP, TAR) with layered protections against zip bombs, path traversal, and symlink attacks.

2. Phase 1: Deterministic Analyzers — All non-LLM analyzers run: static (YAML + YARA + Python checks), bytecode, pipeline, behavioral (if enabled), VirusTotal, AI Defense, and trigger. The scanner collects validated binary files and unreferenced scripts for later enrichment.

3. Phase 2: LLM Analyzers with Enrichment — LLM and meta analyzers receive enrichment context built from Phase 1 results: file inventory, type distribution, magic mismatches, and top critical/high findings. This structured context gives the LLM a picture of what deterministic analysis already found.

4. Post-Processing — Policy enforcement: suppress VT-validated binary findings, enforce disabled rules, apply severity overrides, compute analyzability scores, normalize/deduplicate findings, annotate co-occurrence metadata, and attach the policy fingerprint.

5. Cleanup — Temporary extraction directories are removed. A ScanResult is built with findings, timing, analyzer names, analyzability score, and scan metadata.

6. Reporting — The ScanResult is passed to the chosen reporter (summary, JSON, Markdown, table, SARIF, or HTML).


Core Components

Scanner Orchestrator

The SkillScanner class in core/scanner.py runs the full pipeline. It manages analyzer lifecycle, enrichment context building, post-processing, and cleanup. For directory scans, it adds cross-skill analysis (description overlap, data relay patterns, shared external URLs).

Skill Loader

SkillLoader in core/loader.py handles:

  • Validating skill directory structure and SKILL.md presence
  • Parsing YAML frontmatter (name, description, metadata)
  • Recursive file discovery (excluding .git internals)
  • File-type classification (python, bash, markdown, binary, other)
  • Lenient mode fallback to .md files when SKILL.md is absent

Analyzer Factory

analyzer_factory.py is the single source of truth for analyzer assembly. CLI, API, pre-commit hook, and eval runners all use this factory to ensure parity.

  • build_core_analyzers(policy) — static, bytecode, pipeline (gated by policy)
  • build_analyzers(policy, use_behavioral, use_llm, ...) — adds optional analyzers based on flags

Content Extractor

Safely unpacks archives embedded in skills with protections for zip bombs, nesting depth, path traversal, symlinks, and total size/file count limits.


Analyzer Inventory

Core (Policy-Driven)

AnalyzerDetection Method
static_analyzerYAML signatures + YARA rules + inventory checks
bytecode_analyzerPython bytecode/source consistency
pipeline_analyzerShell pipeline taint analysis and command-risk checks

Optional (Flag-Driven)

AnalyzerDetection MethodRequires
behavioral_analyzerAST dataflow + cross-file correlation--use-behavioral
llm_analyzerSemantic threat analysis with structured schemaAPI key + --use-llm
meta_analyzerSecond-pass LLM validation/filteringAPI key + --enable-meta
virustotal_analyzerBinary hash lookup + optional uploadAPI key + --use-virustotal
aidefense_analyzerCisco AI Defense cloud inspectionAPI key + --use-aidefense
trigger_analyzerOverly broad trigger/description checks--use-trigger
cross_skill_scannerMulti-skill coordination detection--check-overlap

Policy System

ScanPolicy in core/scan_policy.py centralizes all runtime configuration across 14 sections:

  • File limits and thresholds
  • Rule scoping and docs-path behavior
  • Command safety tiers
  • Hidden file allowlists
  • Severity overrides and disabled rules
  • Output deduplication and metadata behavior
  • Core analyzer toggles

Three built-in presets: strict, balanced (default), and permissive.

See Scan Policies for configuration details.


Data Models

Primary data structures in core/models.py:

ModelPurpose
SkillLoaded skill package with files and metadata
FindingIndividual security finding with severity, category, location, and remediation
ScanResultSingle-skill scan output with findings, timing, and analyzability
ReportMulti-skill scan output aggregating multiple ScanResult objects
SeverityEnum: CRITICAL, HIGH, MEDIUM, LOW, INFO, SAFE
ThreatCategoryEnum: prompt injection, data exfiltration, command injection, etc.

Entry Points

Entry PointSourceDescription
CLIskill_scanner/cli/cli.pyMain command-line interface
APIskill_scanner/api/router.pyFastAPI REST server
Pre-commitskill_scanner/hooks/pre_commit.pyGit hook integration
SDKskill_scanner/__init__.pyPython library import

All entry points use analyzer_factory.py for consistent analyzer construction.


Rule Packs

Built-in detection rules live in skill_scanner/data/packs/:

PackContents
coreYAML signatures, YARA rules, Python checks — the main detection pack
atrAdditional threat research signatures

Each pack has a pack.yaml manifest that declares its rules and metadata.


Threat Taxonomy

Every finding is normalized to Cisco AI framework mappings (AITech/AISubtech) so findings from different analyzers use consistent category labels. The taxonomy can be overridden at runtime for custom organizational classifications.


Extension Points

To add new detection capabilities:

  1. Add an analyzer class inheriting BaseAnalyzer
  2. Register the construction path in analyzer_factory.py
  3. Add policy knobs in scan_policy.py (if needed)
  4. Add tests under tests/
  5. Document CLI/API toggles

For rule-based detection, prefer extending skill_scanner/data/packs/core/ (signatures, YARA, Python checks) before adding analyzer-level logic.

See Writing Custom Rules for the full guide.