Skip to content
Cisco
CiscoAI Security

Pickle Fuzzer

A structure-aware test case generator for Python pickle parsers and validators. pickle-fuzzer generates complex, valid pickle bytecode across all protocol versions (0-5) for use in fuzzing and testing pickle parsing implementations.

Unlike traditional fuzzers that generate random bytes, pickle-fuzzer understands pickle's structure and generates syntactically valid pickle bytecode that exercises edge cases, complex opcode sequences, and protocol-specific features.

View on GitHub | Join Discord


Key Use Cases

  • Fuzzing pickle parsers and validators for security vulnerabilities
  • Testing pickle implementations across different Python versions
  • Generating test cases for custom pickle-based serialization systems
  • Discovering edge cases in pickle handling code

Features

  • Multi-Protocol Support -- Generate pickles for all protocol versions (0-5)
  • Stack/Memo Simulation -- Simulates pickle machine stack and memo to ensure valid opcode sequences
  • Comprehensive Opcode Coverage -- Supports all standard pickle opcodes including FRAME, EXT, GLOBAL, etc.
  • Parallel Generation -- Generate multiple pickle files concurrently
  • Configurable Output -- Single file or batch generation modes
  • Deterministic Fuzzing -- Optional seed-based generation for reproducibility
  • Python Bindings -- Integration with Python-based fuzzing tools like Atheris

Installation

Prerequisites: Rust 1.70 or later

# Build from source
git clone https://github.com/cisco-ai-defense/pickle-fuzzer
cd pickle-fuzzer
cargo build --release

# Or install from crates.io
cargo install pickle-fuzzer

Usage

Generate Pickle Files

# Generate a random pickle file
pickle-fuzzer output.pkl

# Generate 100 pickle files in the samples directory
pickle-fuzzer --dir samples --samples 100

# Specific protocol version
pickle-fuzzer --protocol 4 output.pkl

# Deterministic generation
pickle-fuzzer --seed 42 output.pkl

Command-Line Options

OptionDescription
--dir <DIR>Output directory for batch generation
--protocol <0-5>Pickle protocol version
--samples <N>Number of samples to generate (default: 10000)
--seed <SEED>Seed for reproducible generation
--min-opcodes <N>Minimum opcodes to generate (default: 60)
--max-opcodes <N>Maximum opcodes to generate (default: 300)
--mutators <MUTATOR>Enable mutators (bitflip, boundary, offbyone, typeconfusion, etc.)
--mutation-rate <0.0-1.0>Mutation probability (default: 0.1)
--allow-extAllow EXT* opcodes (requires extension registry)
--allow-bufferAllow buffer opcodes (requires buffer support)

GitHub Action

name: Pickle Fuzzing
on: [pull_request]
jobs:
  fuzz:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: cisco-ai-defense/pickle-fuzzer@v1
        with:
          mode: cli
          output_dir: samples
          samples: 200

For Atheris harness mode:

- uses: cisco-ai-defense/pickle-fuzzer@v1
  with:
    mode: atheris
    harness: fuzz_harness.py
    harness_args: "-max_total_time=60"

Python Bindings

from pickle_fuzzer import Generator

gen = Generator(protocol=3)

# Generate a random pickle
pickle_bytes = gen.generate()

# Generate from fuzzer input (deterministic)
fuzzer_data = b"some_fuzzer_input"
pickle_bytes = gen.generate_from_bytes(fuzzer_data)

# Configure generation
gen.set_opcode_range(10, 50)
gen.reset()

Integration with Atheris

import atheris
from pickle_fuzzer.fuzzer import PickleMutator
import pickle

mutator = PickleMutator(protocol=3)

@atheris.instrument_func
def test_one_input(data: bytes):
    pickle_bytes = mutator.mutate(data, max_size=10000)
    try:
        pickle.loads(pickle_bytes)
    except Exception:
        pass

atheris.Setup(sys.argv, test_one_input)
atheris.Fuzz()

Performance

MetricPerformance
Small pickles (10-30 opcodes)~5.2 us
Medium pickles (60-300 opcodes)~48 us
Large pickles (200-500 opcodes)~154 us
Single-threaded throughput~10,000 pickles/sec
Multi-core (8 cores)~80,000 pickles/sec
ProtocolTimeUse Case
V033 usFastest -- ASCII-based, legacy
V141 usBinary
V246 usFast, PROTO opcode
V348 usDefault, balanced
V4139 usFRAME support
V5134 usOut-of-band buffers

How It Works

pickle-fuzzer uses a stack-based approach to generate valid pickle bytecode:

  1. Stack/Memo Simulation -- Maintains an internal stack and memo that mirrors the pickle machine's behavior
  2. Opcode Validation -- Only emits opcodes that are valid given the current stack state
  3. Protocol Compliance -- Respects protocol version constraints for opcode availability
  4. Random Generation -- Uses arbitrary crate for deterministic random data generation

Architecture

ComponentDescription
Generator (src/generator.rs)Core test case generation engine with stack/memo simulation
Opcodes (src/opcodes.rs)Complete opcode definitions for all protocol versions (0-5)
Stack (src/stack.rs)Simulates the pickle virtual machine stack
State (src/state.rs)Manages generator state including memo table and protocol version
Mutators (src/mutators/)Optional mutation strategies for controlled variations

Documentation

GuideDescription
ContributingHow to contribute
DevelopingDevelopment setup and workflows
TestingTesting procedures and validation
BenchmarksPerformance benchmarks
FuzzingFuzzing pickle-fuzzer itself

License

Apache 2.0 -- See LICENSE for details.