Pickle Fuzzer
A structure-aware test case generator for Python pickle parsers and validators. pickle-fuzzer generates complex, valid pickle bytecode across all protocol versions (0-5) for use in fuzzing and testing pickle parsing implementations.
Unlike traditional fuzzers that generate random bytes, pickle-fuzzer understands pickle's structure and generates syntactically valid pickle bytecode that exercises edge cases, complex opcode sequences, and protocol-specific features.
Key Use Cases
- Fuzzing pickle parsers and validators for security vulnerabilities
- Testing pickle implementations across different Python versions
- Generating test cases for custom pickle-based serialization systems
- Discovering edge cases in pickle handling code
Features
- Multi-Protocol Support -- Generate pickles for all protocol versions (0-5)
- Stack/Memo Simulation -- Simulates pickle machine stack and memo to ensure valid opcode sequences
- Comprehensive Opcode Coverage -- Supports all standard pickle opcodes including FRAME, EXT, GLOBAL, etc.
- Parallel Generation -- Generate multiple pickle files concurrently
- Configurable Output -- Single file or batch generation modes
- Deterministic Fuzzing -- Optional seed-based generation for reproducibility
- Python Bindings -- Integration with Python-based fuzzing tools like Atheris
Installation
Prerequisites: Rust 1.70 or later
# Build from source
git clone https://github.com/cisco-ai-defense/pickle-fuzzer
cd pickle-fuzzer
cargo build --release
# Or install from crates.io
cargo install pickle-fuzzer
Usage
Generate Pickle Files
# Generate a random pickle file
pickle-fuzzer output.pkl
# Generate 100 pickle files in the samples directory
pickle-fuzzer --dir samples --samples 100
# Specific protocol version
pickle-fuzzer --protocol 4 output.pkl
# Deterministic generation
pickle-fuzzer --seed 42 output.pkl
Command-Line Options
| Option | Description |
|---|---|
--dir <DIR> | Output directory for batch generation |
--protocol <0-5> | Pickle protocol version |
--samples <N> | Number of samples to generate (default: 10000) |
--seed <SEED> | Seed for reproducible generation |
--min-opcodes <N> | Minimum opcodes to generate (default: 60) |
--max-opcodes <N> | Maximum opcodes to generate (default: 300) |
--mutators <MUTATOR> | Enable mutators (bitflip, boundary, offbyone, typeconfusion, etc.) |
--mutation-rate <0.0-1.0> | Mutation probability (default: 0.1) |
--allow-ext | Allow EXT* opcodes (requires extension registry) |
--allow-buffer | Allow buffer opcodes (requires buffer support) |
GitHub Action
name: Pickle Fuzzing
on: [pull_request]
jobs:
fuzz:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: cisco-ai-defense/pickle-fuzzer@v1
with:
mode: cli
output_dir: samples
samples: 200
For Atheris harness mode:
- uses: cisco-ai-defense/pickle-fuzzer@v1
with:
mode: atheris
harness: fuzz_harness.py
harness_args: "-max_total_time=60"
Python Bindings
from pickle_fuzzer import Generator
gen = Generator(protocol=3)
# Generate a random pickle
pickle_bytes = gen.generate()
# Generate from fuzzer input (deterministic)
fuzzer_data = b"some_fuzzer_input"
pickle_bytes = gen.generate_from_bytes(fuzzer_data)
# Configure generation
gen.set_opcode_range(10, 50)
gen.reset()
Integration with Atheris
import atheris
from pickle_fuzzer.fuzzer import PickleMutator
import pickle
mutator = PickleMutator(protocol=3)
@atheris.instrument_func
def test_one_input(data: bytes):
pickle_bytes = mutator.mutate(data, max_size=10000)
try:
pickle.loads(pickle_bytes)
except Exception:
pass
atheris.Setup(sys.argv, test_one_input)
atheris.Fuzz()
Performance
| Metric | Performance |
|---|---|
| Small pickles (10-30 opcodes) | ~5.2 us |
| Medium pickles (60-300 opcodes) | ~48 us |
| Large pickles (200-500 opcodes) | ~154 us |
| Single-threaded throughput | ~10,000 pickles/sec |
| Multi-core (8 cores) | ~80,000 pickles/sec |
| Protocol | Time | Use Case |
|---|---|---|
| V0 | 33 us | Fastest -- ASCII-based, legacy |
| V1 | 41 us | Binary |
| V2 | 46 us | Fast, PROTO opcode |
| V3 | 48 us | Default, balanced |
| V4 | 139 us | FRAME support |
| V5 | 134 us | Out-of-band buffers |
How It Works
pickle-fuzzer uses a stack-based approach to generate valid pickle bytecode:
- Stack/Memo Simulation -- Maintains an internal stack and memo that mirrors the pickle machine's behavior
- Opcode Validation -- Only emits opcodes that are valid given the current stack state
- Protocol Compliance -- Respects protocol version constraints for opcode availability
- Random Generation -- Uses arbitrary crate for deterministic random data generation
Architecture
| Component | Description |
|---|---|
Generator (src/generator.rs) | Core test case generation engine with stack/memo simulation |
Opcodes (src/opcodes.rs) | Complete opcode definitions for all protocol versions (0-5) |
Stack (src/stack.rs) | Simulates the pickle virtual machine stack |
State (src/state.rs) | Manages generator state including memo table and protocol version |
Mutators (src/mutators/) | Optional mutation strategies for controlled variations |
Documentation
| Guide | Description |
|---|---|
| Contributing | How to contribute |
| Developing | Development setup and workflows |
| Testing | Testing procedures and validation |
| Benchmarks | Performance benchmarks |
| Fuzzing | Fuzzing pickle-fuzzer itself |
License
Apache 2.0 -- See LICENSE for details.