The fuzzing workflow here grew out of research on file-parsing libraries — targets like llama.cpp, libdicom and libbiosig that ingest untrusted binary formats and have historically received less scrutiny than network-facing code. AFL++ with AddressSanitizer is the standard toolchain for this: ASAN surfaces the exact bug class on first reproduction, which collapses the triage step considerably. Most of the campaigns I run are against C/C++ parsers for niche formats where seed corpora are sparse and custom dictionaries matter.
Overview
Fuzzing is the primary methodology for discovering memory corruption vulnerabilities in C/C++ targets. AFL++ with AddressSanitizer (ASAN) is the standard toolchain.
Target selection
Targets organized by file format and parser library:
- CDF: libgsf
- DICOM: libdicom, grassroots dicom
- EGI/VHDR: libbiosig
- FLAC: miniaudio
- GGUF: llama.cpp
- JP2K: nvidia nvjpeg2000
- Node: libigl
- OTF/TTF/PDF: Adobe Acrobat Reader, Foxit Reader, xpdf
Campaign setup
mkdir -pv ~/campaigns/<target>/{source,input,output}
Building with instrumentation
export LLVM_CONFIG="llvm-config-13"
export CC=afl-clang-fast
export CXX=afl-clang-fast++
export AFL_USE_ASAN=1
./configure --prefix=<campaign-dir> --disable-shared
make clean && make && make install
The --disable-shared flag ensures static linking for better instrumentation coverage.
Running a campaign
afl-fuzz -i input/ -o output/ -- ./bin/target_binary @@ output/
ASAN configuration
Key environment variables for crash analysis:
export ASAN_OPTIONS="halt_on_error=1:print_stack_trace=1:detect_leaks=0"
export ASAN_OPTIONS="$ASAN_OPTIONS:detect_stack_use_after_return=1"
export ASAN_OPTIONS="$ASAN_OPTIONS:strict_string_checks=1"
Crash analysis workflow
- Reproduce: run the crashing input against the ASAN-instrumented binary
- Classify: ASAN report identifies the bug class (heap-buffer-overflow, use-after-free, stack-overflow, etc.)
- GDB inspection:
bt,info registers, examine memory at crash site - Minimize: reduce the PoC to the smallest triggering input
PoC minimization
afl-tmin -i crash_input -o minimized_input -- ./target @@
Batch reproducibility check:
for f in output/crashes/id:*; do
echo "=== $f ==="
timeout 5 ./target "$f" 2>&1 | head -5
done
Security impact triage
Prioritize crashes by exploitability:
- Control flow hijacking — overwritten function pointers, vtable corruption, RIP/PC control
- Write primitive — arbitrary or bounded write to attacker-controlled address
- Information leak — out-of-bounds read exposing heap/stack content
- Denial of service — null deref, assertion failure, infinite loop
Custom mutations
AFL++ dictionaries improve coverage for structured formats. Example for PNG:
# png.dict
header_png="\x89PNG\r\n\x1a\n"
chunk_IHDR="IHDR"
chunk_IDAT="IDAT"
chunk_IEND="IEND"
chunk_PLTE="PLTE"
Use with: afl-fuzz -x png.dict ...
AFL++ installation
sudo apt install build-essential git python3-dev automake flex bison \
libglib2.0-dev libpixman-1-dev python3-setuptools clang
git clone https://github.com/AFLplusplus/AFLplusplus
cd AFLplusplus && make distrib && sudo make install
What to watch during a run
Campaigns against format parsers tend to surface heap-buffer-overflows and out-of-bounds reads early in the first few hours, then slow down as coverage plateaus. When afl-whatsup shows no new paths for a prolonged period, check whether the seed corpus is covering format variants — a missing magic byte or an unseen chunk type is often the reason coverage stalls. Note that crash count alone is not triage: a single root cause frequently generates hundreds of distinct crash inputs, so minimize and deduplicate before reporting.
See also
- Binary Exploitation — exploiting the bugs fuzzing finds
- Malware Analysis — PE format knowledge aids harness writing