SpectralKV: Fourier-Wavelet Hybrid Methods for KV Cache Compression

Author: C. Keasey | Affiliation: CKZ Data Labs Ltd
Paper: paper.md — Full research paper with analysis
Code: All compressors (PyTorch + MLX), evaluation suite, and raw results

Key Results (Qwen2.5-0.5B, 4K-token document, budget=512)

Cache Compression

All spectral pruning methods: 50.3 → 7.1 MB (7.1× compression, 86% saved)
TurboQuant-4bit: 1.0× (no savings in simulated-dequant mode)

Attention Output Perturbation (AOP — lower is better)

Method	AOP_mean	AOP_max	Rank
WaveletKV	0.484	3.278	1
WaveletTriAttn	0.515	3.346	2
WaveletFourierKV	0.556	3.711	3
FourierKV	0.563	3.814	4
TriAttentionKV	0.579	3.698	5
TurboQuant-4bit	0.592	3.581	6

Perplexity (ND-PPL — lower is better)

Method	PPL_comp	ND-PPL	Rank
TurboQuant-4bit	1.09	+0.079	1
WaveletTriAttn	3.37	+2.353	2
TriAttentionKV	4.45	+3.421	3
WaveletKV	4.99	+3.960	4
WaveletFourierKV	5.20	+4.172	5
FourierKV	5.65	+4.613	6

Multi-Turn Cache Drift (5 turns — lower drift is better)

Method	T1 AOP	T5 AOP	Drift	Profile
WaveletKV	3.10	3.24	+0.14	Stable
WaveletTriAttn	3.60	3.65	+0.05	Stable
TriAttentionKV	3.35	3.41	+0.06	Stable
FourierKV	3.54	3.91	+0.37	Increasing
WaveletFourierKV	4.50	4.06	−0.44	Self-correcting

Generation Throughput

Method	Tok/s	Speedup	Token Match%
FourierKV	33.2	1.07×	3.1%
WaveletKV	33.2	1.00×	6.2%
WaveletFourierKV	32.4	0.98×	3.1%
WaveletTriAttn	33.1	0.99×	1.5%
TriAttentionKV	33.0	1.03×	3.1%
TurboQuant-4bit	26.3	0.80×	1.5%

Key Findings

WaveletKV achieves lowest attention perturbation — 18% lower than TurboQuant, validating multi-resolution localization
WaveletTriAttention achieves best perplexity among pruning methods — trig prior captures positional distance preferences
All spectral methods deliver 86% memory savings at throughput parity (~33 tok/s)
WaveletFourierKV uniquely self-corrects over multi-turn — cascaded DWT→FFT captures complementary temporal structure
No single method dominates all metrics — choose by deployment scenario

Architecture

spectral_kv/
├── __init__.py
├── compressors.py           # PyTorch/CUDA (FourierKV, WaveletKV, WaveletFourierKV, etc.)
├── compressors_mlx.py       # MLX for Apple Silicon
├── metrics.py               # AOP, spectral fidelity, reconstruction metrics
├── benchmark.py             # Synthetic + real model benchmark
├── eval_comprehensive.py    # Full 4-phase evaluation suite
results/
├── eval_v3.json             # Raw results from comprehensive evaluation
├── benchmark_results.json   # Synthetic benchmark results
paper.md                     # Full research paper

Quick Start

pip install torch transformers accelerate PyWavelets scipy tabulate

# Run comprehensive evaluation
python -m spectral_kv.eval_comprehensive

# Run synthetic benchmark
python -m spectral_kv.benchmark

Improvements (v2→v3)

Tukey window — smooth cosine roll-off in frequency mask (eliminates Gibbs ringing)
Cascaded DWT→FFT — per-band FFT within wavelet scales for joint time-frequency scoring
Adaptive alpha — non-stationarity measurement drives wavelet/Fourier blend ratio
Recency + norm bias — 20% recency + 20% key-norm in all spectral scorers