YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

SpectralKV: Fourier-Wavelet Hybrid Methods for KV Cache Compression

Author: C. Keasey | Affiliation: CKZ Data Labs Ltd
Paper: paper.md β€” Full research paper with analysis
Code: All compressors (PyTorch + MLX), evaluation suite, and raw results

Key Results (Qwen2.5-0.5B, 4K-token document, budget=512)

Cache Compression

All spectral pruning methods: 50.3 β†’ 7.1 MB (7.1Γ— compression, 86% saved)
TurboQuant-4bit: 1.0Γ— (no savings in simulated-dequant mode)

Attention Output Perturbation (AOP β€” lower is better)

Method AOP_mean AOP_max Rank
WaveletKV 0.484 3.278 1
WaveletTriAttn 0.515 3.346 2
WaveletFourierKV 0.556 3.711 3
FourierKV 0.563 3.814 4
TriAttentionKV 0.579 3.698 5
TurboQuant-4bit 0.592 3.581 6

Perplexity (ND-PPL β€” lower is better)

Method PPL_comp ND-PPL Rank
TurboQuant-4bit 1.09 +0.079 1
WaveletTriAttn 3.37 +2.353 2
TriAttentionKV 4.45 +3.421 3
WaveletKV 4.99 +3.960 4
WaveletFourierKV 5.20 +4.172 5
FourierKV 5.65 +4.613 6

Multi-Turn Cache Drift (5 turns β€” lower drift is better)

Method T1 AOP T5 AOP Drift Profile
WaveletKV 3.10 3.24 +0.14 Stable
WaveletTriAttn 3.60 3.65 +0.05 Stable
TriAttentionKV 3.35 3.41 +0.06 Stable
FourierKV 3.54 3.91 +0.37 Increasing
WaveletFourierKV 4.50 4.06 βˆ’0.44 Self-correcting

Generation Throughput

Method Tok/s Speedup Token Match%
FourierKV 33.2 1.07Γ— 3.1%
WaveletKV 33.2 1.00Γ— 6.2%
WaveletFourierKV 32.4 0.98Γ— 3.1%
WaveletTriAttn 33.1 0.99Γ— 1.5%
TriAttentionKV 33.0 1.03Γ— 3.1%
TurboQuant-4bit 26.3 0.80Γ— 1.5%

Key Findings

  1. WaveletKV achieves lowest attention perturbation β€” 18% lower than TurboQuant, validating multi-resolution localization
  2. WaveletTriAttention achieves best perplexity among pruning methods β€” trig prior captures positional distance preferences
  3. All spectral methods deliver 86% memory savings at throughput parity (~33 tok/s)
  4. WaveletFourierKV uniquely self-corrects over multi-turn — cascaded DWT→FFT captures complementary temporal structure
  5. No single method dominates all metrics β€” choose by deployment scenario

Architecture

spectral_kv/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ compressors.py           # PyTorch/CUDA (FourierKV, WaveletKV, WaveletFourierKV, etc.)
β”œβ”€β”€ compressors_mlx.py       # MLX for Apple Silicon
β”œβ”€β”€ metrics.py               # AOP, spectral fidelity, reconstruction metrics
β”œβ”€β”€ benchmark.py             # Synthetic + real model benchmark
β”œβ”€β”€ eval_comprehensive.py    # Full 4-phase evaluation suite
results/
β”œβ”€β”€ eval_v3.json             # Raw results from comprehensive evaluation
β”œβ”€β”€ benchmark_results.json   # Synthetic benchmark results
paper.md                     # Full research paper

Quick Start

pip install torch transformers accelerate PyWavelets scipy tabulate

# Run comprehensive evaluation
python -m spectral_kv.eval_comprehensive

# Run synthetic benchmark
python -m spectral_kv.benchmark

Improvements (v2β†’v3)

  1. Tukey window β€” smooth cosine roll-off in frequency mask (eliminates Gibbs ringing)
  2. Cascaded DWT→FFT — per-band FFT within wavelet scales for joint time-frequency scoring
  3. Adaptive alpha β€” non-stationarity measurement drives wavelet/Fourier blend ratio
  4. Recency + norm bias β€” 20% recency + 20% key-norm in all spectral scorers

References

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for chriskz/spectral-kv-benchmark