Dnotitia Unveils STAR-KV, Achieving UP to 20x KV Cache Compression, Selected as an ICML 2026 Spotlight Paper
2 days ago
Introduces a low-rank-based approach to KV cache compression, one of the key bottlenecks in long-context AISpeeds up attention computation by up to 6.9x and overall generation throughput by up to…