Kv Cache

Research papers, repositories, and articles about kv cache

Showing 2 of 2 items

Information-Aware KV Cache Compression for Long Reasoning

InfoKV mixes attention scores with an information-theory signal that tracks how much a token affects future predictions. This lets the model drop uninformative tokens while keeping rare but important ones, improving long-context reasoning under tight memory. If you fight KV blowup, this suggests a smarter eviction policy. ([huggingface.co](https://huggingface.co/papers/2606.26875))

Jushi Kai, Zhuiri Xiao

KV-CoRE: Benchmarking Data-Dependent Low-Rank Compressibility of KV-Caches in LLMs

KV-CoRE systematically measures how well key-value caches in different layers and tasks can be compressed with low-rank methods. It helps engineers know where cache compression will save memory without wrecking accuracy.

Jian Chen, Zhuoran Wang