arXiv · Apr 2026 · NVIDIA · TensorRT-LLM · Sparse Attention
GVR Top-K
Innovation Temporal-correlation threshold search predicts compact candidate sets losslessly.
Impact 1.88x avg Top-K speedup; up to 9.3% generation gain; default DSv4 indexer path.
Evidence TensorRT-LLM PRs, NVIDIA tech blog, arXiv paper, three filed patents.