NVIDIA
Senior Architect, Compute DL Architecture
Building TensorRT-LLM inference infrastructure for the latest NVIDIA GPUs, with focus on sparse attention, LLM-native algorithm design, GPU kernel optimization, long-context decoding, and system-algorithm co-design.