Senior role in NVIDIA’s deep learning inference team focused on analyzing and improving performance of large language and vision-language models across NVIDIA GPUs. The engineer works on GPU‑accelerated deep learning software and open‑source frameworks (e.g., TensorRT-LLM, vLLM, SGLang, Triton) to optimize large‑scale LLM and GenAI inference for datacenter deployments.
Category
LLM / Generative AI Engineer