Senior Deep Learning Software Engineer, LLM Performance

Nvidia|Santa Clara, United StatesUnknown

$184k - $357kUSDVerified

Job Description

Senior role in NVIDIA’s deep learning inference team focused on analyzing and improving performance of large language and vision-language models across NVIDIA GPUs. The engineer works on GPU‑accelerated deep learning software and open‑source frameworks (e.g., TensorRT-LLM, vLLM, SGLang, Triton) to optimize large‑scale LLM and GenAI inference for datacenter deployments.

Responsibilities

Analyze, profile, and optimize LLM, VLM, and GenAI workloads for low‑latency, high‑throughput inference across NVIDIA accelerators.
Implement and tune inference, serving, and deployment algorithms within frameworks such as TensorRT-LLM, vLLM, SGLang, and Triton.
Collaborate with research, hardware, and systems teams to identify cross‑stack optimizations that improve end‑to‑end model performance.
Stay current with generative‑AI research and prototype emerging test‑time compute techniques (e.g., speculation, retrieval‑based refinement) for production use.

Benefits

Base salary range approximately $184,000–$356,500 depending on level and location, plus eligibility for equity and standard NVIDIA benefits.

Ready to Apply?

Applications go directly to Nvidia's career portal

Apply on Nvidia