All Jobs
Nvidia

Senior Deep Learning Software Engineer, LLM Performance

Nvidia|Santa Clara, United StatesUnknown
$184k - $357kUSDVerified
Apply Now

Job Description

Senior role in NVIDIA’s deep learning inference team focused on analyzing and improving performance of large language and vision-language models across NVIDIA GPUs. The engineer works on GPU‑accelerated deep learning software and open‑source frameworks (e.g., TensorRT-LLM, vLLM, SGLang, Triton) to optimize large‑scale LLM and GenAI inference for datacenter deployments.

Responsibilities

  • Analyze, profile, and optimize LLM, VLM, and GenAI workloads for low‑latency, high‑throughput inference across NVIDIA accelerators.
  • Implement and tune inference, serving, and deployment algorithms within frameworks such as TensorRT-LLM, vLLM, SGLang, and Triton.
  • Collaborate with research, hardware, and systems teams to identify cross‑stack optimizations that improve end‑to‑end model performance.
  • Stay current with generative‑AI research and prototype emerging test‑time compute techniques (e.g., speculation, retrieval‑based refinement) for production use.

Benefits

Base salary range approximately $184,000–$356,500 depending on level and location, plus eligibility for equity and standard NVIDIA benefits.

Category

LLM / Generative AI Engineer

Ready to Apply?

Applications go directly to Nvidia's career portal

Apply on Nvidia