Software Engineer – GenAI Inference

Databricks|San Francisco, United StatesOnsite

$142k - $205kUSDVerified

Job Description

Databricks is hiring a Software Engineer for GenAI inference to design, develop, and optimize the inference engine that powers the Databricks Foundation Model API, ensuring large language model serving is fast, scalable, and efficient across the full inference stack.

Responsibilities

Design and implement the inference engine for Databricks’ Foundation Model API, including routing, batching, scheduling, and memory management for large‑scale LLMs.
Collaborate with researchers to bring new architectures and features (e.g., sparsity, activation compression, MoE) into the serving stack.
Optimize inference for latency, throughput, memory efficiency, and hardware utilization across GPUs and accelerators.
Build instrumentation, tracing, and profiling tools to uncover bottlenecks and guide optimizations.
Ensure reliability, reproducibility, and fault tolerance, including A/B launches, rollbacks, and model versioning.
Work cross‑functionally with platform, infrastructure, and security teams to integrate inference into Databricks’ distributed environment.

Benefits

Local pay range listed as $142,200–$204,600 USD plus eligibility for annual bonus, equity, and comprehensive benefits.Databricks‑standard benefits including health coverage and region‑specific perks as documented on mybenefitsnow.com/databricks.

Ready to Apply?

Applications go directly to Databricks's career portal

Apply on Databricks