AI Inference Engineer

Perplexity AI|San Francisco, United StatesHybrid

Job Description

Infrastructure‑focused role on Perplexity’s AI team, responsible for large‑scale deployment and optimization of LLM inference (Python/Rust/C++, PyTorch, Triton, CUDA, Kubernetes), building APIs and platforms that serve real‑time queries for the answer engine and agents.

Responsibilities

Develop and maintain APIs for AI inference used by internal teams and external customers.
Benchmark and resolve bottlenecks across the inference stack (compute, networking, batching, caching).
Improve reliability and observability of inference systems and participate in incident response.
Implement cutting‑edge LLM inference optimizations informed by current research.

Benefits

Cash compensation range of $190,000–$250,000 plus equity.Comprehensive health, dental and vision insurance for employees and dependents, including a 401(k) plan.Hybrid work centered on the San Francisco Bay Area office.

Ready to Apply?

Applications go directly to Perplexity AI's career portal

Apply on Perplexity AI