All Jobs
Perplexity AI

AI Inference Engineer

Perplexity AI|San Francisco, United StatesHybrid

Job Description

Infrastructure‑focused role on Perplexity’s AI team, responsible for large‑scale deployment and optimization of LLM inference (Python/Rust/C++, PyTorch, Triton, CUDA, Kubernetes), building APIs and platforms that serve real‑time queries for the answer engine and agents.

Responsibilities

  • Develop and maintain APIs for AI inference used by internal teams and external customers.
  • Benchmark and resolve bottlenecks across the inference stack (compute, networking, batching, caching).
  • Improve reliability and observability of inference systems and participate in incident response.
  • Implement cutting‑edge LLM inference optimizations informed by current research.

Benefits

Cash compensation range of $190,000–$250,000 plus equity.Comprehensive health, dental and vision insurance for employees and dependents, including a 401(k) plan.Hybrid work centered on the San Francisco Bay Area office.

Category

MLOps / AI Infrastructure

Posted

11/5/2025

Ready to Apply?

Applications go directly to Perplexity AI's career portal

Apply on Perplexity AI