Principal engineer leading CoreWeave’s next-generation GPU inference platform, architecting ultra-low-latency, large-scale model serving across massive GPU clusters.
Responsibilities
Define technical roadmap for high-throughput, low-latency inference
Design Kubernetes-native control-plane components for model serving
Implement optimizations like micro-batching and KV-cache reuse
Build observability, debugging and rollout tooling for models
Mentor engineers on large-scale inference best practices
Partner with customers to optimize production AI applications
Benefits
Salary range around 206,000–303,000 USD plus equity and benefitsWork on frontier-scale GPU clusters for major AI customersExposure to world-class AI labs and enterprise clients
Category
MLOps / AI Infrastructure
Posted
11/24/2025
Ready to Apply?
Applications go directly to CoreWeave's career portal