
AI Inference Engineer

AI Inference Engineer
Perplexity
We are seeking an AI Inference Engineer to join our team, focusing on the deployment of machine learning models for real-time inference using a tech stack that includes Python, Rust, C++, PyTorch, Triton, CUDA, and Kubernetes. The role involves developing APIs, optimizing inference processes, and enhancing system reliability.
Qualification
- Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
- Familiarity with LLM architectures and inference optimization techniques
- Understanding of GPU architectures
- Experience with GPU kernel programming using CUDA
- Strong programming skills in Python, Rust, or C++
Responsibility
- Develop APIs for AI inference for internal and external customers
- Benchmark and address bottlenecks in the inference stack
- Improve reliability and observability of systems
- Respond to system outages
- Explore and implement optimizations for LLM inference



