
AI Inference Engineer (London)

AI Inference Engineer (London)
Perplexity
We are seeking an AI Inference Engineer to join our team in London, focusing on the deployment of machine learning models for real-time inference. The role involves developing APIs, optimizing inference stacks, and enhancing system reliability using technologies like Python, Rust, C++, and PyTorch.
Qualification
- Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
- Familiarity with LLM architectures and inference optimization techniques
- Understanding of GPU architectures
- Experience with GPU kernel programming using CUDA
- Proficiency in Python, Rust, and C++
Responsibility
- Develop APIs for AI inference for internal and external customers
- Benchmark and address bottlenecks in the inference stack
- Improve reliability and observability of systems
- Respond to system outages
- Explore and implement LLM inference optimizations



