Perplexity logo

AI Inference Engineer (London)

PerplexityLondon
FullTimepythonrustc+++5 more
Apply Now
Perplexity logo

AI Inference Engineer (London)

Perplexity

Apply Now

We are seeking an AI Inference Engineer to join our team in London, focusing on the deployment of machine learning models for real-time inference. The role involves developing APIs, optimizing inference stacks, and enhancing system reliability using technologies like Python, Rust, C++, and PyTorch.

Qualification

  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
  • Familiarity with LLM architectures and inference optimization techniques
  • Understanding of GPU architectures
  • Experience with GPU kernel programming using CUDA
  • Proficiency in Python, Rust, and C++

Responsibility

  • Develop APIs for AI inference for internal and external customers
  • Benchmark and address bottlenecks in the inference stack
  • Improve reliability and observability of systems
  • Respond to system outages
  • Explore and implement LLM inference optimizations

Similar Jobs