Perplexity logo

AI Inference Engineer

PerplexitySan Francisco
FullTimepythonrustc+++5 more
Apply Now
Perplexity logo

AI Inference Engineer

Perplexity

Apply Now

We are seeking an AI Inference Engineer to join our team, focusing on the deployment of machine learning models for real-time inference using a tech stack that includes Python, Rust, C++, PyTorch, Triton, CUDA, and Kubernetes. The role involves developing APIs, optimizing inference processes, and enhancing system reliability.

Qualification

  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
  • Familiarity with LLM architectures and inference optimization techniques
  • Understanding of GPU architectures
  • Experience with GPU kernel programming using CUDA
  • Strong programming skills in Python, Rust, or C++

Responsibility

  • Develop APIs for AI inference for internal and external customers
  • Benchmark and address bottlenecks in the inference stack
  • Improve reliability and observability of systems
  • Respond to system outages
  • Explore and implement optimizations for LLM inference

Similar Jobs