AI Inference Engineer

Perplexity•San Francisco

FullTimepython rust c++machine-learning ai pytorch cuda kubernetes

Apply Now

AI Inference Engineer

Perplexity•San Francisco

FullTimepython rust c+++5 more

Apply Now

AI Inference Engineer

Perplexity

Apply Now

We are seeking an AI Inference Engineer to join our team, focusing on the deployment of machine learning models for real-time inference using a tech stack that includes Python, Rust, C++, PyTorch, Triton, CUDA, and Kubernetes. The role involves developing APIs, optimizing inference processes, and enhancing system reliability.

Qualification

Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
Familiarity with LLM architectures and inference optimization techniques
Understanding of GPU architectures
Experience with GPU kernel programming using CUDA
Strong programming skills in Python, Rust, or C++

Responsibility

Develop APIs for AI inference for internal and external customers
Benchmark and address bottlenecks in the inference stack
Improve reliability and observability of systems
Respond to system outages
Explore and implement optimizations for LLM inference

AI Inference Engineer

AI Inference Engineer

AI Inference Engineer

Qualification

Responsibility

Similar Jobs

AI Research Engineer

HPC Engineer, AI and Data

Staff AI Engineer - AI Product

Research Engineer, Machine Learning (Horizons)

ML Infrastructure Engineer, Safeguards

Similar Jobs

Similar Jobs

AI Research Engineer

HPC Engineer, AI and Data

Staff AI Engineer - AI Product

Research Engineer, Machine Learning (Horizons)

ML Infrastructure Engineer, Safeguards