
Software Engineer, Inference – AMD GPU Enablement

Software Engineer, Inference – AMD GPU Enablement
OpenAI
OpenAI is seeking a Software Engineer for its Inference team to optimize and scale inference infrastructure on AMD GPU platforms. The role involves working on low-level kernel performance and high-level distributed execution, focusing on enhancing inference performance and collaborating with various teams to ensure smooth operation of AI models on new hardware.
Qualification
- Experience writing or porting GPU kernels using HIP, CUDA, or Triton.
- Familiarity with communication libraries like NCCL/RCCL.
- Experience with distributed inference systems and scaling models across accelerators.
- Problem-solving skills for end-to-end performance challenges across hardware and system libraries.
- Excitement to work in a fast-moving team building new infrastructure from first principles.
Responsibility
- Own bring-up, correctness and performance of the OpenAI inference stack on AMD hardware.
- Integrate internal model-serving infrastructure (e.g., vLLM, Triton) into GPU-backed systems.
- Debug and optimize distributed inference workloads across memory, network, and compute layers.
- Validate correctness, performance, and scalability of model execution on large GPU clusters.
- Collaborate with teams to design and optimize high-performance GPU kernels using HIP, Triton, or other frameworks.
- Build, integrate, and tune collective communication libraries (e.g., RCCL) for parallel model execution.


