OpenAI logo

Software Engineer, Frontier Systems

OpenAISan Francisco
FullTimepythonsqldevops+5 more
Apply Now
OpenAI logo

Software Engineer, Frontier Systems

OpenAI

Apply Now

The Frontier Systems team at OpenAI is responsible for building and maintaining the largest supercomputers used for advanced AI model training. The Software Engineer role focuses on developing critical infrastructure to ensure the reliability and efficiency of these systems, minimizing disruptions during large-scale training runs.

Qualification

  • 7+ years of industry experience in software engineering
  • Proficiency in Python and shell scripting
  • Comfort with SQL, PromQL, and Pandas for data analysis
  • Experience in developing reproducible analyses
  • Ability to balance building and operationalizing solutions

Responsibility

  • Own and improve system health checks for hyperscale supercomputers
  • Lead investigations into hardware failures and system-level bugs
  • Build automation to monitor and resolve issues across thousands of machines
  • Ensure uninterrupted operations for researchers during model training
  • Develop reproducible analyses and operationalize solutions

Similar Jobs