Crusoe logo

Staff Software Engineer, Slurm

CrusoeSan Francisco, CA - US
FullTimefull-timegokubernetes+5 more
Apply Now
Crusoe logo

Staff Software Engineer, Slurm

Crusoe

Apply Now

Crusoe is seeking a Staff Software Engineer to join their cloud software team, focusing on building and operating Slurm as a managed cloud service. The role involves designing and scaling systems for GPU-accelerated and high-performance computing, contributing to innovative cloud infrastructure that supports AI workloads sustainably.

Qualification

  • 7+ years of experience in software engineering, particularly in Systems Engineering.
  • Experience in distributed systems, cloud, or HPC environments is essential.
  • 2+ years of programming experience in GoLang.
  • Strong proficiency in other systems programming languages.
  • Experience with Kubernetes and cloud infrastructure.

Responsibility

  • Lead the development and engineering of the managed Slurm offering for AI/ML and HPC customers.
  • Contribute to scalable and robust software solutions aligned with Crusoe Cloud's strategic objectives.
  • Design, build, and maintain Kubernetes operators and controllers for managing Slurm clusters.
  • Drive integration of GPU acceleration in the Slurm environment, including scheduling and resource allocation.
  • Ensure proper use of high-performance networking technologies for distributed GPU workloads.
  • Implement features like multi-tenancy, cluster lifecycle management, and auto-scaling for Slurm services.
  • Develop scalable systems to compete with leading managed services.
  • Support peer development through knowledge sharing and technical guidance.

Similar Jobs