

Staff Software Engineer, Slurm

Staff Software Engineer, Slurm
Crusoe
Crusoe is seeking a Staff Software Engineer to join their cloud software team, focusing on building and operating Slurm as a managed cloud service. The role involves designing and scaling systems for GPU-accelerated and high-performance computing, contributing to innovative cloud infrastructure that supports AI workloads sustainably.
Qualification
- 7+ years of experience in software engineering, particularly in Systems Engineering.
- Experience in distributed systems, cloud, or HPC environments is essential.
- 2+ years of programming experience in GoLang.
- Strong proficiency in other systems programming languages.
- Experience with Kubernetes and cloud infrastructure.
Responsibility
- Lead the development and engineering of the managed Slurm offering for AI/ML and HPC customers.
- Contribute to scalable and robust software solutions aligned with Crusoe Cloud's strategic objectives.
- Design, build, and maintain Kubernetes operators and controllers for managing Slurm clusters.
- Drive integration of GPU acceleration in the Slurm environment, including scheduling and resource allocation.
- Ensure proper use of high-performance networking technologies for distributed GPU workloads.
- Implement features like multi-tenancy, cluster lifecycle management, and auto-scaling for Slurm services.
- Develop scalable systems to compete with leading managed services.
- Support peer development through knowledge sharing and technical guidance.




