
Software Engineer, Internal Infrastructure (North America)

Software Engineer, Internal Infrastructure (North America)

Software Engineer, Internal Infrastructure (North America)
Cohere
Cohere is seeking a Software Engineer for its Internal Infrastructure team to build and operate Kubernetes GPU superclusters across multiple clouds, supporting AI researchers in optimizing infrastructure for model training. The role emphasizes collaboration, stability, scalability, and observability in AI workloads, with opportunities for growth at various career levels.
Qualification
- Deep experience running Kubernetes clusters at scale
- Strong programming skills in Go or Python
- Experience with Cloud Native infrastructure and Infrastructure as Code
- Preference for contributing to Open Source solutions
- Self-directed, adaptable, and excels at identifying and solving problems
Responsibility
- Build and operate Kubernetes compute superclusters across multiple clouds
- Partner with cloud providers to optimize infrastructure costs, performance, and reliability for AI workloads
- Work closely with research teams to understand their infrastructure needs and improve stability, performance, and efficiency of model training techniques
- Design and build resilient, scalable systems for training AI models with intuitive user interfaces for researchers
- Encourage software best practices and participate in team processes such as knowledge sharing, reviews, and on-call duties




