exa logo

Software Engineer, Infrastructure

exaSan Francisco, California
Apply Now
exa logo

Software Engineer, Infrastructure

exa

Apply Now

Exa is seeking an Infra Engineer to join their Infrastructure Team, which is responsible for building the foundational tooling and infrastructure for their AI-driven search engine. The role involves designing and operating large-scale systems, particularly focusing on GPU clusters and Kubernetes orchestration, to enhance performance and reliability.

Qualification

  • Experience with large-scale infrastructure design and operation.
  • Familiarity with GPU clusters and Kubernetes.
  • Knowledge of cloud batch job systems.
  • Strong focus on reliability, observability, and optimization.
  • Ability to work in a fast-paced engineering environment.

Responsibility

  • Design and operate large-scale infrastructure, including GPU clusters and Kubernetes clusters.
  • Build GPU cluster orchestration using Kubernetes.
  • Scale AWS batch job systems to manage map-reduce jobs across thousands of machines.
  • Develop GPU scheduling software to optimize cluster utilization.
  • Implement observability tooling for production systems.

Similar Jobs