Replit logo

Staff Site Reliability Engineer

ReplitFoster City, CA (Hybrid) In office M,W,F
FullTimedevopskubernetesdocker+5 more
Apply Now
Replit logo

Staff Site Reliability Engineer

Replit

Apply Now

Replit is seeking a Staff Site Reliability Engineer to join their SRE team, focusing on ensuring the reliability, scalability, and performance of their infrastructure. The role involves implementing automation, establishing best practices, and mentoring the engineering team to prioritize reliability.

Qualification

  • Proven experience in Site Reliability Engineering or related fields.
  • Strong knowledge of cloud platforms, particularly GCP.
  • Experience with Kubernetes, Docker, and infrastructure automation tools like Terraform or Pulumi.
  • Ability to design and implement observability solutions and performance monitoring.
  • Experience in incident management and leading post-mortem processes.

Responsibility

  • Architect and implement observability solutions including monitoring, logging, and tracing.
  • Define and drive reliability standards by working with product and engineering teams to track SLOs and SLIs.
  • Lead incident management and response during high-impact incidents, conducting post-mortems and implementing preventative measures.
  • Drive automation and infrastructure as code, improving CI/CD pipelines and creating self-healing systems.
  • Optimize performance on Kubernetes and collaborate with teams to enhance cloud deployments.
  • Debug and harden distributed systems to improve reliability and performance.

Similar Jobs