Lovelace logo

Software Engineer - Site Reliability Engineer (SRE)

LovelaceLovelace HQ
FullTimedevopsawsterraform+5 more
Apply Now
Lovelace logo

Software Engineer - Site Reliability Engineer (SRE)

Lovelace

Apply Now

Lovelace AI is seeking a highly skilled Site Reliability Engineer (SRE) to ensure the availability, scalability, and performance of AI-powered applications and infrastructure. The role involves bridging software development and operations, focusing on automation and engineering principles to maintain and improve systems.

Qualification

  • 5+ years of experience in site reliability engineering, DevOps, or systems administration.
  • Proven track record of managing complex infrastructure and troubleshooting production issues.
  • Experience with automation tools like Terraform, Ansible, or CloudFormation.
  • Strong understanding of monitoring and observability solutions.
  • Ability to collaborate with software engineering teams on system design.

Responsibility

  • Design, implement, and maintain monitoring, alerting, and observability solutions.
  • Lead troubleshooting efforts for complex production issues and provide root cause analysis.
  • Develop and maintain automation scripts and infrastructure as code using tools like Terraform and Ansible.
  • Collaborate with software engineering teams to ensure new services are scalable and reliable.
  • Participate in on-call rotations to respond to platform emergencies and alerts.
  • Analyze system performance and recommend optimizations for scalability and efficiency.
  • Implement best practices in deployment, monitoring, and incident management.
  • Conduct post-incident reviews and document solutions in a knowledge base.

Similar Jobs