
Site Reliability Engineer (SRE) - Media Production Infrastructure

Site Reliability Engineer (SRE) - Media Production Infrastructure
Monks
Monks is looking for a highly skilled Site Reliability Engineer (SRE) to join their Platform Engineering team, supporting a media production environment for a global technology company. The role focuses on ensuring high availability, performance, and resilience of critical systems, with responsibilities including infrastructure management, storage expertise, and monitoring.
Qualification
- 14+ years of experience in Site Reliability Engineering or related fields.
- Strong expertise in Storage Area Network (SAN) management and troubleshooting.
- Proficient in networking and system administration, including DNS and directory services.
- Experience with monitoring tools and creating custom dashboards for system observability.
- Ability to provide on-site and remote support in a 24/7 operational environment.
Responsibility
- Maintain and troubleshoot all production hardware, servers, and storage infrastructure, focusing on the Storage Area Network (SAN).
- Execute maintenance and support for the SAN environment, including firmware/software updates for fiber switches, RAIDs, and ape systems.
- Manage Directory services, network services (DNS, static IPs, subnet masks), and configure shares and permissions on the SAN.
- Manage and improve custom dashboards for 24/7 monitoring of systems, RAIDs, temperature sensors, and backup/archive processes.
- Contribute to the development and maintenance of custom applications and dashboards that support media workflows.
- Provide active on-site support and participate in a 24/7 on-call rotation for critical interventions.
- Manage the Backup and Archive environment, maintain tape systems, and prepare projects for archiving to the cloud.



