
HPC Operations Engineer

HPC Operations Engineer
Jump Trading
Jump Trading Group seeks a hands-on HPC Operations Engineer to manage Linux HPC environments at scale. The role involves providing operational support, solving complex problems, and collaborating on global projects. The company values innovation, collaboration, and a competitive spirit in a unique culture focused on research and technology in financial markets.
Qualification
- Experience managing Linux HPC environments at scale.
- Strong problem-solving skills and ability to handle complex operational work.
- Familiarity with RDMA fabrics, parallel filesystems, HPC batch schedulers, and FUSE filesystems.
- Ability to write code for automation and diagnostics in multiple programming languages.
- Experience with cybersecurity requirements and IT policies.
Responsibility
- Provide front-line operational support for 24/7 Linux HPC compute, storage, and interconnects.
- Solve problem reports and questions posed by members of Jump's research community, managing the entire problem lifecycle.
- Respond to alerts in a timely fashion.
- Participate in large, coordinated maintenance operations, including during evenings and weekends.
- Work on global projects across a wide range of infrastructure.
- Write code for diagnosing, resolving, and triaging difficult problems and automating frequently performed tasks.
- Collaborate with team members and across teams to write code and testing infrastructures in multiple programming languages.
- Manage relationships with outside vendors, including traveling domestically and internationally.
- Implement and support performance monitoring and fault monitoring systems.
- Develop and improve systems and user documentation.



