GPU Systems Engineer

wehrtyou•London, United Kingdom; New York, NY, United States; Seattle, Washington, United States

Full Timeon-site full-time python gpu linux devops docker ansible

Apply Now

GPU Systems Engineer

wehrtyou•London, United Kingdom; New York, NY, United States; Seattle, Washington, United States

Full Timeon-site full-time python+5 more

Apply Now

GPU Systems Engineer

wehrtyou

Apply Now

Hudson River Trading (HRT) is seeking GPU Systems Engineers to enhance their HPC/AI research environment. The role involves collaborating with experts to manage large-scale infrastructure, including GPU clusters and petabyte-scale storage, ensuring 24/7 operation of trading and research systems.

Qualification

5+ years of experience in large-scale Linux systems engineering in HPC, AI or distributed infrastructure roles
Extensive experience in Linux system installation, performance tuning, and troubleshooting
Expertise in troubleshooting distributed GPU workloads
Deep knowledge around GPU optimization and performance
Proficiency in Python scripting and automation frameworks
Experience with NVIDIA technologies beyond CUDA, such as NCCL, GPUDirect RDMA, and NVLink
Familiarity with configuration management tools (e.g. Salt, Ansible, Puppet, Chef)
Comfortable diagnosing complex system issues at the hardware, OS, and network levels
Strong communication and organizational skills; able to collaborate across diverse technical teams
Thrive in fast-paced environments and excited by high-impact work

Responsibility

Design, build, and optimize large-scale distributed GPU compute clusters
Identify and resolve GPU workloads’ performance bottlenecks across compute, storage, and networking layers
Collaborate with research and development teams to profile, benchmark, and fine-tune GPU-based workloads
Automate system deployment, monitoring, and troubleshooting across thousands of nodes
Collaborate with research and engineering teams to support evolving workloads
Own critical infrastructure projects — from concept to implementation and support
Test and deploy new hardware and software, and partner with vendors to resolve complex issues

GPU Systems Engineer

GPU Systems Engineer

GPU Systems Engineer

Qualification

Responsibility

Similar Jobs

DataOps Engineer (AI Platform Engineer)

Staff Backend Engineer-RiskOS

Analytics Engineer

Senior Systems Management Specialist

Imaging Algorithms & Firmware Engineer

Similar Jobs

Similar Jobs

DataOps Engineer (AI Platform Engineer)

Staff Backend Engineer-RiskOS

Analytics Engineer

Senior Systems Management Specialist

Imaging Algorithms & Firmware Engineer