Exness Global Limited logo

DataOps Engineer (AI Platform Engineer)

Full TimeUSD 80,000 – 180,000 per year (estimated)devopsdockeraws+13 more
Apply Now
Exness Global Limited logo

DataOps Engineer (AI Platform Engineer)

Exness Global Limited

Apply Now

At Exness, we are not just a leading trading broker—we’ve reimagined what it takes to be a leader. With 40M+ trades a day and 2,000+ people across 13 countries, we combine scale, care, and real tech to make trading better for 1M+ clients worldwide.

Recognised globally as a Best Place to Work, we’re a people-first company where long-term wins always matter more. As part of our team, you will shape the future of fintech with real technology, care, and purpose.

Why this role matters

You will design and operate an on-prem AI platform for deploying and scaling models, working across multi-node GPU clusters, distributed systems, and Kubernetes. You will be responsible for building reliable and efficient infrastructure for large-scale model inference, ensuring optimal GPU utilization, performance, and availability of the platform.

The role is based in our Limassol office, Cyprus. In case of relocation, we offer full relocation support for you and your family to make your move smooth and worry-free.

What you'll actually do

  • Close collaboration with infrastructure teams on selection and configuring GPU servers, high-performance networking, and RDMA-enabled clusters.
  • Perform and manage GPU MIG configurations based on workload requirements and model characteristics.
  • Ensure reliable and scalable GPU operations in Kubernetes, including runtime integration, device plugins, and GPU scheduling capabilities.
  • Design, deploy, and maintain model serving runtimes, including vLLM, ONNX, SGLang, Nvidia Triton Runtimes, and KServe, ensuring high performance, scalability, and efficient GPU utilization.
  • Build and maintain CI/CD pipelines and tooling for model packaging, versioning, and deployment, enabling reliable and model delivery for internal teams.
  • Build and maintain platform tooling for model lifecycle management, including experiment tracking, model versioning, and registry systems (e.g. MLflow).
  • Enable infrastructure and workflows for model fine-tuning and adaptation (e.g. LoRA), focusing on scalability, reproducibility, and automation within the platform.
  • Develop and support internal tooling for managing model inputs and configurations (e.g. prompt templates), enabling consistent and reusable model usage patterns.
  • Conduct performance testing and evaluation of multi-node GPU clusters to identify and resolve bottlenecks.
  • Build and maintain observability for GPU clusters and model workloads, including metrics such as GPU utilization, memory usage, throughput, and latency.
  • Integrate tracing for model inference workflows to provide end-to-end visibility into requests, and model behavior.
  • Ensure compliance with security requirements for platform development.
  • Evaluate and benchmark model inference performance across different runtimes, hardware setups, and configurations to guide platform optimization.

Who we’re looking for

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field
  • 5+ years of experience in infrastructure, platform engineering, or distributed systems, preferably in environments involving machine learning or GPU workloads
  • Strong experience with Kubernetes, including deploying and operating production workloads
  • Experience with Linux-based environments
  • Strong programming skills in Python and/or Go
  • Experience working with GPU infrastructure, including NVIDIA or AMD stack and multi-GPU environments will be considered highly advantageous
  • Understanding of distributed systems and multi-node workloads
  • Experience with model serving and inference systems (e.g. vLLM, ONNX, SGLang, Nvidia Triton Runtimes, KServe)
  • Experience with CI/CD pipelines and automation for deploying services or models
  • Experience with monitoring and observability tools (metrics, tracing, logging)
  • Nice to have familiarity with networking concepts relevant to distributed systems (e.g. RDMA, high-performance networking)
  • Good communication and problem-solving skills
  • Ability to use advanced English for different work and business purposes
  • Critical thinking and attention to detail
  • Decision-making skills and the ability to adapt to new changes
  • Ability to write concise and clear documentation
  • Capability of dealing with constructive critics and knowing how to develop relationships with the team to achieve common goals

What we offer along the way

  • Competitive salary and annual performance bonus
  • Full relocation support for you and your family — flights, housing, visas, and legal assistance included
  • Top-tier health insurance with full family coverage — medical, dental, vision, mental health — plus life insurance for peace of mind
  • Unlimited learning opportunities: external courses, English lessons, career and leadership development
  • Education allowance covering school and kindergarten fees
  • 21 working days of annual leave, plus public holidays and fully paid sick, maternity, and paternity leave
  • Employee appreciation program: branded gifts, birthday day-offs, celebration budgets for weddings, newborns, and milestones
  • “Get to know Team” trips — meet colleagues across our global hubs, along with company-wide offsites that raise the bar
  • Employee share scheme — grow with us
  • Branded MINI Cooper Countryman company car and private parking
  • Free in-house sports clubs, Sanctum Club gym access, and jet skis
  • Access to a Corporate doctor
  • Exclusive discount program with cafes, gyms, and local services
  • Expat tax perks: up to 50% income tax exemption
  • Support with the naturalisation process for relocated employees

At Exness, we know that changing jobs - and changing countries - is a big step. That’s why relocation with Exness is different. We make it smooth, supported, and truly life-changing.

What your journey looks like

  1. Intro call with Recruiter (30 minutes)
  2. English check (if needed)
  3. Tech interview (90 minutes)
  4. Behavioural interview (60 minutes)

What it's like here

Curious about what working at Exness really looks like? Follow us on Instagram and LinkedIn.

We share the real Exness experience - our people, ideas, moments, and everything in between.

Sounds like you? Apply.

Please note: We occasionally amend or withdraw Exness jobs and reserve the right to do so at any time, including prior to the advertised closing date. Before applying, you are advised to read our data protection policy. This policy describes the processing that may be associated with your personal data and informs you that your personal data may be transferred to Exness/Exness Group companies around the world. Exness Group and its approved recruitment consultants will never ask you for a fee to process or consider your application for a career with Exness. Anyone who demands such a fee is not an authorized Exness representative and you are strongly advised to refuse any such demand.

At Exness, we're an equal opportunity employer where every individual is valued. No matter your race, color, religion, sex, national origin, sexual orientation, gender identity or disability, we welcome you. As an international fintech company, we embrace the richness of our diverse team, respecting each individual and promoting gender equality for all genders in our workforce.

Similar Jobs