Liquid AI logo

Member of Technical Staff - Training Infrastructure Engineer

Liquid AISan Francisco
Apply Now
Liquid AI logo

Member of Technical Staff - Training Infrastructure Engineer

Liquid AI

Apply Now

Liquid is seeking a Member of Technical Staff - Training Infrastructure Engineer to design and implement scalable training infrastructure for AI models. The role involves optimizing data pipelines, communication patterns, and checkpointing mechanisms for large-scale multimodal models. Liquid, spun out of MIT, focuses on building efficient AI systems that operate on-device and at the edge.

Qualification

  • Extensive experience in building distributed training infrastructure for language and multimodal models
  • Hands-on expertise in frameworks like PyTorch Distributed, DeepSpeed, or Megatron-LM
  • Deep understanding of hardware accelerators and networking topologies
  • Skilled at identifying and resolving performance bottlenecks in training pipelines
  • Experience with diverse data types (text, images, video, audio) and building efficient data pipelines

Responsibility

  • Design and implement high-performance, scalable training infrastructure for GPU clusters
  • Build robust data loading systems to eliminate I/O bottlenecks
  • Develop sophisticated checkpointing mechanisms for model recovery
  • Optimize communication patterns between nodes for distributed training
  • Collaborate with engineers and researchers to enhance training infrastructure

Similar Jobs