
Member of Technical Staff - Training Infrastructure Engineer

Member of Technical Staff - Training Infrastructure Engineer

Member of Technical Staff - Training Infrastructure Engineer
Liquid AI
Liquid is seeking a Member of Technical Staff - Training Infrastructure Engineer to design and implement scalable training infrastructure for AI models. The role involves optimizing data pipelines, communication patterns, and checkpointing mechanisms for large-scale multimodal models. Liquid, spun out of MIT, focuses on building efficient AI systems that operate on-device and at the edge.
Qualification
- Extensive experience in building distributed training infrastructure for language and multimodal models
- Hands-on expertise in frameworks like PyTorch Distributed, DeepSpeed, or Megatron-LM
- Deep understanding of hardware accelerators and networking topologies
- Skilled at identifying and resolving performance bottlenecks in training pipelines
- Experience with diverse data types (text, images, video, audio) and building efficient data pipelines
Responsibility
- Design and implement high-performance, scalable training infrastructure for GPU clusters
- Build robust data loading systems to eliminate I/O bottlenecks
- Develop sophisticated checkpointing mechanisms for model recovery
- Optimize communication patterns between nodes for distributed training
- Collaborate with engineers and researchers to enhance training infrastructure




