
Senior Software Engineer, Training Efficiency

Senior Software Engineer, Training Efficiency

Senior Software Engineer, Training Efficiency
Waymo
Waymo, an autonomous driving technology company, seeks a Senior Software Engineer for its ML Infrastructure team. The role focuses on enhancing the efficiency of input data pipelines for large-scale machine learning training workloads, collaborating with researchers and engineers to optimize performance and scalability of ML systems.
Qualification
- B.S. in Computer Science, Math, or 5+ years equivalent real-world experience.
- Proficient in distributed systems design with an understanding of ML data pipeline optimization.
- Experience with ML frameworks, including TensorFlow and JAX.
- Hands-on experience with libraries like Grain or tf.data service.
- Solid programming skills in Python and C++.
- Practical familiarity with profiling tools to uncover performance bottlenecks.
Responsibility
- Design and improve distributed input data pipelines for large-scale ML training workloads.
- Collaborate with researchers and ML engineers to resolve bottlenecks in data pipeline performance.
- Improve runtime goodput of ML training workload, optimizing input data processing systems for scalability and reliability.
- Implement and maintain advanced ML infrastructure tools, including ML Pathways, Grain, JAX, and TensorFlow.
- Evaluate and integrate modern technologies to enhance performance and scalability of ML systems.
- Promote best practices for distributed systems architecture and contribute to technical leadership within the team.



