Research Engineer, Pretraining Scaling

Anthropic•San Francisco, CA

Full Timemachine-learning ai python tensorflow pytorch docker kubernetes devops

Apply Now

Research Engineer, Pretraining Scaling

Anthropic•San Francisco, CA

Full Timemachine-learning ai python+5 more

Apply Now

Research Engineer, Pretraining Scaling

Anthropic

Apply Now

Anthropic is seeking a Research Engineer for its ML Performance and Scaling team, focusing on training production pretrained models to ensure reliability and efficiency. The role involves a blend of research and engineering, requiring deep technical expertise in large-scale ML systems and a passion for the field.

Qualification

Hands-on experience training large language models
Deep expertise with JAX, TPU, PyTorch, or large-scale distributed systems
Enjoy both research and engineering work, ideally with a 50/50 split
Strong problem-solving skills and ability to work under pressure during model launches
Experience with performance optimization and observability in ML systems

Responsibility

Own critical aspects of the production pretraining pipeline, including model operations, performance optimization, observability, and reliability
Debug and resolve complex issues across the full stack—from hardware errors and networking to training dynamics and evaluation infrastructure
Design and run experiments to improve training efficiency, reduce step time, increase uptime, and enhance model performance
Respond to on-call incidents during model launches, diagnosing problems quickly and coordinating solutions across teams
Build and maintain production logging, monitoring dashboards, and evaluation infrastructure
Add new capabilities to the training codebase, such as long context support or novel architectures
Collaborate closely with teammates across SF and London, as well as with Tokens, Architectures, and Systems teams
Contribute to the team's institutional knowledge by documenting systems, debugging approaches, and lessons learned

Research Engineer, Pretraining Scaling

Research Engineer, Pretraining Scaling

Research Engineer, Pretraining Scaling

Qualification

Responsibility

Similar Jobs

AI Research Engineer

ML Infrastructure Engineer, Safeguards

Staff AI Engineer - AI Product

Lead Machine Learning Engineer

AI Engineer

Similar Jobs

Similar Jobs

AI Research Engineer

ML Infrastructure Engineer, Safeguards

Staff AI Engineer - AI Product

Lead Machine Learning Engineer

AI Engineer