Cohere logo

Member of Technical Staff, Training Performance Engineer

CohereLondon
FullTimeremotefull-timepython+5 more
Apply Now
Cohere logo

Member of Technical Staff, Training Performance Engineer

Cohere

Apply Now

Cohere is seeking a Member of Technical Staff, Training Performance Engineer to optimize the performance of advanced language models and systems. The role involves improving training metrics, enhancing model performance, and developing training tools in a collaborative, remote-friendly environment. Candidates should have strong software engineering skills and experience with machine learning frameworks.

Qualification

  • Extremely strong software engineering skills.
  • Proficiency in Python and related ML frameworks such as JAX, PyTorch, and XLA/MLIR.
  • Experience writing kernels for GPUs using CUDA and Triton.
  • Experience with large-scale distributed training strategies.
  • Familiarity with autoregressive sequence models, such as Transformers.

Responsibility

  • Design and write high-performant and scalable software for training.
  • Understand architectural modifications and design choices and their effects on training throughput and quality.
  • Write low-level CUDA and Triton kernels to optimize performance from accelerators.
  • Research, implement, and experiment with ideas on supercompute and data infrastructure.
  • Collaborate with top researchers in the field.

Similar Jobs