Cohere logo

Member of Technical Staff, Pre-Training Data

CohereToronto
Apply Now
Cohere logo

Member of Technical Staff, Pre-Training Data

Cohere

Apply Now

Cohere is seeking a Machine Learning Engineer specializing in pretraining data to develop and manage data pipelines for advanced language models. The role involves end-to-end management of training data, ensuring quality and diversity, and contributing to the company's mission of enhancing AI capabilities.

Qualification

  • Experience in machine learning and data engineering.
  • Proficiency in data processing and pipeline development.
  • Strong understanding of natural language processing and AI models.
  • Ability to work with diverse data sources and ensure data quality.
  • Familiarity with data modeling techniques and optimization strategies.

Responsibility

  • Design and build scalable data pipelines for diverse datasets including web data, code data, multilingual corpora, and synthetic data.
  • Conduct data ablations to assess data quality and experiment with data mixtures to enhance model performance.
  • Develop robust data modeling techniques for optimal training efficiency.
  • Research and implement innovative data curation methods.
  • Bridge the gap between raw data and AI models to improve training metrics.

Similar Jobs