
Member of Technical Staff, Pre-Training Data

Member of Technical Staff, Pre-Training Data

Member of Technical Staff, Pre-Training Data
Cohere
Cohere is seeking a Machine Learning Engineer specializing in pretraining data to develop and manage data pipelines for advanced language models. The role involves end-to-end management of training data, ensuring quality and diversity, and contributing to the company's mission of enhancing AI capabilities.
Qualification
- Experience in machine learning and data engineering.
- Proficiency in data processing and pipeline development.
- Strong understanding of natural language processing and AI models.
- Ability to work with diverse data sources and ensure data quality.
- Familiarity with data modeling techniques and optimization strategies.
Responsibility
- Design and build scalable data pipelines for diverse datasets including web data, code data, multilingual corpora, and synthetic data.
- Conduct data ablations to assess data quality and experiment with data mixtures to enhance model performance.
- Develop robust data modeling techniques for optimal training efficiency.
- Research and implement innovative data curation methods.
- Bridge the gap between raw data and AI models to improve training metrics.




