
Staff Research Engineer, Model Efficiency

Staff Research Engineer, Model Efficiency

Staff Research Engineer, Model Efficiency
Cohere
Cohere is seeking a Staff Research Engineer for their Model Efficiency team, focused on enhancing the efficiency of Large Language Models (LLMs) in AI systems. The role involves developing and deploying techniques to improve model performance while maintaining quality, within a diverse and inclusive remote-friendly environment.
Qualification
- PhD in Machine Learning or a related field.
- Understanding of LLM architecture and optimization techniques.
- Significant experience with model efficiency enhancement techniques.
- Strong software engineering skills.
- Experience with publications at top-tier conferences (ICLR, ACL, NeurIPS).
- Ability to work in a fast-paced, high-ambiguity start-up environment.
- Passion for mentoring others.
Responsibility
- Develop, prototype, and deploy techniques to improve model efficiency in production.
- Optimize LLM inference given resource constraints.
- Explore model architecture and MoE routing optimization.
- Implement decoding and inference-time algorithm improvements.
- Collaborate on software/hardware co-design for GPU acceleration.



