
Research Manager, Interpretability

Research Manager, Interpretability

Research Manager, Interpretability
Anthropic
The Research Manager for Interpretability at Anthropic will lead efforts to reverse engineer neural networks and enhance AI safety through mechanistic interpretability. The role involves working with a dedicated team focused on understanding how AI models function and ensuring their reliability and trustworthiness for users and society.
Qualification
- PhD in a relevant field (e.g., computer science, machine learning, neuroscience)
- Strong background in machine learning and neural networks
- Experience with interpretability research or related fields
- Proven track record of publishing research in top-tier conferences
- Excellent communication and collaboration skills
Responsibility
- Lead research initiatives on mechanistic interpretability of neural networks
- Develop methodologies to reverse engineer AI models
- Collaborate with researchers and engineers to enhance AI safety
- Publish findings and contribute to the scientific community
- Mentor and guide team members in interpretability research



