Anthropic logo

Research Scientist, Interpretability

AnthropicSan Francisco, CA
Full Timemachine-learningaipython+5 more
Apply Now
Anthropic logo

Research Scientist, Interpretability

Anthropic

Apply Now

Anthropic is seeking a Research Scientist for its Interpretability team, focused on understanding and improving the safety of AI systems through mechanistic interpretability. The role involves reverse-engineering neural networks to enhance trust and reliability in AI technologies.

Qualification

  • PhD in a relevant field (e.g., computer science, machine learning, neuroscience)
  • Strong background in machine learning and neural networks
  • Experience with interpretability research or related areas
  • Proficiency in programming languages such as Python
  • Ability to work collaboratively in a research team

Responsibility

  • Conduct research on mechanistic interpretability of neural networks
  • Develop methodologies to reverse-engineer trained models
  • Collaborate with a team of researchers and engineers
  • Publish findings in relevant scientific forums
  • Contribute to the design and implementation of interpretability tools

Similar Jobs