
Machine Learning Ops Engineer

Machine Learning Ops Engineer
Stratum AI
We are seeking a Machine Learning Ops Engineer to join our Infrastructure Team, focusing on building and maintaining a platform for AI models in the mining industry. This remote-first position in Canada involves collaboration with Technical Services and Platform teams to deliver valuable solutions to clients.
Qualification
- Bachelor's degree in Computer Science, Engineering, or related fields OR equivalent experience in software development and ML engineering
- 3+ years of industry experience
- Proficiency in Kubernetes and PyTorch
- Advanced Python programming skills
- Proficiency with data science libraries (numpy, pandas)
- Experience with visualization tools
- Ability to write modular, robust, and tested Python code
- Strong debugging skills for complex ML systems
- Deep learning experience with implementation of neural network models and training workflows
Responsibility
- Develop robust and well-tested code for core internal tools
- Create data preprocessing modules for mining data
- Implement metrics calculations and evaluation pipelines
- Build visualization tools for 3D models and ML performance metrics
- Troubleshoot and fix issues in existing metrics code
- Build and maintain our custom end-to-end MLOps platform
- Implement experiment tracking systems
- Create model registry with versioning and storage
- Develop automated testing frameworks
- Build interfaces between different components of the ML pipeline
- Develop production-grade QA/QC systems for deployed AI models
- Implement input data validation
- Create automated alerts for performance issues
- Set up monitoring for data drift
- Build dashboards for model performance metrics
- Create specialized tools for mining data
- Implement spatial data processing utilities
- Build visualization tools for 3D geological data
- Develop data converters between different mining data formats
- Create utilities for coordinate transformations
- Refactor and productionize code created by the client services team
- Convert notebooks into modular Python packages
- Implement proper error handling and logging
- Add comprehensive testing to existing code
- Improve performance of data processing pipelines
- Provide technical expertise to the client services team
- Manage infrastructure for data processing, model training, and serving
- Mentor junior engineers, perform code reviews, and write documentation
- Proactively identify technical challenges and drive improvement initiatives




