
Principal Engineer - Observability

Principal Engineer - Observability
CoreWeave
CoreWeave is seeking a Principal Engineer to lead the architecture, development, and operations of their Observability product, focusing on enhancing customer experience and system performance for AI workloads. The role involves strategic leadership, advanced solution design, and operational responsibilities to ensure high reliability and performance of observability metrics.
Qualification
- Extensive experience in software engineering and architecture, particularly in observability and monitoring solutions.
- Strong understanding of AI workloads and their operational requirements.
- Proven track record of designing and implementing high-scale, low-latency systems.
- Experience with telemetry analysis and performance optimization.
- Ability to collaborate effectively with cross-functional teams and communicate with stakeholders at all levels.
Responsibility
- Lead the strategy and implementation for Observability, ensuring alignment with business goals and performance objectives.
- Design and implement advanced solutions, including low-latency, high-scale Observability pipelines across all products.
- Build solutions that offer insights to customers for rapid troubleshooting of their AI workloads.
- Champion initiatives to improve the reliability, durability, and self-healing capabilities of Observability metrics, and assume operational responsibilities.
- Help shape customer experience by promoting unparalleled visibility into our systems’ performance and reliability with customer facing metrics and dashboards.
- Analyze telemetry for production systems to identify opportunities for improvement in performance and reliability.
- Develop operational review practices for storage engineering to assess performance against targets and iterating on those targets.
- Act as a trusted advisor to senior leadership, providing insights on storage industry trends.




