
Staff Software Engineer - GenAI Performance and Kernel

Staff Software Engineer - GenAI Performance and Kernel

Staff Software Engineer - GenAI Performance and Kernel
Databricks
The Staff Software Engineer for GenAI Performance and Kernel will design, implement, and optimize high-performance GPU kernels for GenAI inference. This role involves leading kernel-level performance engineering, collaborating with ML researchers and product teams, and mentoring other engineers.
Qualification
- BS/MS/PhD in Computer Science or related field
- Deep hands-on experience writing and tuning compute kernels (CUDA, Triton, OpenCL)
- Strong knowledge of GPU/accelerator architecture
- Experience with advanced optimization techniques
- Familiarity with ML-specific kernel libraries (cuBLAS, cuDNN)
Responsibility
- Lead the design, implementation, benchmarking, and maintenance of core compute kernels optimized for various hardware backends
- Drive the performance roadmap for kernel-level improvements including vectorization, tensorization, and auto-tuning
- Integrate kernel optimizations with higher-level ML systems
- Build and maintain profiling, instrumentation, and verification tooling
- Lead performance investigations and root-cause analysis on inference bottlenecks
- Establish coding patterns and frameworks for modularizing kernels
- Influence system architecture decisions for effective kernel improvements
- Mentor and guide other engineers in lower-level performance




