
Software Engineer - GenAI inference

Software Engineer - GenAI inference

Software Engineer - GenAI inference
Databricks
The Software Engineer for GenAI inference at Databricks will design, develop, and optimize the inference engine for the Foundation Model API, focusing on large language model (LLM) serving systems. The role involves collaboration with researchers and cross-functional teams to enhance performance and scalability of the inference stack.
Qualification
- BS/MS/PhD in Computer Science or a related field
- 3+ years of experience in performance-critical systems
- Solid understanding of ML inference internals including attention, MLPs, and quantization
- Hands-on experience with CUDA and GPU programming
- Comfortable designing and operating distributed systems including RPC frameworks and memory partitioning
- Ability to uncover and solve performance bottlenecks across various layers
- Experience building instrumentation and profiling tools for ML models
- Ability to collaborate closely with ML researchers
Responsibility
- Contribute to the design and implementation of the inference engine optimized for large-scale LLMs inference
- Collaborate with researchers to integrate new model architectures and features into the engine
- Optimize latency, throughput, memory efficiency, and hardware utilization across GPUs and accelerators
- Build and maintain instrumentation, profiling, and tracing tools to identify bottlenecks
- Develop scalable routing, batching, scheduling, memory management, and dynamic loading mechanisms for inference workloads
- Support reliability, reproducibility, and fault tolerance in inference pipelines
- Integrate with federated, distributed inference infrastructure and manage communication overhead
- Document and share learnings, contributing to internal best practices and open-source efforts




