
Software Engineer - Model API's

Software Engineer - Model API's
Baseten
Baseten is seeking a Software Engineer to join their Model Performance team, focusing on the development and optimization of Model APIs that power hosted API endpoints for AI models. The role involves working with distributed systems, model serving, and enhancing developer experience, contributing to a high-impact team in a rapidly growing AI company backed by significant funding.
Qualification
- 3+ years experience building and operating distributed systems or large-scale APIs.
- Proven track record in optimizing performance for AI models and APIs.
- Strong understanding of CUDA and GPU programming.
- Experience with benchmarking and performance measurement frameworks.
- Familiarity with API design principles and best practices.
Responsibility
- Design, build, and operate the Model APIs surface with a focus on advanced inference capabilities.
- Profile and optimize TensorRT-LLM kernels and analyze CUDA kernel performance.
- Implement custom CUDA operators and tune memory allocation patterns for maximum throughput.
- Build comprehensive benchmarking frameworks to measure performance across different model architectures and hardware configurations.
- Productionize performance improvements across runtimes, including speculative decoding and quantization.
- Instrument deep observability metrics and build repeatable benchmarks for speed and reliability.
- Implement platform fundamentals such as API versioning, validation, and authentication.
- Collaborate with other teams to enhance developer-friendly model serving experiences.




