
Member of technical staff (Inference)

Member of technical staff (Inference)
hcompany
H is an innovative AI startup focused on developing agentic AI to automate complex tasks and enhance human potential. The Inference team is seeking a technical staff member to optimize inference pipelines and model performance, contributing to cutting-edge AI technology.
Qualification
- MS or PhD in Computer Science, Machine Learning or related fields
- Proficient in Python, Rust or C/C++
- Experience in GPU programming such as CUDA, Open AI Triton, Metal
- Experience in model compression and quantization techniques
- Strong communication and presentation skills
- Collaborative mindset, thriving in dynamic teams
Responsibility
- Develop scalable, low-latency and cost effective inference pipelines
- Optimize model performance: memory usage, throughput, and latency using advanced techniques
- Develop specialized GPU kernels for performance-critical tasks
- Collaborate with research teams on model architectures
- Review state-of-the-art papers to improve inference techniques
- Prioritize and implement state-of-the-art inference techniques




