Cartesia logo

Inference Engineer

Cartesia*HQ - San Francisco, CA
Apply Now
Cartesia logo

Inference Engineer

Cartesia

Apply Now

Cartesia is a pioneering AI company focused on developing real-time multimodal intelligence. The Inference Engineer role involves designing and building scalable model inference systems using advanced machine learning techniques, particularly Transformers and State Space Models. The company values strong engineering skills and offers a collaborative work environment in San Francisco with competitive benefits.

Qualification

  • Strong engineering skills with experience in complex codebases and writing maintainable code.
  • Experience in building large-scale distributed systems with high performance and reliability.
  • Technical leadership skills with a track record of delivering results in ambiguous situations.
  • Background in inference pipelines with machine learning and generative models.
  • Experience with inference frameworks like vLLM, SGLang, or Continuous Batching is preferred.
  • Familiarity with CUDA, Triton, or similar technologies is preferred.

Responsibility

  • Design and build low latency, scalable, and reliable model inference and serving stack for foundation models.
  • Collaborate with research and product engineering teams to ensure fast and reliable product delivery.
  • Develop robust inference infrastructure and monitoring systems for products.
  • Shape product development and impact the application of AI across devices and applications.
  • Implement state-of-the-art machine learning models and research into practical applications.

Similar Jobs