

Software Engineer, Infrastructure (All Levels)

Software Engineer, Infrastructure (All Levels)
Radai
About Rad Ai
- Influence the technical direction for infrastructure and platform capabilities that support our rapidly growing AI product suite.
- Architect and evolve our cloud infrastructure (primarily on AWS) across container orchestration (Kubernetes, Elastic Container Service), serverless (e.g., Lambda), virtual machines (e.g., EC2), and data stores to support current and future products.
- Work closely with Platform leadership, product engineering, data, and ML teams to design systems that are robust, observable, and compliant in a healthcare environment.
- Define and drive infrastructure strategy for the Platform org—partnering with engineering leadership to align roadmaps, set standards, and sequence work for maximum business impact.
- Secure networking, identity, and access patterns across environments.
- Improve reliability and operational excellence by defining SLOs, SLIs, and error budgets for core platform services.
- Leading and participating in blameless post-incident reviews and translating learnings into systemic improvements.
- Own observability and monitoring strategy across logging, metrics, and tracing, ensuring we can detect, debug, and prevent issues efficiently.
- Mentor and level up engineers across Platform and product teams—reviewing design docs, guiding architecture decisions, and modeling high standards for reliability, security, and maintainability.
- Partner with security and compliance stakeholders to ensure our infrastructure and operational practices meet HIPAA and other healthcare requirements.
- Advocate for and implement developer experience improvements, such as better CI/CD workflows, faster feedback loops, and tooling that reduces cognitive load for product teams.
- Bring 4+ years of hands-on infrastructure / platform development experience (or equivalent practical experience) in modern, cloud-native environments, with a track record of owning critical systems in production.
- Have deep expertise with AWS (preferred) and/or GCP, including core networking, compute, storage, and managed services.
- Are highly proficient in at least one programming/scripting language used for infrastructure work (Python preferred).
- Extensive experience building tooling and automation for other engineers.
- Have strong experience with Kubernetes, containers (Docker), and container orchestration, and understand how to operate these systems reliably at scale.
- Are comfortable with Infrastructure as Code (Terraform preferred, Pulumi, or similar) and Git-based workflows.
- Possess solid Linux fundamentals and are comfortable debugging issues at the OS, networking, and application layers.
- Have demonstrable experience leading complex, cross-team initiatives from design through rollout—communicating tradeoffs, aligning stakeholders, de-risking launches, and measuring impact.
- Communicate clearly and empathetically with both technical and non-technical partners, and enjoy mentoring engineers at multiple levels.
- Take a data-informed, pragmatic approach to decision-making—balancing ideal architecture with business needs, delivery timelines, and team capacity.
- Experience in regulated environments (e.g., HIPAA) or prior work in healthcare or health tech.
- Background in platform or security engineering, especially around access control, encryption, auditability, and compliance.
- Experience working closely with ML / data teams or with ML platforms (e.g., Airflow, Ray, ML pipelines, model serving stacks).
- Familiarity with observability stacks (CloudWatch, New Relic, Grafana, OpenTelemetry, etc.).
- Experience designing or operating internal developer platforms, SDKs, or reusable frameworks that standardize how services are built and deployed.
- Prior experience at a fast-growing startup where you've helped scale infrastructure, processes, and teams.
- Comprehensive Medical, Dental, Vision & Life insurance
- HSA (with employer match), FSA, & DCFSA
- 401(k)
- 11 Paid Company Holidays
- Location Flexibility
- Flexible PTO policy
- Annual company-wide offsite
- Periodic team offsites
- Annual equipment stipend
- For roles based outside the US, your recruiter can share more details




