Senior Site Reliability Engineer
Location: Remote
Compensation: To Be Discussed
Reviewed: Wed, Mar 11, 2026
This job expires in: 20 days
Job Summary
A company is looking for a Senior Site Reliability Engineer.
Key Responsibilities
- Own fleet reliability and lead the strategy for SaaS infrastructure, including defining SLOs and capacity planning
- Design and evolve infrastructure on GCP and AWS, focusing on non-deterministic AI workloads
- Drive operational excellence by evolving incident management practices and leveraging AI for root cause analysis
Required Qualifications
- 5+ years of experience operating cloud infrastructure (GCP and/or AWS) with Terraform and Kubernetes
- Experience or strong interest in operating LLM-based systems or agentic workloads
- Understanding of distributed systems principles and their application in infrastructure decisions
- Proficiency in at least one modern programming language (TypeScript, Java, Go, or Python)
- Ability to communicate complex infrastructure trade-offs to technical and non-technical stakeholders
COMPLETE JOB DESCRIPTION
The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...