Senior Site Reliability Engineer

Location: Remote
Compensation: To Be Discussed
Reviewed: Wed, Mar 11, 2026
This job expires in: 20 days

Job Summary

A company is looking for a Senior Site Reliability Engineer.

Key Responsibilities
  • Own fleet reliability and lead the strategy for SaaS infrastructure, including defining SLOs and capacity planning
  • Design and evolve infrastructure on GCP and AWS, focusing on non-deterministic AI workloads
  • Drive operational excellence by evolving incident management practices and leveraging AI for root cause analysis
Required Qualifications
  • 5+ years of experience operating cloud infrastructure (GCP and/or AWS) with Terraform and Kubernetes
  • Experience or strong interest in operating LLM-based systems or agentic workloads
  • Understanding of distributed systems principles and their application in infrastructure decisions
  • Proficiency in at least one modern programming language (TypeScript, Java, Go, or Python)
  • Ability to communicate complex infrastructure trade-offs to technical and non-technical stakeholders

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...