Senior Site Reliability Engineer
Location: Remote
Compensation: Salary
Reviewed: Mon, Mar 30, 2026
This job expires in: 18 days
Job Summary
A company is looking for a Senior Site Reliability Engineer.
Key Responsibilities
- Build and maintain observability for AI workloads, including telemetry, dashboards, alerts, and SLO/SLI tracking
- Write automation and tooling to reduce operational toil and improve deployment safety
- Collaborate with product engineering teams to enhance reliability and ensure operational readiness for product releases
Required Qualifications
- 5+ years of experience in SRE, infrastructure engineering, or platform engineering with large-scale distributed systems
- Extensive experience with Kubernetes and containerization at scale
- Experience defining SLOs and using observability tools such as Prometheus and Grafana
- Coding ability in Python or Go for automation and tooling, with experience in CI/CD pipelines
- Interest or experience with AI/ML infrastructure, model serving, or GPU workloads
COMPLETE JOB DESCRIPTION
The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...