Senior Site Reliability Engineer

Location: Remote
Compensation: Salary
Reviewed: Mon, Mar 30, 2026
This job expires in: 18 days

Job Summary

A company is looking for a Senior Site Reliability Engineer.

Key Responsibilities
  • Build and maintain observability for AI workloads, including telemetry, dashboards, alerts, and SLO/SLI tracking
  • Write automation and tooling to reduce operational toil and improve deployment safety
  • Collaborate with product engineering teams to enhance reliability and ensure operational readiness for product releases
Required Qualifications
  • 5+ years of experience in SRE, infrastructure engineering, or platform engineering with large-scale distributed systems
  • Extensive experience with Kubernetes and containerization at scale
  • Experience defining SLOs and using observability tools such as Prometheus and Grafana
  • Coding ability in Python or Go for automation and tooling, with experience in CI/CD pipelines
  • Interest or experience with AI/ML infrastructure, model serving, or GPU workloads

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...