Site Reliability Engineer

Location: Remote
Compensation: Salary
Reviewed: Wed, Dec 10, 2025
This job expires in: 17 days

Job Summary

A company is looking for a Site Reliability Engineer to enhance observability and reliability practices within a distributed environment.

Key Responsibilities
  • Own and evolve the observability stack using various monitoring tools and AWS services
  • Design and maintain SLIs, SLOs, and error budgets to improve system reliability
  • Support incident investigations and maintain observability cost efficiency
Required Qualifications
  • Hands-on experience with production observability systems like Prometheus and Grafana
  • Experience with Thanos or large-scale metrics systems
  • Strong understanding of SLIs, SLOs, and incident response workflows
  • Solid experience with Kubernetes and Infrastructure as Code (Terraform preferred)
  • Proficiency in scripting or programming (Go, Python, or Bash)

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...