Remote Jobs Sign In

Principal Site Reliability Engineer

Location: Remote
Compensation: Salary
Reviewed: Fri, May 29, 2026
This job expires in: 30 days

Job Summary

Seeking a Principal Site Reliability Engineer for a hybrid or remote role, this full-time position will design and implement scalable infrastructure across multiple cloud environments while driving an "automation-first" culture to enhance system reliability and observability.

Key responsibilities
  • Design and implement highly available, scalable infrastructure across AWS, Azure, GCP, and bare-metal environments
  • Drive an "automation-first" culture by writing code (Python/Go) to eliminate manual toil and build self-healing systems
  • Act as a lead Incident Commander, developing response playbooks and conducting deep-dive post-incident analyses
Required qualifications
  • 10+ years of experience managing reliability, scalability, and availability for large-scale production services
  • Deep expertise in programming (e.g., Python, Go, or C/C++)
  • Strong background in networking protocols, Linux/FreeBSD systems, and distributed architecture
  • Experience in high-stakes incident management and participation in a 24/7 on-call rotation
  • Proficiency in leveraging ITIL frameworks and incident data to drive service maturity

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...