Remote Jobs Sign In

Principal Site Reliability Engineer

Location: Remote
Compensation: Salary
Reviewed: Tue, Jun 30, 2026
This job expires in: 27 days

Job Summary

Seeking a Principal Site Reliability Engineer for a hybrid role based in San Jose, CA, or a remote position, who will provide technical vision and hands-on execution to enhance the reliability of a global platform, focusing on automation and observability across multi-cloud infrastructure.

Key responsibilities
  • Design and implement highly available, scalable infrastructure across AWS, Azure, GCP, and bare-metal environments
  • Drive an "automation-first" culture by writing code in Python/Go to eliminate manual toil and build self-healing systems
  • Act as a lead Incident Commander, developing response playbooks and conducting deep-dive post-incident analyses
Required qualifications
  • 10+ years of experience managing reliability, scalability, and availability for large-scale production services
  • Foundational understanding of AI/ML technologies and experience leveraging AI-driven solutions
  • Deep expertise in programming languages such as Python, Go, or C/C++
  • Strong background in networking protocols, Linux/FreeBSD systems, and distributed architecture
  • Experience with ITIL frameworks and incident data during high-stakes incident management

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...