Site Reliability Engineer
Location: Remote
Compensation: Salary
Reviewed: Fri, May 29, 2026
This job expires in: 30 days
Job Summary
Focused on ensuring the reliability and scalability of CloudBlue's multi-tenant SaaS platforms, the full-time remote Site Reliability Engineer will define SLIs and SLOs, influence system architecture for fault tolerance, and reduce operational toil through automation and process improvements.
Key responsibilities
- Define and implement SLIs, SLOs, and error budgets for critical CloudBlue services
- Design and operate CloudBlue's observability stack using tools like Datadog and Grafana
- Act as a senior responder during production incidents, leading incident coordination and service restoration
Required qualifications
- 3+ years of experience as an SRE, DevOps Engineer, or Production Engineer
- Proven experience operating highly available, enterprise-grade, multi-tenant SaaS platforms
- Hands-on experience with observability and monitoring tools such as Datadog and Grafana
- Solid understanding of Linux, networking, and distributed systems fundamentals
- Experience with containerized environments such as Docker and Kubernetes
COMPLETE JOB DESCRIPTION
The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...