Focused on ensuring the reliability and scalability of CloudBlue's multi-tenant SaaS platforms, the full-time remote Site Reliability Engineer will define SLIs and SLOs, influence system architecture for fault tolerance, and reduce operational toil through automation and process improvements.

Key responsibilities

Define and implement SLIs, SLOs, and error budgets for critical CloudBlue services
Design and operate CloudBlue's observability stack using tools like Datadog and Grafana
Act as a senior responder during production incidents, leading incident coordination and service restoration

Required qualifications

3+ years of experience as an SRE, DevOps Engineer, or Production Engineer
Proven experience operating highly available, enterprise-grade, multi-tenant SaaS platforms
Hands-on experience with observability and monitoring tools such as Datadog and Grafana
Solid understanding of Linux, networking, and distributed systems fundamentals
Experience with containerized environments such as Docker and Kubernetes

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...

Apply

Company Overview

Company Company Name

Headquarters Headquarters

Founded Founded

Website

Wikipedia Wikipedia URL

BBB URL BBB URL

The company description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...

Apply

Site Reliability Engineer

Job Summary

Key responsibilities

Required qualifications

COMPLETE JOB DESCRIPTION

Related Jobs

Applied for this Job?