Leading the global platform reliability, the full-time Staff Site Reliability Engineer will drive the observability strategy on Google Cloud Platform (GCP), manage complex networking infrastructure, and optimize high-throughput data environments, all while working remotely from anywhere in the United States or Canada.

Key Responsibilities:

Architect, optimize, and troubleshoot complex networking infrastructure across all OSI layers
Design and scale the unified observability platform using the Grafana Labs suite
Deploy machine learning models for automated anomaly detection and intelligent alerting

Required Qualifications:

8+ years of experience in SRE, Production Engineering, or Distributed Systems infrastructure roles
Expertise in Google Kubernetes Engine (GKE) and orchestration/containerization
Proven experience managing high-throughput Apache Kafka pipelines and large-scale data environments
Hands-on experience with the Grafana ecosystem, including Grafana Enterprise/Cloud and Prometheus
Advanced proficiency in Go and Python for custom infrastructure tooling and data integration

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...

Apply

Company Company Name

Headquarters Headquarters

Founded Founded

Website

Wikipedia Wikipedia URL

BBB URL BBB URL

The company description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...

Apply

Staff Site Reliability Engineer

Job Summary

Key Responsibilities:

Required Qualifications:

COMPLETE JOB DESCRIPTION

Related Jobs

Applied for this Job?