Enabling and ensuring the reliability of production systems, the full-time Senior Site Reliability Engineer will monitor mission-critical services, design scalable distributed systems, and implement incident management frameworks, with opportunities for remote work.

Key responsibilities

Monitor and maintain mission-critical production services to ensure maximum uptime
Design and implement scalable distributed systems to facilitate the development of self-driving vehicles
Participate in an on-call rotation to uphold the SLOs and SLAs of production services

Required qualifications

Expertise in at least one scripting language (e.g. Bash, Python)
Fundamental understanding of Linux operating system internals, TCP/IP networking, and storage subsystems
Experience scaling and securing services in the cloud (AWS, GCP) or cloud native environments
Experience using infrastructure-as-code principles to automate the creation of infrastructure resources (e.g. Terraform, CloudFormation)
Strong experience implementing and debugging cloud native and open source tools such as Kubernetes, etcd, Prometheus, OpenTelemetry, and Istio

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...

Apply

Company Company Name

Headquarters Headquarters

Founded Founded

Website

The company description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...

Apply

Senior Site Reliability Engineer

Job Summary

Key responsibilities

Required qualifications

COMPLETE JOB DESCRIPTION

Related Jobs

Applied for this Job?