Focusing on monitoring, observability, and alerting, the remote Site Reliability Engineer will design, implement, and maintain monitoring solutions to ensure the reliability and performance of infrastructure and applications while optimizing incident response times.

Key responsibilities

Design, implement, and maintain monitoring solutions using tools like Prometheus, Grafana, and Datadog
Develop actionable alerting strategies and establish runbooks for efficient incident management
Analyze system performance metrics to identify bottlenecks and implement optimizations for efficiency and scalability

Required qualifications

Must be based in Latin America
Proven experience as a Site Reliability Engineer or in a similar role
Proficiency in logging, metrics, and tracing frameworks (e.g., DataDog, Prometheus)
Experience with cloud platforms (preferably Azure) and infrastructure-as-code tools (e.g., Terraform)
Strong programming and scripting skills in Python and Bash

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...

Apply

Company Overview

Company Company Name

Headquarters Headquarters

Founded Founded

Website

The company description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...

Apply

Site Reliability Engineer

Job Summary

Key responsibilities

Required qualifications

COMPLETE JOB DESCRIPTION

Related Jobs

Applied for this Job?