Site Reliability Engineer
Location: Remote
Compensation: To Be Discussed
Reviewed: Fri, May 29, 2026
This job expires in: 30 days
Job Summary
Focusing on monitoring, observability, and alerting, the remote Site Reliability Engineer will design, implement, and maintain monitoring solutions to ensure the reliability and performance of infrastructure and applications while optimizing incident response times.
Key responsibilities
- Design, implement, and maintain monitoring solutions using tools like Prometheus, Grafana, and Datadog
- Develop actionable alerting strategies and establish runbooks for efficient incident management
- Analyze system performance metrics to identify bottlenecks and implement optimizations for efficiency and scalability
Required qualifications
- Must be based in Latin America
- Proven experience as a Site Reliability Engineer or in a similar role
- Proficiency in logging, metrics, and tracing frameworks (e.g., DataDog, Prometheus)
- Experience with cloud platforms (preferably Azure) and infrastructure-as-code tools (e.g., Terraform)
- Strong programming and scripting skills in Python and Bash
COMPLETE JOB DESCRIPTION
The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...