Site Reliability Engineer
Location: Remote
Compensation: To Be Discussed
Reviewed: Fri, Jun 19, 2026
This job expires in: 29 days
Job Summary
Seeking a Site Reliability Engineer Specialist to work remotely in a full-time capacity, responsible for leading observability and incident response efforts, defining instrumentation standards, and mentoring engineers across teams.
Key responsibilities
- Own the technical direction of the observability stack, defining instrumentation standards for Java and Node.js services
- Establish meaningful SLIs, SLOs, and error budgets, partnering with engineering and product teams to drive engineering decisions
- Lead major incident response as a senior incident commander and conduct blameless postmortems with actionable follow-through
Required qualifications
- 8+ years in SRE, infrastructure, or platform engineering, with experience at Specialist or Principal level in large-scale production systems
- Deep production experience with Kubernetes (preferably GKE) and strong observability background with OpenTelemetry and centralized logging
- Hands-on experience operating stateful services in production, including PostgreSQL, MongoDB Atlas, Redis, or RabbitMQ
- Proven track record leading incident response and SLO programs that influenced engineering behavior
- Strong communication skills in both English and Portuguese, with the ability to collaborate across remote-first teams
COMPLETE JOB DESCRIPTION
The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...