Site Reliability Engineer
Location: Remote
Compensation: Salary
Reviewed: Fri, Jun 05, 2026
This job expires in: 30 days
Job Summary
Owning the reliability and operational health of the live production environment, the full-time Site Reliability Engineer (Incident Manager) will manage incidents from detection to resolution, lead post-mortems, and drive platform improvements, with the flexibility to work fully remote or hybrid in Brisbane.
Key responsibilities
- Monitor the live production environment to proactively identify potential issues or anomalies
- Serve as incident commander during outages, managing communications and coordinating cross-functional responses
- Lead post-mortems and root cause analyses for significant incidents, ensuring action items are addressed and shared with the team
Required qualifications
- Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience
- 3-5 years of experience as a Livesite Engineer, Site Reliability Engineer, Incident Manager, or in a comparable production operations role
- Solid knowledge of Linux/Unix systems, networking concepts, and web technologies
- Proficiency in scripting languages (Python, Bash) for automation and tooling
- Hands-on experience with monitoring and alerting tools (e.g., Splunk, Datadog, Prometheus, Grafana)
COMPLETE JOB DESCRIPTION
The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...