Owning the reliability and operational health of the live production environment, the full-time Site Reliability Engineer (Incident Manager) will manage incidents from detection to resolution, lead post-mortems, and drive platform improvements, with the flexibility to work fully remote or hybrid in Brisbane.

Key responsibilities

Monitor the live production environment to proactively identify potential issues or anomalies
Serve as incident commander during outages, managing communications and coordinating cross-functional responses
Lead post-mortems and root cause analyses for significant incidents, ensuring action items are addressed and shared with the team

Required qualifications

Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience
3-5 years of experience as a Livesite Engineer, Site Reliability Engineer, Incident Manager, or in a comparable production operations role
Solid knowledge of Linux/Unix systems, networking concepts, and web technologies
Proficiency in scripting languages (Python, Bash) for automation and tooling
Hands-on experience with monitoring and alerting tools (e.g., Splunk, Datadog, Prometheus, Grafana)

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...

Apply

Company Company Name

Headquarters Headquarters

Founded Founded

Website

The company description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...

Apply

Site Reliability Engineer

Job Summary

Key responsibilities

Required qualifications

COMPLETE JOB DESCRIPTION

Related Jobs

Applied for this Job?