Remote Jobs Sign In

Senior Incident Manager

Location: Remote
Compensation: Salary
Reviewed: Wed, Jun 03, 2026
This job expires in: 30 days

Job Summary

Leading critical incident response across AI data center infrastructure, the full-time Senior Incident Manager will coordinate rapid resolution of service-impacting events, improve operational resilience, and drive incident management best practices in a remote environment.

Key responsibilities
  • Lead the response to critical incidents impacting AI infrastructure and GPU clusters, serving as the Incident Commander during major outages
  • Own the incident response lifecycle, ensuring timely communication and maintaining incident documentation and operational playbooks
  • Conduct post-incident reviews and root cause analysis to identify reliability gaps and implement corrective actions
Required qualifications
  • 8+ years of experience in incident management, site reliability engineering, or infrastructure operations
  • Strong understanding of data center operations, GPU compute clusters, and cloud infrastructure platforms
  • Proven ability to lead high-pressure incident response situations
  • Experience with incident management frameworks (ITIL, SRE, or equivalent) and incident tracking tools such as PagerDuty and ServiceNow
  • Excellent communication and stakeholder management skills

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...