Senior Site Reliability Engineer

Job is Expired
Location: Remote
Compensation: Salary
Reviewed: Mon, Feb 23, 2026

Job Summary

A company is looking for a Senior Site Reliability Engineer, AI Factory.

Key Responsibilities
  • Run commissioning and provisioning for GPU systems and manage firmware versions
  • Monitor hardware state, identify bottlenecks, and ensure peak performance
  • Develop operations strategies and maintain consistency with SLAs across infrastructure
Required Qualifications
  • BS or MS degree in Computer Engineering/Science or related field, with 10+ years of relevant experience
  • Experience managing GPU fleets and improving data center operations
  • Expertise in BMS & Power management and configuration management solutions
  • Experience with Datacenter Inventory Management Systems and developing QCOW2 images
  • Proven track record of collaboration with multiple teams to achieve operational excellence

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...