Remote Jobs Sign In

Senior Reliability Engineer

Location: Remote
Compensation: Salary
Reviewed: Mon, Jun 22, 2026
This job expires in: 26 days

Job Summary

Passionate about building world-class reliability systems, the full-time Senior Reliability Engineer will develop and implement an organization-wide reliability strategy for DGX Cloud, focusing on operational excellence and incident response in a 24/7 environment.

Key responsibilities
  • Build and guide the organization-wide reliability strategy, enhancing operational practices
  • Establish and maintain a rigorous SLO program, ensuring high standards across teams
  • Lead incident response for high-severity incidents, promoting effective resolution
Required qualifications
  • 10+ years of industry experience with a Bachelor's or Master's degree, or equivalent experience
  • Deep, hands-on experience with large-scale production systems
  • Strong software engineering skills in Go, Python, or similar languages
  • Proven experience in establishing and maintaining an SLO program
  • Practical experience in reliability fields such as chaos engineering and failure injection

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...