Senior Reliability Engineer
Location: Remote
Compensation: Salary
Reviewed: Mon, Jun 22, 2026
This job expires in: 26 days
Job Summary
Passionate about building world-class reliability systems, the full-time Senior Reliability Engineer will develop and implement an organization-wide reliability strategy for DGX Cloud, focusing on operational excellence and incident response in a 24/7 environment.
Key responsibilities
- Build and guide the organization-wide reliability strategy, enhancing operational practices
- Establish and maintain a rigorous SLO program, ensuring high standards across teams
- Lead incident response for high-severity incidents, promoting effective resolution
Required qualifications
- 10+ years of industry experience with a Bachelor's or Master's degree, or equivalent experience
- Deep, hands-on experience with large-scale production systems
- Strong software engineering skills in Go, Python, or similar languages
- Proven experience in establishing and maintaining an SLO program
- Practical experience in reliability fields such as chaos engineering and failure injection
COMPLETE JOB DESCRIPTION
The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...