Principal Software Engineer
Location: Remote
Compensation: Salary
Reviewed: Mon, May 18, 2026
This job expires in: 29 days
Job Summary
To shape the technical direction for production engineering, the full-time Principal Software Engineer will define strategies for large-scale GPU cluster operations, focusing on automation and reliability in both cloud and on-prem environments.
Key responsibilities
- Define and execute the technical strategy for DGX Cloud cluster operations, emphasizing automation and reliability
- Lead the design and implementation of systems for cluster lifecycle management, validation, and observability
- Mentor engineers and influence cross-functional teams in platform, infrastructure, and operational standards
Required qualifications
- 15+ years of experience in building and operating large-scale distributed systems or cloud infrastructure
- Deep expertise in Kubernetes, Linux, infrastructure automation, and production operations
- Strong programming skills in Go, Python, or similar languages
- Proven ability to lead complex cross-organizational technical initiatives
- BS/MS in Computer Science or equivalent experience
COMPLETE JOB DESCRIPTION
The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...