Senior HPC Infrastructure Engineer
Job is Expired
Location: Remote
Compensation: Salary
Reviewed: Sat, Aug 30, 2025
Job Summary
A company is looking for a Senior GPU and HPC Infrastructure Engineer - DGX Cloud.
Key Responsibilities
- Contribute to the automation of datacenter operations, break/fix, and lifecycle management for large-scale Machine Learning systems
- Implement monitoring and health management capabilities for GPU assets to ensure reliability and scalability
- Build automated test infrastructure for qualifying distributed systems and ensure software integration across engineering teams
Required Qualifications
- 5+ years of software engineering experience on large-scale production systems
- BS in Computer Science, Engineering, Physics, Mathematics, or equivalent experience
- Expert knowledge of a systems programming language (Go, Python) and Linux system administration
- Understanding of cluster management systems (Kubernetes, SLURM) and complex distributed systems
- Familiarity with performance, security, and reliability in distributed systems
COMPLETE JOB DESCRIPTION
The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...
Job is Expired