Senior HPC Infrastructure Engineer

Job is Expired
Location: Remote
Compensation: Salary
Reviewed: Sat, Aug 30, 2025

Job Summary

A company is looking for a Senior GPU and HPC Infrastructure Engineer - DGX Cloud.

Key Responsibilities
  • Contribute to the automation of datacenter operations, break/fix, and lifecycle management for large-scale Machine Learning systems
  • Implement monitoring and health management capabilities for GPU assets to ensure reliability and scalability
  • Build automated test infrastructure for qualifying distributed systems and ensure software integration across engineering teams
Required Qualifications
  • 5+ years of software engineering experience on large-scale production systems
  • BS in Computer Science, Engineering, Physics, Mathematics, or equivalent experience
  • Expert knowledge of a systems programming language (Go, Python) and Linux system administration
  • Understanding of cluster management systems (Kubernetes, SLURM) and complex distributed systems
  • Familiarity with performance, security, and reliability in distributed systems

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...