Software Engineer for AI Infrastructure
Location: Remote
Compensation: Salary
Reviewed: Thu, Jun 04, 2026
This job expires in: 28 days
Job Summary
Focusing on the benchmarking and optimization of distributed training and inference workloads, the full-time Software Engineer for AI Infrastructure will bring up, validate, and debug large-scale AI clusters while working remotely or onsite in various locations.
Key responsibilities
- Bring up, validate, and debug large-scale AI clusters and end-to-end workloads
- Benchmark AI pre-training, post-training, and inference workloads using NVIDIA AI software stacks
- Perform root-cause analysis of failures and contribute to failure-attribution tooling across the cluster
Required qualifications
- Bachelor's or Master's in Computer Science or a related technical field (or equivalent experience)
- 3+ years of experience developing software for AI, HPC, or systems-level applications
- Hands-on experience with multi-GPU or multi-node workloads and CUDA-aware distributed execution
- Experience debugging and scaling distributed systems
- Strong programming skills in Python and C/C++
COMPLETE JOB DESCRIPTION
The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...