Senior Software Engineer
Location: Remote
Compensation: Salary
Reviewed: Thu, Jun 04, 2026
This job expires in: 30 days
Job Summary
Leading the optimization and benchmarking of distributed training and inference workloads, the full-time Senior Software Engineer will manage large-scale AI clusters and ensure efficient performance across NVIDIA GPU platforms, with opportunities for remote work.
Key responsibilities
- Lead the bring-up, validation, and debugging of large-scale AI clusters and end-to-end workloads
- Profile and optimize workload performance using tools such as Nsight Systems and NCCL tests
- Conduct root-cause analysis of failures and build resilience and failure-attribution capabilities for large clusters
Required qualifications
- Bachelor's or Master's in Computer Science or a related technical field (or equivalent experience)
- 8+ years of experience in software infrastructure for large-scale AI or HPC systems
- Expertise in debugging and triaging AI applications across the full stack
- Deep hands-on experience with NCCL and CUDA-aware distributed execution
- Proficient in Python and C/C++ programming
COMPLETE JOB DESCRIPTION
The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...