Principal Developer, AI Networking
Location: Remote
Compensation: Salary
Reviewed: Fri, Jun 12, 2026
This job expires in: 26 days
Job Summary
Focusing on profiling, analyzing, and optimizing AI workloads on large-scale GPU and CPU clusters, the full-time Principal Developer, AI Networking will work remotely or onsite to enhance distributed Deep Learning LLM training and inference, with an emphasis on networking and performance analysis.
Key responsibilities
- Characterizing AI workloads and deep learning models for large-scale LLM training and inference on NVIDIA supercomputers
- Benchmarking, profiling, and analyzing performance to identify bottlenecks and optimization opportunities, particularly in networking
- Developing tools for PyTorch trace-based profiling and collaborating with cross-functional teams to provide performance analysis insights
Required qualifications
- B.Sc in Computer Science or Software Engineering or equivalent experience
- 15+ years of experience with high-performance networking (RDMA, MPI, NCCL, SHARP)
- Demonstrated ability in performance evaluation techniques and approaches
- Experience with NVIDIA GPUs, the CUDA library, and deep learning frameworks like TensorFlow or PyTorch
- Proficiency in programming languages: Python, Bash, and C++
COMPLETE JOB DESCRIPTION
The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...