Staff Software Engineer, GPU Infrastructure

Location: Remote

Compensation: To Be Discussed

Reviewed: Thu, Jan 15, 2026

This job expires in: 30 days

Job Category: Information Technology

Weekly Hours: Full Time

Employer Type: Employer

Career Level: Entry Level

Job Summary

A company is looking for a Staff Software Engineer, GPU Infrastructure (HPC).

Key Responsibilities

Build and scale ML-optimized HPC infrastructure by deploying and managing Kubernetes-based GPU/TPU superclusters across multiple clouds
Optimize AI/ML training infrastructure for cost efficiency, reliability, and performance in collaboration with cloud providers
Troubleshoot and resolve complex infrastructure issues to ensure minimal disruption to AI/ML workflows

Required Qualifications

Deep expertise in ML/HPC infrastructure, including experience with GPU/TPU clusters and distributed training frameworks
Proven ability to deploy and manage cloud-native Kubernetes clusters for AI workloads
Proficiency in Python and Go, with a preference for open-source contributions
Familiarity with Linux internals, RDMA networking, and performance optimization for ML workloads
Experience collaborating with AI researchers or ML engineers to address infrastructure challenges

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...

Apply

Company Overview

Company Company Name

Headquarters Headquarters

Founded Founded

Website

Wikipedia Wikipedia URL

The company description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...