Platform Support Engineer
Location: Remote
Compensation: To Be Discussed
Reviewed: Fri, May 15, 2026
This job expires in: 30 days
Job Summary
Platform Support Engineer, a remote full-time position supporting ML engineers with large-scale training and inference workloads across cloud infrastructure, Kubernetes, and GPU platforms, while diagnosing failures and improving reliability.
Key Responsibilities
- Partner with customer engineering teams to resolve complex distributed systems and ML infrastructure issues
- Investigate failures involving distributed training, Kubernetes orchestration, and GPU allocation
- Identify patterns in customer issues to drive long-term reliability improvements and contribute to operational enhancements
Required Qualifications
- Strong software engineering and systems troubleshooting background
- Experience with Kubernetes and containerized environments
- Hands-on experience operating machine learning workloads in production or research environments
- Familiarity with GPU infrastructure and orchestration
- Experience with observability and debugging tools such as Prometheus or Grafana
COMPLETE JOB DESCRIPTION
The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...