Remote Jobs Sign In

Staff Machine Learning Engineer

Location: Remote
Compensation: Salary
Reviewed: Fri, Jun 12, 2026
This job expires in: 22 days

Job Summary

Owning the infrastructure that powers AI, the full-time Staff Machine Learning Systems Engineer will design, build, and operate production systems for AI workloads, focusing on Kubernetes, CI/CD pipelines, and observability in a remote setting.

Key responsibilities
  • Own and scale the AI compute and deployment platform, including Kubernetes operations and GitOps-based deployment pipelines
  • Build and maintain inference and model-serving infrastructure, ensuring reliable serving patterns for LLM-powered workflows
  • Manage observability and tracing systems, defining SLOs and incident response for AI infrastructure reliability
Required qualifications
  • 8+ years of experience in infrastructure, platform, DevOps, or SRE engineering, with at least 3 years focused on ML/AI systems in production
  • Deep experience with Kubernetes and cloud-native ecosystem tools, including autoscaling and GitOps
  • Strong infrastructure-as-code skills, particularly with Terraform, and experience in secure cloud architecture design
  • Proficiency in Python with experience in building production infrastructure tooling and observability pipelines
  • Experience operating LLM-based systems in production, including inference routing and reliability patterns

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...