Principal Site Reliability Engineer
Job is Expired
Location: Remote
Compensation: Salary
Reviewed: Wed, Jul 30, 2025
Job Summary
A company is looking for a Principal Site Reliability Engineer, AI Infrastructure.
Key Responsibilities
- Architect and scale globally distributed production systems for AI/ML and HPC across hybrid and multi-cloud environments
- Design and implement automation frameworks to enhance system resilience and operational efficiency
- Lead initiatives to assess operational maturity and establish long-term reliability strategies in collaboration with various teams
Required Qualifications
- 15+ years of experience in SRE, Production Engineering, or Cloud Infrastructure
- Deep expertise in Linux/Unix systems and public/private cloud platforms (AWS, GCP, Azure, OCI)
- Expert-level programming skills in Python and familiarity with languages such as C++, Go, or Rust
- Experience with Kubernetes, microservice orchestration, and observability frameworks
- Degree in Computer Science or related field, or equivalent experience
COMPLETE JOB DESCRIPTION
The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...
Job is Expired