Senior Site Reliability Engineer
Location: Remote
Compensation: To Be Discussed
Reviewed: Thu, Jun 18, 2026
This job expires in: 30 days
Job Summary
Seeking a full-time Senior Site Reliability Engineer, this remote position will manage the reliability, scalability, and performance of mission-critical services, driving operational excellence and automation while serving as a Datadog expert.
Key responsibilities
- Design, implement, and maintain highly available and resilient systems to enhance customer experience
- Define and enforce best practices for monitoring and alerting using Datadog across the AWS environment
- Develop automation tools and software to improve operational tasks and system reliability while participating in incident management and post-mortems
Required qualifications
- Demonstrated experience in SRE, Production Engineering, or Platform Engineering roles managing production systems at scale
- Proficiency with Kubernetes, AWS, and infrastructure automation tools like Terraform
- Experience defining and using Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for reliability decisions
- Strong programming or scripting skills in languages such as Python, Go, or Bash for building automation and tooling
- Ability to lead post-mortems and manage complex situations during high-severity incidents
COMPLETE JOB DESCRIPTION
The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...