Job Summary
A cloud software company needs applicants for an opening for a Telecommute Senior Principal Site Reliability Engineer.
Candidates will be responsible for the following:
- Actively developing Service Level Objectives (SLOs) for our most critical services of the platform
- Responding to pages generated by automated monitoring and alerting
- Joining an incident response team as a Subject Matter Expert
Skills and Requirements Include:
- You need to have worked with complex distributed systems
- Must be familiar with how the internet and web applications work
- Comfortable reading and writing code with a team in at least one of Ruby, Go, Python, or Erlang
- Experience with AWS services like EC2, ELB, EKS, S3
- Experience with cloud computing patterns