Job Summary
A business services company is in need of a Remote Senior Site Reliability Engineer.
Core Responsibilities of this position include:
- Defining standard practices around monitoring, incidents, blameless postmortems, releases and other maintenance activities
- Driving and owning the measuring of SLI/SLO and ensure team is meeting set goals for availability and SLA
- Handling cross team performance issues from identification of the cause, determining the areas of improvement and driving those actions to closure
Position Requirements Include:
- Bachelor’s degree in Computer Science, Computer Engineering or similar discipline
- Strong technical knowledge of cloud infrastructure, distributed systems and reliability practices
- Hand-on experience with tools such as ELK (Elasticsearch, Logstash, Kibana), Grafana, CloudWatch, Jenkins, Jira etc.
- Strong troubleshooting/problem solving skills with ability to make swift informed judgment calls
- Outstanding written and verbal communication skills with demonstrated ability to communicate effectively with all levels of an organization
- Knowledge of Chaos engineering and Automation frameworks (self-healing)