Job Summary
A software company is searching for a person to fill their position for a Telecommute Principal Service Reliability Engineer.
Candidates will be responsible for the following:
- Developing automation to autocorrect or completely prevent issues in our online solution
- Identifying single points of failure and other high-risk architecture issues
- Proposing and implementing more resilient resolutions
Qualifications for this position include:
- 10+ years of experience managing Linux servers
- 5+ years of experience with enterprise systems monitoring
- 5+ years of experience with enterprise configuration management tools
- 5+ years of experience programming in at least one object-oriented language
- 3+ years of experience delivering hosted services
- 1+ year(s) of experience with Kubernetes and Docker-based containers
- Solid understanding of standard TCP/IP networking and common protocols