Job Summary
A company is looking for a Site Reliability Engineer to design, implement, and maintain the reliability and efficiency of their platforms.
Key Responsibilities:
- Lead the design, implementation, and management of highly available and scalable systems
- Collaborate with cross-functional teams to identify performance bottlenecks, troubleshoot complex issues, and optimize system performance
- Drive automation initiatives to streamline deployment, configuration management, and infrastructure provisioning processes
Required Qualifications:
- Minimum of 10 years of professional experience in a Site Reliability Engineering role or similar capacity
- Strong experience with cloud technologies (e.g., AWS, Azure, GCP) and infrastructure as code (e.g., Terraform, Ansible)
- Proficiency in programming and scripting languages (e.g., Python, Go, Bash) and RPA (e.g. Blue Prism, UIPath) to automate tasks and develop tools
- Deep understanding of containerization and orchestration technologies (e.g., Kubernetes, Docker)
- Expertise in implementing and managing monitoring and logging solutions (e.g., Zabbix, Nagios, Prometheus, ELK stack)