Job Summary
An IT company is seeking a Remote Machine Learning Artificial Intelligence Platform Senior Site Reliability Engineer.
Candidates will be responsible for the following:
- Designing, building, and maintaining AI/ML infrastructure that supports every stage of ML workflow
- Influencing architectural decisions with focus on security, scalability, and high-performance
- Fostering sound infrastructure engineering principles and representing engineering values
Applicants must meet the following qualifications:
- BS or MS in Computer Science / related technical fields or equivalent combination of graduate degree and work experience
- 5+ years of work experience in Site Reliability particularly working with cloud providers or large scale systems
- 5+ years of experience in scripting or coding using python or bash programming languages
- Experience in designing, deploying, and securing cloud-based AI/ML infrastructure
- Experience in containerization technologies, orchestration platform, and CI/CD framework
- Experience in writing code to deploy and automate infrastructure