Job Summary
A data and analytics company has a current position open for a Remote Principal Machine Learning Operations Engineer.
Core Responsibilities Include:
- Contributing to a set of best patterns and practices for deploying cloud-based infrastructure
- Supporting services before they are Generally Available
- Defining and managing SLIs, SLOs, and SLAs for services
Must meet the following requirements for consideration:
- Experience operating high-availability, fault-tolerant, scalable, distributed software/infrastructure
- Strong background with either Scala (Java), Go, or Python programming experience
- Deep experience with existing MLOps frameworks (Databricks, Seldon, Sagemaker, DVC, etc)
- Comfort and ideally substantial experience operating big data infrastructure
- Solid understanding of traffic management and networking concepts
- Experience with stream-processing systems (ksqlDB, Spark Streaming, Apache Beam/Flink, etc)