Job Summary
A company that offers online and mobile food ordering platforms is searching for a person to fill their position for a Remote Principal Database Site Reliability Engineer.
Core Responsibilities of this position include:
- Building a system for providing self-service management of caching clusters
- Raising the top-line reliability of Grubhub by building solutions to make stateful data stores more reliable under heavy load
- Building complex automation systems to allow for large clusters to self-heal in cases of hardware failures
Qualifications for this position include:
- Experience as a Principal-level Site Reliability Engineer working on extremely high throughput systems
- Understanding how to scale a distributed system, failure modes, monitoring, et cetera
- Experience writing code that automates complex operations. We want someone who can write OO, testable code, not just scripts
- Knowledge of the Linux kernel and familiarity with tools to debug performance at a system-call level
- Experience tuning the JVM. You should understand the various garbage collectors, how memory is managed
- Experience with a NoSQL database