Senior ML Engineer

Location: Remote

Compensation: Salary

Reviewed: Mon, May 25, 2026

This job expires in: 30 days

Job Category: Research

Employer Type: Employer

Career Level: Experienced, Senior Level

Job Summary

To optimize LLM inference performance, the full-time Senior ML Engineer will lead the technical direction of inference optimization at Kimchi, focusing on throughput, latency, and cache utilization while working remotely.

Key responsibilities

Push throughput by implementing continuous batching, speculative decoding, and kernel-level tuning across various inference engines
Cut latency by profiling and identifying bottlenecks in compute, memory bandwidth, scheduling, and networking
Quantize models without quality regression, measuring performance on real workloads and optimizing memory footprint

Required qualifications

5+ years of experience building real ML systems, particularly in inference or training infrastructure
Strong proficiency in Python for production services
Hands-on experience with vLLM, SGLang, or TensorRT-LLM, with an understanding of inference engine performance
Fluency with quantization tradeoffs and experience measuring quality regressions
Comfort with distributed systems and practical failure modes of multi-GPU and multi-node setups

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...

Apply

Company Overview

Company Company Name

Headquarters Headquarters

Founded Founded

Website

The company description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...