Senior ML Engineer

Location: Remote

Compensation: Salary

Reviewed: Mon, May 25, 2026

This job expires in: 30 days

Job Category: Research

Employer Type: Employer

Career Level: Experienced, Senior Level

Job Summary

To lead the technical direction of inference optimization, the full-time Senior ML Engineer will focus on enhancing throughput, reducing latency, and optimizing KV cache utilization for LLMs, working remotely in a high-autonomy role.

Key responsibilities

Push throughput through continuous batching, speculative decoding, and kernel-level tuning across various LLM frameworks
Cut latency by profiling and addressing bottlenecks related to compute, memory bandwidth, and scheduling
Optimize KV cache usage with advanced techniques such as paged attention and quantized KV to improve throughput

Required qualifications

5+ years of experience building real ML systems, particularly in inference or training infrastructure
Strong proficiency in Python for production services
Hands-on experience with vLLM, SGLang, or TensorRT-LLM and understanding of inference engine performance
Fluency in quantization tradeoffs and practical experience measuring quality regressions
Comfort with distributed systems and their failure modes in multi-GPU and multi-node setups

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...

Apply

Company Overview

Company Company Name

Headquarters Headquarters

Founded Founded

Website

The company description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...