Technical Lead - AI Inferences

Location: Remote
Compensation: To Be Discussed
Reviewed: Thu, May 07, 2026
This job expires in: 25 days

Job Summary

A company is looking for a Technical Lead - AI Inferences.

Key Responsibilities
  • Architect and oversee the deployment of high-throughput, low-latency LLM inference pipelines
  • Mentor and lead a small team of developers, conducting code reviews and sprint planning
  • Implement and evaluate state-of-the-art KV cache management solutions to optimize inference
Required Qualifications
  • Proven experience with KV cache reuse, speculative decoding, and continuous batching
  • Deep familiarity with serving frameworks such as vLLM, LMCache, and NIXL
  • Expertise in backend engineering with Python, C++, or Rust, and knowledge of CUDA and GPU memory management
  • Experience with Kubernetes (K8s) for scaling GPU workloads

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...