LLM Inference Kernel Engineer

Location: Remote
Compensation: To Be Discussed
Reviewed: Thu, Apr 02, 2026
This job expires in: 30 days

Job Summary

A company is looking for a LLM Inference Kernel Engineer MLA.

Key Responsibilities
  • Design and implement high-performance GPU kernels for large language model inference workloads
  • Optimize CUDA kernels focusing on memory efficiency, execution speed, and latency reduction
  • Collaborate on integrating optimized kernels into modern inference serving frameworks
Required Qualifications
  • Strong experience developing GPU kernels using CUDA C or C++ in performance-critical environments
  • Hands-on experience optimizing inference workloads for large language models
  • Solid understanding of attention mechanisms and advanced implementations
  • Deep knowledge of GPU architecture, including memory hierarchy and latency tradeoffs
  • Ability to operate in a fast-paced, highly iterative environment with minimal oversight

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...