LLM Inference Kernel Engineer
Location: Remote
Compensation: To Be Discussed
Reviewed: Thu, Apr 02, 2026
This job expires in: 30 days
Job Summary
A company is looking for a LLM Inference Kernel Engineer MLA.
Key Responsibilities
- Design and implement high-performance GPU kernels for large language model inference workloads
- Optimize CUDA kernels focusing on memory efficiency, execution speed, and latency reduction
- Collaborate on integrating optimized kernels into modern inference serving frameworks
Required Qualifications
- Strong experience developing GPU kernels using CUDA C or C++ in performance-critical environments
- Hands-on experience optimizing inference workloads for large language models
- Solid understanding of attention mechanisms and advanced implementations
- Deep knowledge of GPU architecture, including memory hierarchy and latency tradeoffs
- Ability to operate in a fast-paced, highly iterative environment with minimal oversight
COMPLETE JOB DESCRIPTION
The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...