LLM Inference Kernel Engineer

Location: Remote

Compensation: To Be Discussed

Reviewed: Thu, Apr 02, 2026

This job expires in: 30 days

Job Category: Research

Employer Type: Staffing Agency

Career Level: Experienced

Job Summary

A company is looking for a LLM Inference Kernel Engineer MLA.

Key Responsibilities

Design and implement high-performance GPU kernels for large language model inference workloads
Optimize CUDA kernels focusing on memory efficiency, execution speed, and latency reduction
Collaborate on integrating optimized kernels into modern inference serving frameworks

Required Qualifications

Strong experience developing GPU kernels using CUDA C or C++ in performance-critical environments
Hands-on experience optimizing inference workloads for large language models
Solid understanding of attention mechanisms and advanced implementations
Deep knowledge of GPU architecture, including memory hierarchy and latency tradeoffs
Ability to operate in a fast-paced, highly iterative environment with minimal oversight

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...

Apply

Company Overview

Company Company Name

Headquarters Headquarters

Founded Founded

Website

BBB URL BBB URL

The company description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...