Driving innovation in model serving and inference architectures for advanced AI systems, the full-time AI Research Engineer will focus on optimizing model deployment and inference strategies while working remotely worldwide.

Key responsibilities

Design and deploy model serving architectures that ensure high throughput and low latency across diverse environments
Build, run, and monitor inference tests in production environments, tracking key performance indicators to validate model performance
Analyze computational efficiency and diagnose bottlenecks in the serving pipeline to optimize infrastructure for scalability and reliability

Required qualifications

A degree in Computer Science or related field, ideally a PhD in NLP, Machine Learning, or a related area
Proven experience in low-level kernel optimizations and inference optimization on mobile devices
Deep understanding of modern model serving architectures and inference optimization techniques
Strong expertise in writing GPU kernels for mobile devices and developing end-to-end inference pipelines
Knowledge of advanced techniques such as Pruning, Quantization, and Diffusion Models

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...

Apply

Company Overview

Company Company Name

Headquarters Headquarters

Founded Founded

Website

Wikipedia Wikipedia URL

The company description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...

Apply

AI Research Engineer

Job Summary

Key responsibilities

Required qualifications

COMPLETE JOB DESCRIPTION

Related Jobs

Applied for this Job?