To design and optimize infrastructure for GenAI and LLM workloads, the full-time remote Architect - Platform Engineer will implement scalable solutions, perform GPU profiling, and manage compute-intensive jobs while collaborating with cross-functional teams to deploy cutting-edge AI applications.

Key Responsibilities

Design and implement scalable infrastructure for LLM and GenAI workloads across multi-GPU environments
Perform GPU profiling, benchmarking, and performance optimization for distributed training workloads
Manage and schedule compute-intensive jobs using Slurm-based clusters and OpenShift/Kubernetes environments

Required Qualifications

Strong experience with Slurm and distributed training environments
Hands-on expertise with Red Hat OpenShift and/or Kubernetes
Deep knowledge of the NVIDIA GPU ecosystem (CUDA, cuDNN, NCCL, Triton)
Experience deploying GenAI workloads (LLM fine-tuning, RAG pipelines)
Familiarity with Infrastructure-as-Code tools (Terraform, Ansible)

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...

Apply

Company Company Name

Headquarters Headquarters

Founded Founded

Website

BBB URL BBB URL

The company description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...

Apply

Architect - Platform Engineer

Job Summary

Key Responsibilities

Required Qualifications

COMPLETE JOB DESCRIPTION

Related Jobs

Applied for this Job?