Research Scientist, AI Evaluations

Location: Remote

Compensation: To Be Discussed

Reviewed: Thu, May 21, 2026

This job expires in: 30 days

Job Category: Research

Weekly Hours: Full Time

Employer Type: Employer

Education Level: Bachelors, Masters, Doctorate

Job Summary

Leading the design of benchmarks and evaluations, the full-time Research Scientist, AI Evaluations will focus on creating trustworthy evaluation tasks for frontier models, validating these against human baselines, and publishing research that shapes the future of AI evaluation datasets in a remote environment.

Key responsibilities

Design tasks and benchmarks that distinguish capability levels across various AI models
Validate evaluations rigorously by analyzing inter-rater reliability and quantifying signal versus noise
Publish research to establish Protege as a standard-setter for evaluation data and contribute to the AI community

Required qualifications

Advanced degree (PhD preferred, or MS/BS with equivalent industry experience) in a quantitative field
Hands-on experience evaluating LLMs, agents, or other ML systems
Experience with annotator quality and inter-rater reliability
Excellent scientific writing and communication skills
A bias toward velocity in project execution and results delivery

COMPLETE JOB DESCRIPTION

The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...

Apply

Company Overview

Company Company Name

Headquarters Headquarters

Founded Founded

Website

The company description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...