Research Scientist, AI Evaluations
Location: Remote
Compensation: To Be Discussed
Reviewed: Thu, May 21, 2026
This job expires in: 30 days
Job Summary
Leading the design of benchmarks and evaluations, the full-time Research Scientist, AI Evaluations will focus on creating trustworthy evaluation tasks for frontier models, validating these against human baselines, and publishing research that shapes the future of AI evaluation datasets in a remote environment.
Key responsibilities
- Design tasks and benchmarks that distinguish capability levels across various AI models
- Validate evaluations rigorously by analyzing inter-rater reliability and quantifying signal versus noise
- Publish research to establish Protege as a standard-setter for evaluation data and contribute to the AI community
Required qualifications
- Advanced degree (PhD preferred, or MS/BS with equivalent industry experience) in a quantitative field
- Hands-on experience evaluating LLMs, agents, or other ML systems
- Experience with annotator quality and inter-rater reliability
- Excellent scientific writing and communication skills
- A bias toward velocity in project execution and results delivery
COMPLETE JOB DESCRIPTION
The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...