Senior Scientist in Synthetic Data
Location: Remote
Compensation: Salary
Reviewed: Wed, Jun 10, 2026
This job expires in: 6 days
Job Summary
To advance capabilities in synthetic data generation for training frontier models, the full-time Senior Scientist in Synthetic Data will build pipelines using LLM-based methods, collaborate with various teams, and contribute to open-source libraries in a remote or onsite environment.
Key responsibilities
- Build synthetic data generation pipelines to enhance the training of LLMs, focusing on reasoning, coding, and multimodal understanding
- Advance multimodal synthetic data generation in collaboration with NVIDIA's model teams
- Design and maintain open-source libraries and SDKs, ensuring clean APIs and comprehensive documentation
Required qualifications
- PhD in Computer Science, Machine Learning, Statistics, or a related field, or equivalent experience
- 3+ years of research experience in synthetic data generation, generative modeling, or multimodal machine learning
- Deep technical understanding of LLMs and their data requirements for training and inference
- Proven track record of developing or maintaining widely-used software libraries
- Strong publication record at major machine learning and AI conferences
COMPLETE JOB DESCRIPTION
The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...