Data Acquisition Engineer
Location: Remote
Compensation: To Be Discussed
Reviewed: Mon, May 18, 2026
This job expires in: 28 days
Job Summary
Data Acquisition Engineer, a full-time remote position focused on developing systems for large-scale web crawling and data acquisition to support the training of frontier models for software development.
Key Responsibilities
- Design and operate a large-scale web crawler for acquiring publicly accessible data
- Develop specialized crawlers targeting high-value sources to enhance data recall
- Collaborate with teams to align data acquisition with model training needs and build ingestion pipelines
Required Qualifications
- Strong background in distributed systems and experience with large-scale data pipelines
- Proficiency in Python and experience with web crawling or large-scale data extraction
- Familiarity with cloud platforms (AWS) and container orchestration (Kubernetes, Docker)
- Understanding of data privacy and responsible crawling practices
- Experience in building pre-training datasets for large language models is a plus
COMPLETE JOB DESCRIPTION
The job description is available to subscribers. Subscribe today to get the full benefits of a premium membership with Virtual Vocations. We offer the largest remote database online...