ML Data Pipeline Consultant

Pedro Alves

Freelance · Mid-level · Denver

About the role

Fixed-scope engagement. We have a data pipeline that feeds our ML training jobs. It was built eighteen months ago by a contractor who is no longer available. It works well enough for our current data volume but has two specific problems: incremental updates are not implemented — we run full reloads on a 6-hour schedule — and the pipeline has no data quality checks, so bad records reach training data and we only find out after a model degrades. We need a consultant to implement incremental processing and add data quality gates. The pipeline runs on Databricks and the destination is Snowflake. The source systems are a PostgreSQL transactional database and three third-party API feeds. Timeline: 8 weeks. Deliverables: incremental pipeline implementation, data quality check library with documented rules, and a handover document the internal team can extend. Budget: fixed at $12,000. We are not looking for a proposal on how to rebuild the pipeline from scratch. We are looking for someone who can work with what exists and fix the specific problems.

Key Deliverables

List the expected deliverables for this project.

Audit the existing pipeline and document current behaviour, data flow, and identified failure modes
Implement incremental processing to replace the full-reload pattern — Delta Lake CDC or an agreed equivalent
Build a data quality check library covering completeness, referential integrity, and range validation for the three source systems
Test the incremental pipeline end-to-end with at least two weeks of production data before handover
Deliver a handover document covering architecture, operational runbook, and extension guide

Requirements

Technical stack needed for this mission.

Demonstrated experience implementing incremental data processing in Databricks — Delta Lake change data capture or equivalent
Snowflake as a destination: merge operations, incremental loading strategies, and cost-aware design
Python for pipeline code — readable, maintainable, documented
SQL for the transformation logic that runs inside the pipeline
ETL pipeline debugging experience — you can read an existing pipeline and understand it without the original author
Strong written English for the handover document — it is a deliverable, not an afterthought

Contract Type

Fixed-price

Level

Mid-level

Budget Range

$12,000

Duration

8 weeks

AI Expertise

MLOps & AI Infrastructure

Ready to apply for this role?

Create a free talent account in under 2 minutes.

Apply to verified AI companies
Get AI-matched job recommendations
Message hiring managers directly
Build your public AI talent profile

Create free account & apply Log in

ML Data Pipeline Consultant

Report this job

Report submitted

Apply for
ML Data Pipeline Consultant

About the role

Key Deliverables

Requirements

AI Expertise

ML Data Pipeline Consultant

Report this job

Report submitted

Apply for ML Data Pipeline Consultant

About the role

Key Deliverables

Requirements

AI Expertise

Apply for
ML Data Pipeline Consultant