Profile cover
P

ML Data Pipeline Consultant

Pedro Alves

Freelance · Mid-level · Denver

About the role

Fixed-scope engagement. We have a data pipeline that feeds our ML training jobs. It was built eighteen months ago by a contractor who is no longer available. It works well enough for our current data volume but has two specific problems: incremental updates are not implemented — we run full reloads on a 6-hour schedule — and the pipeline has no data quality checks, so bad records reach training data and we only find out after a model degrades. We need a consultant to implement incremental processing and add data quality gates. The pipeline runs on Databricks and the destination is Snowflake. The source systems are a PostgreSQL transactional database and three third-party API feeds. Timeline: 8 weeks. Deliverables: incremental pipeline implementation, data quality check library with documented rules, and a handover document the internal team can extend. Budget: fixed at $12,000. We are not looking for a proposal on how to rebuild the pipeline from scratch. We are looking for someone who can work with what exists and fix the specific problems.

Key Deliverables

List the expected deliverables for this project.

  • Audit the existing pipeline and document current behaviour, data flow, and identified failure modes
  • Implement incremental processing to replace the full-reload pattern — Delta Lake CDC or an agreed equivalent
  • Build a data quality check library covering completeness, referential integrity, and range validation for the three source systems
  • Test the incremental pipeline end-to-end with at least two weeks of production data before handover
  • Deliver a handover document covering architecture, operational runbook, and extension guide

Requirements

Technical stack needed for this mission.

  • Demonstrated experience implementing incremental data processing in Databricks — Delta Lake change data capture or equivalent
  • Snowflake as a destination: merge operations, incremental loading strategies, and cost-aware design
  • Python for pipeline code — readable, maintainable, documented
  • SQL for the transformation logic that runs inside the pipeline
  • ETL pipeline debugging experience — you can read an existing pipeline and understand it without the original author
  • Strong written English for the handover document — it is a deliverable, not an afterthought

Contract Type

Fixed-price

Level

Mid-level

Budget Range

$12,000

Duration

8 weeks

AI Expertise

MLOps & AI Infrastructure

Ready to apply for this role?

Create a free talent account in under 2 minutes.

  • Apply to verified AI companies
  • Get AI-matched job recommendations
  • Message hiring managers directly
  • Build your public AI talent profile