ML Engineer — Distributed Training Setup (Multi-GPU)

Yemi Adeyemi

Contract · Senior

About the role

We're training large transformer models (7B–30B parameters) and our training setup is inefficient. Single-machine multi-GPU only. No FSDP, no ZeRO optimisation, no gradient checkpointing. We need an ML engineer to set up proper distributed training across our cluster (16 × H100 on GKE). Deliverables: working distributed training config using PyTorch FSDP, training speed benchmarks before and after, runbook for our research team. Bonus if you can also set up experiment tracking integration (we use Weights & Biases).

Contract Type

Hourly rate

Level

Senior

Budget Range

$90 – $140 / hour

Duration

3 months

AI Expertise

MLOps & AI Infrastructure

Ready to apply for this role?

Create a free talent account in under 2 minutes.

Apply to verified AI companies
Get AI-matched job recommendations
Message hiring managers directly
Build your public AI talent profile

Create free account & apply Log in

ML Engineer — Distributed Training Setup (Multi-GPU)

Apply for ML Engineer — Distributed Training Setup (Multi-GPU)

About the role

AI Expertise

Apply for
ML Engineer — Distributed Training Setup (Multi-GPU)