Profile cover
J

Senior MLOps Engineer

Julien Moreau

Full-time · Senior · New_York

About the role

We operate a real-time recommendation system serving 800 million requests per day across 40+ models running in parallel on AWS, with specific workloads on GCP. The infrastructure works. It is not, however, mature enough. Our model deployment pipeline has too many manual steps, our rollback process takes 45 minutes when it should take five, and our experiment tracking is inconsistent enough that engineers sometimes can't reproduce a training run from three months ago. We are hiring a senior MLOps engineer to fix this — not to manage it, but to actively engineer better solutions. You will be hands-on in the code every day. You will also be setting engineering standards and reviewing the work of three junior and mid-level engineers. If you have operated ML infrastructure at this scale and you care about the craft of building systems that work reliably when it matters most, we want to talk.

Responsibilities

  • Lead the redesign of our model deployment and rollback pipeline
  • Improve experiment tracking standardisation across all ML teams
  • Mentor and review work for three junior and mid-level MLOps engineers
  • Define and enforce MLOps engineering standards and best practices
  • Drive the migration of manual deployment steps into automated, auditable processes

Requirements

  • 6+ years in infrastructure or platform engineering with 3+ years focused on ML systems
  • Deep expertise in Kubernetes for model serving and scaling
  • Airflow pipeline design and operational experience at scale
  • MLflow or a comparable model registry — experiment tracking, versioning, deployment
  • AWS-native infrastructure (EKS, SageMaker, S3, CloudWatch) is essential; GCP is a bonus
  • Databricks for large-scale feature engineering and training jobs
  • Maturity in CI/CD design for ML — canary deployments, shadow mode, automated rollback

Benefits

  • Technical leadership at real scale — 800M requests/day is not a slide deck number
  • Full remote across US time zones
  • $135,000 – $170,000 base salary + equity
  • Conference and research budget: $3,000 annually
  • Equipment refresh every two years

Job Type

Full-time

Level

Senior

Language

English

Salary Range

$135,000 – $170,000

AI Expertise

MLOps & AI Infrastructure

Ready to apply for this role?

Create a free talent account in under 2 minutes.

  • Apply to verified AI companies
  • Get AI-matched job recommendations
  • Message hiring managers directly
  • Build your public AI talent profile