Profile cover
L

MLOps Engineer

Luca Ferretti

Full-time · Mid-level · UTC

About the role

We are a fully distributed team of 23 people across 11 time zones. We have not held a synchronous all-hands meeting in two years. Every significant decision starts as a written document, circulates for asynchronous comment, and is recorded permanently in our internal wiki before it is implemented. Our ML infrastructure reflects this culture. Every pipeline has a runbook. Every model deployment has a written decision record explaining why it was chosen. Every on-call incident produces a written post-mortem that stays in the wiki permanently, not in someone's memory. We are not perfect. Some of our documentation is out of date. The MLOps engineer we hire will inherit some of that and will be expected to fix it — because undocumented infrastructure is infrastructure that only one person can operate, and that's a risk we take seriously. The role involves Kubernetes, Airflow, MLflow, and CI/CD. The actual technical work is solid but not extraordinary. What makes this unusual is the writing culture. If you write as naturally as you code, this is a very good place to work. If you find the documentation requirement burdensome, this will be a frustrating place.

Responsibilities

  • Own and maintain the ML training and serving infrastructure across Kubernetes
  • Build and document all new MLflow experiment tracking and model registry integrations
  • Design and implement Airflow DAGs for new model retraining pipelines — always with runbooks
  • Conduct on-call rotations (two weeks per quarter) and write post-mortems for every significant incident
  • Audit existing infrastructure documentation and close the gaps — we will give you dedicated time for this

Requirements

  • 3–5 years of MLOps, platform engineering, or data engineering
  • Kubernetes for ML workload orchestration — you write manifests, debug pod scheduling, and manage resource quotas
  • MLflow for experiment tracking and model registry in a production context — you know its limitations as well as its capabilities
  • Airflow for pipeline orchestration — you've written DAGs that other people maintain without asking you questions
  • CI/CD pipeline ownership using GitHub Actions or equivalent — you treat your pipelines as software
  • Python for infrastructure tooling and automation
  • Strong written English — you will be evaluated on your writing during the interview process as explicitly as on your technical skills

Benefits

  • Truly async work — no Slack noise outside your working hours, no "quick sync" requests
  • Full remote, your working hours within a 4-hour overlap with UTC
  • $88,000 – $108,000 base salary
  • $1,000 annual book and learning budget — we've spent a lot on books on writing and communication specifically
  • All documentation is public internally — your work is visible to the whole company

Job Type

Full-time

Level

Mid-level

Language

English

Salary Range

$88,000 – $108,000

AI Expertise

MLOps & AI Infrastructure

Ready to apply for this role?

Create a free talent account in under 2 minutes.

  • Apply to verified AI companies
  • Get AI-matched job recommendations
  • Message hiring managers directly
  • Build your public AI talent profile