ML Platform Engineer

Ryan Ashford

Full-time · Mid-level · Chicago

About the role

Role summary: ML Platform Engineer, mid-level. Company: B2B SaaS, 85 employees, Series A. Team: Platform Engineering (4 engineers + 1 manager). Reporting to: Head of Platform. Stack: Python, Kubernetes on AWS EKS, MLflow, Prefect, Docker, PostgreSQL. What this role owns: model serving infrastructure, experiment tracking and model registry, CI/CD pipelines for model deployments, monitoring and alerting for model performance. What this role does not own: model research and training (that's the ML team), product engineering (separate team), data pipelines upstream of model inputs (data engineering). Current state: We run 12 production models. Deployment is semi-automated. MLflow is partially adopted. Monitoring is reactive rather than proactive. Rollback takes too long. Target state: Fully automated deployment with canary releases. MLflow standardised across all teams. Proactive alerting on data drift and performance degradation. Rollback under five minutes. Timeline to target: Two to three quarters. What we need from you: Hands-on engineering capability, clear written communication, and an opinion on the right approach before we've told you what to build.

Responsibilities

Automate model deployment pipeline from registry trigger to canary release on EKS
Standardise MLflow adoption across ML and data science teams with clear documentation
Build proactive monitoring for model performance, data drift, and infrastructure health
Reduce rollback time from current 40 minutes to under five minutes
Conduct platform onboarding sessions for new ML team members

Requirements

4+ years in platform, DevOps, or MLOps engineering
Kubernetes — you design and debug pod configurations, resource limits, and scaling behaviour yourself
MLflow for experiment tracking, model registry, and serving — production deployment experience
Python for scripting infrastructure, deployment automation, and monitoring integrations
AWS-native services: EKS, ECR, S3, CloudWatch — not theoretical
Prefect or Airflow for workflow orchestration
Docker — you write Dockerfiles and debug image issues without a tutorial

Benefits

Clear scope, defined target state, and direct path to ownership
Full remote, US time zones preferred
$95,000 – $118,000 base salary + equity
Platform tooling budget: $1,500 quarterly
$2,000 conference and certification budget annually

Job Type

Full-time

Level

Mid-level

Language

English

Salary Range

$95,000 – $118,000

AI Expertise

MLOps & AI Infrastructure

Ready to apply for this role?

Create a free talent account in under 2 minutes.

Apply to verified AI companies
Get AI-matched job recommendations
Message hiring managers directly
Build your public AI talent profile

Create free account & apply Log in