Role summary: ML Platform Engineer, mid-level. Company: B2B SaaS, 85 employees, Series A. Team: Platform Engineering (4 engineers + 1 manager). Reporting to: Head of Platform. Stack: Python, Kubernetes on AWS EKS, MLflow, Prefect, Docker, PostgreSQL. What this role owns: model serving infrastructure, experiment tracking and model registry, CI/CD pipelines for model deployments, monitoring and alerting for model performance. What this role does not own: model research and training (that's the ML team), product engineering (separate team), data pipelines upstream of model inputs (data engineering). Current state: We run 12 production models. Deployment is semi-automated. MLflow is partially adopted. Monitoring is reactive rather than proactive. Rollback takes too long. Target state: Fully automated deployment with canary releases. MLflow standardised across all teams. Proactive alerting on data drift and performance degradation. Rollback under five minutes. Timeline to target: Two to three quarters. What we need from you: Hands-on engineering capability, clear written communication, and an opinion on the right approach before we've told you what to build.
Responsibilities
Automate model deployment pipeline from registry trigger to canary release on EKS
Standardise MLflow adoption across ML and data science teams with clear documentation
Build proactive monitoring for model performance, data drift, and infrastructure health
Reduce rollback time from current 40 minutes to under five minutes
Conduct platform onboarding sessions for new ML team members
Requirements
4+ years in platform, DevOps, or MLOps engineering
Kubernetes — you design and debug pod configurations, resource limits, and scaling behaviour yourself
MLflow for experiment tracking, model registry, and serving — production deployment experience
Python for scripting infrastructure, deployment automation, and monitoring integrations
AWS-native services: EKS, ECR, S3, CloudWatch — not theoretical
Prefect or Airflow for workflow orchestration
Docker — you write Dockerfiles and debug image issues without a tutorial
Benefits
Clear scope, defined target state, and direct path to ownership
Full remote, US time zones preferred
$95,000 – $118,000 base salary + equity
Platform tooling budget: $1,500 quarterly
$2,000 conference and certification budget annually