We're moving our ML infrastructure from a setup that was fine when we had three data scientists to one that can support ten and scale to twenty. Right now models are trained on EC2 instances, experiments are tracked in a shared spreadsheet, and deployment is a script someone runs manually from their terminal. We know this is not sustainable. We need an engineer who has done this migration before — containerised model serving, proper experiment tracking with MLflow, a model registry, automated deployment pipelines. AWS is our primary environment and we're comfortable staying there. We use GCP for some BigQuery workloads that won't move. Kubernetes for orchestration. We are not looking to adopt every new tool — we care about reliability and documentation more than novelty. If you are the kind of engineer who writes runbooks, keeps architecture diagrams up to date, and gets annoyed by undocumented infrastructure, we are going to get along extremely well.
Responsibilities
Design and implement the transition from manual deployment to an automated ML pipeline
Set up and standardise MLflow experiment tracking across all data science projects
Build and maintain Kubernetes-based model serving infrastructure on AWS EKS
Create and maintain runbooks, architecture diagrams, and deployment documentation
Work with data scientists to ensure their code is production-ready before handoff
Requirements
4+ years in cloud infrastructure or MLOps
AWS-native expertise — EKS, S3, ECR, CloudWatch, and at least one managed training service
Docker and Kubernetes for containerised model serving and scaling
MLflow for experiment tracking, model versioning, and serving
GCP familiarity (BigQuery, Cloud Run) is a real bonus
Strong Python for scripting infrastructure and deployment automation
Opinionated about documentation — you leave things better than you found them
Benefits
Greenfield MLOps build — you design the standards, not inherit them
Full remote
$105,000 – $135,000 base salary + equity
$2,000 annual cloud certification and learning budget
Flexible hours with core overlap 10am–3pm US Eastern