We are a two-sided marketplace for skilled trades — connecting homeowners with vetted electricians, plumbers, and HVAC technicians in 38 metropolitan areas. We have 2.1 million registered homeowners and 48,000 active trade professionals. The network effects are real: a denser supply side in a market makes the demand side convert better, which brings more demand, which makes supply more profitable, which attracts more supply. Our ML systems sit inside that loop. Ranking, matching, quality scoring, fraud detection, and dynamic pricing all feed into marketplace health metrics that we watch the same way an airline watches load factor. We are looking for a Senior ML Infrastructure Engineer to own the serving layer for the 12 models currently in production and the platform that the data science team uses to deploy new ones. The problems are practical: latency SLAs, model freshness guarantees, feature store reliability, and the kind of operational rigour that keeps a marketplace from degrading quietly over a weekend when no one is watching.
Responsibilities
Own the model serving infrastructure for 12 production models, maintaining latency SLAs and uptime targets
Build and maintain the real-time feature pipeline serving ranking and matching models
Design and implement a feature store supporting both real-time serving and offline training data access
Define and track ML infrastructure SLOs — latency, freshness, error rate — with dashboards and automated alerting
Work with data scientists to establish deployment standards and reduce time from model approval to production
Requirements
6+ years of platform, MLOps, or data engineering with at least two years supporting a high-traffic production ML system
Kubernetes for multi-tenant ML serving — you've managed resource contention, horizontal pod autoscaling, and serving latency SLOs
Apache Kafka for real-time feature pipelines and event-driven model updates
Apache Spark for batch feature computation and offline training data preparation
MLflow for model registry, versioning, and serving lifecycle management
AWS at depth: EKS, Kinesis, SageMaker, ElastiCache — you've run real serving infrastructure, not prototypes
SQL for the data engineering that supports feature stores and model evaluation pipelines
Benefits
Own the infrastructure that drives marketplace health metrics for a real two-sided marketplace at scale
Full remote, US time zones preferred
$135,000 – $160,000 base salary + equity
$2,000 annual conference and tooling budget
On-call is structured and compensated — two weeks per quarter with clear escalation paths