We are at Series C. We have 210 employees, $47M ARR, and a machine learning team that has grown from three people to fourteen in eighteen months. Growth at that rate is exciting and it is also genuinely dangerous for a research function. We've been careful. The models we shipped eighteen months ago still behave the way we designed them to. But we know the risk — the faster you grow, the more likely you are to ship something that works in staging and degrades silently in production because no one had time to think carefully about the evaluation methodology. We're hiring a Staff ML Scientist to be the person who slows us down in the right moments. Someone senior enough to have earned the credibility to say, in a planning meeting, that we're not ready to ship this yet, and have that be received as a contribution rather than an obstacle. The technical work is real: you'll design experiments, evaluate models, and contribute to architecture decisions. But the reason this is a Staff role and not a Senior role is that we need someone who can shape how the team thinks about rigor — through example, through code review, through documentation standards, and sometimes through difficult conversations.
Responsibilities
Define and enforce model evaluation standards across the ML team — build the frameworks, write the documentation, and model the behaviour in your own work
Lead technical design for the two most complex model development tracks currently in planning
Conduct deep-dive reviews of ML experiments from mid-level scientists and identify methodology gaps before they reach production
Own relationships with key research stakeholders across product, data engineering, and leadership
Mentor three to five senior and mid-level scientists and contribute to hiring bar definition
Requirements
8+ years of machine learning with a demonstrable track record of rigorous evaluation methodology in production systems
Deep Python — you write the code that defines how the rest of the team writes code
PyTorch for research and production model development — architecture design, training dynamics, debugging at scale
Strong command of statistical evaluation: effect sizes, multiple testing correction, calibration, and temporal validation
Experience operating at Staff or Principal level in a team that was scaling — you've navigated the gap between research culture and shipping culture
Publications, conference talks, or equivalent public technical contributions are a plus but not a requirement
Benefits
Staff-level ownership in a company with real traction and a machine learning team that is large enough to be interesting
Full remote with optional use of our New York and Toronto offices
$170,000 – $210,000 base salary + meaningful equity at Series C strike price
$3,000 annual research and conference budget
Unlimited PTO with a genuine expectation that people use at least four weeks per year