We've been building for 18 months. We have paying customers. We have a product that works — mostly because we ship fast and fix faster. The AI core is a fine-tuned language model for legal tech: contract clause identification, obligation extraction, and risk flagging in vendor agreements. The base model is good. The fine-tuning is good enough. 'Good enough' is where we are now, and we need to reach 'excellent' before a better-funded competitor does. That's the job: take our existing fine-tuning process — a script one of our co-founders wrote at 2am and has been incrementally improved ever since — and make it excellent. Better data curation, better training runs, better evaluation, faster iteration. The stack is Python, Hugging Face Transformers, PyTorch, and AWS SageMaker for training jobs. We're a team of eight. Everyone ships code. You'll work directly with the CTO and two ML engineers, and your results will directly shape the product roadmap.
Responsibilities
Redesign and implement the data curation and cleaning pipeline for fine-tuning datasets
Run systematic training experiments comparing fine-tuning strategies and document results clearly
Build an evaluation framework for domain-specific output quality beyond perplexity — task-specific metrics, human eval design
Reduce fine-tuning iteration cycle time from 4–5 days to under 2 days
Collaborate with the product team to identify capability gaps and translate them into training objectives and data requirements
Requirements
3–5 years of ML engineering with 2+ years of hands-on LLM fine-tuning in a production context
Hugging Face — Transformers, PEFT/LoRA, Datasets — production experience, not just tutorial familiarity
PyTorch for custom training logic, debugging, and custom loss functions
AWS SageMaker for managing training jobs and experiment tracking
Strong data curation instincts — you know fine-tuning quality is bottlenecked by dataset quality and have opinions about fixing it
Python for data processing pipelines, training scripts, and evaluation automation
Experience with instruction tuning, RLHF, or DPO is a meaningful advantage
Benefits
Direct ownership of the AI core of a product used daily by paying customers
Full remote, flexible hours
$100,000 – $128,000 base salary + meaningful equity in a pre-Series A company with strong growth
Small team — you'll have context across the full product, not just your layer
Fast iteration: you propose an experiment on Monday, run it by Thursday, and review results by Friday