Have you ever looked at an LLM output and known immediately that the prompt was wrong — not the model, not the data, but the prompt? That's the instinct we're looking for. We are a legal tech company building AI-assisted contract review tools. Our prompts work. They just don't work consistently. We need someone systematic: someone who treats prompting like engineering, not like guess-and-check. You'll own the prompt layer for three of our core product features, with room to expand.
Responsibilities
Own and improve prompts for contract summarisation, clause extraction, and risk flagging
Build a prompt versioning system with performance benchmarks
Run A/B tests on prompt variations and measure accuracy changes
Work with lawyers on the team to validate output quality
Document failure modes and keep a known-issues registry
Requirements
2+ years prompt engineering experience with real shipped products
Able to build and run prompt regression test suites
Familiarity with structured output techniques (JSON mode, function calling)
Strong written communication — you document everything
Plus: experience with legal or financial document use cases
Benefits
Full remote
$85,000 – $110,000 salary
Conference budget
High-quality domain work — not generic chatbot prompts