Profile cover
S

Senior Prompt Engineer

Sofia Reyes

Full-time · Senior · Chicago

About the role

We build a consumer AI assistant used by 1.4 million people daily. The assistant helps users draft text, navigate complex decisions, and get reliable answers — across financial planning, health information, and professional communication. At that scale, every change to a system prompt has measurable effects within hours. We learned this the hard way: a prompt that tested well on our internal eval suite degraded user satisfaction scores by 3.2 points in its first week in production. That experience is the reason we're hiring a Senior Prompt Engineer whose primary responsibility isn't writing prompts — it's building the evaluation frameworks that tell us whether a prompt is actually better, not just different. You'll work at the intersection of language design, RAG architecture, and rigorous measurement. You'll own the evaluation methodology. You'll run the A/B tests. You'll document every change with a rationale and test results before it ships. The role reports to our Head of AI Product and requires overlap with US Central hours.

Responsibilities

  • Own and continuously improve all production system prompts across the assistant's core feature set
  • Design and maintain an automated prompt evaluation framework with clear, reproducible metrics
  • Run controlled A/B tests on prompt variants and report findings with statistical confidence intervals
  • Collaborate with the safety team to identify and remediate failure modes before deployment
  • Document prompt architecture decisions in a format accessible to engineers, product managers, and leadership

Requirements

  • 4+ years of engineering or applied research with 2+ years of focused prompt engineering on production LLM systems
  • Deep hands-on experience with OpenAI and Anthropic APIs — you understand temperature, top-p, and logit bias in production, not just in theory
  • Experience designing automated evaluation pipelines for LLM output quality: helpfulness, accuracy, and safety metrics
  • LangChain or LlamaIndex for RAG pipeline construction and continuous improvement
  • Python for evaluation automation, pipeline tooling, and data analysis
  • Familiarity with structured output techniques: function calling, JSON mode, and constrained generation
  • Strong written communication — every prompt change you ship has a documented rationale and evaluation result

Benefits

  • Work at the intersection of LLM research and consumer product at real scale — you see how every change performs on 1.4 million daily users
  • Full remote, US time zones
  • $120,000 – $150,000 base salary + equity
  • $2,500 annual research and conference budget
  • Access to production evaluation data — real signal, not synthetic benchmarks

Job Type

Full-time

Level

Senior

Language

English

Salary Range

$120,000 – $150,000

AI Expertise

NLP & Prompt Engineering

Ready to apply for this role?

Create a free talent account in under 2 minutes.

  • Apply to verified AI companies
  • Get AI-matched job recommendations
  • Message hiring managers directly
  • Build your public AI talent profile