Profile cover
M

AI Engineer

Marco Russo

Full-time · Mid-level · Los_Angeles

About the role

Context: we serve 1.2 million users. Our AI features handle 300,000 requests per day. Our P95 latency for LLM calls is currently 4.2 seconds. Our target is 1.8 seconds. We have a hypothesis about where the bottleneck is. We need an AI engineer who can test that hypothesis, implement the fix, and own the performance monitoring layer going forward. Direct impact. Measurable output. Clear success criteria from day one.

Responsibilities

  • Profile and diagnose current AI feature latency bottlenecks
  • Implement streaming responses and caching strategies
  • Build latency and cost dashboards for the AI features layer
  • Own the LLM API integration layer and keep it efficient
  • Document all performance improvements with before/after benchmarks

Requirements

  • 2–4 years building AI features with LLM APIs in production
  • Experience optimising LLM call latency (streaming, caching, prompt compression)
  • Strong Python with async programming experience
  • Familiarity with observability tools (Datadog, Grafana, or similar)
  • Able to scope and deliver a performance improvement project independently

Benefits

  • Concrete problem with measurable success criteria
  • Full remote
  • $95,000 – $125,000
  • Equity package
  • Engineering-first culture — no endless meetings

Job Type

Full-time

Level

Mid-level

Language

English

Salary Range

$95,000 – $125,000

AI Expertise

AI & Machine Learning Engineers

Ready to apply for this role?

Create a free talent account in under 2 minutes.

  • Apply to verified AI companies
  • Get AI-matched job recommendations
  • Message hiring managers directly
  • Build your public AI talent profile