Context: we serve 1.2 million users. Our AI features handle 300,000 requests per day. Our P95 latency for LLM calls is currently 4.2 seconds. Our target is 1.8 seconds. We have a hypothesis about where the bottleneck is. We need an AI engineer who can test that hypothesis, implement the fix, and own the performance monitoring layer going forward. Direct impact. Measurable output. Clear success criteria from day one.
Responsibilities
Profile and diagnose current AI feature latency bottlenecks
Implement streaming responses and caching strategies
Build latency and cost dashboards for the AI features layer
Own the LLM API integration layer and keep it efficient
Document all performance improvements with before/after benchmarks
Requirements
2–4 years building AI features with LLM APIs in production