Profile cover
S

Data Platform Engineer — Kafka & Spark

Samuel Torres

Full-time · Senior

About the role

We process 500 million events per day. Stack is Kafka + Spark on AWS, orchestrated via Airflow. It's held up, but end-to-end latency on our critical event streams has crept from 90 seconds to 6+ minutes over the past year and our ML team keeps missing their feature freshness SLAs. What we actually need: – Scale our Kafka cluster from 12 to 30+ brokers without a weekend maintenance window – Get P95 stream latency back under 2 minutes – Build a feature store (evaluating Feast vs Tecton) so ML models can query real-time features without going directly to Kafka – Improve observability — when latency spikes we have no idea where in the pipeline the bottleneck is The ML team is 8 scientists who are vocal about what they need. You'll work directly with them. They're not always patient — if constant stakeholder requests frustrate you, this will be hard. If you like having users who depend on your work every day, it's great. DevOps owns the infra, you own the pipeline. The boundary is usually clear.

Requirements

  • – Production experience with Kafka and Spark at scale
  • – Strong Python, ideally PySpark
  • – AWS experience (MSK, EMR, Glue, or equivalents)
  • – Experience building real-time feature pipelines for ML
  • – Can mentor junior engineers

Job Type

Full-time

Level

Senior

Language

English

Salary Range

$120k – $165k / year

AI Expertise

MLOps & AI Infrastructure

Ready to apply for this role?

Create a free talent account in under 2 minutes.

  • Apply to verified AI companies
  • Get AI-matched job recommendations
  • Message hiring managers directly
  • Build your public AI talent profile