Our analytics stack is a combination of good decisions made three years ago, legacy choices that made sense at the time, and a few things we built under deadline pressure that have become permanent. We have Kafka streams, a Databricks lakehouse, and Airflow orchestrating approximately 200 DAGs. Not everything talks to everything cleanly. Some of our Spark jobs are slower than they should be. A few pipelines were built by data scientists who had no business writing infrastructure code — and we mean that kindly, they needed to ship fast. We need an experienced data engineer who can work across this stack: fix what's broken, optimise what's slow, and build new pipelines that don't create more problems than they solve. Roughly 60% of your time will be on existing infrastructure, 40% on new features. If you need perfectly scoped projects handed to you on a specification document, this probably isn't the right fit.
Responsibilities
Maintain and optimise Kafka-to-Databricks data ingestion pipelines
Refactor slow Spark jobs for performance and reliability
Build new ETL pipelines for three new data product features in Q3
Improve Airflow DAG monitoring and alerting
Document the data architecture and maintain an accurate lineage map
Requirements
4+ years of production data engineering
Strong command of PySpark and SQL across large datasets
Hands-on experience with Kafka for event streaming
Experience with Databricks — Delta tables, jobs, and notebook-based pipelines
Airflow orchestration — DAG design, dependency management, monitoring
Able to diagnose and fix performance issues without a DBA on call
Benefits
Senior team with high technical standards — you'll learn from the people around you