We run 8–12 experiments simultaneously across our mobile app. Our current A/B testing is held together with duct tape — different engineers use different significance thresholds, experiment logs live in three different places, and we've had results overturned twice because of novelty effects nobody accounted for.
We need a data scientist to design and implement a proper experimentation platform: standardised metrics, pre-registered experiment plans, sequential testing to avoid peeking, and a clear governance process for experiment sign-off.
This is 30% design/documentation work and 70% building the tooling (Python + our existing Amplitude and BigQuery stack).