Reinforcement Learning Engineer

James Okafor

Full-time · Mid-level · Chicago

About the role

There's something philosophically interesting about reinforcement learning that doesn't show up in supervised learning: the agent is responsible for generating its own experience. It decides what to try, what to ignore, and — in a sense — what to learn. We find that interesting. We also find it practically useful for what we build: adaptive tutoring systems that adjust lesson sequencing and difficulty in real time based on how individual students are responding, without needing a human curriculum designer in the loop for every edge case. The RL work here is applied and grounded — not robotics or game environments, but real student interaction data, real learning outcomes, and real constraints around data sparsity, safety, and interpretability. We're looking for an engineer who has implemented RL algorithms from the literature — not just used a gym environment with a tutorial PPO implementation, but actually reasoned about why an algorithm is or isn't working and done something about it. Python and PyTorch are our core stack. Stable-Baselines3 is used for baselines. Custom environments in most cases.

Responsibilities

Design and implement RL-based lesson sequencing policies for the adaptive tutoring engine
Build and maintain custom simulation environments modelling student learning trajectories
Run controlled experiments comparing RL policies against curriculum baselines
Analyse policy behaviour and failure modes and document findings clearly
Contribute to our offline evaluation framework for safe policy validation before deployment

Requirements

3–5 years of software engineering with at least 2 years of focused RL work
PyTorch for implementing and debugging custom RL algorithms
Deep understanding of policy gradient methods — PPO, A3C, SAC — not just at the API level
Experience designing and implementing custom RL environments
NumPy for efficient numerical computation in training loops
Familiarity with Stable-Baselines3 or similar frameworks for rapid prototyping and baselines
Exposure to offline RL or imitation learning is a genuine plus for our data constraints

Benefits

RL applied to something that genuinely improves how people learn
Full remote across US and EU time zones
$105,000 – $130,000 base salary + equity
Research reading time built into the schedule — two hours per week, formally
Small team where your experimental results directly inform product direction

Job Type

Full-time

Level

Mid-level

Language

English

Salary Range

$105,000 – $130,000

AI Expertise

AI & Machine Learning Engineers

Ready to apply for this role?

Create a free talent account in under 2 minutes.

Apply to verified AI companies
Get AI-matched job recommendations
Message hiring managers directly
Build your public AI talent profile

Create free account & apply Log in