Profile cover
J

Reinforcement Learning Engineer

James Okafor

Full-time · Mid-level · Chicago

About the role

There's something philosophically interesting about reinforcement learning that doesn't show up in supervised learning: the agent is responsible for generating its own experience. It decides what to try, what to ignore, and — in a sense — what to learn. We find that interesting. We also find it practically useful for what we build: adaptive tutoring systems that adjust lesson sequencing and difficulty in real time based on how individual students are responding, without needing a human curriculum designer in the loop for every edge case. The RL work here is applied and grounded — not robotics or game environments, but real student interaction data, real learning outcomes, and real constraints around data sparsity, safety, and interpretability. We're looking for an engineer who has implemented RL algorithms from the literature — not just used a gym environment with a tutorial PPO implementation, but actually reasoned about why an algorithm is or isn't working and done something about it. Python and PyTorch are our core stack. Stable-Baselines3 is used for baselines. Custom environments in most cases.

Responsibilities

  • Design and implement RL-based lesson sequencing policies for the adaptive tutoring engine
  • Build and maintain custom simulation environments modelling student learning trajectories
  • Run controlled experiments comparing RL policies against curriculum baselines
  • Analyse policy behaviour and failure modes and document findings clearly
  • Contribute to our offline evaluation framework for safe policy validation before deployment

Requirements

  • 3–5 years of software engineering with at least 2 years of focused RL work
  • PyTorch for implementing and debugging custom RL algorithms
  • Deep understanding of policy gradient methods — PPO, A3C, SAC — not just at the API level
  • Experience designing and implementing custom RL environments
  • NumPy for efficient numerical computation in training loops
  • Familiarity with Stable-Baselines3 or similar frameworks for rapid prototyping and baselines
  • Exposure to offline RL or imitation learning is a genuine plus for our data constraints

Benefits

  • RL applied to something that genuinely improves how people learn
  • Full remote across US and EU time zones
  • $105,000 – $130,000 base salary + equity
  • Research reading time built into the schedule — two hours per week, formally
  • Small team where your experimental results directly inform product direction

Job Type

Full-time

Level

Mid-level

Language

English

Salary Range

$105,000 – $130,000

AI Expertise

AI & Machine Learning Engineers

Ready to apply for this role?

Create a free talent account in under 2 minutes.

  • Apply to verified AI companies
  • Get AI-matched job recommendations
  • Message hiring managers directly
  • Build your public AI talent profile