Our research group focuses on efficient large language model inference — specifically, the problem of making capable models fast and cheap enough to operate in latency-sensitive production environments without unacceptable quality degradation. The work spans speculative decoding, quantisation, knowledge distillation, and architectural modifications that improve inference throughput without the regressions that naïve compression approaches introduce. We are a team of six researchers and two engineers. Our publication bar is high and we do not publish for the sake of it. We are currently pursuing a research direction we believe has significant commercial implications and have not yet published. We hire researchers who are motivated by hard problems, who write clearly and precisely, and whose experimental work is rigorous enough that a peer can reproduce it from the write-up alone without requesting additional clarification. If you are in a research role at a university, national lab, or industry group and are ready to work on problems where the output matters beyond citations, we welcome the conversation.
Responsibilities
Conduct original research on LLM inference efficiency with publication-quality rigour
Implement and evaluate novel algorithmic approaches in PyTorch with full reproducibility documentation
Collaborate with the engineering team to translate validated research into production-ready implementations
Produce internal research memos on experimental outcomes — positive and negative — within two weeks of completion
Present research progress at weekly internal seminars and represent the company at external conferences
Requirements
PhD in machine learning, computer science, or a closely related discipline, or equivalent research output
4+ years of research experience with publications at NeurIPS, ICML, ICLR, ACL, or equivalent venues
Deep expertise in at least one of: LLM inference optimisation, model quantisation, knowledge distillation, speculative decoding
PyTorch for research implementation — you implement novel architectures and training procedures from scratch
Python for experimentation infrastructure, data processing, and rigorous evaluation pipelines
Experimental practice that would satisfy a peer reviewer: careful baselines, ablations, and statistically sound evaluation
Strong technical writing — both research papers and internal technical memos
Benefits
Research problems with direct commercial production implications — your findings get implemented, not filed
Full remote with optional onsite quarters in San Francisco
$150,000 – $190,000 base salary + equity + publication bonus
$5,000 annual research budget covering conferences, compute credits, and tooling
Pre-publication peer review process — your work is internally reviewed before external submission