We build an AI research assistant for universities and research institutions. The system ingests thousands of academic PDFs, extracts and indexes domain knowledge, and answers researcher queries with cited source passages. The core architecture is a retrieval-augmented generation pipeline built on LangChain and LlamaIndex, with a vector store backed by Weaviate. The system is live and has 400 active users. We need an LLM engineer to extend it — improve retrieval relevance, reduce latency on long documents, handle edge cases around citation attribution, and build evaluation tooling that catches regressions before they reach production. This is maintenance and growth combined. Production experience with RAG systems is required. Experience fine-tuning models with Hugging Face is a meaningful bonus.
Responsibilities
Improve retrieval relevance and answer quality for the academic QA pipeline
Build and maintain an automated evaluation suite for RAG output quality
Reduce end-to-end response latency for queries over long research papers
Handle edge cases around attribution, source overlap, and citation accuracy
Contribute to the technical roadmap for the next generation of the system
Requirements
3–5 years of software engineering with 2+ years focused on LLM applications
Hands-on production experience with RAG pipelines — not tutorial projects
Deep familiarity with LangChain and LlamaIndex for orchestration
Strong Python — you write maintainable, testable, well-documented code
Experience calling and evaluating multiple LLM providers via OpenAI API and equivalents
Familiarity with Hugging Face for model evaluation and fine-tuning is strongly preferred
Benefits
Work used daily by real researchers at leading universities