We will not pretend this is an easy role. We process legal and financial documents in 9 languages. Our current extraction pipeline uses a mix of regex, older Hugging Face models, and a lot of post-processing code that has grown organically over two years. It works, but not well enough for the enterprise clients we are trying to close. We need an NLP engineer who has dealt with messy, real-world document pipelines and knows how to improve them systematically — not someone who will arrive with one new model and assume it fixes everything.
Responsibilities
Audit the existing extraction pipeline and document current failure modes
Fine-tune or replace extraction models where gaps are clearly identified
Build automated evaluation datasets across document types and languages
Improve post-processing logic to handle edge cases more robustly
Work with the product team to define acceptable accuracy thresholds per use case
Requirements
3+ years NLP engineering with document processing systems in production
Strong Python and Hugging Face Transformers
Experience with named entity recognition, document classification, or information extraction
Able to evaluate model performance per entity type and per language
Comfort working with legacy codebases — improves what exists, doesn\'t throw everything away
Benefits
Technically challenging, commercially relevant NLP work