Machine Learning Engineer, LLM Evals & Agent Systems
London, Hybrid
Clapham Junction, 4 days onsite initially
We’re working with an early-stage AI company building production-grade agentic systems and workflow automation products. They are now hiring an AI Engineer to take ownership of their evaluation infrastructure and help shape the future direction of their AI capability.
This is not a pure research role or prompt engineering position. The focus is production AI systems, eval frameworks, agent orchestration, and engineering reliability.
You will work directly with founders and engineers to evolve their existing V1 eval framework into a scalable, production-ready V2 system integrated into deployment workflows and engineering pipelines.
The environment is fast moving, highly technical, and suited to engineers who enjoy ownership, ambiguity, and building systems end to end.
What You’ll Work On
Designing and evolving LLM evaluation frameworks for production systems
- Building eval infrastructure directly into deployment and engineering pipelines
- Improving agent reliability, reasoning quality, and orchestration logic
- Defining prompting strategies, sub-agent interactions, and reasoning trade-offs
- Making architectural decisions around latency, reasoning depth, performance, and reliability
- Working closely with founders on product and technical direction
- Helping shape the long-term AI engineering function as the company scales
What They’re Looking For
- 2 to 5 years of backend or software engineering experience
- 1 to 2 years of hands-on AI engineering experience in production environments
- Experience deploying LLM applications or agentic systems into production
- Strong engineering fundamentals across APIs, backend systems, infrastructure, and architecture
- Experience designing evals, benchmarking systems, or AI testing workflows
- Ability to translate business requirements into measurable evaluation frameworks
- Comfortable discussing production failures, trade-offs, and engineering decisions
- Strong ownership mentality and ability to operate in fast-moving environments
Nice To Have
- Experience from start-ups or YC-style environments
- Exposure to multi-agent systems or orchestration frameworks
- Experience integrating eval tooling into CI/CD or deployment systems
- Customer-facing or stakeholder-facing exposure
- Side projects or experimentation around evals, benchmarking, or agent systems
Package
- 0.5% equity
- 30 days holiday plus bank holidays
- Hybrid working in Clapham Junction
- Opportunity to grow into a future Head of AI position as the company scales
Interview Process
- Introductory call
- Founder meeting
- Technical architecture and eval discussion
- Scenario-based onsite session