Machine Learning Engineer, LLM Evals & Agent Systems

London, Hybrid

Clapham Junction, 4 days onsite initially

We’re working with an early-stage AI company building production-grade agentic systems and workflow automation products. They are now hiring an AI Engineer to take ownership of their evaluation infrastructure and help shape the future direction of their AI capability.

This is not a pure research role or prompt engineering position. The focus is production AI systems, eval frameworks, agent orchestration, and engineering reliability.

You will work directly with founders and engineers to evolve their existing V1 eval framework into a scalable, production-ready V2 system integrated into deployment workflows and engineering pipelines.

The environment is fast moving, highly technical, and suited to engineers who enjoy ownership, ambiguity, and building systems end to end.

What You’ll Work On

Designing and evolving LLM evaluation frameworks for production systems

Building eval infrastructure directly into deployment and engineering pipelines
Improving agent reliability, reasoning quality, and orchestration logic
Defining prompting strategies, sub-agent interactions, and reasoning trade-offs
Making architectural decisions around latency, reasoning depth, performance, and reliability
Working closely with founders on product and technical direction
Helping shape the long-term AI engineering function as the company scales

What They’re Looking For

2 to 5 years of backend or software engineering experience
1 to 2 years of hands-on AI engineering experience in production environments
Experience deploying LLM applications or agentic systems into production
Strong engineering fundamentals across APIs, backend systems, infrastructure, and architecture
Experience designing evals, benchmarking systems, or AI testing workflows
Ability to translate business requirements into measurable evaluation frameworks
Comfortable discussing production failures, trade-offs, and engineering decisions
Strong ownership mentality and ability to operate in fast-moving environments

Nice To Have