2026-03-07 · Petr Kindlmann
LLM-as-Judge: How to Score AI Agent Responses Automatically
Traditional metrics can't evaluate AI agent quality. LLM-as-judge uses one model to score another's output against human-written criteria. Learn how to write good judge criteria, set thresholds, and combine judge assertions with deterministic checks in KindLM.