14684 shaares
Robust evaluation of LLM summarisation is still an active research area5. But librarians don’t need perfect metrics to run useful, repeatable checks. Here are five practical “sanity tests” for each category—lightweight enough to do routinely, but strong enough to surface the common failure modes.