Tagged

#agents

A small eval loop for the humanizer skill

May 25, 2026

A case study in using Caliper to evaluate blader/humanizer, tighten voice calibration, and turn the improvement into an upstream contribution with regression coverage.

agents evaluation skills writing

Skills without evals are just optimism

May 9, 2026

Tribal knowledge encoded as an AI skill is still just text until you evaluate it. Ablation baselines, routing regression tests, trajectory autoraters, and the gotchas flywheel keep encoded knowledge from rotting.

agents evaluation skills

Why traditional access control fails with agents

Feb 22, 2026

Everyone is worried about AI reading things it shouldn't. That's the wrong threat model. The problem starts after the agent reads.

ai-safety agents security access-control