Notes from building fewwords.
Arguments I’ll commit to in writing under my own name. Receipts, benchmark numbers, research reading notes, and the things the pitch doesn’t say loudly enough. Short list, slowly added to.
Fourteen postmortems, fourteen YAMLs.
Fourteen real production-agent incidents reconstructed from public postmortems. Each one caught by between one and fourteen lines of YAML. Here are five.
$8.55 to run Claude Sonnet 4.6 as a trajectory judge. Here are the numbers.
One week. 1,980 real agent traces from τ-bench. A reconciled reference built from Opus 4.7 plus a human pass. Here is what Sonnet 4.6 scored as a judge, and what happened when I ran the same traces through fewwords.
Four 2026 papers proved deterministic trajectory verification works. None of them ship.
ToolGate. Solver-Aided Agent Verification. TraceSafe. Verifiably Safe Agents. Four groups converged on the same abstraction in the same year. None of them is a product you can pip install on a Tuesday. That is the gap we built into.
What my benchmark doesn’t measure (yet).
The honest answer to the question every careful reader asks. Our drift detection today is check-fire-rate, not learned trajectory shape. Here is the exact gap, what I am doing about it, and why I am saying it in public.
Why YAML. An engineering argument against DSLs, GUIs, and AI-writes-AI.
I tried to build fewwords with a proper DSL. Every platform engineer I showed it to asked the same question. I do not have a rebuttal anymore. Here is why.