know it works.

Evals, AI experiments, and side projects — from someone who's been in software long enough to know it works, and curious enough to keep breaking it.

Blog

Thinking out loud about gen AI evals, what works, what doesn't, and why it matters.

Projects

Side projects built for fun, curiosity, or just because something seemed possible.

About

20+ years in software. Currently obsessed with making AI actually reliable.